Class SimpleXMLFileIndexingWriter

All Implemented Interfaces:
DocWriter

public class SimpleXMLFileIndexingWriter extends XMLFileIndexingWriter
This is the default writer for generic XML formats. Creates a Lucene Document from any valid XML file by stripping the XML tags to extract and index the content. The full content of all Elements and Attributes is indexed in the default and admindefault fields and is stemmed and indexed in the stems field. The reader for this type of Document is XMLDocReader.
Author:
John Weatherley
See Also:
  • Constructor Details

    • SimpleXMLFileIndexingWriter

      public SimpleXMLFileIndexingWriter()
      Constructor for the SimpleXMLFileIndexingWriter object
  • Method Details

    • getDocType

      public String getDocType() throws Exception
      Gets the xml format for this document, for example "oai_dc," "adn," "dlese_ims," or "dlese_anno".
      Specified by:
      getDocType in interface DocWriter
      Specified by:
      getDocType in class FileIndexingServiceWriter
      Returns:
      The docType value
      Throws:
      Exception - If errlr.
    • getReaderClass

      public String getReaderClass()
      Gets the name of the concrete DocReader class that is used to read this type of Document, which is "org.dlese.dpc.index.reader.XMLDocReader".
      Specified by:
      getReaderClass in interface DocWriter
      Specified by:
      getReaderClass in class FileIndexingServiceWriter
      Returns:
      The STring "org.dlese.dpc.index.reader.XMLDocReader".
    • init

      public void init(File sourceFile, org.apache.lucene.document.Document existingDoc) throws Exception
      This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.
      Specified by:
      init in class XMLFileIndexingWriter
      Parameters:
      sourceFile - The sourceFile being indexed.
      existingDoc - An existing Document that exists for this in the index.
      Throws:
      Exception - If error
    • getWhatsNewDate

      protected Date getWhatsNewDate() throws Exception
      Returns the date used to determine "What's new" in the library, which is null (unknown).
      Specified by:
      getWhatsNewDate in class XMLFileIndexingWriter
      Returns:
      The what's new date for the item
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getWhatsNewType

      protected String getWhatsNewType() throws Exception
      Returns null (unknown).
      Specified by:
      getWhatsNewType in class XMLFileIndexingWriter
      Returns:
      null.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • destroy

      protected void destroy()
      Does nothing.
      Specified by:
      destroy in class FileIndexingServiceWriter
    • getValidationReport

      protected String getValidationReport() throws Exception
      Gets a report detailing any errors found in the validation of the data, or null if no error was found. This method performs schema validation over the XML.
      Overrides:
      getValidationReport in class FileIndexingServiceWriter
      Returns:
      Null if no data validation errors were found, otherwise a String that details the nature of the error.
      Throws:
      Exception - If error in performing the validation.
    • _getIds

      protected String[] _getIds()
      Returns null to handle by super.
      Specified by:
      _getIds in class XMLFileIndexingWriter
      Returns:
      Null
    • getUrls

      public String[] getUrls()
      Gets the urls attribute of the SimpleXMLFileIndexingWriter object
      Specified by:
      getUrls in class XMLFileIndexingWriter
      Returns:
      The urls value
    • getDescription

      public String getDescription()
      Gets the description attribute of the SimpleXMLFileIndexingWriter object
      Specified by:
      getDescription in class XMLFileIndexingWriter
      Returns:
      The description value
    • getTitle

      public String getTitle()
      Gets the title attribute of the SimpleXMLFileIndexingWriter object
      Specified by:
      getTitle in class XMLFileIndexingWriter
      Returns:
      The title value
    • indexFullContentInDefaultAndStems

      public boolean indexFullContentInDefaultAndStems()
      Place the entire XML content into the default and stems search field.
      Specified by:
      indexFullContentInDefaultAndStems in class XMLFileIndexingWriter
      Returns:
      True
    • addFields

      protected void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) throws Exception
      Nothing to do here. All functionality handled by super.
      Specified by:
      addFields in class XMLFileIndexingWriter
      Parameters:
      newDoc - The new Document that is being created for this resource
      existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
      sourceFile - The feature to be added to the CustomFields attribute
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.