Class ItemFileIndexingWriter

All Implemented Interfaces:
DocWriter
Direct Known Subclasses:
ADNFileIndexingWriter, DleseIMSFileIndexingWriter

public abstract class ItemFileIndexingWriter extends XMLFileIndexingWriter
Abstract class for writing a Lucene Document for a collection of item-level metadata records of a specific format (DLESE IMS, ADN-Item, ADN-Collection, etc). The reader for this type of Document is XMLDocReader or ItemDocReader.


The Lucene Document fields that are created by this class are (in addition the the ones listed for FileIndexingServiceWriter):

title - The tile for the resource. Stored.
description - The description for the resource. Stored.
url - The url to the resoruce. Stored.
Stored. Appended with a '0' at the beginning to support wildcard searching.
metadatapfx - The metadata prefix (format) for this record, for example 'adn' or 'oai_dc'. Stored. Appended with a '0' at the beginning to support wildcard searching.
accessionstatus - The accession status for this record. Stored. Appended with a '0' at the beginning to support wildcard searching.
annotypes - Annotataion types that are refer to this record. Keyword.
annopathways - Annotataion pathways that are refer to this record. Keyword.
associatedids - A list of record IDs that refer to the same resource. Keyword.
valid - Indicates whether the record is valid [true | false]. Not stored.
validationreport - Text describing an error in the validation of the data for this record. Stored. Only indexed if there was a validation error indicated by the valid field containing false.

Author:
John Weatherley
See Also:
  • Constructor Details

    • ItemFileIndexingWriter

      public ItemFileIndexingWriter()
  • Method Details

    • getKeywords

      protected abstract String getKeywords() throws Exception
      Returns the item's keywords sorted and separated by the '+' symbol. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'keywords' and is also indexed in the 'default' field.
      Returns:
      The keywords String
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCreatorLastName

      protected abstract String getCreatorLastName() throws Exception
      Returns the items creator's last name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field the 'default' field only.
      Returns:
      The creator's last name String
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCreator

      protected abstract String getCreator() throws Exception
      Returns the items creator's full name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'creator'.
      Returns:
      Creator's full name
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAccessionStatus

      protected abstract String getAccessionStatus() throws Exception
      Returns the accession status of this record, for example 'accessioned'. The String is tokenized, stored and indexed under the field key 'accessionstatus'.
      Returns:
      The accession status.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAccessionDate

      protected abstract Date getAccessionDate() throws Exception
      Returns the accession date for the item, or null if this item is not accessioned.
      Returns:
      The accession date for the item, or null if this item is not accessioned.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCreationDate

      protected abstract Date getCreationDate() throws Exception
      Returns the date this item was first created, or null if not available.
      Returns:
      The item creation date or null
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getContent

      protected abstract String getContent()
      Returns the content of the item this record catalogs, or null if not available. For example the full HTML text of the Web page.
      Returns:
      The content of the item, or null
    • getAssociatedMmdRecs

      protected abstract MmdRec[] getAssociatedMmdRecs()
      Returns the MmdRecs for records in other collections that catalog the same resource. Does not include myMmdRec.
      Returns:
      The associated MmdRecs, null or empty if none
    • getAllMmdRecs

      protected abstract MmdRec[] getAllMmdRecs()
      Returns the MmdRecs for all records associated with this resouce, including myMmdRec.
      Returns:
      All MmdRecs for this resource, null or empty if none
    • getMyMmdRec

      protected abstract MmdRec getMyMmdRec()
      Returns the MmdRec for this record only.
      Returns:
      The MmdRec for this record, or null
    • getContentType

      protected abstract String getContentType()
      Returns the content type of the item this record catalogs, or null if not available. For example "text/html" or "html".
      Returns:
      The content type of the item, or null
    • getHasRelatedResource

      protected abstract boolean getHasRelatedResource() throws Exception
      Returns true if the item has one or more related resource, false otherwise.
      Returns:
      True if the item has one or more related resource, false otherwise.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getRelatedResourceIds

      protected abstract String[] getRelatedResourceIds() throws Exception
      Returns the IDs of related resources that are cataloged by ID, or null if none are present
      Returns:
      Related resource IDs, or null if none are available
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getRelatedResourceUrls

      protected abstract String[] getRelatedResourceUrls() throws Exception
      Returns the URLs of related resources that are cataloged by URL, or null if none are present
      Returns:
      Related resource URLs, or null if none are available
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • addFrameworkFields

      protected abstract void addFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc) throws Exception
      Adds fields to the index that are unique to the given framework.

      Example code:
      protected void addFrameworkFields(Document newDoc, Document existingDoc) throws Exception {
        String customContent = "Some content";
        newDoc.add(new Field("mycustomefield", customContent));
      }

      Parameters:
      newDoc - The new Document that is being created for this resource
      existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getDocType

      public abstract String getDocType() throws Exception
      Returns a unique document type key for this kind of record, corresponding to the format type. For example "adn," "dlese_ims," or "dlese_anno". The string is parsed using the Lucene StandardAnalyzer so it must be lowercase and should not contain any stop words.
      Specified by:
      getDocType in interface DocWriter
      Specified by:
      getDocType in class FileIndexingServiceWriter
      Returns:
      The docType String
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getReaderClass

      public abstract String getReaderClass()
      Gets the fully qualified name of the concrete DocReader class that is used to read this type of Document, for example "org.dlese.dpc.index.reader.ItemDocReader".
      Specified by:
      getReaderClass in interface DocWriter
      Specified by:
      getReaderClass in class FileIndexingServiceWriter
      Returns:
      The name of the DocReader.
    • initItem

      public abstract void initItem(File source, org.apache.lucene.document.Document existingDoc) throws Exception
      This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.
      Parameters:
      source - The source file being indexed
      existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
      Throws:
      Exception - If an error occured during set-up.
    • destroy

      protected abstract void destroy()
      This method is called at the conclusion of processing and may be used for tear-down.
      Specified by:
      destroy in class FileIndexingServiceWriter
    • getValidationReport

      protected abstract String getValidationReport() throws Exception
      Gets a report detailing any errors found in the validation of the data, or null if no error was found. This could be implemented by simply performing XML schema validation on the file, or can involve more customized validation of the data if necessary. This method is called after all other methods that access the data (XMLFileIndexingWriter.getTitle(), addFrameworkFields(Document, Document), etc.) so that data verification can be done during those calls, if needed.
      Overrides:
      getValidationReport in class FileIndexingServiceWriter
      Returns:
      Null if no data validation errors were found, otherwise a String that details the nature of the error.
      Throws:
      Exception - If error in performing the validation.
    • init

      public void init(File source, org.apache.lucene.document.Document existingDoc) throws Exception
      Initialize the subclasses and record data service data.
      Specified by:
      init in class XMLFileIndexingWriter
      Parameters:
      source - The source file being indexed.
      existingDoc - A Document that previously existed in the index for this item, if present
      Throws:
      Exception - Thrown if error reading the XML map
    • getMyAnnoResultDocs

      protected ResultDocList getMyAnnoResultDocs() throws Exception
      Gets the annotations for this record, null or zero length if none available. Overrides method in XMLFileIndexingWriter because IDs need initializing.
      Overrides:
      getMyAnnoResultDocs in class XMLFileIndexingWriter
      Returns:
      The myAnnoResultDocs value
      Throws:
      Exception - If error
    • addFields

      protected final void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) throws Exception
      Adds fields to the index that are common to all item-level documents. These include the title, description, id and url as well as collection, accession status, annotation references, and collection(s).
      Specified by:
      addFields in class XMLFileIndexingWriter
      Parameters:
      newDoc - The new Document that is being created for this resource
      existingDoc - An existing Document that currently resides in the index for the given resource, or null if none was previously present
      sourceFile - The sourceFile that is being indexed.
      Throws:
      Exception - If an error occurs
    • getWhatsNewDate

      protected Date getWhatsNewDate() throws Exception
      Returns the date used to determine "What's new" in the library, which is the item's accession date.
      Specified by:
      getWhatsNewDate in class XMLFileIndexingWriter
      Returns:
      The what's new date for the item
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getWhatsNewType

      protected String getWhatsNewType() throws Exception
      Returns 'itemnew' or 'itemannoinprogress' or 'itemannocomplete' whichever came most recelntly.
      Specified by:
      getWhatsNewType in class XMLFileIndexingWriter
      Returns:
      The string 'itemnew' or 'itemannoinprogress' or 'itemannocomplete'.
      Throws:
      Exception - If error getting whats new type.