Class ADNFileIndexingWriter

All Implemented Interfaces:
DocWriter

public class ADNFileIndexingWriter extends ItemFileIndexingWriter
Creates a Lucene Document from an ADN-item metadata source file.

The Lucene Document fields that are created by this class are (in addition the the ones listed for FileIndexingServiceWriter):

doctype - Set to 'adn'. Stored. Note: the actual indexing of this field happens in the superclass FileIndexingServiceWriter.
additional fields - A number of additional fields are defined. See the Java code for method addFrameworkFields(Document, Document) for details.

Author:
John Weatherley, Ryan Deardorff
  • Constructor Details

    • ADNFileIndexingWriter

      public ADNFileIndexingWriter()
      Create a ADNFileIndexingWriter that indexes the given collection in field collection.
    • ADNFileIndexingWriter

      public ADNFileIndexingWriter(boolean isDupDoc)
      Create a ADNFileIndexingWriter that indexes the given collection in field collection.
      Parameters:
      isDupDoc - False to force this to be processed as a non-dup
  • Method Details

    • finalize

      protected void finalize() throws Throwable
      Perform finalization... closing resources, etc.
      Overrides:
      finalize in class Object
      Throws:
      Throwable - If error
    • getNumInstances

      public static long getNumInstances()
      Gets the numInstances attribute of the ADNFileIndexingWriter class
      Returns:
      The numInstances value
    • initItem

      public void initItem(File source, org.apache.lucene.document.Document existingDoc) throws Exception
      Initialize the XML map, MmdRecs and other data prior to processing
      Specified by:
      initItem in class ItemFileIndexingWriter
      Parameters:
      source - The source file being indexed.
      existingDoc - A Document that previously existed in the index for this item, if present
      Throws:
      Exception - Thrown if error reading the XML map
    • destroy

      protected void destroy()
      Release map resources for GC after processing.
      Specified by:
      destroy in class ItemFileIndexingWriter
    • getCollections

      public String[] getCollections() throws Exception
      Returns unique collection keys for the item being indexed, separated by spaces. For example 'dcc,' 'comet' or 'dwel'. Since this may be a multi-doc, it may have multiple collections, so overridding the default getCollection() method.
      Overrides:
      getCollections in class XMLFileIndexingWriter
      Returns:
      The collection keys
      Throws:
      Exception - If error
    • getAccessionStatus

      protected String getAccessionStatus() throws Exception
      Returns the accession status of this record, for example 'accessioned'. The String is tokenized, stored and indexed under the field key 'accessionstatus'.
      Specified by:
      getAccessionStatus in class ItemFileIndexingWriter
      Returns:
      The accession status.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getHasRelatedResource

      protected boolean getHasRelatedResource() throws Exception
      Returns true if the item has one or more related resource, false otherwise.
      Specified by:
      getHasRelatedResource in class ItemFileIndexingWriter
      Returns:
      True if the item has one or more related resource, false otherwise.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getRelatedResourceIds

      protected String[] getRelatedResourceIds() throws Exception
      Returns the IDs of related resources that are cataloged by ID, or null if none are present
      Specified by:
      getRelatedResourceIds in class ItemFileIndexingWriter
      Returns:
      Related resource IDs, or null if none are available
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getRelatedResourceUrls

      protected String[] getRelatedResourceUrls() throws Exception
      Returns the URLs of related resources that are cataloged by URL, or null if none are present
      Specified by:
      getRelatedResourceUrls in class ItemFileIndexingWriter
      Returns:
      Related resource URLs, or null if none are available
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAccessionDate

      protected Date getAccessionDate() throws Exception
      Returns the accession date for the item, or null if this item is not accessioned. If this is a multi-doc, returns the oldest accession date of the bunch, corresponding to the first time this resource appeared in the library.
      Specified by:
      getAccessionDate in class ItemFileIndexingWriter
      Returns:
      The accession date for the item, or null if this item is not accessioned.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCreationDate

      protected Date getCreationDate() throws Exception
      Returns the date this item was first created, or null if not available.
      Specified by:
      getCreationDate in class ItemFileIndexingWriter
      Returns:
      The item creation date or null
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getReaderClass

      public String getReaderClass()
      Gets the name of the concrete DocReader class that is used to read this type of Document, which is "ItemDocReader".
      Specified by:
      getReaderClass in interface DocWriter
      Specified by:
      getReaderClass in class ItemFileIndexingWriter
      Returns:
      The String "org.dlese.dpc.index.reader.ItemDocReader".
    • indexFullContentInDefaultAndStems

      public boolean indexFullContentInDefaultAndStems()
      Default and stems fields handled here, so do not index full content.
      Specified by:
      indexFullContentInDefaultAndStems in class XMLFileIndexingWriter
      Returns:
      False
    • getAssociatedMmdRecs

      protected MmdRec[] getAssociatedMmdRecs()
      Returns the MmdRecs for records in other collections that catalog the same resource, not including myMmdRec.
      Specified by:
      getAssociatedMmdRecs in class ItemFileIndexingWriter
      Returns:
      The associated MmdRecs, or null if none
    • getAllMmdRecs

      protected MmdRec[] getAllMmdRecs()
      Returns the MmdRecs for all records that catalog this resouce, including myMmdRec.
      Specified by:
      getAllMmdRecs in class ItemFileIndexingWriter
      Returns:
      All MmdRecs for this resource, null or empty if none
    • getMyMmdRec

      protected MmdRec getMyMmdRec()
      Returns the MmdRec for this record only.
      Specified by:
      getMyMmdRec in class ItemFileIndexingWriter
      Returns:
      The MmdRec for this record, or null
    • getValidationReport

      protected String getValidationReport() throws Exception
      Gets a report detailing any errors found in the validation of the data, or null if no error was found.
      Specified by:
      getValidationReport in class ItemFileIndexingWriter
      Returns:
      Null if no data validation errors were found, otherwise a String that details the nature of the error.
      Throws:
      Exception - If error in performing the validation.
    • getDocType

      public final String getDocType()
      Gets the docType attribute of the ADNFileIndexingWriter, which is 'adn.'
      Specified by:
      getDocType in interface DocWriter
      Specified by:
      getDocType in class ItemFileIndexingWriter
      Returns:
      The docType, which is 'adn.'
    • _getIds

      protected String[] _getIds() throws Exception
      Gets the id(s) for this item. If multiple IDs exists, the first one is the primary.
      Specified by:
      _getIds in class XMLFileIndexingWriter
      Returns:
      The id value
      Throws:
      Exception - If an error occurs
    • getTitle

      public final String getTitle() throws Exception
      Gets the title attribute of the ADNFileIndexingWriter object
      Specified by:
      getTitle in class XMLFileIndexingWriter
      Returns:
      The title value
      Throws:
      Exception - If an error occurs
    • getDescription

      public final String getDescription() throws Exception
      Gets the description attribute of the ADNFileIndexingWriter object
      Specified by:
      getDescription in class XMLFileIndexingWriter
      Returns:
      The description value
      Throws:
      Exception - If an error occurs
    • getUrls

      public final String[] getUrls() throws Exception
      Gets the url(s) from the ADN record(s).
      Specified by:
      getUrls in class XMLFileIndexingWriter
      Returns:
      The urls value
      Throws:
      Exception - If an error occurs
    • getKeywords

      protected String getKeywords() throws Exception
      Returns the item's keywords sorted and separated by the '+' symbol. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'keywords' and is also indexed in the 'default' field.
      Specified by:
      getKeywords in class ItemFileIndexingWriter
      Returns:
      The keywords String
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCreatorLastName

      protected String getCreatorLastName() throws Exception
      Returns the items creator's last name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field the 'default' field only.
      Specified by:
      getCreatorLastName in class ItemFileIndexingWriter
      Returns:
      The creator's last name String
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCreator

      protected String getCreator() throws Exception
      Returns the items creator's full name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'creator'.
      Specified by:
      getCreator in class ItemFileIndexingWriter
      Returns:
      Creator's full name
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCost

      protected String getCost() throws Exception
      Returns the item's cost. The String is stored and indexed under the field key 'cost'.
      Returns:
      Resource cost
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getBoundingBox

      protected BoundingBox getBoundingBox() throws Exception
      Gets the boundingBox attribute of the ADNFileIndexingWriter object
      Overrides:
      getBoundingBox in class XMLFileIndexingWriter
      Returns:
      The boundingBox value
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getContent

      protected String getContent()
      Returns the content of the item this record catalogs, or null if not available. For example the full HTML text of the Web page.
      Specified by:
      getContent in class ItemFileIndexingWriter
      Returns:
      The content of the item, or null
    • getContentType

      protected String getContentType()
      Returns the content type of the item this record catalogs, or null if not available. For example "text/html" or "html".
      Specified by:
      getContentType in class ItemFileIndexingWriter
      Returns:
      The content type of the item, or null
    • addFrameworkFields

      protected final void addFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc) throws Exception
      Adds custom fields to the index that are unique to this framework.
      Specified by:
      addFrameworkFields in class ItemFileIndexingWriter
      Parameters:
      newDoc - The feature to be added to the FrameworkFields attribute
      existingDoc - The feature to be added to the FrameworkFields attribute
      Throws:
      Exception - If an error occurs
    • setIsSingleDoc

      public void setIsSingleDoc(boolean isSingleDoc)
      Sets the whether this writer should write a single record doc rather than a multi-item doc.
      Parameters:
      isSingleDoc - The new isSingleDoc value
    • getGradeRange

      protected String[] getGradeRange()
      Gets the gradeRange attribute of the ADNFileIndexingWriter object
      Returns:
      The gradeRange value
    • getResourceTypes

      protected String[] getResourceTypes()
      Gets the resourceTypes attribute of the ADNFileIndexingWriter object
      Returns:
      The resourceTypes value
    • getContentStandards

      protected String[] getContentStandards()
      Gets the contentStandards attribute of the ADNFileIndexingWriter object
      Returns:
      The contentStandards value
    • getSubjects

      protected String[] getSubjects()
      Gets the subjects attribute of the ADNFileIndexingWriter object
      Returns:
      The subjects value
    • getCreatorEmailPrimary

      protected String getCreatorEmailPrimary() throws Exception
      Gets the creator's primary email.
      Returns:
      The creator's primary email.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getCreatorEmailAlt

      protected String getCreatorEmailAlt() throws Exception
      Gets the creator's alternate email.
      Returns:
      The creator's alternate email.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getOrganizationEmail

      protected String getOrganizationEmail() throws Exception
      Gets the oraganization email.
      Returns:
      The oraganization email.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getOrganizationInstName

      protected String getOrganizationInstName() throws Exception
      Gets the oraganizations institution name. ADN xPath lifecycle/contributors/contributor/organization/instName
      Returns:
      The oraganization name.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getOrganizationInstDepartment

      protected String getOrganizationInstDepartment() throws Exception
      Gets the oraganizations institution department name. ADN xPath lifecycle/contributors/contributor/organization/instDept
      Returns:
      The oraganizations institution department name.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getPersonInstName

      protected String getPersonInstName() throws Exception
      Gets the persons institution name. ADN xPath lifecycle/contributors/contributor/person/instName
      Returns:
      The institution name.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getPersonInstDepartment

      protected String getPersonInstDepartment() throws Exception
      Gets the persons institution department name. ADN xPath lifecycle/contributors/contributor/person/instDept
      Returns:
      The institution department name.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getUrlMirrors

      protected String getUrlMirrors() throws Exception
      Gets the mirror URLs encoded as terms, if any.
      Returns:
      The URL mirrors encoded as terms, or empty string.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAudienceToolFor

      protected String getAudienceToolFor() throws Exception
      The audience tool for.
      Returns:
      The audience tool for.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAudienceBeneficiary

      protected String getAudienceBeneficiary() throws Exception
      The audience beneficiary.
      Returns:
      The audience beneficiary.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAudienceTypicalAgeRange

      protected String getAudienceTypicalAgeRange() throws Exception
      The audience typical age range.
      Returns:
      The audience typical age range.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAudienceInstructionalGoal

      protected String getAudienceInstructionalGoal() throws Exception
      The audience instructionalGoal.
      Returns:
      The audience instructionalGoal.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getAudienceTeachingMethod

      protected String getAudienceTeachingMethod() throws Exception
      The audience teachingMethod.
      Returns:
      The audience teachingMethod.
      Throws:
      Exception - This method should throw and Exception with appropriate error message if an error occurs.
    • getPlaceNames

      protected String getPlaceNames()
      Gets all place names as text. Place names are extracted from the following XPaths: general/simplePlacesAndEvents/placeAndEvent/place, geospatialCoverages/geospatialCoverage/boundBox/bbPlaces/place/name and geospatialCoverages/geospatialCoverage/detGeos/detGeo/detPlaces/place/name.
      Returns:
      All place names as text.
    • getEventNames

      protected String getEventNames()
      Gets all event names as text. Event names are extracted from the following XPaths: general/simplePlacesAndEvents/placeAndEvent/event, geospatialCoverages/geospatialCoverage/boundBox/bbEvents/event/name and geospatialCoverages/geospatialCoverage/detGeos/detGeo/detEvents/event/name.
      Returns:
      All event names as text.
    • getTemporalCoverageNames

      protected String getTemporalCoverageNames()
      Gets all temporal coverage names as text. Temporal coverage names are extracted from the following XPaths: general/simpleTemporalCoverages/description, and temporalCoverages/timeAndPeriod/periods/period/name.
      Returns:
      All temporal coverage names as text.