Package org.dlese.dpc.index.writer
Class ADNFileIndexingWriter
java.lang.Object
org.dlese.dpc.index.writer.FileIndexingServiceWriter
org.dlese.dpc.index.writer.XMLFileIndexingWriter
org.dlese.dpc.index.writer.ItemFileIndexingWriter
org.dlese.dpc.index.writer.ADNFileIndexingWriter
- All Implemented Interfaces:
DocWriter
Creates a Lucene
Document from an ADN-item metadata source file.
The Lucene Document fields that are created by this class are (in
addition the the ones listed for FileIndexingServiceWriter):
doctype - Set to 'adn'. Stored. Note: the actual indexing of this field happens in
the superclass FileIndexingServiceWriter.
additional fields - A number of additional fields are defined. See the Java code for
method addFrameworkFields(Document, Document) for details.
- Author:
- John Weatherley, Ryan Deardorff
-
Constructor Summary
ConstructorsConstructorDescriptionCreate a ADNFileIndexingWriter that indexes the given collection in field collection.ADNFileIndexingWriter(boolean isDupDoc) Create a ADNFileIndexingWriter that indexes the given collection in field collection. -
Method Summary
Modifier and TypeMethodDescriptionprotected String[]_getIds()Gets the id(s) for this item.protected final voidaddFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc) Adds custom fields to the index that are unique to this framework.protected voiddestroy()Release map resources for GC after processing.protected voidfinalize()Perform finalization...protected DateReturns the accession date for the item, or null if this item is not accessioned.protected StringReturns the accession status of this record, for example 'accessioned'.protected MmdRec[]Returns the MmdRecs for all records that catalog this resouce, including myMmdRec.protected MmdRec[]Returns the MmdRecs for records in other collections that catalog the same resource, not including myMmdRec.protected StringThe audience beneficiary.protected StringThe audience instructionalGoal.protected StringThe audience teachingMethod.protected StringThe audience tool for.protected StringThe audience typical age range.protected BoundingBoxGets the boundingBox attribute of the ADNFileIndexingWriter objectString[]Returns unique collection keys for the item being indexed, separated by spaces.protected StringReturns the content of the item this record catalogs, or null if not available.protected String[]Gets the contentStandards attribute of the ADNFileIndexingWriter objectprotected StringReturns the content type of the item this record catalogs, or null if not available.protected StringgetCost()Returns the item's cost.protected DateReturns the date this item was first created, or null if not available.protected StringReturns the items creator's full name.protected StringGets the creator's alternate email.protected StringGets the creator's primary email.protected StringReturns the items creator's last name.final StringGets the description attribute of the ADNFileIndexingWriter objectfinal StringGets the docType attribute of the ADNFileIndexingWriter, which is 'adn.'protected StringGets all event names as text.protected String[]Gets the gradeRange attribute of the ADNFileIndexingWriter objectprotected booleanReturns true if the item has one or more related resource, false otherwise.protected StringReturns the item's keywords sorted and separated by the '+' symbol.protected MmdRecReturns the MmdRec for this record only.static longGets the numInstances attribute of the ADNFileIndexingWriter classprotected StringGets the oraganization email.protected StringGets the oraganizations institution department name.protected StringGets the oraganizations institution name.protected StringGets the persons institution department name.protected StringGets the persons institution name.protected StringGets all place names as text.Gets the name of the concreteDocReaderclass that is used to read this type ofDocument, which is "ItemDocReader".protected String[]Returns the IDs of related resources that are cataloged by ID, or null if none are presentprotected String[]Returns the URLs of related resources that are cataloged by URL, or null if none are presentprotected String[]Gets the resourceTypes attribute of the ADNFileIndexingWriter objectprotected String[]Gets the subjects attribute of the ADNFileIndexingWriter objectprotected StringGets all temporal coverage names as text.final StringgetTitle()Gets the title attribute of the ADNFileIndexingWriter objectprotected StringGets the mirror URLs encoded as terms, if any.final String[]getUrls()Gets the url(s) from the ADN record(s).protected StringGets a report detailing any errors found in the validation of the data, or null if no error was found.booleanDefault and stems fields handled here, so do not index full content.voidInitialize the XML map, MmdRecs and other data prior to processingvoidsetIsSingleDoc(boolean isSingleDoc) Sets the whether this writer should write a single record doc rather than a multi-item doc.Methods inherited from class org.dlese.dpc.index.writer.ItemFileIndexingWriter
addFields, getMyAnnoResultDocs, getWhatsNewDate, getWhatsNewType, initMethods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
addCustomFields, getDeletedDoc, getDocGroup, getDom4jDoc, getFieldContent, getFieldContent, getFieldName, getIds, getIndex, getMyCollectionDoc, getOaiModtime, getPrimaryId, getRecordDataService, getRelatedIds, getRelatedIdsMap, getRelatedUrls, getRelatedUrlsMap, getTermStringFromStringArray, getXmlIndexer, getXmlIndexerFieldsConfigMethods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addDocToRemove, addToAdminDefaultField, addToDefaultField, create, getConfigAttributes, getDocsource, getFileContent, getFileIndexingPlugin, getFileIndexingService, getLuceneDoc, getPreviousRecordDoc, getSessionAttributes, getSourceDir, getSourceFile, isMakingDeletedDoc, isValidationEnabled, prtln, prtlnErr, setConfigAttributes, setDebug, setFileIndexingPlugin, setFileIndexingService, setIsMakingDeletedDoc, setValidationEnabled
-
Constructor Details
-
ADNFileIndexingWriter
public ADNFileIndexingWriter()Create a ADNFileIndexingWriter that indexes the given collection in field collection. -
ADNFileIndexingWriter
public ADNFileIndexingWriter(boolean isDupDoc) Create a ADNFileIndexingWriter that indexes the given collection in field collection.- Parameters:
isDupDoc- False to force this to be processed as a non-dup
-
-
Method Details
-
finalize
Perform finalization... closing resources, etc. -
getNumInstances
public static long getNumInstances()Gets the numInstances attribute of the ADNFileIndexingWriter class- Returns:
- The numInstances value
-
initItem
Initialize the XML map, MmdRecs and other data prior to processing- Specified by:
initItemin classItemFileIndexingWriter- Parameters:
source- The source file being indexed.existingDoc- A Document that previously existed in the index for this item, if present- Throws:
Exception- Thrown if error reading the XML map
-
destroy
protected void destroy()Release map resources for GC after processing.- Specified by:
destroyin classItemFileIndexingWriter
-
getCollections
Returns unique collection keys for the item being indexed, separated by spaces. For example 'dcc,' 'comet' or 'dwel'. Since this may be a multi-doc, it may have multiple collections, so overridding the default getCollection() method.- Overrides:
getCollectionsin classXMLFileIndexingWriter- Returns:
- The collection keys
- Throws:
Exception- If error
-
getAccessionStatus
Returns the accession status of this record, for example 'accessioned'. The String is tokenized, stored and indexed under the field key 'accessionstatus'.- Specified by:
getAccessionStatusin classItemFileIndexingWriter- Returns:
- The accession status.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getHasRelatedResource
Returns true if the item has one or more related resource, false otherwise.- Specified by:
getHasRelatedResourcein classItemFileIndexingWriter- Returns:
- True if the item has one or more related resource, false otherwise.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getRelatedResourceIds
Returns the IDs of related resources that are cataloged by ID, or null if none are present- Specified by:
getRelatedResourceIdsin classItemFileIndexingWriter- Returns:
- Related resource IDs, or null if none are available
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getRelatedResourceUrls
Returns the URLs of related resources that are cataloged by URL, or null if none are present- Specified by:
getRelatedResourceUrlsin classItemFileIndexingWriter- Returns:
- Related resource URLs, or null if none are available
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAccessionDate
Returns the accession date for the item, or null if this item is not accessioned. If this is a multi-doc, returns the oldest accession date of the bunch, corresponding to the first time this resource appeared in the library.- Specified by:
getAccessionDatein classItemFileIndexingWriter- Returns:
- The accession date for the item, or null if this item is not accessioned.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCreationDate
Returns the date this item was first created, or null if not available.- Specified by:
getCreationDatein classItemFileIndexingWriter- Returns:
- The item creation date or null
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getReaderClass
Gets the name of the concreteDocReaderclass that is used to read this type ofDocument, which is "ItemDocReader".- Specified by:
getReaderClassin interfaceDocWriter- Specified by:
getReaderClassin classItemFileIndexingWriter- Returns:
- The String "org.dlese.dpc.index.reader.ItemDocReader".
-
indexFullContentInDefaultAndStems
public boolean indexFullContentInDefaultAndStems()Default and stems fields handled here, so do not index full content.- Specified by:
indexFullContentInDefaultAndStemsin classXMLFileIndexingWriter- Returns:
- False
-
getAssociatedMmdRecs
Returns the MmdRecs for records in other collections that catalog the same resource, not including myMmdRec.- Specified by:
getAssociatedMmdRecsin classItemFileIndexingWriter- Returns:
- The associated MmdRecs, or null if none
-
getAllMmdRecs
Returns the MmdRecs for all records that catalog this resouce, including myMmdRec.- Specified by:
getAllMmdRecsin classItemFileIndexingWriter- Returns:
- All MmdRecs for this resource, null or empty if none
-
getMyMmdRec
Returns the MmdRec for this record only.- Specified by:
getMyMmdRecin classItemFileIndexingWriter- Returns:
- The MmdRec for this record, or null
-
getValidationReport
Gets a report detailing any errors found in the validation of the data, or null if no error was found.- Specified by:
getValidationReportin classItemFileIndexingWriter- Returns:
- Null if no data validation errors were found, otherwise a String that details the nature of the error.
- Throws:
Exception- If error in performing the validation.
-
getDocType
Gets the docType attribute of the ADNFileIndexingWriter, which is 'adn.'- Specified by:
getDocTypein interfaceDocWriter- Specified by:
getDocTypein classItemFileIndexingWriter- Returns:
- The docType, which is 'adn.'
-
_getIds
Gets the id(s) for this item. If multiple IDs exists, the first one is the primary.- Specified by:
_getIdsin classXMLFileIndexingWriter- Returns:
- The id value
- Throws:
Exception- If an error occurs
-
getTitle
Gets the title attribute of the ADNFileIndexingWriter object- Specified by:
getTitlein classXMLFileIndexingWriter- Returns:
- The title value
- Throws:
Exception- If an error occurs
-
getDescription
Gets the description attribute of the ADNFileIndexingWriter object- Specified by:
getDescriptionin classXMLFileIndexingWriter- Returns:
- The description value
- Throws:
Exception- If an error occurs
-
getUrls
Gets the url(s) from the ADN record(s).- Specified by:
getUrlsin classXMLFileIndexingWriter- Returns:
- The urls value
- Throws:
Exception- If an error occurs
-
getKeywords
Returns the item's keywords sorted and separated by the '+' symbol. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'keywords' and is also indexed in the 'default' field.- Specified by:
getKeywordsin classItemFileIndexingWriter- Returns:
- The keywords String
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCreatorLastName
Returns the items creator's last name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field the 'default' field only.- Specified by:
getCreatorLastNamein classItemFileIndexingWriter- Returns:
- The creator's last name String
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCreator
Returns the items creator's full name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'creator'.- Specified by:
getCreatorin classItemFileIndexingWriter- Returns:
- Creator's full name
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCost
Returns the item's cost. The String is stored and indexed under the field key 'cost'.- Returns:
- Resource cost
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getBoundingBox
Gets the boundingBox attribute of the ADNFileIndexingWriter object- Overrides:
getBoundingBoxin classXMLFileIndexingWriter- Returns:
- The boundingBox value
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getContent
Returns the content of the item this record catalogs, or null if not available. For example the full HTML text of the Web page.- Specified by:
getContentin classItemFileIndexingWriter- Returns:
- The content of the item, or null
-
getContentType
Returns the content type of the item this record catalogs, or null if not available. For example "text/html" or "html".- Specified by:
getContentTypein classItemFileIndexingWriter- Returns:
- The content type of the item, or null
-
addFrameworkFields
protected final void addFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc) throws Exception Adds custom fields to the index that are unique to this framework.- Specified by:
addFrameworkFieldsin classItemFileIndexingWriter- Parameters:
newDoc- The feature to be added to the FrameworkFields attributeexistingDoc- The feature to be added to the FrameworkFields attribute- Throws:
Exception- If an error occurs
-
setIsSingleDoc
public void setIsSingleDoc(boolean isSingleDoc) Sets the whether this writer should write a single record doc rather than a multi-item doc.- Parameters:
isSingleDoc- The new isSingleDoc value
-
getGradeRange
Gets the gradeRange attribute of the ADNFileIndexingWriter object- Returns:
- The gradeRange value
-
getResourceTypes
Gets the resourceTypes attribute of the ADNFileIndexingWriter object- Returns:
- The resourceTypes value
-
getContentStandards
Gets the contentStandards attribute of the ADNFileIndexingWriter object- Returns:
- The contentStandards value
-
getSubjects
Gets the subjects attribute of the ADNFileIndexingWriter object- Returns:
- The subjects value
-
getCreatorEmailPrimary
Gets the creator's primary email.- Returns:
- The creator's primary email.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCreatorEmailAlt
Gets the creator's alternate email.- Returns:
- The creator's alternate email.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getOrganizationEmail
Gets the oraganization email.- Returns:
- The oraganization email.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getOrganizationInstName
Gets the oraganizations institution name. ADN xPath lifecycle/contributors/contributor/organization/instName- Returns:
- The oraganization name.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getOrganizationInstDepartment
Gets the oraganizations institution department name. ADN xPath lifecycle/contributors/contributor/organization/instDept- Returns:
- The oraganizations institution department name.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getPersonInstName
Gets the persons institution name. ADN xPath lifecycle/contributors/contributor/person/instName- Returns:
- The institution name.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getPersonInstDepartment
Gets the persons institution department name. ADN xPath lifecycle/contributors/contributor/person/instDept- Returns:
- The institution department name.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getUrlMirrors
Gets the mirror URLs encoded as terms, if any.- Returns:
- The URL mirrors encoded as terms, or empty string.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAudienceToolFor
The audience tool for.- Returns:
- The audience tool for.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAudienceBeneficiary
The audience beneficiary.- Returns:
- The audience beneficiary.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAudienceTypicalAgeRange
The audience typical age range.- Returns:
- The audience typical age range.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAudienceInstructionalGoal
The audience instructionalGoal.- Returns:
- The audience instructionalGoal.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAudienceTeachingMethod
The audience teachingMethod.- Returns:
- The audience teachingMethod.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getPlaceNames
Gets all place names as text. Place names are extracted from the following XPaths:general/simplePlacesAndEvents/placeAndEvent/place, geospatialCoverages/geospatialCoverage/boundBox/bbPlaces/place/name andgeospatialCoverages/geospatialCoverage/detGeos/detGeo/detPlaces/place/name. - Returns:
- All place names as text.
-
getEventNames
Gets all event names as text. Event names are extracted from the following XPaths:general/simplePlacesAndEvents/placeAndEvent/event, geospatialCoverages/geospatialCoverage/boundBox/bbEvents/event/name andgeospatialCoverages/geospatialCoverage/detGeos/detGeo/detEvents/event/name. - Returns:
- All event names as text.
-
getTemporalCoverageNames
Gets all temporal coverage names as text. Temporal coverage names are extracted from the following XPaths:general/simpleTemporalCoverages/description, andtemporalCoverages/timeAndPeriod/periods/period/name. - Returns:
- All temporal coverage names as text.
-