Class ItemFileIndexingWriter
- All Implemented Interfaces:
DocWriter
- Direct Known Subclasses:
ADNFileIndexingWriter,DleseIMSFileIndexingWriter
Document for a collection of
item-level metadata records of a specific format (DLESE IMS, ADN-Item, ADN-Collection, etc). The reader
for this type of Document is XMLDocReader or ItemDocReader.
The Lucene Document fields that are created by this class are (in
addition the the ones listed for FileIndexingServiceWriter):
title - The tile for the resource. Stored.
description - The description for the resource. Stored.
url - The url to the resoruce. Stored.
Stored. Appended with a '0' at the beginning to support wildcard searching.
metadatapfx - The metadata prefix (format) for this record, for example 'adn' or
'oai_dc'. Stored. Appended with a '0' at the beginning to support wildcard searching.
accessionstatus - The accession status for this record. Stored. Appended with a '0'
at the beginning to support wildcard searching.
annotypes - Annotataion types that are refer to this record. Keyword.
annopathways - Annotataion pathways that are refer to this record. Keyword.
associatedids - A list of record IDs that refer to the same resource. Keyword.
valid - Indicates whether the record is valid [true | false]. Not stored.
validationreport - Text describing an error in the validation of the data for this
record. Stored. Only indexed if there was a validation error indicated by the valid field containing
false.
- Author:
- John Weatherley
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected final voidaddFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) Adds fields to the index that are common to all item-level documents.protected abstract voidaddFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc) Adds fields to the index that are unique to the given framework.protected abstract voiddestroy()This method is called at the conclusion of processing and may be used for tear-down.protected abstract DateReturns the accession date for the item, or null if this item is not accessioned.protected abstract StringReturns the accession status of this record, for example 'accessioned'.protected abstract MmdRec[]Returns the MmdRecs for all records associated with this resouce, including myMmdRec.protected abstract MmdRec[]Returns the MmdRecs for records in other collections that catalog the same resource.protected abstract StringReturns the content of the item this record catalogs, or null if not available.protected abstract StringReturns the content type of the item this record catalogs, or null if not available.protected abstract DateReturns the date this item was first created, or null if not available.protected abstract StringReturns the items creator's full name.protected abstract StringReturns the items creator's last name.abstract StringReturns a unique document type key for this kind of record, corresponding to the format type.protected abstract booleanReturns true if the item has one or more related resource, false otherwise.protected abstract StringReturns the item's keywords sorted and separated by the '+' symbol.protected ResultDocListGets the annotations for this record, null or zero length if none available.protected abstract MmdRecReturns the MmdRec for this record only.abstract StringGets the fully qualified name of the concreteDocReaderclass that is used to read this type ofDocument, for example "org.dlese.dpc.index.reader.ItemDocReader".protected abstract String[]Returns the IDs of related resources that are cataloged by ID, or null if none are presentprotected abstract String[]Returns the URLs of related resources that are cataloged by URL, or null if none are presentprotected abstract StringGets a report detailing any errors found in the validation of the data, or null if no error was found.protected DateReturns the date used to determine "What's new" in the library, which is the item's accession date.protected StringReturns 'itemnew' or 'itemannoinprogress' or 'itemannocomplete' whichever came most recelntly.voidInitialize the subclasses and record data service data.abstract voidThis method is called prior to processing and may be used to for any necessary set-up.Methods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
_getIds, addCustomFields, getBoundingBox, getCollections, getDeletedDoc, getDescription, getDocGroup, getDom4jDoc, getFieldContent, getFieldContent, getFieldName, getIds, getIndex, getMyCollectionDoc, getOaiModtime, getPrimaryId, getRecordDataService, getRelatedIds, getRelatedIdsMap, getRelatedUrls, getRelatedUrlsMap, getTermStringFromStringArray, getTitle, getUrls, getXmlIndexer, getXmlIndexerFieldsConfig, indexFullContentInDefaultAndStemsMethods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addDocToRemove, addToAdminDefaultField, addToDefaultField, create, getConfigAttributes, getDocsource, getFileContent, getFileIndexingPlugin, getFileIndexingService, getLuceneDoc, getPreviousRecordDoc, getSessionAttributes, getSourceDir, getSourceFile, isMakingDeletedDoc, isValidationEnabled, prtln, prtlnErr, setConfigAttributes, setDebug, setFileIndexingPlugin, setFileIndexingService, setIsMakingDeletedDoc, setValidationEnabled
-
Constructor Details
-
ItemFileIndexingWriter
public ItemFileIndexingWriter()
-
-
Method Details
-
getKeywords
Returns the item's keywords sorted and separated by the '+' symbol. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'keywords' and is also indexed in the 'default' field.- Returns:
- The keywords String
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCreatorLastName
Returns the items creator's last name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field the 'default' field only.- Returns:
- The creator's last name String
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCreator
Returns the items creator's full name. An empty String or null is acceptable. The String is tokenized, stored and indexed under the field key 'creator'.- Returns:
- Creator's full name
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAccessionStatus
Returns the accession status of this record, for example 'accessioned'. The String is tokenized, stored and indexed under the field key 'accessionstatus'.- Returns:
- The accession status.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getAccessionDate
Returns the accession date for the item, or null if this item is not accessioned.- Returns:
- The accession date for the item, or null if this item is not accessioned.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getCreationDate
Returns the date this item was first created, or null if not available.- Returns:
- The item creation date or null
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getContent
Returns the content of the item this record catalogs, or null if not available. For example the full HTML text of the Web page.- Returns:
- The content of the item, or null
-
getAssociatedMmdRecs
Returns the MmdRecs for records in other collections that catalog the same resource. Does not include myMmdRec.- Returns:
- The associated MmdRecs, null or empty if none
-
getAllMmdRecs
Returns the MmdRecs for all records associated with this resouce, including myMmdRec.- Returns:
- All MmdRecs for this resource, null or empty if none
-
getMyMmdRec
Returns the MmdRec for this record only.- Returns:
- The MmdRec for this record, or null
-
getContentType
Returns the content type of the item this record catalogs, or null if not available. For example "text/html" or "html".- Returns:
- The content type of the item, or null
-
getHasRelatedResource
Returns true if the item has one or more related resource, false otherwise.- Returns:
- True if the item has one or more related resource, false otherwise.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getRelatedResourceIds
Returns the IDs of related resources that are cataloged by ID, or null if none are present- Returns:
- Related resource IDs, or null if none are available
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getRelatedResourceUrls
Returns the URLs of related resources that are cataloged by URL, or null if none are present- Returns:
- Related resource URLs, or null if none are available
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
addFrameworkFields
protected abstract void addFrameworkFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc) throws Exception Adds fields to the index that are unique to the given framework.Example code:
protected void addFrameworkFields(Document newDoc, Document existingDoc) throws Exception {
String customContent = "Some content";
newDoc.add(new Field("mycustomefield", customContent));
}- Parameters:
newDoc- The newDocumentthat is being created for this resourceexistingDoc- An existingDocumentthat currently resides in the index for the given resource, or null if none was previously present- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getDocType
Returns a unique document type key for this kind of record, corresponding to the format type. For example "adn," "dlese_ims," or "dlese_anno". The string is parsed using the LuceneStandardAnalyzerso it must be lowercase and should not contain any stop words.- Specified by:
getDocTypein interfaceDocWriter- Specified by:
getDocTypein classFileIndexingServiceWriter- Returns:
- The docType String
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getReaderClass
Gets the fully qualified name of the concreteDocReaderclass that is used to read this type ofDocument, for example "org.dlese.dpc.index.reader.ItemDocReader".- Specified by:
getReaderClassin interfaceDocWriter- Specified by:
getReaderClassin classFileIndexingServiceWriter- Returns:
- The name of the
DocReader.
-
initItem
public abstract void initItem(File source, org.apache.lucene.document.Document existingDoc) throws Exception This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.- Parameters:
source- The source file being indexedexistingDoc- An existing Document that currently resides in the index for the given resource, or null if none was previously present- Throws:
Exception- If an error occured during set-up.
-
destroy
protected abstract void destroy()This method is called at the conclusion of processing and may be used for tear-down.- Specified by:
destroyin classFileIndexingServiceWriter
-
getValidationReport
Gets a report detailing any errors found in the validation of the data, or null if no error was found. This could be implemented by simply performing XML schema validation on the file, or can involve more customized validation of the data if necessary. This method is called after all other methods that access the data (XMLFileIndexingWriter.getTitle(),addFrameworkFields(Document, Document), etc.) so that data verification can be done during those calls, if needed.- Overrides:
getValidationReportin classFileIndexingServiceWriter- Returns:
- Null if no data validation errors were found, otherwise a String that details the nature of the error.
- Throws:
Exception- If error in performing the validation.
-
init
Initialize the subclasses and record data service data.- Specified by:
initin classXMLFileIndexingWriter- Parameters:
source- The source file being indexed.existingDoc- A Document that previously existed in the index for this item, if present- Throws:
Exception- Thrown if error reading the XML map
-
getMyAnnoResultDocs
Gets the annotations for this record, null or zero length if none available. Overrides method in XMLFileIndexingWriter because IDs need initializing.- Overrides:
getMyAnnoResultDocsin classXMLFileIndexingWriter- Returns:
- The myAnnoResultDocs value
- Throws:
Exception- If error
-
addFields
protected final void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) throws Exception Adds fields to the index that are common to all item-level documents. These include the title, description, id and url as well as collection, accession status, annotation references, and collection(s).- Specified by:
addFieldsin classXMLFileIndexingWriter- Parameters:
newDoc- The new Document that is being created for this resourceexistingDoc- An existing Document that currently resides in the index for the given resource, or null if none was previously presentsourceFile- The sourceFile that is being indexed.- Throws:
Exception- If an error occurs
-
getWhatsNewDate
Returns the date used to determine "What's new" in the library, which is the item's accession date.- Specified by:
getWhatsNewDatein classXMLFileIndexingWriter- Returns:
- The what's new date for the item
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getWhatsNewType
Returns 'itemnew' or 'itemannoinprogress' or 'itemannocomplete' whichever came most recelntly.- Specified by:
getWhatsNewTypein classXMLFileIndexingWriter- Returns:
- The string 'itemnew' or 'itemannoinprogress' or 'itemannocomplete'.
- Throws:
Exception- If error getting whats new type.
-