Package org.dlese.dpc.index.writer
Class SimpleXMLFileIndexingWriter
java.lang.Object
org.dlese.dpc.index.writer.FileIndexingServiceWriter
org.dlese.dpc.index.writer.XMLFileIndexingWriter
org.dlese.dpc.index.writer.SimpleXMLFileIndexingWriter
- All Implemented Interfaces:
DocWriter
This is the default writer for generic XML formats. Creates a Lucene
Document from any valid XML file by stripping the XML tags to extract and
index the content. The full content of all Elements and Attributes is indexed in the default and
admindefault fields and is stemmed and indexed in the stems field. The reader for this type of Document is
XMLDocReader.- Author:
- John Weatherley
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor for the SimpleXMLFileIndexingWriter object -
Method Summary
Modifier and TypeMethodDescriptionprotected String[]_getIds()Returns null to handle by super.protected voidaddFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) Nothing to do here.protected voiddestroy()Does nothing.Gets the description attribute of the SimpleXMLFileIndexingWriter objectGets the xml format for this document, for example "oai_dc," "adn," "dlese_ims," or "dlese_anno".Gets the name of the concreteDocReaderclass that is used to read this type ofDocument, which is "org.dlese.dpc.index.reader.XMLDocReader".getTitle()Gets the title attribute of the SimpleXMLFileIndexingWriter objectString[]getUrls()Gets the urls attribute of the SimpleXMLFileIndexingWriter objectprotected StringGets a report detailing any errors found in the validation of the data, or null if no error was found.protected DateReturns the date used to determine "What's new" in the library, which is null (unknown).protected StringReturns null (unknown).booleanPlace the entire XML content into the default and stems search field.voidThis method is called prior to processing and may be used to for any necessary set-up.Methods inherited from class org.dlese.dpc.index.writer.XMLFileIndexingWriter
addCustomFields, getBoundingBox, getCollections, getDeletedDoc, getDocGroup, getDom4jDoc, getFieldContent, getFieldContent, getFieldName, getIds, getIndex, getMyAnnoResultDocs, getMyCollectionDoc, getOaiModtime, getPrimaryId, getRecordDataService, getRelatedIds, getRelatedIdsMap, getRelatedUrls, getRelatedUrlsMap, getTermStringFromStringArray, getXmlIndexer, getXmlIndexerFieldsConfigMethods inherited from class org.dlese.dpc.index.writer.FileIndexingServiceWriter
abortIndexing, addDocToRemove, addToAdminDefaultField, addToDefaultField, create, getConfigAttributes, getDocsource, getFileContent, getFileIndexingPlugin, getFileIndexingService, getLuceneDoc, getPreviousRecordDoc, getSessionAttributes, getSourceDir, getSourceFile, isMakingDeletedDoc, isValidationEnabled, prtln, prtlnErr, setConfigAttributes, setDebug, setFileIndexingPlugin, setFileIndexingService, setIsMakingDeletedDoc, setValidationEnabled
-
Constructor Details
-
SimpleXMLFileIndexingWriter
public SimpleXMLFileIndexingWriter()Constructor for the SimpleXMLFileIndexingWriter object
-
-
Method Details
-
getDocType
Gets the xml format for this document, for example "oai_dc," "adn," "dlese_ims," or "dlese_anno".- Specified by:
getDocTypein interfaceDocWriter- Specified by:
getDocTypein classFileIndexingServiceWriter- Returns:
- The docType value
- Throws:
Exception- If errlr.
-
getReaderClass
Gets the name of the concreteDocReaderclass that is used to read this type ofDocument, which is "org.dlese.dpc.index.reader.XMLDocReader".- Specified by:
getReaderClassin interfaceDocWriter- Specified by:
getReaderClassin classFileIndexingServiceWriter- Returns:
- The STring "org.dlese.dpc.index.reader.XMLDocReader".
-
init
This method is called prior to processing and may be used to for any necessary set-up. This method should throw and exception with appropriate message if an error occurs.- Specified by:
initin classXMLFileIndexingWriter- Parameters:
sourceFile- The sourceFile being indexed.existingDoc- An existing Document that exists for this in the index.- Throws:
Exception- If error
-
getWhatsNewDate
Returns the date used to determine "What's new" in the library, which is null (unknown).- Specified by:
getWhatsNewDatein classXMLFileIndexingWriter- Returns:
- The what's new date for the item
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
getWhatsNewType
Returns null (unknown).- Specified by:
getWhatsNewTypein classXMLFileIndexingWriter- Returns:
- null.
- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-
destroy
protected void destroy()Does nothing.- Specified by:
destroyin classFileIndexingServiceWriter
-
getValidationReport
Gets a report detailing any errors found in the validation of the data, or null if no error was found. This method performs schema validation over the XML.- Overrides:
getValidationReportin classFileIndexingServiceWriter- Returns:
- Null if no data validation errors were found, otherwise a String that details the nature of the error.
- Throws:
Exception- If error in performing the validation.
-
_getIds
Returns null to handle by super.- Specified by:
_getIdsin classXMLFileIndexingWriter- Returns:
- Null
-
getUrls
Gets the urls attribute of the SimpleXMLFileIndexingWriter object- Specified by:
getUrlsin classXMLFileIndexingWriter- Returns:
- The urls value
-
getDescription
Gets the description attribute of the SimpleXMLFileIndexingWriter object- Specified by:
getDescriptionin classXMLFileIndexingWriter- Returns:
- The description value
-
getTitle
Gets the title attribute of the SimpleXMLFileIndexingWriter object- Specified by:
getTitlein classXMLFileIndexingWriter- Returns:
- The title value
-
indexFullContentInDefaultAndStems
public boolean indexFullContentInDefaultAndStems()Place the entire XML content into the default and stems search field.- Specified by:
indexFullContentInDefaultAndStemsin classXMLFileIndexingWriter- Returns:
- True
-
addFields
protected void addFields(org.apache.lucene.document.Document newDoc, org.apache.lucene.document.Document existingDoc, File sourceFile) throws Exception Nothing to do here. All functionality handled by super.- Specified by:
addFieldsin classXMLFileIndexingWriter- Parameters:
newDoc- The newDocumentthat is being created for this resourceexistingDoc- An existingDocumentthat currently resides in the index for the given resource, or null if none was previously presentsourceFile- The feature to be added to the CustomFields attribute- Throws:
Exception- This method should throw and Exception with appropriate error message if an error occurs.
-