Package org.dlese.dpc.index.writer.xml
Class XMLIndexer
java.lang.Object
org.dlese.dpc.index.writer.xml.XMLIndexer
Adds index fields to a Lucene
Document from any well-formed XML. Individual
field names are derived from the xPath to each element and attribute in the XML instance document. Fields
are encoded to support text, keyword and stemmed search. Also creates standard fields for IDs, URLs, title,
description and geospatial bounding box footprint. The 'default' and 'stems' fields are also indexed as text and stemmed text, respectively.
A XMLIndexerFieldsConfig may be supplied to configure specific search fields for given XML
formats. If a field is defined in the XMLIndexerFieldsConfig, and content is avialable at the given xPath,
it will override the value set for ids, urls,
title or description. In addition, field values configured by schema override those configured by xmlFormat.
- Author:
- John Weatherley
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionXMLIndexer(String xmlString, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) Constructor for the XMLIndexer objectXMLIndexer(URL urlToXml, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) Constructor for the XMLIndexer objectXMLIndexer(org.dom4j.Document localizedXmlDocument, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) Constructor for the XMLIndexer object -
Method Summary
Modifier and TypeMethodDescriptionReturns the value of boundingBox.Returns the value of description.Gets the full content of each Attribute in the XML.Gets the full content of each Element in the XML.String[]getIds()Returns the value of ids.String[]Returns unique IDs for the item being indexed encoded for indexing.Gets the ids of related records.Gets the ids of related records.Gets the urls of related records.Gets the urls of related records.getTitle()Returns the value of title.String[]getUrls()Returns the value of urls.org.dom4j.DocumentGets the localized Dom4j Document for this XML instance.Returns the value of xPathFieldsPrefix, or null if none.voidindexFields(org.apache.lucene.document.Document luceneDoc) Indexes the contents of the XML, adding fields to the Lucene Document that is supplied.booleanindexJavaBeanFields(org.apache.lucene.document.Document luceneDoc) Indexes Java Bean XML that was encoded with the java.beans.XMLEncoder class, using the bean properties as field names.voidindexXpathFields(org.apache.lucene.document.Document luceneDoc) Indexes the content of each element and attribute in the source XML as individual search fields, using the xPath to the element or attribute as the field name.voidsetBoundingBox(BoundingBox boundingBox) Sets the value of boundingBox.voidsetDescription(String description) Sets the value of description.voidSets the value of ids.voidsetIndexDefaultAndStemsField(boolean indexDefaultAndStemsField) Sets whether to index the default, admindefault, and stems field for this record.voidSets the value of title.voidSets the value of urls.voidsetXPathFieldsPrefix(String xPathFieldsPrefix) Sets the value of xPathFieldsPrefix, which is appended at the front of the xPath fields when indexed.
-
Constructor Details
-
XMLIndexer
public XMLIndexer(org.dom4j.Document localizedXmlDocument, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) Constructor for the XMLIndexer object- Parameters:
localizedXmlDocument- A localized XML DocumentxmlFormat- The XML format being indexed, for example adn or oai_dcxmlIndexerFieldsConfig- The config, or null if not used
-
XMLIndexer
public XMLIndexer(String xmlString, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) throws Exception Constructor for the XMLIndexer object- Parameters:
xmlString- A valid XML stringxmlFormat- The XML format being indexed, for example adn or oai_dcxmlIndexerFieldsConfig- The config, or null if not used- Throws:
Exception- If error
-
XMLIndexer
public XMLIndexer(URL urlToXml, String xmlFormat, XMLIndexerFieldsConfig xmlIndexerFieldsConfig) throws Exception Constructor for the XMLIndexer object- Parameters:
urlToXml- URL to an XML documentxmlFormat- The XML format being indexed, for example adn or oai_dcxmlIndexerFieldsConfig- The config, or null if not used- Throws:
Exception- If error
-
-
Method Details
-
setIndexDefaultAndStemsField
public void setIndexDefaultAndStemsField(boolean indexDefaultAndStemsField) throws IllegalStateException Sets whether to index the default, admindefault, and stems field for this record.- Parameters:
indexDefaultAndStemsField- The value to assign indexDefaultAndStemsField.- Throws:
IllegalStateException- If called after method #indexFields has been called
-
getTitle
Returns the value of title.- Returns:
- The title value
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
setTitle
Sets the value of title.- Parameters:
title- The value to assign title.- Throws:
IllegalStateException- If called after method #indexFields has been called
-
getDescription
Returns the value of description.- Returns:
- The description value
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
setDescription
Sets the value of description.- Parameters:
description- The value to assign description.- Throws:
IllegalStateException- If called after method #indexFields has been called
-
getUrls
Returns the value of urls.- Returns:
- The urls value
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
setUrls
Sets the value of urls.- Parameters:
urls- The value to assign urls.- Throws:
IllegalStateException- If called after method #indexFields has been called
-
getIds
Returns the value of ids.- Returns:
- The ids value
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
setIds
Sets the value of ids.- Parameters:
ids- The value to assign ids.- Throws:
IllegalStateException- If called after method #indexFields has been called
-
getIdsEncoded
Returns unique IDs for the item being indexed encoded for indexing. If more than one ID is present, the first one is the primary.- Returns:
- The id Strings encoded for indexing
- Throws:
IllegalStateException- If called prior to calling method #indexFields- See Also:
-
getRelatedIds
Gets the ids of related records.- Returns:
- The related ids
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
getRelatedUrls
Gets the urls of related records.- Returns:
- The related urls
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
getRelatedIdsMap
Gets the ids of related records. The Map key contains the relationship (isAnnotatedBy, etc.) and the Map value contains a List of Strings that indicate the ids of the target records.- Returns:
- The related ids
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
getRelatedUrlsMap
Gets the urls of related records. The Map key contains the relationship (isAnnotatedBy, etc.) and the Map value contains a List of Strings that indicate the urls of the target records.- Returns:
- The related urls
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
getXPathFieldsPrefix
Returns the value of xPathFieldsPrefix, or null if none. -
setXPathFieldsPrefix
Sets the value of xPathFieldsPrefix, which is appended at the front of the xPath fields when indexed. Set to null to use none (default).- Parameters:
xPathFieldsPrefix- The value to append to the xPath fields, or null for none- Throws:
IllegalStateException
-
getBoundingBox
Returns the value of boundingBox. -
setBoundingBox
Sets the value of boundingBox.- Parameters:
boundingBox- The value to assign boundingBox.
-
getFullXmlElementContent
Gets the full content of each Element in the XML. Attribute content is not included. If this is a Java Bean, gets the contnet of all Bean properties. Method #indexFields must be called prior to using this method.- Returns:
- The full Element content
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
getFullXmlAttributeContent
Gets the full content of each Attribute in the XML. Element content is not included. Method #indexFields must be called prior to using this method.- Returns:
- The full Attribute content
- Throws:
IllegalStateException- If called prior to calling method #indexFields
-
getXmlDocument
public org.dom4j.Document getXmlDocument()Gets the localized Dom4j Document for this XML instance.- Returns:
- The xml Document
-
indexFields
Indexes the contents of the XML, adding fields to the Lucene Document that is supplied.- Parameters:
luceneDoc- TheDocumentto add fields to- Throws:
Exception- If error, provides an appropriate message to display in indexing reports.
-
indexXpathFields
Indexes the content of each element and attribute in the source XML as individual search fields, using the xPath to the element or attribute as the field name. If an xPath field prefix has been indicated it will be inserted at the beginning of the field path.- Parameters:
luceneDoc- TheDocumentto add fields to- Throws:
Exception- If error, provides an appropriate message to display in indexing reports.- See Also:
-
indexJavaBeanFields
Indexes Java Bean XML that was encoded with the java.beans.XMLEncoder class, using the bean properties as field names. If this is not Java Bean encoded XML, nothing is done, returns false.- Parameters:
luceneDoc- TheDocumentto add fields to- Returns:
- True if this is a Java Bean and property fields were indexed.
- Throws:
Exception- If error, provides an appropriate message to display in indexing reports.
-