Package org.dlese.dpc.index.writer
Class IndexingTools
java.lang.Object
org.dlese.dpc.index.writer.IndexingTools
Tools to aid in indexing.
- Author:
- John Weatherley
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic final voidaddToAdminDefaultField(org.apache.lucene.document.Document myDoc, String content) Indexes the given text into the admin default field.static final voidaddToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, String content) Indexes the given text into the default and stems fields.static final StringencodeToTerm(String text) Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.static final StringencodeToTerm(String text, boolean encodeWildCards) Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.static final String[]extractSeparatePhrasesFromString(String separatedPhrases) Extracts the phrases from a String that was created using the methodmakeSeparatePhrasesFromNodes(List nodes)ormakeSeparatePhrasesFromStrings(List strings).static final String[]extractStringsFromString(String separatedWords) Extracts the words from a String that was created using the methodmakeStringFromNodes(List nodes).static final String[]getAnalyzedTerms(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer) Extracts all terms in any field from a Lucene query using the givenAnalyzer.static final org.apache.lucene.analysis.Token[]getAnalyzedTokens(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer) Extracts allTokens from a Lucene query using the givenAnalyzer.static final StringBuffergetAnalyzerOutput(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer) Creates a StringBuffer to display the tokens created by a given analyzer.static final StringmakeSeparatePhrasesFromNodes(List nodes) Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided.static final StringmakeSeparatePhrasesFromStrings(String[] strings) Creates a String separated by the phrase separator term from each of the Strings provided.static final StringmakeSeparatePhrasesFromStrings(List strings) Creates a String separated by the phrase separator term from each of the Strings provided.static final StringmakeStringFromNodes(List nodes) Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided.static final StringtokenizeID(String ID) Tokenizes a DLESE ID by replacing the char - with a blank space.static final StringtokenizeURI(String uri) Tokenizes a URI by replacing the unindexable chars with a blank space.
-
Field Details
-
defaultFieldName
Default field 'default'- See Also:
-
stemsFieldName
Stems field 'stems'- See Also:
-
adminDefaultFieldName
Admin default field 'admindefault'- See Also:
-
PHRASE_SEPARATOR
String used to separate and preserve phrases indexed as text, includes leading and trailing white space.- See Also:
-
-
Constructor Details
-
IndexingTools
public IndexingTools()
-
-
Method Details
-
addToDefaultAndStemsFields
public static final void addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, String content) Indexes the given text into the default and stems fields.- Parameters:
myDoc- Document to add tocontent- Content to add
-
addToAdminDefaultField
public static final void addToAdminDefaultField(org.apache.lucene.document.Document myDoc, String content) Indexes the given text into the admin default field.- Parameters:
myDoc- Document to add tocontent- Content to add
-
makeSeparatePhrasesFromNodes
Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.A call to this method might look like:
String value = makeIndexPhrasesFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));- Parameters:
nodes- List of Elements or Attributes- Returns:
- A String or null
-
makeSeparatePhrasesFromStrings
Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.- Parameters:
strings- List of Strings or null- Returns:
- A String or null
-
makeSeparatePhrasesFromStrings
Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.- Parameters:
strings- Array of Strings or null- Returns:
- A String or null
-
extractSeparatePhrasesFromString
Extracts the phrases from a String that was created using the methodmakeSeparatePhrasesFromNodes(List nodes)ormakeSeparatePhrasesFromStrings(List strings).- Parameters:
separatedPhrases- String that contains the phrase separator to seperate phrases- Returns:
- An array of phrase Strings or null if the imput is null
-
makeStringFromNodes
Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.A call to this method might look like:
String value = makeStringFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));- Parameters:
nodes- List of dom4j Nodes of Elements or Attributes- Returns:
- A String or null
-
extractStringsFromString
Extracts the words from a String that was created using the methodmakeStringFromNodes(List nodes).- Parameters:
separatedWords- DESCRIPTION- Returns:
- An array of word Strings
-
tokenizeID
Tokenizes a DLESE ID by replacing the char - with a blank space.- Parameters:
ID- The ID String- Returns:
- The tokenized ID
-
tokenizeURI
Tokenizes a URI by replacing the unindexable chars with a blank space.- Parameters:
uri- A URL or URI- Returns:
- The tokenized URI
-
encodeToTerm
Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.- Parameters:
text- Text- Returns:
- Encoded text
-
encodeToTerm
Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.- Parameters:
text- TextencodeWildCards- True to encode the '*' wildcard char, false to leave unencoded.- Returns:
- Encoded text
-
getAnalyzedTokens
public static final org.apache.lucene.analysis.Token[] getAnalyzedTokens(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer) Extracts allTokens from a Lucene query using the givenAnalyzer.- Parameters:
textToParse- The text to analyze with the analyzerfield- The field this Analyzer should interpret the text as, or null to use 'default'analyzer- The analyzer to use- Returns:
- The Tokens generated by the analyzer
-
getAnalyzedTerms
public static final String[] getAnalyzedTerms(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer) Extracts all terms in any field from a Lucene query using the givenAnalyzer.- Parameters:
textToParse- The text to analyze with the analyzerfield- The field this Analyzer should interpret the text as, or null to use 'default'analyzer- The analyzer to use- Returns:
- The terms generated by the analyzer
-
getAnalyzerOutput
public static final StringBuffer getAnalyzerOutput(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer) Creates a StringBuffer to display the tokens created by a given analyzer. Output is of the form: [token1] [token2].- Parameters:
textToParse- The text to analyze with the analyzerfield- The lucene field name, or null to use defaultanalyzer- The analyzer to use- Returns:
- The analyzerTokenOutput value
-