java.lang.Object

org.dlese.dpc.index.writer.IndexingTools

public class IndexingTools extends Object

Tools to aid in indexing.

Author:: John Weatherley

Field Summary

Fields

Modifier and Type

Field

Description

static final String

adminDefaultFieldName

Admin default field 'admindefault'

static final String

defaultFieldName

Default field 'default'

static final String

PHRASE_SEPARATOR

String used to separate and preserve phrases indexed as text, includes leading and trailing white space.

static final String

stemsFieldName

Stems field 'stems'
Constructor Summary

Constructors

Constructor

Description

IndexingTools()
Method Summary

Modifier and Type

Method

Description

static final void

addToAdminDefaultField(org.apache.lucene.document.Document myDoc, String content)

Indexes the given text into the admin default field.

static final void

addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, String content)

Indexes the given text into the default and stems fields.

static final String

encodeToTerm(String text)

Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.

static final String

encodeToTerm(String text, boolean encodeWildCards)

Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.

static final String[]

extractSeparatePhrasesFromString(String separatedPhrases)

Extracts the phrases from a String that was created using the method makeSeparatePhrasesFromNodes(List nodes) or makeSeparatePhrasesFromStrings(List strings).

static final String[]

extractStringsFromString(String separatedWords)

Extracts the words from a String that was created using the method makeStringFromNodes(List nodes).

static final String[]

getAnalyzedTerms(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)

Extracts all terms in any field from a Lucene query using the given Analyzer.

static final org.apache.lucene.analysis.Token[]

getAnalyzedTokens(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)

Extracts all Tokens from a Lucene query using the given Analyzer.

static final StringBuffer

getAnalyzerOutput(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)

Creates a StringBuffer to display the tokens created by a given analyzer.

static final String

makeSeparatePhrasesFromNodes(List nodes)

Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided.

static final String

makeSeparatePhrasesFromStrings(String[] strings)

Creates a String separated by the phrase separator term from each of the Strings provided.

static final String

makeSeparatePhrasesFromStrings(List strings)

Creates a String separated by the phrase separator term from each of the Strings provided.

static final String

makeStringFromNodes(List nodes)

Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided.

static final String

tokenizeID(String ID)

Tokenizes a DLESE ID by replacing the char - with a blank space.

static final String

tokenizeURI(String uri)

Tokenizes a URI by replacing the unindexable chars with a blank space.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- defaultFieldName
  
  public static final String defaultFieldName
  
  Default field 'default'
  See Also:
  
  Constant Field Values
- stemsFieldName
  
  public static final String stemsFieldName
  
  Stems field 'stems'
  See Also:
  
  Constant Field Values
- adminDefaultFieldName
  
  public static final String adminDefaultFieldName
  
  Admin default field 'admindefault'
  See Also:
  
  Constant Field Values
- PHRASE_SEPARATOR
  
  public static final String PHRASE_SEPARATOR
  
  String used to separate and preserve phrases indexed as text, includes leading and trailing white space.
  See Also:
  
  Constant Field Values
Constructor Details
- IndexingTools
  
  public IndexingTools()
Method Details
- addToDefaultAndStemsFields
  
  public static final void addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, String content)
  
  Indexes the given text into the default and stems fields.
  
  Parameters:
  
  myDoc - Document to add to
  
  content - Content to add
- addToAdminDefaultField
  
  public static final void addToAdminDefaultField(org.apache.lucene.document.Document myDoc, String content)
  
  Indexes the given text into the admin default field.
  
  Parameters:
  
  myDoc - Document to add to
  
  content - Content to add
- makeSeparatePhrasesFromNodes
  
  public static final String makeSeparatePhrasesFromNodes(List nodes)
  
  Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.
  A call to this method might look like:
  String value = makeIndexPhrasesFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));
  
  Parameters:
  
  nodes - List of Elements or Attributes
  
  Returns:
  
  A String or null
- makeSeparatePhrasesFromStrings
  
  public static final String makeSeparatePhrasesFromStrings(List strings)
  
  Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.
  
  Parameters:
  
  strings - List of Strings or null
  
  Returns:
  
  A String or null
- makeSeparatePhrasesFromStrings
  
  public static final String makeSeparatePhrasesFromStrings(String[] strings)
  
  Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.
  
  Parameters:
  
  strings - Array of Strings or null
  
  Returns:
  
  A String or null
- extractSeparatePhrasesFromString
  
  public static final String[] extractSeparatePhrasesFromString(String separatedPhrases)
  
  Extracts the phrases from a String that was created using the method makeSeparatePhrasesFromNodes(List nodes) or makeSeparatePhrasesFromStrings(List strings).
  
  Parameters:
  
  separatedPhrases - String that contains the phrase separator to seperate phrases
  
  Returns:
  
  An array of phrase Strings or null if the imput is null
- makeStringFromNodes
  
  public static final String makeStringFromNodes(List nodes)
  
  Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.
  A call to this method might look like:
  String value = makeStringFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));
  
  Parameters:
  
  nodes - List of dom4j Nodes of Elements or Attributes
  
  Returns:
  
  A String or null
- extractStringsFromString
  
  public static final String[] extractStringsFromString(String separatedWords)
  
  Extracts the words from a String that was created using the method makeStringFromNodes(List nodes).
  
  Parameters:
  
  separatedWords - DESCRIPTION
  
  Returns:
  
  An array of word Strings
- tokenizeID
  
  public static final String tokenizeID(String ID)
  
  Tokenizes a DLESE ID by replacing the char - with a blank space.
  
  Parameters:
  
  ID - The ID String
  
  Returns:
  
  The tokenized ID
- tokenizeURI
  
  public static final String tokenizeURI(String uri)
  
  Tokenizes a URI by replacing the unindexable chars with a blank space.
  
  Parameters:
  
  uri - A URL or URI
  
  Returns:
  
  The tokenized URI
- encodeToTerm
  
  public static final String encodeToTerm(String text)
  
  Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.
  
  Parameters:
  
  text - Text
  
  Returns:
  
  Encoded text
- encodeToTerm
  
  public static final String encodeToTerm(String text, boolean encodeWildCards)
  
  Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.
  
  Parameters:
  
  text - Text
  
  encodeWildCards - True to encode the '*' wildcard char, false to leave unencoded.
  
  Returns:
  
  Encoded text
- getAnalyzedTokens
  
  public static final org.apache.lucene.analysis.Token[] getAnalyzedTokens(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
  
  Extracts all Tokens from a Lucene query using the given Analyzer.
  
  Parameters:
  
  textToParse - The text to analyze with the analyzer
  
  field - The field this Analyzer should interpret the text as, or null to use 'default'
  
  analyzer - The analyzer to use
  
  Returns:
  
  The Tokens generated by the analyzer
- getAnalyzedTerms
  
  public static final String[] getAnalyzedTerms(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
  
  Extracts all terms in any field from a Lucene query using the given Analyzer.
  
  Parameters:
  
  textToParse - The text to analyze with the analyzer
  
  field - The field this Analyzer should interpret the text as, or null to use 'default'
  
  analyzer - The analyzer to use
  
  Returns:
  
  The terms generated by the analyzer
- getAnalyzerOutput
  
  public static final StringBuffer getAnalyzerOutput(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
  
  Creates a StringBuffer to display the tokens created by a given analyzer. Output is of the form: [token1] [token2].
  
  Parameters:
  
  textToParse - The text to analyze with the analyzer
  
  field - The lucene field name, or null to use default
  
  analyzer - The analyzer to use
  
  Returns:
  
  The analyzerTokenOutput value

Class IndexingTools

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

defaultFieldName

stemsFieldName

adminDefaultFieldName

PHRASE_SEPARATOR

Constructor Details

IndexingTools

Method Details

addToDefaultAndStemsFields

addToAdminDefaultField

makeSeparatePhrasesFromNodes

makeSeparatePhrasesFromStrings

makeSeparatePhrasesFromStrings

extractSeparatePhrasesFromString

makeStringFromNodes

extractStringsFromString

tokenizeID

tokenizeURI

encodeToTerm

encodeToTerm

getAnalyzedTokens

getAnalyzedTerms

getAnalyzerOutput