Class IndexingTools

java.lang.Object
org.dlese.dpc.index.writer.IndexingTools

public class IndexingTools extends Object
Tools to aid in indexing.
Author:
John Weatherley
  • Field Details

    • defaultFieldName

      public static final String defaultFieldName
      Default field 'default'
      See Also:
    • stemsFieldName

      public static final String stemsFieldName
      Stems field 'stems'
      See Also:
    • adminDefaultFieldName

      public static final String adminDefaultFieldName
      Admin default field 'admindefault'
      See Also:
    • PHRASE_SEPARATOR

      public static final String PHRASE_SEPARATOR
      String used to separate and preserve phrases indexed as text, includes leading and trailing white space.
      See Also:
  • Constructor Details

    • IndexingTools

      public IndexingTools()
  • Method Details

    • addToDefaultAndStemsFields

      public static final void addToDefaultAndStemsFields(org.apache.lucene.document.Document myDoc, String content)
      Indexes the given text into the default and stems fields.
      Parameters:
      myDoc - Document to add to
      content - Content to add
    • addToAdminDefaultField

      public static final void addToAdminDefaultField(org.apache.lucene.document.Document myDoc, String content)
      Indexes the given text into the admin default field.
      Parameters:
      myDoc - Document to add to
      content - Content to add
    • makeSeparatePhrasesFromNodes

      public static final String makeSeparatePhrasesFromNodes(List nodes)
      Creates a String separated by the phrase separator term from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.

      A call to this method might look like:
      String value = makeIndexPhrasesFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));

      Parameters:
      nodes - List of Elements or Attributes
      Returns:
      A String or null
    • makeSeparatePhrasesFromStrings

      public static final String makeSeparatePhrasesFromStrings(List strings)
      Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.

      Parameters:
      strings - List of Strings or null
      Returns:
      A String or null
    • makeSeparatePhrasesFromStrings

      public static final String makeSeparatePhrasesFromStrings(String[] strings)
      Creates a String separated by the phrase separator term from each of the Strings provided. The input list may be null.

      Parameters:
      strings - Array of Strings or null
      Returns:
      A String or null
    • extractSeparatePhrasesFromString

      public static final String[] extractSeparatePhrasesFromString(String separatedPhrases)
      Extracts the phrases from a String that was created using the method makeSeparatePhrasesFromNodes(List nodes) or makeSeparatePhrasesFromStrings(List strings).
      Parameters:
      separatedPhrases - String that contains the phrase separator to seperate phrases
      Returns:
      An array of phrase Strings or null if the imput is null
    • makeStringFromNodes

      public static final String makeStringFromNodes(List nodes)
      Creates a String separated by spaces from the text of each of the Element or Attributes dom4j Nodes provided. The input list may be null.

      A call to this method might look like:
      String value = makeStringFromNodes(xmlDoc.selectNodes("/news-oppsRecord/topics/topic"));

      Parameters:
      nodes - List of dom4j Nodes of Elements or Attributes
      Returns:
      A String or null
    • extractStringsFromString

      public static final String[] extractStringsFromString(String separatedWords)
      Extracts the words from a String that was created using the method makeStringFromNodes(List nodes).
      Parameters:
      separatedWords - DESCRIPTION
      Returns:
      An array of word Strings
    • tokenizeID

      public static final String tokenizeID(String ID)
      Tokenizes a DLESE ID by replacing the char - with a blank space.
      Parameters:
      ID - The ID String
      Returns:
      The tokenized ID
    • tokenizeURI

      public static final String tokenizeURI(String uri)
      Tokenizes a URI by replacing the unindexable chars with a blank space.
      Parameters:
      uri - A URL or URI
      Returns:
      The tokenized URI
    • encodeToTerm

      public static final String encodeToTerm(String text)
      Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String)}.
      Parameters:
      text - Text
      Returns:
      Encoded text
    • encodeToTerm

      public static final String encodeToTerm(String text, boolean encodeWildCards)
      Same as {org.dlese.dpc.index.SimpleLuceneIndex#encodeToTerm(String,boolean)}.
      Parameters:
      text - Text
      encodeWildCards - True to encode the '*' wildcard char, false to leave unencoded.
      Returns:
      Encoded text
    • getAnalyzedTokens

      public static final org.apache.lucene.analysis.Token[] getAnalyzedTokens(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
      Extracts all Tokens from a Lucene query using the given Analyzer.
      Parameters:
      textToParse - The text to analyze with the analyzer
      field - The field this Analyzer should interpret the text as, or null to use 'default'
      analyzer - The analyzer to use
      Returns:
      The Tokens generated by the analyzer
    • getAnalyzedTerms

      public static final String[] getAnalyzedTerms(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
      Extracts all terms in any field from a Lucene query using the given Analyzer.
      Parameters:
      textToParse - The text to analyze with the analyzer
      field - The field this Analyzer should interpret the text as, or null to use 'default'
      analyzer - The analyzer to use
      Returns:
      The terms generated by the analyzer
    • getAnalyzerOutput

      public static final StringBuffer getAnalyzerOutput(String textToParse, String field, org.apache.lucene.analysis.Analyzer analyzer)
      Creates a StringBuffer to display the tokens created by a given analyzer. Output is of the form: [token1] [token2].
      Parameters:
      textToParse - The text to analyze with the analyzer
      field - The lucene field name, or null to use default
      analyzer - The analyzer to use
      Returns:
      The analyzerTokenOutput value