Package org.dlese.dpc.index
Class FileIndexingService
java.lang.Object
org.dlese.dpc.index.FileIndexingService
Indexes files into a
SimpleLuceneIndex and automatically updates the index
whenever changes to the files are made. This class uses a FileIndexingServiceWriter to create the Lucene Documents that are placed in the SimpleLuceneIndex. This class looks for changes made to items in a directory of files
and updates the index automatically by adding, updating or deleting items as appropriate. The frequency
for update checkes is configurable. There should be only one instance of this class for each SimpleLuceneIndex that is being populated with this class.- Author:
- John Weatherley
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intIndicates that indexing was aborted by requeststatic final intIndicates that indexing directory does not existstatic final intIndicates a read error on the directorystatic final intIndicates that indexing completed with a severe errorstatic final intIndicates that indexing completed successfully, but one or more item was indexed with errorsstatic final intIndicates that indexing completed normally -
Constructor Summary
ConstructorsConstructorDescriptionFileIndexingService(SimpleLuceneIndex index, long updateFrequency, boolean saveDeletes, String idFieldToRemove, String fileIndexingServiceDataDir, int maxNumFilesToIndex) Indexes files to the givenSimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency.FileIndexingService(SimpleLuceneIndex index, long updateFrequency, String fileIndexingServiceDataDir, int maxNumFilesToIndex) Indexes files to the givenSimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency. -
Method Summary
Modifier and TypeMethodDescriptionbooleanaddDirectory(File srcDir, Class documentWriterClass, HashMap documentWriterConfigAttributes, FileIndexingPlugin plugin, int indexingPriority) Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.booleanaddDirectory(String sourceFileDirectory, Class documentWriterClass, HashMap documentWriterConfigAttributes, FileIndexingPlugin plugin, int indexingPriority) Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.voidchangeUpdateFrequency(long updateFrequency) Changes the frequency of reindexing to the new value.booleandeleteDirectory(File srcDir) Deletes the files in the given directory from the index and removes it from the configuration.booleandeleteDirectory(String sourceFileDirectory) Deletes the files in the given directory from the index and removes it from the configuration.getAttribute(String key) Gets an attribute Object from this FileIndexingService.Gets a HashMap of all directories that are configured in this FileIndexingService, keyed by absolute path.static StringReturn a string for the current time and date, sutiable for display in log files and output to standout:Gets the last 10 indexing status messages.longGets the lastSyncTime attribute of the FileIndexingService objectintGets the numRecordsToAdd attribute of the FileIndexingService objectintGets the numRecordsToDelete attribute of the FileIndexingService objectintGets the numRecordsToReplace attribute of the FileIndexingService objectstatic StringReturn a string for the current time and date, sutiable for display in log files and output to standout:longGets the updateFrequency attribute of the FileIndexingService objectvoidindexFile(File fileToIndex, FileIndexingPlugin plugin) Indexes a single file.voidindexFiles(boolean reindexAll, File directory, FileIndexingObserver observer) Updates the index to reflect the files in the directory indicated, which must have been previously added to this FileIndexingService usingaddDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int).voidindexFiles(boolean reindexAll, FileIndexingObserver observer) Updates the index to reflect the files in the directories this service is monitoring, with the option to run the update in the background.booleanisDirectoryConfigured(File srcDir) Determines whether the given directory is configured for indexing.booleanDetermins whether indexing is in progress.intreindexDocs(String query, boolean reindexAll) Reindexes Documents managed by this FileIndexingService that match the given Lucene query.intreindexDocs(String field, String[] terms, boolean reindexAll) Re-indexes all documents that match the given terms within the given field.intreindexDocs(String field, String term, boolean reindexAll) Re-indexes all documents that match the given term within the given field.voidreindexDocs(org.apache.lucene.document.Document[] docs, boolean reindexAll) Reindexes the given Documents.voidreindexDocs(ResultDocList docs, boolean reindexAll) Reindexes the Documents in the given ResultDocs.final voidremoveDocs(String field, String[] terms, FileIndexingServiceWriter docWriter) Removes all documents that match the given terms within the given field.final voidremoveDocs(String field, String[] terms, FileIndexingServiceWriter docWriter, boolean saveDeletes) Removes all documents that match the given terms within the given field.final voidremoveDocs(String field, String term, FileIndexingServiceWriter docWriter) Removes all documents that match the given term within the given field.final voidremoveDocs(String field, String term, FileIndexingServiceWriter docWriter, boolean saveDeletedRecords) Removes all documents that match the given term within the given field.voidsetAttribute(String key, Object attribute) Sets an attribute Object that will be available for access here and from the FileIndexingServiceWriters.static voidsetDebug(boolean db) Sets the debug attribute objectvoidsetValidationEnabled(boolean validateFiles) Sets whether or not to validate the files being indexed and create a validation report, which is indexed.voidstartTester(String docRoot, String sourceFileDirectory) Starts a FileMoveTester iff one is not already initialized.voidstartTimerThread(long updateFrequency) Start or restarts the timer thread with the given update frequency.voidStops the indexing process if it is currently taking place.voidStops the FileMoveTestervoidStops the indexing timer thread.
-
Field Details
-
INDEXING_SUCCESS
public static final int INDEXING_SUCCESSIndicates that indexing completed normally- See Also:
-
INDEXING_ABORTED
public static final int INDEXING_ABORTEDIndicates that indexing was aborted by request- See Also:
-
INDEXING_ERROR
public static final int INDEXING_ERRORIndicates that indexing completed with a severe error- See Also:
-
INDEXING_ITEM_ERROR
public static final int INDEXING_ITEM_ERRORIndicates that indexing completed successfully, but one or more item was indexed with errors- See Also:
-
INDEXING_DIR_DOES_NOT_EXIST
public static final int INDEXING_DIR_DOES_NOT_EXISTIndicates that indexing directory does not exist- See Also:
-
INDEXING_DIR_READ_ERROR
public static final int INDEXING_DIR_READ_ERRORIndicates a read error on the directory- See Also:
-
-
Constructor Details
-
FileIndexingService
public FileIndexingService(SimpleLuceneIndex index, long updateFrequency, String fileIndexingServiceDataDir, int maxNumFilesToIndex) Indexes files to the givenSimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency. Validation of files is enabled by default.- Parameters:
index- TheSimpleLuceneIndexthat will be populated and updated with Documents created from filesupdateFrequency- The frequency by which files are checked for updates, in seconds. Zero or less indicates no updates should be performed.fileIndexingServiceDataDir- The directory where serialized data will be storedmaxNumFilesToIndex- Max number of files to index per iteration- See Also:
-
FileIndexingService
public FileIndexingService(SimpleLuceneIndex index, long updateFrequency, boolean saveDeletes, String idFieldToRemove, String fileIndexingServiceDataDir, int maxNumFilesToIndex) Indexes files to the givenSimpleLuceneIndex, checking for changes in the files and reindexing them at the given update frequency. Validation of files is enabled by default.- Parameters:
index- TheSimpleLuceneIndexthat will be populated and updated with Documents created from filesupdateFrequency- The frequency by which files are checked for updates, in seconds. Zero or less indicates no updates should be performed.saveDeletes- True to save removed documents in the index and mark them deleted, else they will be removed from the index.idFieldToRemove- An ID field whoes docs should be removed if found in duplicate.fileIndexingServiceDataDir- Dir where persistent data files will be storedmaxNumFilesToIndex- The number of files to index per iteration- See Also:
-
-
Method Details
-
setAttribute
Sets an attribute Object that will be available for access here and from the FileIndexingServiceWriters.- Parameters:
key- The key used to reference the attribute.attribute- Any Java Object.- See Also:
-
stopIndexing
public void stopIndexing()Stops the indexing process if it is currently taking place. This method may take a few seconds to complete. -
getAttribute
Gets an attribute Object from this FileIndexingService.- Parameters:
key- The key used to reference the attribute.- Returns:
- The Java Object that is stored under the given key or null if none exists.
- See Also:
-
changeUpdateFrequency
public void changeUpdateFrequency(long updateFrequency) Changes the frequency of reindexing to the new value. Same asstartTimerThread(long updateFrequency).- Parameters:
updateFrequency- The frequency by which files are checked for changes and reindexed.
-
startTimerThread
public void startTimerThread(long updateFrequency) Start or restarts the timer thread with the given update frequency. Same aschangeUpdateFrequency(long updateFrequency).- Parameters:
updateFrequency- The number of seconds between index updates.
-
stopTimerThread
public void stopTimerThread()Stops the indexing timer thread. -
setValidationEnabled
public void setValidationEnabled(boolean validateFiles) Sets whether or not to validate the files being indexed and create a validation report, which is indexed. If set to true, the files will be validated, otherwise they will not. Default is true.- Parameters:
validateFiles- True to validate, else false.- See Also:
-
addDirectory
public boolean addDirectory(String sourceFileDirectory, Class documentWriterClass, HashMap documentWriterConfigAttributes, FileIndexingPlugin plugin, int indexingPriority) Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.- Parameters:
sourceFileDirectory- The file direcory that will be monitored for updates.documentWriterClass- The feature to be added to the Directory attributedocumentWriterConfigAttributes- The feature to be added to the Directory attributeplugin- The feature to be added to the Directory attributeindexingPriority- The feature to be added to the Directory attribute- Returns:
- True if the directory was added successfully.
-
addDirectory
public boolean addDirectory(File srcDir, Class documentWriterClass, HashMap documentWriterConfigAttributes, FileIndexingPlugin plugin, int indexingPriority) Adds a directory of files to be monitored for changes, or replaces the current one if one exists with the same absolute path.- Parameters:
srcDir- The file direcory that will be monitored for updates.documentWriterClass- The feature to be added to the Directory attributedocumentWriterConfigAttributes- The feature to be added to the Directory attributeplugin- The feature to be added to the Directory attributeindexingPriority- The feature to be added to the Directory attribute- Returns:
- True if the directory was added successfully.
-
isIndexing
public boolean isIndexing()Determins whether indexing is in progress.- Returns:
- True if indexing is in progress, false if not
-
isDirectoryConfigured
Determines whether the given directory is configured for indexing.- Parameters:
srcDir- A directory of indexable files.- Returns:
- True if this directory is already configured for indexing, false otherwise.
-
getConfiguredDirectories
Gets a HashMap of all directories that are configured in this FileIndexingService, keyed by absolute path.- Returns:
- The configuredDirectories value
-
deleteDirectory
Deletes the files in the given directory from the index and removes it from the configuration. Assumes the directory was previously added to the index using theaddDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int)method.- Parameters:
sourceFileDirectory- The directory of files needing to be removed from the index.- Returns:
- True if the directory of files exsited in the index and was removed.
-
deleteDirectory
Deletes the files in the given directory from the index and removes it from the configuration. Assumes the directory was previously added to the index using theaddDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int)method.- Parameters:
srcDir- The directory of files needing to be removed from the index.- Returns:
- True if the directory of files exsited in the index and was removed.
-
getUpdateFrequency
public long getUpdateFrequency()Gets the updateFrequency attribute of the FileIndexingService object- Returns:
- The updateFrequency value
-
getLastSyncTime
public long getLastSyncTime()Gets the lastSyncTime attribute of the FileIndexingService object- Returns:
- The lastSyncTime value
-
getNumRecordsToDelete
public int getNumRecordsToDelete()Gets the numRecordsToDelete attribute of the FileIndexingService object- Returns:
- The numRecordsToDelete value
-
getNumRecordsToAdd
public int getNumRecordsToAdd()Gets the numRecordsToAdd attribute of the FileIndexingService object- Returns:
- The numRecordsToAdd value
-
getNumRecordsToReplace
public int getNumRecordsToReplace()Gets the numRecordsToReplace attribute of the FileIndexingService object- Returns:
- The numRecordsToReplace value
-
indexFiles
Updates the index to reflect the files in the directories this service is monitoring, with the option to run the update in the background. Any new, deleted or modified files that appear in the directories will be reflected in the index.- Parameters:
reindexAll- True to reindex all files regardless of file mod date, False to reindex only those files that have changed since the last indexing.observer- The FileIndexingObserver that will be notified when indexing is complete, or null to use none
-
indexFiles
Updates the index to reflect the files in the directory indicated, which must have been previously added to this FileIndexingService usingaddDirectory(java.lang.String, java.lang.Class, java.util.HashMap, org.dlese.dpc.index.writer.FileIndexingPlugin, int). Any new, deleted or modified files that appear in the directory will be reflected in the index.- Parameters:
reindexAll- True to reindex all files regardless of file mod date, False to reindex only those files that have changed since the last indexing.directory- The directory to index.observer- The FileIndexingObserver that will be notified when indexing is complete, or null to use none
-
indexFile
public void indexFile(File fileToIndex, FileIndexingPlugin plugin) throws FileIndexingServiceException Indexes a single file. The operaion is executed serially to completion.- Parameters:
fileToIndex- The file to index.plugin- A FileIndexingPlugin or null.- Throws:
FileIndexingServiceException- If unable to index
-
removeDocs
Removes all documents that match the given term within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in at the constructor. This is useful for removing a single document that is indexed with a unique ID field, or to remove a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs to be removed along with the ID field that it is indexed under, or the file path corresponding to a record along with the field "docsource." Note this is the same asSimpleLuceneIndex.removeDocs(String,String)but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.- Parameters:
field- The field that is searched.term- The term that is matched for removal.docWriter- The FileIndexingServiceWriter to use
-
removeDocs
Removes all documents that match the given terms within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in at the constructor. This is useful for removing multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be removed. Note this is the same asSimpleLuceneIndex.removeDocs(String,String[])but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.- Parameters:
field- The field that is searched.terms- The terms that are matched for removal.docWriter- The FileIndexingServiceWriter to use
-
removeDocs
public final void removeDocs(String field, String term, FileIndexingServiceWriter docWriter, boolean saveDeletedRecords) Removes all documents that match the given term within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in to this method. This is useful for removing a single document that is indexed with a unique ID field, or to remove a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs to be removed along with the ID field that it is indexed under, or the file path corresponding to a record along with the field "docsource." Note this is the same asSimpleLuceneIndex.removeDocs(String,String)but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.- Parameters:
field- The field that is searched.term- The term that is matched for removal.docWriter- The FileIndexingServiceWriter to usesaveDeletedRecords- True to save the removed documents in the index and mark them deleted, else they will be removed from the index.
-
removeDocs
public final void removeDocs(String field, String[] terms, FileIndexingServiceWriter docWriter, boolean saveDeletes) Removes all documents that match the given terms within the given field. Removed documents will either be saved in the index and marked as deleted (indicated by the Lucene field "deleted" being indexed as "true"), or removed from the index altogether as determined by the parameter passed in to this method. This is useful for removing multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be removed. Note this is the same asSimpleLuceneIndex.removeDocs(String,String[])but is synchronized with other operations occuring in this FileIndexinService and handles deletes accordingly.- Parameters:
field- The field that is searched.terms- The terms that are matched for removal.docWriter- Writer used to perform the delete.saveDeletes- True to save the removed documents in the index and mark them deleted, else they will be removed from the index.
-
reindexDocs
Re-indexes all documents that match the given term within the given field. Requires that the file for the given document is still in it's original location. If the file is not in it's original location then the index will remove the document without updating and it will not be marked as deleted. This is useful for updating a single document that is indexed with a unique ID field, or to update a group of documents mathcing the same term for a given field. For example you might pass in an ID of a record that needs updating along with the ID field that it is indexed under, or the file path corresponding to a record that needs updating along with the field "docsource."- Parameters:
field- The field that is searched.term- The term that is matched for updates.reindexAll- True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.- Returns:
- The number of matching documents to be updated.
-
reindexDocs
Re-indexes all documents that match the given terms within the given field. This is useful for updating multiple documents that are indexed with a unique ID field. For example you might pass in an array of IDs needing to be updated along with the ID field that it is indexed under, or an array of file paths corresponding to records that need updating along with the field "docsource."- Parameters:
field- The field that is searched.terms- The terms that are matched for updates.reindexAll- True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.- Returns:
- The number of matching documents to be updated.
-
reindexDocs
Reindexes Documents managed by this FileIndexingService that match the given Lucene query.- Parameters:
query- A Lucene search query.reindexAll- True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.- Returns:
- The number of matching documents to be updated.
-
reindexDocs
public void reindexDocs(org.apache.lucene.document.Document[] docs, boolean reindexAll) Reindexes the given Documents.- Parameters:
docs- Lucene Documents from the same index that is managed by this FileIndexingService.reindexAll- True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
-
reindexDocs
Reindexes the Documents in the given ResultDocs.- Parameters:
docs- Lucene ResultDocs from the same index that is managed by this FileIndexingService.reindexAll- True to reindex all matching results, false to reindex only those matches whoes files have been modified since the last update.
-
getIndexingMessages
Gets the last 10 indexing status messages.- Returns:
- The indexingMessages.
-
startTester
Starts a FileMoveTester iff one is not already initialized. The FileMoveTester simulate moving files in and out of the sourceFile directory, for testing purposes only. Warning: FileMoveTester moves metadatafiles. Only use with test records!)- Parameters:
docRoot- The context document root as obtainied by calling getServletContext().getRealPath("/");sourceFileDirectory- DESCRIPTION
-
stopTester
public void stopTester()Stops the FileMoveTester -
getSimpleDateStamp
Return a string for the current time and date, sutiable for display in log files and output to standout:- Returns:
- The dateStamp value
-
getDateStamp
Return a string for the current time and date, sutiable for display in log files and output to standout:- Returns:
- The dateStamp value
-
setDebug
public static void setDebug(boolean db) Sets the debug attribute object- Parameters:
db- The new debug value
-