I am using Lucene (or more specifically Compass), to log threads in a forum and I need a way to extract the keywords behind the discussion. That said, I don't want to index every entry someone makes, but rather I'd have a list of 'keywords' that are relevant to a certain context and if the entry matches a keyword and is above a threshold I'd add these entries to the index.
I want to be able to use the power of an analyser to strip out things and do its magic, but then return the tokens from the analyser in order to match the keywords, and also count the number of occurrences certain words are being mentioned.
Is there a way to get the tokens from an analyser without having the overhead of indexing every entry made?
I was thinking I'd have to maintain a RAMDirectory to hold all entries, and then perform searches using my list of keywords, then merge the relevant Documents to the persistence manager to actually store the relevant entries.