ansaurus

Question

Answer 1

+2 A:

StandardAnalyzer strips out the special characters during indexing. You can pass in a list of explicit stopwords (excluding the ones you want in).

Mikos 2010-04-29 01:13:55

Should I consider another Analyzer to use to achieve my goal?What about switching between Tokenized to Un_Tokenized when storing fields with special characters?

Brandon 2010-04-29 02:03:59

well if you don't tokenize the field you cannot "search" on it. You have a couple of choices write your own analyzer (is very simple) or pass the list of stop words to StandardAnalyzer.something like:Hashtable htStopwords = new Hashtable();Analyzer analyzer = new StandardAnalyzer(htStopwords);

Mikos 2010-04-29 02:25:19

you can also look at StopAnalyzer or SimpleAnalyzer...they might help. The problem is that you could end up having a lot of noise words. But if that is not an issue....

Mikos 2010-04-29 02:28:02

Answer 2

A:

While index, you have tokenized the field. So, your input String creates two tokens "test" and "test". For search, you are constructing query by hand ie using TermQuery instead of QueryParser, which would have tokenized the field.

For the entire match, you need to index field UN_TOKENIZED. Here, the input string is taken as a single token. The single token created "Test (Test)." In that case, your current search code will work. You have to watch the case of input string carefully to make sure if you are indexing lower case text, you have to do the same while searching.

It is generally good practice to use same analyzer during indexing and searching. You can use KeywordAnalyer to generate single token from the input string.

Shashikant Kore 2010-04-29 02:54:58

ansaurus

tags:

views:

answers:

Lucene and Special Characters

related questions