views:

325

answers:

2

When I use Luke to search my Lucene index using a standard analyzer, I can see the field I am searchng for contains values of the form MY_VALUE. When I search for field:"MY_VALUE" however, the query is parsed as field:"my value"

Is there a simple way to escape the underscore (_) character so that it will search for it?

EDIT:

4/1/2010 11:08AM PST

I think there is a bug in the tokenizer for Lucene 2.9.1 and it was probably there before. Load up Luke and try to search for "BB_HHH_FFFF5_SSSS", when there is a number, the following tokens are returned:

"bb hhh_ffff5_ssss"

After some testing, I've found that this is because of the number. If I input

"BB_HHH_FFFF_SSSS", I get

"bb hhh ffff ssss"

At this point, I'm leaning towards a tokenizer bug unless the presence of the number is supposed to have this behavior but I fail to see why.

Can anyone confirm this?

+1  A: 

It doesn't look like you used the StandardAnalyzer to index that field. In Luke you'll need to select the analyzer that you used to index that field in order to match MY_VALUE correctly.

Incidentally, you might be able to match MY_VALUE by using the KeywordAnalyzer.

bajafresh4life
No, I did use the standard analyzer as the indexer which is why this is weird.
Matt
If you indexed using the Standard Analyzer then your index will contain "my" and "value" as two different tokens. Try searching for "my value" (including the quotes) and you might get results.
Thomas
I would double-check which analyzer you're using for indexing. If you've used the StandardAnalyzer for indexing, it's impossible to have MY_VALUE as a term, since StandardAnalyzer always splits on underscores.
bajafresh4life
+1  A: 

I don't think you'll be able to use the standard analyser for this use case.

Judging what I think your requirements are, the keyword analyser should work fine for little effort (the whole field becomes a single term).

I think some of the confusion arises when looking at the field with luke. The stored value is not what's used by queries, what you need are the terms. I suspect that when you look at the terms stored for your field, they'll be "my" and "value".

Hope this helps,

Moleski