views:

227

answers:

1

I'm building a Java Lucene-based search system that, on addition, adds a certain number of meta-fields, one of which is a sourceId field, which denotes where the entry came from.

I'm now trying to retrieve all documents from a particular source, but the index doesn't appear to be able to find them. However, if I search for a wildcard value, the returned documents all have the correct value for this field.

The lucene query I'm using is quite simple, basically index-source-id:1 but that fails to return any hits, if I search for content:a* I get dozens of documents, all of which, when asked, return the value 1 for the index-source-id value, which is correct.

Any ideas?

+1  A: 

I have only worked with the PHP port, however, have you checked what text analyzer you are using? This FAQ seems to indicate that like the PHP version, you need to use a diffrent one that doesn't remove digits.
You can find a list of analyzers here

Just to be sure, you have set the id to be indexable?

Yacoby
I have set the ID to be indexable, yup.I was looking for a list of Analyzers, but couldn't find one that said it particularly dealt with numbers, it appears StandardAnalyzer does, which I thought had been deprecated, so perhaps that might help.
Martin
Rebuilding the index, and then searching, with StandardAnalyzer instead of SimpleAnalyzer did the trick!
Martin
For future reference, you do not want to analyze (nor tokenize) id fields since they're supposed to be atomic by nature and as Einstein showed us with his buddies in Manhattan Project, splitting atoms isn't a good thing to do...
Esko