tags:

views:

223

answers:

3

I want to provide for partial matching, so I am tacking on * to the end of search queries. What I've noticed is that a search query of gatorade will return 12 results whereas gatorade* returns 7. So * seems to be 1 or many as opposed to 0 or many ... how can I achieve this? Am I going about partial matching in Solr all wrong? Thanks.

+2  A: 

First, I think Solr wildcards are better summarized by "0 or many" than "1 or many". I doubt that's the source of your problem. (For example, see the javadocs for WildcardQuery.)

Second, are you using stemming, because my first guess is that you're dealing with a stemming issue. Solr wildcards can behave kind of oddly with stemming. This is because wildcard expansion is based by searching through the list of terms stored in the inverted index; these terms are going to be in stemmed form (perhaps something like "gatorad"), rather than the words from the original source text (perhaps "gatorade" or "gatorades").

For example, suppose you have a stemmer that maps both "gatorade" and "gatorades" to the stem "gatorad". This means your inverted index will not contain either "gatorade" or "gatorades", only "gatorad". If you then issue the query gatorade*, Solr will walk the term index looking for all the stems beginning with "gatorade". But there are no such stems, so you won't get any matches. Similarly, if you searched gatorades*, Solr will look for all stems beginning with "gatorades". But there are no such stems, so you won't get any matches.

Third, for optimal help, I'd suggest posting some more information, in particular:

  • Some particular query URLs you are submitting to Solr
  • An excerpt from your schema.xml file. In particular, include A) the field elements for the fields you are having trouble with, and B) the field type definitions corresponding to those fields
Chris
+1 need more info from the OP to correctly diagnose this.
Mauricio Scheffer
A: 

so what I was looking for is to make the search term for 'gatorade' -> 'gatorade OR gatorade*' which will give me all the matches i'm looking for.

agentile
I suspect this approach is not adequate to give expected results in all cases. See my revised answer for some potential insight into why. (It depends on the details of your stemmer, but if you were using my hypothetical stemmer, your revised query would indeed find all the docs that contain "gatorade", but it would miss all the docs that A] contain "gatorades" but B] don't contain "gatorade".)
Chris
A: 

If you want a query to return all documents that match either a stemmed form of gatorade or words that begin with gatorade, you'll need to construct the query yourself: +(gatorade gatorade*). You could alternatively extend the SolrParser to do this, but that's more work.

Dallan Quass