tags:

views:

93

answers:

1

I'm writing a search feature for a database of NFL players.

The user enters a search string like "Jason Campbell" or "Campbell" or "Jason".

I'm having trouble getting the appropriate results.

Which Analyzer should I use when indexing? Which Query when querying? Should I distinguish between first name and last name or just index the full name string?

I'd like the following behavior:

Query: "Jason Campbell" -> Result: exact match for 1 player, Jason Campbell

Query: "Campbell" -> Result: all players with Campbell in their name

Query: "Jason" -> Result: all players with Jason in their name

Query: "Cambel" [misspelled] -> Result: all players with Campbell in their name

+1  A: 

StandardAnalyzer should work fine for all above queries. Your first query should be enclosed in double-quotes for an exact match, your last query would require a fuzzy query. For example you could set Cambell~0.5 and you could get Campbell as match(with the numeric value after the tilde indicating the fuzziness).

BTW I would suggest using Solr which provides features for spell-check and auto-suggest so you wouldn't have to reinvent the wheel. This is similar to Google's "did you mean..."

Mikos
Which Query implementation would you use? I'm having a tough time getting TermQuery to match an exact phrase.(You can programmatically set FuzzyQuery's fuzziness factory, no need for tilde notation.)
As the term suggests (no pun intended) a TermQuery is a for a term, you should look at a query based on the case. If you want to match "John Smith" *exactly*, then use PhraseQuery. If you want to Johnson Smith" when the user types John Smith you should look at FuzzyQuery
Mikos
i basically ended up using something like you suggested. thanks.first try an exact match using either TermQuery or PhraseQuery depending on how many terms are in the query. Same approach for fuzzy query because it takes single terms as its input so you need to build up phrases using BooleanQuery.Thanks, this helped.