tags:

views:

60

answers:

2

Hello,

I'm trying to search for partial beginning matches on a big list of lastnames. So Wein* should find Weinberg, Weinkamm etc.

I could do this by creating a special field, and adding

<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" preserveOriginal="1"/>

to its type specification in schema.xml. When I add the line above only to the indexing analyzer and leave it empty for the query analyzer, I can then search by just search special_field:Wein and get the expected results.

Now I see that solr also has a *-syntax. What's the connection between EdgeNGramFilterFactory and the *-syntax?

Am I doing things correctly or is there a better, more regular way?

Thanks!

+1  A: 

I don't recommend the Wein* query. That is implemented internally as PrefixQuery, which rewrites the original query to include all terms that have prefix equals "Wein". Depending on how large is your index (I mean how many terms), this query rewriting can be a bottleneck.

The EdgeNGramFilter at index time is a better approach. This solution will use more space, but queries will be processed much faster.

Rodes
Thanks. I don't expect many query terms, so I went with the wildcard syntax and am quite happy with it.
CruftyCraft
I refer to the number of index terms, not query terms. Solr/Lucene does a linear search over all indexed terms to select a subset of terms that has the same prefix. Once the subset is selected, the query is expanded to include those terms. So the bottleneck is the linear search. Then, I still recommend the EdgeNGramFilter, unless you say that you have few terms in your index.
Rodes
I understand. I might do same changes in this direction once we hit a performance limit. Thanks.
CruftyCraft
A: 

Note: I also asked this question in the Lucene forum where I got a good answer: http://lucene.472066.n3.nabble.com/How-to-do-partial-beginning-matches-td781147.html

CruftyCraft