tags:

views:

106

answers:

3

Hi,

I'm working on a web application which uses Lucene.net(version 2.0.0.4) for store search. Though my web application user can search for stores in the US which are located within 50 miles from a given location. I'm using a third party API to find all the cities within a radius.For a city say Edison,NJ, it gives me around 450 cities within 40 miles(API returns a .Net hashtable containing 450 cities). By iterating over this hashtable, am using BooleanQuery/Query classes to build lucene query.

In this scenario,i find that it is taking a lot of time to build,execute and return the search results through lucene. Is there any way I can optimize this code??

Thanks!

A: 

I think the key to this performing is to think about how you store your data and to have some metadata around it.

What do I mean by that?

Have a list of Cities that have a store in NJ, for example, and filter out cities that come back from your third party api based on your master list. You might find that you only have 5 matches out of the 450 returned. Similarly, I wouldn't combine 450 querys in one query - try and chunk them up into smaller amounts.

Also, if you can create indexes based on states you might find that a smaller index - specifically for NJ - can handle your query more efficiently than selecting a specific state's data in a larger index.

Hope this helps, Ciaran

+1  A: 

When you are building your index, map the cities to latitude and longitude coordinates. In the web app when you are doing a radius search, map the city searched to coordinates and do a range query (you'll need to convert the distance to whatver units your coordinates are in).

This is imperfect in that you will be searching a square instead of a circle, but you could write some code to filter results outside the original radius if you need to be precise.

KenE
A: 

KenE's answer is a good one, and you should google "lucene spatial search" for more info about that approach.

There's another way you can go, assuming the radius is always 40 miles: just reverse the process.

Have a field called nearyby_city. For every store in your index, add the list of cities that are in its 40-mile radius. Now, when you search for a store near Edison, NJ, simply add a nearby_city:"Edison, NJ" term to your query. Now only stores within 40 miles of that city will match your query.

itsadok