views:

764

answers:

3

Hi,

I'm using Lucene for a job search portal using .net. Am facing some performance related issues in the following use case. Use case is: When doing job search, user can select job location(for exameple:Atlanta,GA) and select radial distance (say 50 miles).The time required to return job search results from Lucene is pretty high.

FYI,we are maintaining a sql server 2005 database where we store US and Canada based city,state,longitude and latitude.(contains a total of about 1 million records).

Is there anyway I can improve the performace of this location based job search?

+1  A: 

Basically, you have two types of search parameters: textual and spatial. You can probably use one type to filter the results you got from the other. For example, for someone looking for a .NET developer job near Atlanta, GA you could either first retrieve all the .NET developer jobs and filter for location, or retrieve all jobs around Atlanta and filter for .NET developer ones. I believe the first should be faster. You can also store the job locations directly in Lucene, and incorporate them in the search. A rough draft is: Indexing: 1. When you receive a new 'wanted' ad, find its geo-location using the database. 2. Store the location as a Lucene field in the ad's document. Retrieval: 1. Retrieve all jobs according to textual matches. 2. Use geometrical calculations for finding distances between the user's place and the job location. 3. Filter jobs according to distance.

Lucene in Action has an example of spatial search similar in spirit. A second edition is in the making. Also, check out Sujit Pal's suggestions for spatial search with Lucene and Patrick O'Leary's framework. There are also Locallucene and LocalSolr, but I do not know how mature they are.

Yuval F
A: 

Hi, my index size is about 4 MB.Am using the following code for building query for nearest cities:

foreach (string city in htNearestCities.Keys)
                {
                    cityStateQuery = new BooleanQuery();
                    queryCity = queryParserCity.Parse("\"" + city + "\"");
                    queryState = queryParserState.Parse("\"" + ((string[])htNearestCities[city])[1] + "\"");
                    cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); 
                    cityStateQuery.Add(queryState, BooleanClause.Occur.MUST);

                    findLocationQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD);
                    }
4MB? Lucene's a bit overkill for such a small data set.
Gandalf
we are expecting millions of records to be indexed in Lucene down the line...
A: 

You may ultimately want to have lucene handle the spatial search by indexing tiles. But if you're certain the lucene query is slow, not the finding of the cities, then start by indexing the state and city together. Much like indexing multiple columns in a relational database: a 'state:city' field with values like 'GA:Atlanta'. Then the intersection isn't done at query time.

Coady