Hi there,
Am working on web based Job search application using Lucene.User on my site can search for jobs which are within a radius of 100 miles from say "Boston,MA" or any other location. Also, I need to show the search results sorted by "relevance"(ie. Score returned by lucene) in descending order.
I'm using a 3rd party API to fetch all the cities within given radius of a city.This API returns me around 864 cities within 100 miles radius of "Boston,MA".
I'm building the city/state Lucene query using the following logic which is part of my "BuildNearestCitiesQuery" method. Here nearestCities is a hashtable returned by the above API.It contains 864 cities with CityName ass key and StateCode as value. And finalQuery is a Lucene BooleanQuery object which contains other search criteria entered by the user like:skills,keywords,etc.
foreach (string city in nearestCities.Keys)
{
BooleanQuery tempFinalQuery = finalQuery;
cityStateQuery = new BooleanQuery();
queryCity = queryParserCity.Parse(city);
queryState = queryParserState.Parse(((string[])nearestCities[city])[1]);
cityStateQuery.Add(queryCity, BooleanClause.Occur.MUST); //must is like an AND
cityStateQuery.Add(queryState, BooleanClause.Occur.MUST);
}
nearestCityQuery.Add(cityStateQuery, BooleanClause.Occur.SHOULD); //should is like an OR
finalQuery.Add(nearestCityQuery, BooleanClause.Occur.MUST);
I then input finalQuery object to Lucene's Search method to get all the jobs within 100 miles radius.:
searcher.Search(finalQuery, collector);
I found out this BuildNearestCitiesQuery method takes a whopping 29 seconds on an average to execute which obviously is unacceptable by any standards of a website.I also found out that the statements involving "Parse" take a considerable amount of time to execute as compared to other statements.
A job for a given location is a dynamic attribute in the sense that a city could have 2 jobs(meeting a particular search criteria) today,but zero job for the same search criteria after 3 days.So,I cannot use any "Caching" over here.
Is there any way I can optimize this logic?or for that matter my whole approach/algorithm towards finding all jobs within 100 miles using Lucene?
FYI,here is how my indexing in Lucene looks like:
doc.Add(new Field("jobId", job.JobID.ToString().Trim(), Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add(new Field("title", job.JobTitle.Trim(), Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("description", job.JobDescription.Trim(), Field.Store.NO, Field.Index.TOKENIZED));
doc.Add(new Field("city", job.City.Trim(), Field.Store.YES, Field.Index.TOKENIZED , Field.TermVector.YES));
doc.Add(new Field("state", job.StateCode.Trim(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("citystate", job.City.Trim() + ", " + job.StateCode.Trim(), Field.Store.YES, Field.Index.UN_TOKENIZED , Field.TermVector.YES));
doc.Add(new Field("datePosted", jobPostedDateTime, Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.Add(new Field("company", job.HiringCoName.Trim(), Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("jobType", job.JobTypeID.ToString(), Field.Store.NO, Field.Index.UN_TOKENIZED,Field.TermVector.YES));
doc.Add(new Field("sector", job.SectorID.ToString(), Field.Store.NO, Field.Index.UN_TOKENIZED, Field.TermVector.YES));
doc.Add(new Field("showAllJobs", "yy", Field.Store.NO, Field.Index.UN_TOKENIZED));
Thanks a ton for reading!I would really appreciate your help on this.
Janis