views:

112

answers:

3

Hi guys,

PROBLEM: I need to write an advanced search functionality for a website. All the data is stored in MySQL and I'm using Zend Framework on top. I know that I can write a script that takes the search page and builds an SQL query out of it, but this becomes extremely slow if there's a lot of hits. Then I would have to get down to the gritty details of optimizing the database tables/fields/etc. which I'm trying to avoid if possible.

Lucene: I gave Lucene a try, but since it's a full-text search engine, it does not allow any mathematical operators!! So if I wanted to get all the records where field_x > 5, there is no way to do it (correct?)

General Practice? I would like to know how large sites deal with this dilemma. Is there a standard way of doing this that I don't know about, or does everyone have to deal with the nasty details of optimizing the database at some point? I was hoping that some fast indexing/searching technology existed (e.g. Lucene) that would address this problem.

ANY OTHER COMMENTS OR SUGGESTION ARE MOST WELCOME!!

Thanks a lot guys! Ali

+1  A: 

Use Lucene for your text-based searches, and use SQL for field_x > 5 searches. I say this because text-based search is hard to get right, and you're probably better off leaving that to an expert.

If you need your users to have the capability of building mathematical expression searches, consider writing an expression builder dialog like this example to collect the search phrase. Then use a parameterized SQL query to execute the search.

SqlWhereBuilder ASP.NET Server Control
http://www.codeproject.com/KB/custom-controls/SqlWhereBuilder.aspx

Robert Harvey
Thanks for your comment Robert. I don't need anything fancy like the expression-builder you mentioned. Just need to query the database using mathematical expressions internally.
Ali
OK. You should consider moving your memo fields from your mySQL database into the Lucene database so that you can do text search on them. Presumably you can also put a key into the Lucene database so you can pull the text from the mySQL side. Now you have the best of both worlds; you can do full text searches, and can still do math searches on the SQL database.
Robert Harvey
Yes, I understand how that would work. Thanks a lot for your help Rob :)
Ali
+1  A: 

You can use filters in Lucene to carry out a text search of a reduced set of records. So if you query the database first to get all records where field_x > 5, build a filter (a list of lucene document IDs) and pass this into the lucene search method along with the text query. I'm just learning about this, here's a link to a question I asked (it uses Lucene.Net and C# but it may help) - ignore my question, just check out the accepted answer:

http://stackoverflow.com/questions/1079934/how-do-you-implement-a-custom-filter-with-lucene-net

Nick
To be honest, I don't even need a full-text search much. I just want to reduce the load on the database and let a fast, specialized index (such as Lucene) handle my large searches. So, is there a faster way than querying the DB if I just have mathematical conditions?
Ali
+1  A: 

You can use Zend Lucene for textual search, and combine it with MySQL for joins. Please see Mark Krellenstein's Search Engine vs DBMS paper about the choice; Basically, search engines are better for ranked text search; Databases are better for more complex data manipulations, such as joins, using different record structures.

For a simple x>5 type query, you can use a range query inside Lucene.

Yuval F
Thanks Yuval. I was already using Zend_Lucene. I read Mark's article and it's very interesting. I suppose I will go with the database option for now, until my needs for full-text search grow to such an extent that it will be worth duplicating the effort for keeping the index up-to-date. What I don't understand is why Lucene won't allow you to perform these simple mathematical operations in its queries?! I'm no expert at writing search engines(obviously :), but compared to what they've already accomplished, it seems like it would be trivial. Thanks again.
Ali