views:

84

answers:

2

I have a location search website for a city, we started out with collecting data for all possible categories in the city like Schools, Colleges, Departmental Stores etc and stored their information in a separate table, as each entry had different details apart from their name, address and phone number.

We had to integrate search in the website to enable people to find information, so we built an index table where in we stored the categories and related keywords for the same category and the table which much be fetched if that category was searched for. Later on we added the functionality of searching on the name and address as well by adding another master table containing those fields from all the tables to one place. Now my doubt is the following

  • The application design is improper, and we have written queries like select * from master where name like "%$input%" , all over, since our database is MYSQL and PHP on serverside, is there any suggestion for me to improve on the design of the system?
  • People want more features like splitting the keywords and ranking them according to relevance etc, is there any ready framework available which runs search on a database.
  • I tried using Full Text Search in MYSQL and it seems effective to me, is that enough?

Correct me if i am wrong, i had a look into Lucene and Google Custom Search, don't they work on making an index by crawling existing webpages and building their own index? I have a collection of tables on a mysql database on which i have to apply searching. What options do i have?

+1  A: 

Take a look at sphinx: http://www.sphinxsearch.com/

Per their site:

How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

It's quite popular with a lot of people in the rails community right now, and they all rave about how awesome it is :)

Ian Selby
+1  A: 

To address your points:

  1. Using %input% is very bad. That will cause a full table scan every query. Under any amount of load or on even a remotely large dataset your DB server will choke.

  2. An RDBMS alone is not a good solution for this. You are looking in the right place by seeking a separate solution for search. Something which can communicate well with your RDBMS is good; something that runs inside an RDBMS won't do what you need.

  3. Full Text Search in MySQL is workable for very basic keyword searches, nothing more. The scope of usefulness is extremely limited - you need a highly predictable usage model to leverage the built-in searching. It is called "search" but it's not really search the way most people think of it. Compared to the quality of search results we have come to expect from Google and Bing, it does not compare. In that sense of the word "search", it is something else - like Notepad vs Word. They both are things to type in, but that's about it.

As far as separate systems for handling search, Lucene is very good. Lucene works however you want it to work, essentially. You can interact with it programatically to insert indexable documents. Likewise, a Google Appliance (not Google Custom Search) can be given direct meta feeds which expose whatever you want to be indexed, such as data directly from a database.

Rex M
how can i make lucene work on databases?
Anirudh Goel
@Anirudh read the API documentation. You will need to either write a stand-alone program which runs through your database periodically and updates the Lucene index, or in your application, as part of inserting new records, also put them in Lucene.
Rex M