views:

1002

answers:

6

Which techniqes would you use to implement a search for contents in a column on a very big table in MySql? Say for instance that you have 10.000.000 emails stored in a table in the database and would like to implement a subject search, that would enable me to search for one or more words that was present in the email subject. If the user searched for "christmas santa" you should find a emails with subjects like "Santa visits us this christmas" and "christmas, will santa ever show".

My idea is to process all the words in the subjects (strip all numbers, special signs, commas etc) and save each word in an index table, where I have a unique index on the word column. Then I would link that to the email table by a many to many relationship table.

Is there a better way to perform wildcard searches on very big tables ?

Is there databases that natively supports this kind of searches ?

A: 

This sounds like a a full text search, which SQL Server supports.

But your idea is generally sound. You're effectively computing an "index" on your table in advance to speed up searches.

Craig Shearer
A: 

check "full text search" in MySQL docs (AFAIK, all current DBMS support this)

Javier
+2  A: 

MySQL's MyISAM tables support a FULLTEXT index, which helps in this kind of search.

But it's not the speediest technology available for this kind of search. And you can't use it on data stored in InnoDB tables.

I've heard some good things about Sphinx Search, but I haven't used it yet.

Here's another blog about Sphinx: http://capttofu.livejournal.com/13037.html

Bill Karwin
A: 

You want to look at the MATCH...AGAINST function.

See, for example: Using MySQL Full-text Searching

Eli
+7  A: 

You could use FULLTEXT indexes if you are using MyISAM as the storage engine. However, MySQL in general is not very good with text search.

A much better option would be to go with a dedicated text indexing solution such as Lucene or Sphinx. Personally I'd recommend Sphinx - it has great integration with PHP and MySQL and is very, very fast (can be used to speed up even ordinary queries - performs very fast grouping and ordering).

Wikipedia has a nice list of different indexing engines - here.

Eran Galperin
+1  A: 

While a mysql fulltext index is possible, I suspect I would look at using something designed to be a search engine like Lucene.

Zoredache