views:

94

answers:

4

I have 10,000,000 records which will be the best technique to search records, currently i m using full text search but it is slow,please suggest.

+1  A: 

It depends on several simple questions:

  • what kind of data is processed? (simple entries like "Firstname, Lastname" or more complex datasets?
  • how is it structured? (plain database table? partitioned?)
  • what do you search for? (i.e. search for names in telephone directory)
Daniel
+5  A: 

There is no one-size-fits-all solution but you can try out:

Sphinx

How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use.

Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes).

As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU's Sphinx project.

http://www.sphinxsearch.com/

Lucene PHP (Part of Zend Framework):

Zend_Search_Lucene is a general purpose text search engine written entirely in PHP 5. Since it stores its index on the filesystem and does not require a database server, it can add search capabilities to almost any PHP-driven website. Zend_Search_Lucene supports the following features:

  • Ranked searching - best results returned first
  • Many powerful query types: phrase queries, boolean queries, wildcard
    queries, proximity queries, range
    queries and many others.
  • Search by specific field (e.g., title, author, contents)

http://framework.zend.com/ http://framework.zend.com/manual/en/zend.search.lucene.overview.html

Chris T
+1 for suggesting Zend_Lucene. Never used it with so much items, but never had performance problems with it.
Maerlyn
A: 

Because i didn't worked with such a large datasets like this here are some ideas that may work:

First question is that these records are static (geoip's for example) or not?

  • I'd try to optimize my database as much as i can (try using EXPLAIN if you're using MySQL)
  • Look out for every kind of queries that can be possible, try to optimize your database against these queries
  • If indexes are fine i'll go with some kind of cache where i would save my previous resultsets. This will be handy when your database isn't updated regulary.
  • You can cron the job above (for example: most used search queries and their results can be precached too)
  • Try to optimize these ideas for your needs

If you can provide some more details maybe i can refine my tips.

fabrik
These are very helpfull techs thanks
Jos
A: 

Use Solr . It's lucene with some additions easily accessible by http protocol. It's blazing fast in comparison to any full text searches from mysql.

Kamil Szot