I have 10,000,000 records which will be the best technique to search records, currently i m using full text search but it is slow,please suggest.
It depends on several simple questions:
- what kind of data is processed? (simple entries like "Firstname, Lastname" or more complex datasets?
- how is it structured? (plain database table? partitioned?)
- what do you search for? (i.e. search for names in telephone directory)
There is no one-size-fits-all solution but you can try out:
Sphinx
How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.
Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use.
Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes).
As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU's Sphinx project.
Lucene PHP (Part of Zend Framework):
Zend_Search_Lucene is a general purpose text search engine written entirely in PHP 5. Since it stores its index on the filesystem and does not require a database server, it can add search capabilities to almost any PHP-driven website. Zend_Search_Lucene supports the following features:
- Ranked searching - best results returned first
- Many powerful query types: phrase queries, boolean queries, wildcard
queries, proximity queries, range
queries and many others.- Search by specific field (e.g., title, author, contents)
http://framework.zend.com/ http://framework.zend.com/manual/en/zend.search.lucene.overview.html
Because i didn't worked with such a large datasets like this here are some ideas that may work:
First question is that these records are static (geoip's for example) or not?
- I'd try to optimize my database as much as i can (try using EXPLAIN if you're using MySQL)
- Look out for every kind of queries that can be possible, try to optimize your database against these queries
- If indexes are fine i'll go with some kind of cache where i would save my previous resultsets. This will be handy when your database isn't updated regulary.
- You can cron the job above (for example: most used search queries and their results can be precached too)
- Try to optimize these ideas for your needs
If you can provide some more details maybe i can refine my tips.
Use Solr . It's lucene with some additions easily accessible by http protocol. It's blazing fast in comparison to any full text searches from mysql.