Hi guys,
Just wondering if there is any tips on improving search times (full-text).
How do large sites like stackoverflow, reddit, etc, implement their search functions?
(Sorry for the vagueness - i am a newbie)
Hi guys,
Just wondering if there is any tips on improving search times (full-text).
How do large sites like stackoverflow, reddit, etc, implement their search functions?
(Sorry for the vagueness - i am a newbie)
Have a read of the MySQL Guide to Fine-Tuning Full-Text Search. It describes many techniques the engine can use to make searches faster or more exhaustive.
Oh wow, there are entire courses and papers written on this...
Firstly, if you're storing in a database, there are indexes and different joins and views and all sorts of fun for speeding up your queries.
However you've specified full text search, so I'll direct you to this page which has a comparison of the most common techniques. Now this is for arrays, but will give you an understanding of how splitting or searching can be improved or varied.
Next, take a read of this Wikipedia article on string searching. There are the naive search where you just look, or ones where you create an index first, so that future searches let you jump - like chapters or page numbers in a book of text.
The index or pattern storage techniques are also very useful in compression, and that's yet another way to help speed up searching - if you build the compressed string, you can be clever and jump to the compressed section, extract and compare, depending on whether you have a limited number of patterns that you are searching for, or whether you have anything-goes.
Then there's fuzzy searching as well, where you don't get an exact match - you may do this on some 'closeness' score - like a percentage of character matches.
Hopefully that gives you a good starting point at least!
Apache Lucene is the canonical open source full text indexing engine. I'd start there if I needed to build a search feature for a web site.