Can anybody recommend a good book(s)/paper(s)/article(s) on Full Text Search (and maybe indexing in general). I'm pretty anal about having to understand what's happening behind the scenes in my applications, and I'm having trouble understanding why Sphinx and other external FTS's leaves MySQL/MyISAM in the dust.
+2
A:
I found the postgres Full Text Search page http://www.postgresql.org/docs/8.3/static/textsearch.html very enlightening.
Especially: http://www.postgresql.org/docs/8.3/static/textsearch-intro.html
Textual search operators have existed in databases for years. PostgreSQL has ~, ~*, LIKE, and ILIKE operators for textual data types, but they lack many essential properties required by modern information systems:
- There is no linguistic support, even for English. Regular expressions are not sufficient because they cannot easily handle derived words, e.g., satisfies and satisfy. You might miss documents that contain satisfies, although you probably would like to find them when searching for satisfy. It is possible to use OR to search for multiple derived forms, but this is tedious and error-prone (some words can have several thousand derivatives).
- They provide no ordering (ranking) of search results, which makes them ineffective when thousands of matching documents are found.
- They tend to be slow because there is no index support, so they must process all documents for every search.
Christopher
2009-06-26 17:21:59
+5
A:
For understanding full text search from the bottom up, I recommend "Managing Gigabytes".
George Phillips
2009-06-26 18:47:02
+2
A:
There is an excellent free Information Retrieval book (Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008), including text search, available free (legit) here.
bvmou
2009-06-29 06:27:31