views:

454

answers:

2

Currently working on a project that is centered around a medical nomenclature known as SNOMED. At the heart of snomed is are three relational datasets that are 350,000, 1.1 mil, and 1.3 mil records in length. We want to be able to quickly query this dataset for the data entry portion where we would like to have some shape or form of auto-completion/suggestion.

Its currently in a MySQL MyISAM DB just for dev purposes but we want to start playing with some in memory options. It's currently 30MB + 90MB + 70MB in size including the indexes. The MEMORY MySQL Engine and MemCached were the obvious ones, so my question is which of these would you suggest or is there something better out there?

We're working in Python primarily at the app level if that makes a difference. Also we're running on a single small dedicated server moving to 4GB DDR2 soon.

Edit: Additional Info

We're interested in keeping the suggesting and autocompletion fast. Something that will peform well for these types of queires is desirable. Each term in snomed typically has several synonyms, abbreviations, and a preferred name. We will be querying this dataset heavily (90MB in size including index). We're also considering building an inverted index to speed things up and return more relevant results (many of the terms are long "Entire coiled artery of decidua basalis (body structure)"). Lucene or some other full text search may be appropriate.

+2  A: 

From your use case, it sounds like you want to do full-text searching; I would suggest sphinx. It's blazing fast, even on large data sets. You can integrate memcached if you need extra speed.

Todd Gardner
+1  A: 

Please see

For how to do this with Lucene. Lucene is the closest to industry standard full-text search library. It is fast and gives quality results. However, It takes time to master Lucene - you have to handle many low-level details. An easier way may be to use Solr, a Lucene sub-project which is much easier to set up, and can give JSON output, that can be used for autocomplete.

As Todd said, you can also use Sphinx. I have never used it, but heard it is highly integrable with MySQL. I failed to find how to implement autocomplete using Sphinx - maybe you should post this as a separate question.

Yuval F
as for the autocomplete, turns out you can do prefix and infix indexes in sphinx
nategood