Hey,
Currently, we've got an application that needs to perform very fast searches on ~2 mill records.
Searches need to search both a large free-text field, and a number of integer/decimal fields between different ranges, along with various functions/computations & sorting.
Currently, we're handling this with a big MSSQL database, using the built-in freetext engine, and some replication to move the load off the transactional tables.
However - as you may have guessed, this solution isn't the most scalable.
I've written up a little Lucene-based document store, and am generally quite impressed by the results, with text searches not taking much longer than a 1/2 a second (on 100k records).
The hard part is the parametric searching - I'm aware Lucene does basic range matching - however I feel we need something more powerful.
I've made a little test database using db4o - which has powerful query capabilities, however these queries are quite slow - taking over 15sec on only 100k records - wherein SQL takes about 1.5 seconds for the freetext & parametric searches.
Also, our database needs to have an update resolution of less than 10min, with approx 15% of the records changing on a daily basis. Our SQL server is handling this currently, but starting to creak.
Any guidance on suitable technologies & approaches would be appreciated.
Cheers, Dave