views:

37

answers:

1

Hey,

Currently, we've got an application that needs to perform very fast searches on ~2 mill records.

Searches need to search both a large free-text field, and a number of integer/decimal fields between different ranges, along with various functions/computations & sorting.

Currently, we're handling this with a big MSSQL database, using the built-in freetext engine, and some replication to move the load off the transactional tables.

However - as you may have guessed, this solution isn't the most scalable.

I've written up a little Lucene-based document store, and am generally quite impressed by the results, with text searches not taking much longer than a 1/2 a second (on 100k records).

The hard part is the parametric searching - I'm aware Lucene does basic range matching - however I feel we need something more powerful.

I've made a little test database using db4o - which has powerful query capabilities, however these queries are quite slow - taking over 15sec on only 100k records - wherein SQL takes about 1.5 seconds for the freetext & parametric searches.

Also, our database needs to have an update resolution of less than 10min, with approx 15% of the records changing on a daily basis. Our SQL server is handling this currently, but starting to creak.

Any guidance on suitable technologies & approaches would be appreciated.

Cheers, Dave

A: 

LinkedIn wrote an add-on to Lucene called bobo to expand its facted search queries which might be worth looking into. But I think bobo is really only needed if you have an absolutely massive index - there must be something reallyl weird going on if a search on 100k documents is taking that long.

Xodarap