views:

71

answers:

1

Hi,

I previously asked a similar question on this topic, I ended up deriving several solutions which worked, one based on bloom filters + ngrams, the other based on hash tables + ngrams. Both solutions perform fine with small data sets (<1000 texts, usually tweets) but the computation time grew exponentially meaning doing 10,000 could take hours.

I am currently working in Ruby and perhaps, that is the problem but are there any other solutions or approaches I could attempt to solve this problem?

Thanks, Ben

+1  A: 

If you are looking to do text searching in large sets of data, you might have to look into something like solr. There is a really easy to setup solr gem called sunspot http://outoftime.github.com/sunspot/

smnirven