views:

40

answers:

2

I was wondering if anyone could point me to a very very large dictionary of random words that could be used to test some high performance string data structures? I'm finding some that are in the ~2MB range... however I'd like some larger if possible. I'm guessing there has to be some large standard string dataset somewhere that could be used. Thanks!

+1  A: 

http://norvig.com/big.txt

The above link was mentioned in Norvig's spell checker article - http://norvig.com/spell-correct.html

Duniyadnd
+1  A: 

I'd recommend taking a look through the material available at the TREC (Text REtrieval Conference). Some good datasets which might meet your requirements.

borrible