views:

23

answers:

3

Hi,

I am looking for a corups of text to run some trial fulltext style data searches across. Either something I can download, or a system that generates it. Something a bit more random would be better e.g. 1,000,000 wikipedia articles in a format easy to insert into a 2 column database (id, text).

Any ideas or suggestions?

+1  A: 

Why not use a Wikipedia dump?

Ben S
Mainly because uncompressed it is many many GB and is in markup language - just looking for text.
Chris Padfield
+1  A: 

Project Gutenberg has 32000 books available.

Peter Tillemans
A: 

I'll throw this out there since I'm familiar with it - Prosper.com makes their member loan listings available for analysis through an XML export. The export would have about 50,000 loan requests with descriptions and over 1,000,000 member profiles (although many of those are empty).

Eric Petroelje
Thanks, this could be useful. Still quite a bit of processing to get it to work - but will give it a go.
Chris Padfield