views:

83

answers:

3

I was asked to make a software that will encrypt and decrypt a "normal English" text based on letter frequencies.

The question is where do I find some text samples where the official frequencies will match?

So far, I have tried "War and Peace" by Lev Tolstoy, it didn't work well..

LE: I don't need just a list of words, I need a text sample to make some processing.
LE2: The goal is to guess 20 from 26 in a 2000 characters long text.

+1  A: 

Check out infochimps; they have a bunch of freely available datasets that may be useful.

Noon Silk
wasn't able to find anything useful, thanks anyway
Victor Z.
+1  A: 

You're searching for English text corpora, e.g. http://faculty.washington.edu/ebender/corpora/corpora.html#modern. Out of what's listed there, I know that Project Gutenberg is free; many of the others might not be.

I'm not sure what you mean by the official frequencies -- the point of the frequencies is to match what you find in the wild, and if they don't, that's the frequency table's problem.

Darius Bacon
thanks, found some text samples, going to test frequencies now.
Victor Z.
A: 

Try this list of English words:

http://www.openbsd.org/cgi-bin/cvsweb/src/share/dict/

Sean Comeau