tags:

views:

2242

answers:

8

What would be the best way to go about getting a function that returns a random English word (preferably a noun), without keeping a list of all possible words in a file before hand?

Edit: Sorry for the failure of wording. After the tiniest amount of thought i suppose i am after some sort of online list or api that would not require me to have a massive list of all the words stored on my server.

A: 

Well, you have three options:

  • Hard-code the list of words and initialize an array with it.
  • Fetch the list from an internet location instead of a file.
  • Keep a list of possible words in a file.

The only way to avoid the above is if you're not concerned whether the word is real: you can just generate random-length strings of characters. (There's no way to programmatically generate words without a dictionary list to go from.)

lc
+7  A: 

You can't. There is no algorithm to generate meaningful words. You can only generate words that sound like English, but they won't have any meaning.

Alex Reitbort
+4  A: 

You could have the function try and parse an online resource such as:

http://www.zokutou.co.uk/randomword/

Gary Willoughby
+2  A: 

Another theoretical approach: you could scrape the random wikipedia article page and return the N-th word of the article.

splattne
It's a nice idea, but you might need to filter out dates and numbers and non-Engilsh words.
Ben
The results wouldn't be very random -- you'd tend to get the same few words a lot, and all sorts of other problems.
Whatsit
@Whatsit I guess you're right. On the other hand: what des random english word really mean? If you ask somebody for a random word, it will be a similar statistical distribution
splattne
+10  A: 

Word lists need not take up all that much space.

Here's a wordlist with over 5000 words, all nouns. It clocks in at under 50K, the size of a medium-sized jpeg image.

I'll leave choosing a random one as an exercise for the reader.

Triptych
This really is the best option. You could easily keep the entire list in memory and you'll have complete control over the source -- no unexpected changes, no connection issues, no security concerns, and overall should be much easier to implement.
Whatsit
And you don't even need to keep it all in memory.
Triptych
A: 

There's a random word generator here - it's not English but it's English-ish, i.e. the words are similar enough to language that a user can read the words and store them in short-term memory.

Source code is in C# and a bit kludged, but you could use a similar approach in Python to generate lots of words without having to store a massive list.

Alternatively, you could call the web service on the demo page directly - it's hosted on GoDaddy though, so no guarantees it will work in production!

Luke Sampson
+1  A: 

You can download the "words common to SOWPODS and TWL" lists from http://www.math.toronto.edu/jjchew/scrabble/lists/ . I put all the words in those files together and the list weighed in at about 642k. Not huge by any standards. The lists do contain a whole lot of obscure words though, since they are meant for tournament Scrabble use. The good thing is that the lists form a substantial subset of the English language.

Chinmay Kanchi
A: 
Jason
that's not python
SilentGhost