views:

284

answers:

5

I need a database of every single valid word in English. I checked the /usr/share/dict/words file, it contains less than 100k words. wikipedia says English has 475k words. Where do I get the complete list (American spelling)?

Also, is there a single website that gives out words for other languages too? Including Asian and European?

Edit: Forgot to add, I do not need names etc. Just valid english words.

+1  A: 

You can find what you need on infochimps.org.

They have a list of 350,000 simple (ie non-compound) words available for free download.

Word List - 350,000+ Simple English Words

Regarding other languages, you might want to poke around on Wiktionary. Here is a link to all the database backups - the information isnt organized so likely but if they have a language, you can download the data in SQL format.

danben
A: 

You didn't say what you needed this list for. If something used as a blacklist for password checks is enough cracklib might be good for you. It contains over 1.5M words.

honk
no, not for blacklist. I am doing some sort of word game/graph.
+3  A: 

WordNet database might be helpful. I once worked on a Firefox add-on which deals with words and all kinds of simple to complicated associations between them and stuff. Looks like WordNet will be very much useful to you.

MySQL format - http://androidtech.com/html/wordnet-mysql-20.php

do they have a downloadable list too?
Yes, they give you the facility to download their database in a lot of formats - CSV, MySQL Database, etc.. and even have APIs you can use through .Net, Java etc... This is the download page - http://wordnet.princeton.edu/wordnet/download/
is this the one?http://wordnetcode.princeton.edu/3.0/WNdb-3.0.tar.gz
I have not personally downloaded it, but it was there ready when I started coding. So I don't know what files will be there in which download. I just know that you can download in different formats. If you can tell me in which format you want, I may be able to help.
Looks like a very interesting project indeed.
Wim Hollebrandse
A: 

There's no such thing as a "complete" list. Different people have different ways of measuring -- for example, they might include slang, neologisms, multi-word phrases, offensive terms, foreign words, verb conjugations, and so on. Some people have even counted a million words! So you'll have to decide what you want in a word list.

JW
A: 

You may check *spell en-GB dictionary used by Mozilla, OpenOffice, plenty of other software.

mloskot