views:

149

answers:

4

I would like to download an English dictionary -- not just a word list -- in a structured format such as TXT, XML, or SQL.

Specifically, I need phonetic pronunciation and parts of speech (definition is not required).

Surprisingly, I can't find this online anywhere. Wiktionary is available for download, but it is only the MediaWiki articles themselves. Crawling all articles and extracting the phonetics and parts of speech would be a huge exercise.

Is this available anywhere? I don't mind paying.

Edit: a few people have asked what I would like to do. My immediate need is just curiosity, for example "what the most common two-syllable verbs?". Eventually my hope would be a tool that helps you find available domain names, and does so by pairing the correct parts of speech, with bonus points for phonetic matches.

Note: cross-posted on English Language and Usage.

A: 

Portman, while I used the SpellChecker tool from DevExpress I knew that there existed the OpenOffice dictionaries I'm pretty sure they have a well defined data structure. I recommend you to use that in combination with any free/paid text to speech tool.

Hope that helps,

Ramon Araujo
@ramon - he's looking for pronunciations and parts of speech, not just a list of words (which is what DevExpress and OpenOffice provide).
Jess
@Jess - DevExpress use OpenOffice list of words, but have also a SpellChecker. I recommended him to use standard .dic and .aff files to find the words, then a tool to guarantee the pronunciation.
Ramon Araujo
@Ramon - the OpenOffice files are actually a subset of Aspell. They include only spelling. No parts of speech and no pronunciation.
Portman
@Portman, - Totally agree. My suggestion was using them as a list of words to be "spoken" by any free text to speech tool. There are plenty of them on internet ;)
Ramon Araujo
@Ramon - I think he wants ACTUAL pronunciation that he can parse. It's not like he's going to listen to the TTS engine's pronunciation then write it down (and TTS engines usually aren't terribly good beyond the top 10,000 most common words).
Jess
A: 

This is not a direct answer to your question, but the Double Metaphone algorithm is very good at finding word or phrase matches for search engine application servers (such as Solr and others).

I cannot tell what your intended use of this is, so I can't tell if my suggestion is useful or not. If it is close to your intended use, the Wikipedia page about Double Metaphone has a listing of about a dozen implementations of it which may be worth exploring.

http://en.wikipedia.org/wiki/Double_Metaphone

Chris Adragna
+1  A: 

Wordnet is one of the best dictionaries i know. Perhaps you will find something there: http://wordnet.princeton.edu/wordnet/related-projects/

chris
This looks promising. I wish the data wasn't in a custom format, but it looks extractable.
Portman
+1  A: 

Go to http://www.speech.cs.cmu.edu/cgi-bin/cmudict and you will find the download page for the pronunciation dictionary at https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/cmudict/

The latest version is currently cmudict.0.7a.

This is what I am currently using to implement the syllable counter for http://www.haikuvillage.com. It's in Ruby and I'd be happy to open source it for you if that helps.

matthuhiggins
Cool! This is extremely helpful. Now I need parts of speech...
Portman