views:

499

answers:

3

Can someone point me to where I can download English dictionary as a txt or xml file. I am building a simple app for myself and looking for something what I could start using immediately without learning complex API.

Support for synonyms would be great, that is it should be easier to retrieve all the synonyms for particular word.

It would be absolutely fantastic if dictionary would be listing British and American spelling of the words where they are differ.

Even if it would be small dictionary (few 000's words) that's ok, I only need it for small project.

I even would be willing to buy one if the price is reasonable, and dictionary is easy to use - simple xml wold be great.

Any directions please.

+1  A: 

Try WordNet.

C4H5As
+3  A: 

WordNet is what you want. It's big, containing over a hundred thousand entries, and it's freely available.

However, it's not stored as XML. To access the data, you'll want to use one of the existing WordNet APIs for your language of choice.

Using the APIs is generally pretty straightforward, so I don't think you have to worry much about "learning (a) complex API". For example, borrowing from the WordNet How to for the Python based Natural Language Toolkit (NLTK):

 >>> from nltk.corpus import wordnet
 >>> 
 >>> # Get All Synsets for 'dog'
 >>> # This is essentially all senses of the word in the db
 >>> wordnet.synsets('dog')
 [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), 
  Synset('cad.n.01'), Synset('frank.n.02'),Synset('pawl.n.01'), 
  Synset('andiron.n.01'), Synset('chase.v.01')]

 >>> # Get the definition and usage for the first synset
 >>> wn.synset('dog.n.01').definition
 'a member of the genus Canis (probably descended from the common 
 wolf) that has been domesticated by man since prehistoric times; 
 occurs in many breeds'
 >>> wn.synset('dog.n.01').examples
 ['the dog barked all night']

 >>> # Get antonyms for 'good'
 >>> wordnet.synset('good.a.01').lemmas[0].antonyms()
 [Lemma('bad.a.01.bad')]

 >>> # Get synonyms for the first noun sense of 'dog'
 >>> wordnet.synset('dog.n.01').lemmas
 [Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), 
 Lemma('dog.n.01.Canis_familiaris')]

 >>> # Get synonyms for all senses of 'dog'
 >>> for synset in wordnet.synsets('dog'): print synset.lemmas
 [Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), 
 Lemma('dog.n.01.Canis_familiaris')]
 ...
 [Lemma('frank.n.02.frank'), Lemma('frank.n.02.frankfurter'), 
 ...

While there is an American English bias in WordNet, it supports British spellings and usage. For example, you can look up 'colour' and one of the synsets for 'lift' is 'elevator.n.01'.

Notes on XML

If having the data represented as XML is essential, you could easily use one of the APIs to access the WordNet database and convert it into XML, e.g. see Thinking XML: Querying WordNet as XML.

dmcer
+1  A: 

I have used Roget's thesaurus in the past. It has the synonymy information in plain text files. There is also some java code to help you parse the text.

These pages provides links to a bunch of thesauri/lexical resources some of which are freely downloadable.

http://www.w3.org/2001/sw/Europe/reports/thes/thes_links.html

http://www-a2k.is.tokushima-u.ac.jp/member/kita/NLP/lex.html

hashable