views:

132

answers:

3

Wiktionary is a wiki dicitonary that covers many languages. It even has translations. I'd be interested in parsing it and playing with the data, has anyone does anything like this before? Is there any library I can use? (Preferable Python)

+3  A: 

Wiktionary runs on MediaWiki, which has an API.

One of the subpages for the API documentation is Client code, which lists some Python libraries.

Amber
A: 

I guarantee you somebody has parsed it to build a rainbow table for password cracking.

T.E.D.
A: 

I had at one time downloaded a wiktionary dump, trying to gather together words and definitions for slavic languages. I approached it using elementtree to go thru the xml file that is the dump. I would avoid trying to scrape or crawl the site, and just download the xml dump that wikimedia provides for wiktionary. Go to the wikimedia downloads, look for the english wiktionary dumps (enwiktionary) and go to the most recent dump. You'll probably want the pages-articles.xml.bz2 file, which is just the article content, no history or comments. Parse this with whatever xml processing libraries you prefer in python. I personally prefer elementtree. Good luck.

razzmataz