I have a project where in I collect all the wikipedia articles belonging to a particular category, pull out the dump from the wikipedia, and put it into our db. So I should be parsing wikipedia dump file to get the stuff done. Do we have an efficient parser to do this job. I am a python developer. So I prefer any parser in python. If not suggest me one and I will try to write a port of it in python and contribute it to the web, so other persons make use of it or atleast try it. Please suggest me one. So all I want is a python parser to parser wikipedia dump files. I started writing a manual parser which parses each node and gets the stuff done.
views:
1508answers:
3
+3
A:
There is example code for the same at http://jjinux.blogspot.com/2009/01/python-parsing-wikipedia-dumps-using.html
Swaroop C H
2009-03-19 10:00:28