views:

230

answers:

5

If I have downloaded Wikipedia XML dumps, is there any way of removing all of the internal links from within an XML file?

Thanks

A: 

Wikipedia database dumps and information about using them are located here: Wikipedia:Database download. You should do this instead of writing a script to scrape Wikipedia.

Chad Birch
Yeah found that, sorry I got my question wrong really! How can I remove the internal links from the xml files?
A: 

Removed as Q has now changed

Could you please update your question. This will increase the probability of getting help (Note that this site works like a wiki where you can edit your content again)
0xA3
No prob, welcome to SO :-)
0xA3
A: 

One thing you could do, if you are importing them into a local wiki, is to import all the files you want, then use a robot (eg. pywikipediabot is easy to use) to get rid of all the internal links.

Adrian Archer
Better yet, if your wiki is going to be used somewhere that you have internet access, you could change all the internal links to [[wikipedia:PageName|PageName]], then they would refer back to their original articles.
Adrian Archer
A: 

You could do a search and replace in your favorite text editor, replacing [[ and ]] with nothing.

Adrian Archer
A: 

I would try to use XSLT to transform the XML file into another XML file.

lothar