If I have downloaded Wikipedia XML dumps, is there any way of removing all of the internal links from within an XML file?
Thanks
If I have downloaded Wikipedia XML dumps, is there any way of removing all of the internal links from within an XML file?
Thanks
Wikipedia database dumps and information about using them are located here: Wikipedia:Database download. You should do this instead of writing a script to scrape Wikipedia.
One thing you could do, if you are importing them into a local wiki, is to import all the files you want, then use a robot (eg. pywikipediabot is easy to use) to get rid of all the internal links.
You could do a search and replace in your favorite text editor, replacing [[ and ]] with nothing.
I would try to use XSLT to transform the XML file into another XML file.