Wikipedia with Python | ansaurus

tags:

views:

134

answers:

1

+2 Q:

Wikipedia with Python

Hello,

I have this very simple python code to read xml for the wikipedia api:

import urllib
from xml.dom import minidom

usock = urllib.urlopen("http://en.wikipedia.org/w/api.php?action=query&amp;titles=Fractal&amp;prop=links&amp;pllimit=500")
xmldoc=minidom.parse(usock)
usock.close()
print xmldoc.toxml()

But this code returns with these errors:

Traceback (most recent call last):
  File "/home/user/workspace/wikipediafoundations/src/list.py", line 5, in <module><br>
    xmldoc=minidom.parse(usock)<br>
  File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse<br>
    return expatbuilder.parse(file)<br>
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse<br>
    result = builder.parseFile(file)<br>
  File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile<br>
    parser.Parse(buffer, 0)<br>
xml.parsers.expat.ExpatError: syntax error: line 1, column 62<br>

I have no clue as I just learning python. Is there a way to get an error with more detail? Does anyone know the solution? Also, please recommend a better language to do this in.

Thank You,
Venkat Rao

+7 A:

The URL you're requesting is an HTML representation of the XML that would be returned:

http://en.wikipedia.org/w/api.php?action=query&amp;titles=Fractal&amp;prop=links&amp;pllimit=500

So the XML parser fails. You can see this by pasting the above in a browser. Try adding a format=xml at the end:

http://en.wikipedia.org/w/api.php?action=query&amp;titles=Fractal&amp;prop=links&amp;pllimit=500&amp;format=xml

as documented on the linked page:

http://en.wikipedia.org/w/api.php

ars 2010-08-11 03:43:45

Thank YouThat was easy.

Venkat S. Rao 2010-08-11 03:47:10

@user, since @ars's answer solved you problem, **accept it** -- that is, clic on the checkmark-shaped icon on the left of his answer's text. This is fundamental SO etiquette!

Alex Martelli 2010-08-11 04:36:29

related questions

Load an XmlNodeList into an XmlDocument without looping?

Does System.Xml use MSXML?

Using an XML catalog with Python's lxml?

Why Are People Still Creating RSS Feeds?

Pretty printing XML files on Emacs

Application configuration files

What is the best XML editor?

How much extra overhead is generated when sending a file over a web service as a byte array?

XPATHS and Default Namespaces

How to parse XML in VBA

Small modification to an XML document using StAX

how to use xpath in python

Best binary XML format for JavaME

How can I split an XML document into thirds (or, even better, n pieces)?

Test serialization encoding

Is it "bad practice" to be sensitive to linebreaks in XML documents?

HTML comments break down

Authoritative source on XML-sig

Best way to get InnerXml of an XElement?

HTML version choice

SQL 2005 For XML Explicit - Need help formatting

Any experiences with Protocol Buffers?

XML Editing/Viewing Software

XML Processing in Python

Converting CSV File to XML in Java