ansaurus

Question

Answer 1

A:

Any chance you could post the XML snippet? The parser is indicating that the error is happening at the very first line. My guess is the formatting is off or reporting incorrectly, which is causing EXPAT to pitch an exception right off the bat.

My guess is that first line violates something in the "well formed XML" content anwyay. For reference, you might compare against http://en.wikipedia.org/wiki/XML

heckj 2010-07-15 05:54:19

Answer 2

A:

Looks like something is wrong with your XML file, right about line 1, column 4.

I tried this, and what I got doesn't look like XML to me. Here are the first eight characters, as Alex suggested:

>>> raw_result.read(8)
'BRTR\x00\x00\x00\x03'

Fred Larson 2010-07-15 05:54:32

Answer 3

A:

Your server is picky about the accept header in deciding what to send back and in which format. The following should work:

In [265]: import urllib2

In [266]: req = urllib2.Request(query, headers={'Accept':'application/xml'})

In [267]: rsp = urllib2.urlopen(req)

In [268]: xml = minidom.parse(rsp)

In [268]: xml.toxml()[:64]
Out[268]: u'<?xml version="1.0" ?><sparql xmlns="http://www.w3.org/2005/spar'

Note the accept header in urllib2.Request.

ars 2010-07-15 06:31:11

thanks, that works perfectly.

Jeff 2010-07-15 06:35:05

Answer 4

A:

It seems that the RDF server is delivering plain text to your urllib.urlopen call.

You should be able, with setting the right header

Accept: application/sparql-results+xml, */*;q=0.5

, to get the xml response. You have to read the RDF protocol specification of openRDF for details - there is for openRDF more than one format.

zovision 2010-07-15 06:31:39

ansaurus

tags:

views:

answers:

trouble parsing XML in python

related questions