ansaurus

Question

Should I strip the XML declaration from suds output before parsing with lxml?

Answer 1

+1 A:

Hmm, I'm currently implementing my first Suds-based solution and parsing my responses with lxml without a problem, but I think this could be because I'm doing it in a pretty blunt and dumb way. Here's what my code looks like:

try:
    result = self.client.service.ExportOwnersDetails(fAccess=self.access_id, fParams=params)
except URLError:
    # TODO: Log timeout here, handle
    return
response = str(result.fReturn)

if len(response) == 0 or response.find('<?xml ') == -1:
    # TODO: Log import error here, handle
    return
response = StringIO(response)
xml = etree.parse(response)

Like I said, not very clever (and obviously I still have some logging to do), but that's my approach. The fAccess, fParams, fReturn nonsense is the naming convention at the third-party provider I'm integrating with.

Tom 2010-03-16 21:30:42

Well, you could use `etree.fromstring(response)` instead of having to convert to a StringIO first (etree.parse() is for reading files, etree.fromstring() happily accepts strings).But the StringIO conversion may be the reason you’re not seeing the same errors I do…

mikl 2010-03-16 22:01:07

Duh, I knew I'd been away from lxml too long. fromstring() worked fine for me. Thanks for asking a question so you could clean up my code.

Tom 2010-03-16 23:26:58

Answer 2

+1 A:

You and lxml are correct; a valid XML document must be a stream of bytes encoded as declared in the <?xml ..... header (default: UTF-8).

I'd suggest a third option: leave it in unicode with an XML header that omits the encoding declaration but leaves the version in there (future-safe). That will keep lxml happy and avoid the overhead of you encoding it again.

I'd also suggest some gentle enquiry at the suds site and having a poke around in their source.

John Machin 2010-03-16 21:54:50

I suppose simply removing the encoding part is a reasonable way to go about it, thanks.I think I’ll hit up the suds guys to see if a fix for this edge case is worthy of inclusion into the main library.

mikl 2010-03-16 22:05:31

ansaurus

tags:

views:

answers:

Should I strip the XML declaration from suds output before parsing with lxml?

related questions