I'm encoding challenged, so this is probably simple, but I'm stuck.
I'm trying to parse an XML file emailed to the App Engine's new receive mail functionality. First, I just pasted the XML into the body of the message, and it parsed fine with CElementTree. Then I changed to using an attachment, and parsing it with CElementTree produces this error:
SyntaxError: not well-formed (invalid token): line 3, column 10
I've output the XML from both emailing in the body and as an attachment, and they look the same to me. I assume pasting it in the box is changing the encoding in a way that attaching the file is not, but I don't know how to fix it.
The first few lines look this:
<?xml version="1.0" standalone="yes"?>
<gpx xmlns="http://www.topografix.com/GPX/1/0" version="1.0" creator="TopoFusion 2.85" xmlns:TopoFusion="http://www.TopoFusion.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd http://www.TopoFusion.com http://www.TopoFusion.com/topofusion.xsd">
<name><![CDATA[Pacific Crest Trail section K hike 4]]></name><desc><![CDATA[Pacific Crest Trail section K hike 4. Five Lakes to Old Highway 40 near Donner. As described in Day Hikes on the PCT California edition by George & Patricia Semb. See pages 150-152 for access and exit trailheads. GPS data provided by the USFS]]></desc><author><![CDATA[MikeOnTheTrail]]></author><email><![CDATA[[email protected]]]></email><url><![CDATA[http://www.pcta.org]]></url>
<urlname><![CDATA[Pacific Crest Trail Association Homepage]]></urlname>
<time>2006-07-08T02:16:05Z</time>
Edited to add more info:
I have a GPX file that's a few thousand lines. If I paste it into the body of the message I can parse it correctly, like so:
gpxcontent = message.bodies(content_type='text/plain')
for x in gpxcontent:
gpxcontent = x[1].decode()
for event, elem in ET.iterparse(StringIO.StringIO(gpxcontent), events=("start", "start-ns")):
If I attach it to the mail as an attachment, using Gmail. And then extract it like so:
if isinstance(message.attachments, tuple):
attachments = [message.attachments]
gpxcontent = attachments[0][3].decode()
for event, elem in ET.iterparse(StringIO.StringIO(gpxcontent), events=("start", "start-ns")):
I get the error above. Line 3 column 10 seems to be the start of ![CDATA on the third line.