tags:

views:

80

answers:

1

I have created a GAE app that parses RSS feeds using cElementTree. Testing on my local installation of GAE works fine. When I uploaded this app and tried to test it, I get a SyntaxError.

The error is :

Traceback (most recent call last):
File "/base/python_lib/versions/1/google/appengine/ext/webapp/init.py", line 509, in call handler.post(*groups) File "/base/data/home/apps/palmfeedparser/1-6.339910418736930444/pipes.py", line 285, in post tree = ET.parse(urlopen(URL)) File "", line 45, in parse
File "", line 32, in parse SyntaxError: no element found: line 14039, column 45

I did what Mr.Alex Martelli suggested and it printed out the following on my local machine:

[' <ac:tag><![CDATA[Mobilit\xc3\xa4t]]></ac:tag>\n', ' </ac:tags>\n', ' <ac:images>\n', ' <ac:image ac:number="1">\n', ' <ac:asset_url ac:type="app">http://cdn.downloads.example.com/public/1198/de/images/1/A/01.png&lt;/ac:asset_url&gt;\n']

I uploaded the app and it printed out:

['      <ac:tag><![CDATA[Mobilit\xc3\xa4t]]></ac:tag>\n', '     </ac:tags>\n', '     <ac:images>\n', '      <ac:image ac:number="1">\n', '       <ac:asset_url ac:type="app">http://cdn.downloads.example.com/public/1198/de/images/1/A/01.png&lt;/ac:asset_url&gt;\n']

These lines correspond to the following lines in the RSS feed I am reading:

<ac:tags>
  <ac:tag><![CDATA[Mobilität]]></ac:tag>

 </ac:tags>
 <ac:images>
  <ac:image ac:number="1">
   <ac:asset_url ac:type="app">http://cdn.downloads.example.com/public/1198/de/images/1/A/01.png&lt;/ac:asset_url&gt;

I notice that there is a newline before the closing ac:tags. Line 14039 corresponds to this new line.

Update:

I use urllib.urlopen to access the URL of the feed. I displayed the contents it fetches both locally and on GAE proper. Locally, no content is truncated. Testing after uploading the app, shows that the feed that has 15289 lines is truncated to 14185 lines.

What method can I use to fetch this huge feed? Would urlfetch work?

Thanks in advance for your help!

A_iyer

A: 

You may have run into one of the mysterious limits placed on GAE.

Urlopen has been overridden by google to it's urlfetch method, so there shouldn't be any difference in it. (though it might be worth trying, there are a lot of hidden things in GAE)

newline characters shouldn't effect cElementTree.

Are there any other logging messages coming through in your AppEngine Logs? (Relating to the urlopen request?)

Paul
Thanks for your reply Paul! I tried with urlfetch and as you said there was no difference. the newline characters didnt affect cElementTree. The feed is truncated with a trailing ' character that causes the error.GAE doesnt show any logging messages other than the syntaxError. Are there ways of working to prevent the truncation in GAE?
A_iyer