tags:

views:

266

answers:

3

Because the Twisted getPage function doesn't give me access to headers, I had to write my own getPageWithHeaders function.

def getPageWithHeaders(contextFactory=None, *args, **kwargs):
    try:
        return _makeGetterFactory(url, HTTPClientFactory,
                                  contextFactory=contextFactory,
                                  *args, **kwargs)
    except:
        traceback.print_exc()

This is exactly the same as the normal getPage function, except that I added the try/except block and return the factory object instead of returning the factory.deferred

For some reason, I sometimes get a maximum recursion depth exceeded error here. It happens consistently a few times out of 700, usually on different sites each time. Can anyone shed any light on this? I'm not clear why or how this could be happening, and the Twisted codebase is large enough that I don't even know where to look.

EDIT: Here's the traceback I get, which seems bizarrely incomplete:

Traceback (most recent call last):
  File "C:\keep-alive\utility\background.py", line 70, in getPageWithHeaders
    factory = _makeGetterFactory(url, HTTPClientFactory, timeout=60 , contextFactory=context, *args, **kwargs)
  File "c:\Python26\lib\site-packages\twisted\web\client.py", line 449, in _makeGetterFactory
    factory = factoryFactory(url, *args, **kwargs)
  File "c:\Python26\lib\site-packages\twisted\web\client.py", line 248, in __init__
    self.headers = InsensitiveDict(headers)
RuntimeError: maximum recursion depth exceeded

This is the entire traceback, which clearly isn't long enough to have exceeded our max recursion depth. Is there something else I need to do in order to get the full stack? I've never had this problem before; typically when I do something like

def f(): return f()
try: f()
except: traceback.print_exc()

then I get the kind of "maximum recursion depth exceeded" stack that you'd expect, with a ton of references to f()

+1  A: 

You should look at the traceback you're getting together with the exception -- that will tell you what function(s) is/are recursing too deeply, "below" _makeGetterFactory. Most likely you'll find that your own getPageWithHeaders is involved in the recursion, exactly because instead of properly returning a deferred it tries to return a factory that's not ready yet. What happens if you do go back to returning the deferred?

Alex Martelli
I've added the traceback to my question. As for going back to getPage, interestingly, I sometimes call getPage and sometimes call getPageWithHeaders, and this problem never occurs with getPage. So I'm guessing that my function is somehow causing this problem, though I can't possibly see how.
Eli Courtwright
A: 

The URL opener is likely following an un-ending series of 301 or 302 redirects.

Ted Dziuba
+2  A: 

The specific traceback that you're looking at is a bit mystifying. You could try traceback.print_stack rather than traceback.print_exc to get a look at the entire stack above the problematic code, rather than just the stack going back to where the exception is caught.

Without seeing more of your traceback I can't be certain, but you may be running into the problem where Deferreds will raise a recursion limit exception if you chain too many of them together.

If you turn on Deferred debugging (from twisted.internet.defer import setDebugging; setDebugging(True)) you may get more useful tracebacks in some cases, but please be aware that this may also slow down your server quite a bit.

Glyph