views:

170

answers:

2

I'm trying to test some python code that uses urllib2 and lxml.

I've seen several blog posts and stack overflow posts where people want to test exceptions being thrown, with urllib2. I haven't seen examples testing successful calls.

Am I going down the correct path?

Does anyone have a suggestion for getting this to work?

Here is what I have so far:

import mox
import urllib
import urllib2
import socket
from lxml import etree

# set up the test
m = mox.Mox()
response = m.CreateMock(urllib.addinfourl)
response.fp = m.CreateMock(socket._fileobject)
response.name = None # Needed because the file name is checked.
response.fp.read().AndReturn("""<?xml version="1.0" encoding="utf-8"?>
<foo>bar</foo>""")
response.geturl().AndReturn("http://rss.slashdot.org/Slashdot/slashdot")
response.read = response.fp.read # Needed since __init__ is not called on addinfourl.
m.StubOutWithMock(urllib2, 'urlopen')
urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
m.ReplayAll()

# code under test
response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
# Note: response2.fp.read() and response2.read() do not behave the same, as defined above.
# In [21]: response2.fp.read()
# Out[21]: '<?xml version="1.0" encoding="utf-8"?>\n<foo>bar</foo>'
# In [22]: response2.read()
# Out[22]: <mox.MockMethod object at 0x97f326c>
xcontent = etree.parse(response2)

# verify test
m.VerifyAll()

It fails with:

Traceback (most recent call last):
  File "/home/jon/mox_question.py", line 22, in <module>
    xcontent = etree.parse(response2)
  File "lxml.etree.pyx", line 2583, in lxml.etree.parse (src/lxml/lxml.etree.c:25057)
  File "parser.pxi", line 1487, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:63708)
  File "parser.pxi", line 1517, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:63999)
  File "parser.pxi", line 1400, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:62985)
  File "parser.pxi", line 990, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:60508)
  File "parser.pxi", line 542, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:56659)
  File "parser.pxi", line 624, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:57472)
  File "lxml.etree.pyx", line 235, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:6222)
  File "parser.pxi", line 371, in lxml.etree.copyToBuffer (src/lxml/lxml.etree.c:55252)
TypeError: reading from file-like objects must return byte strings or unicode strings

This is because response.read() does not return what I expected it to return.

A: 

It looks like your failure isn't related to mox at all - the line causing the error is reading from response2, which is a direct call to slashdot. Perhaps inspect that object and see what it's content is?

EDIT: I didn't see the m.StubOutWithMock(urllib2, 'urlopen') line above, so I thought you were comparing two calls; one mocked (response) and one not (response2). An updated answer is below.

Anthony Briggs
If you look at urllib.py, self.read is set equal to self.fp.read. These two calls should return the same data. From my comments in the code, self.fp.read is returning a string, while self.read is returning <mox.MockMethod object>. This is because `__init__` is not called on addinfourl, so I added the method assignment into my test code, and it doesn't return what I expect.
jmkacz
What happens if you define it explicitly? ie. instead of:`response.read = response.fp.read`use:`response.read().AndReturn("""<?xml version="1.0" encoding="utf-8"?><foo>bar</foo>""")`It's possible that mox is doing some sort of magic behind the scenes when you're calling the .read() method.
Anthony Briggs
I had tried that, but the low-level library calls read with the number of bytes it wants, so you can't really mock out read in this case.
jmkacz
A: 

I wouldn't delve into urllib2 internals at all. It's beyond the scope of what you care about I think. Here's a simple way to do it with StringIO. The key thing here is that what you intent to parse as XML just needs to be file-like in terms of duck typing, it doesn't need to be an actual addinfourl instance.

import StringIO
import mox
import urllib2
from lxml import etree

# set up the test
m = mox.Mox()
response = StringIO.StringIO("""<?xml version="1.0" encoding="utf-8"?>
<foo>bar</foo>""")
m.StubOutWithMock(urllib2, 'urlopen')
urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
m.ReplayAll()

# code under test
response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
xcontent = etree.parse(response2)

# verify test
m.VerifyAll()
Peter Lyons
Thanks Peter. One more twist. What if I also wanted to check the response code? So, if (response2.getcode() == 200): parse; else: raise an exception.
jmkacz
I added `response.getcode = lambda: 200` after defining response, and it seems to be working.
jmkacz
OK, great. None of this is going to win any awards for elegance, but it gets the job done.
Peter Lyons