views:

90

answers:

2

I have a function with a python doctest that fails because one of the test input strings has a backslash that's treated like an escape character even though I've encoded the string as a raw string.

My doctest looks like this:

>>> infile = [ "Todo:        fix me", "/** todo: fix", "* me", "*/", r"""//\todo      stuff to fix""", "TODO fix me too", "toDo bug 4663" ]
>>> find_todos( infile )
['fix me', 'fix', 'stuff to fix', 'fix me too', 'bug 4663']

And the function, which is intended to extract the todo texts from a single line following some variation over a todo specification, looks like this:

todos = list()
for line in infile:
    print line
    if todo_match_obj.search( line ):
        todos.append( todo_match_obj.search( line ).group( 'todo' ) )

And the regular expression called todo_match_obj is:

r"""(?:/{0,2}\**\s?todo):?\s*(?P<todo>.+)"""

A quick conversation with my ipython shell gives me:

In [35]: print "//\todo"
//      odo

In [36]: print r"""//\todo"""
//\todo

And, just in case the doctest implementation uses stdout (I haven't checked, sorry):

In [37]: sys.stdout.write( r"""//\todo""" )
//\todo

My regex-foo is not high by any standards, and I realize that I could be missing something here.

EDIT: Following Alex Martellis answer, I would like suggestions on what regular expression would actually match the blasted r"""//\todo fix me""". I know that I did not originally ask for someone to do my homework, and I will accept Alex's answer as it really did answer my question (or confirm my fears). But I promise to upvote any good solutions to my problem here :)

EDITEDIT: for reference, a bug has been filed with the kodos project: bug #437633

I'm using Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15)

Thank you for reading this far (If you skipped directly down here, I understand)

+2  A: 

Read your original regex carefully:

r"""(?:/{0,2}\**\s?todo):?\s*(?P<todo>.+)"""

It matches: zero to two slashes, then 0+ stars, then 0 or 1 "whitespace characters" (blanks, tabs etc), then the literal characters 'todo' (and so on).

Your rawstring is:

r"""//\todo      stuff to fix"""

so there's a literal backslash between the slashes and the 'todo', therefore of course the regex doesn't match it. It can't -- nowhere in that regex are you expressing any desire to optionally match a literal backslash.

Edit: A RE pattern, very close to yours, that would accept and ignore an optional backslash right before the 't' would be:

r"""(?:/{0,2}\**\s?\\?todo):?\s*(?P<todo>.+)"""

note that the backslash does have to be repeated, to "escape itself", in this case.

Alex Martelli
Well, this error space _was_ mentioned in my post ;)Also, what confuses me is that kodos (the Python Regex debugger) thinks that the above regular expression matches that raw string. This does not, of course, make my regular expression any better.
Steen
@Steen, it looks like you've found a minor bug in kodos, and I suggest you report on that on their bug tracker.
Alex Martelli
@Alex Yes, unfortunately it looks like it. I'll go bug them.
Steen
Yes! Beautiful. Thanks for your correction to my regex.
Steen
A: 

This gets even more strange as I venture down the road of doctests.

Consider this python script.

If you uncomment the lines 22 and 23, the script passes just fine, as the method returns True, which is both asserted and explicitly compared.

But if you run the file as it stands in the link, the doctest will fail with the message:

% python doctest_test.py                                                                                                          
**********************************************************************
File "doctest_test.py", line 3, in __main__.doctest_test
Failed example:
    doctest_test( r"""//    odo""" )
Exception raised:
    Traceback (most recent call last):
      File "/usr/lib/python2.6/doctest.py", line 1241, in __run
        compileflags, 1) in test.globs
      File "<doctest __main__.doctest_test[0]>", line 1, in <module>
        doctest_test( r"""//    odo""" )
      File "doctest_test.py", line 14, in doctest_test
        assert input_string == compare_string
    AssertionError
**********************************************************************
1 items had failures:
   1 of   1 in __main__.doctest_test
***Test Failed*** 1 failures.

Can someone enlighten me here?

I'm still using python 2.6.4 for this.

I'm placing this answer under 'community wiki', as it really does not reputation-wise relate to the question.

Steen