views:

103

answers:

2

What I am trying to do: Parse a query for a leading or trailing ? which will result in a search on the rest of the string.

"foobar?" or "?foobar" results in a search. "foobar" results in some other behavior.

This code works as expected in the interpreter:

 >>> import re
 >>> print re.match(".+\?\s*$","foobar?")
 <_sre.SRE_Match object at 0xb77c4d40>
 >>> print re.match(".+\?\s*$","foobar")
 None

This code from a Django app does not:

doSearch = { "text":"Search for: ", "url":"http://www.google.com/#&amp;q=QUERY", "words":["^\?\s*",".+\?\s*$"] }
...
subQ = myCore.lookForPrefix(someQuery, doSearch["words"])
...
def lookForPrefix(query,listOfPrefixes):
    for l in listOfPrefixes:
        if re.match(l, query):
            return re.sub(l,'', query)
    return False

The Django code never matches the trailing "?", all other regexs work fine.

And ideas about why not?

+3  A: 

The problem is in your second regex. It matches the whole query, so using re.sub() will replace it all with an empty string. I.e. lookForPrefix('foobar?',listOfPrefixes) will return ''. You are likely checking the return value in an if, so it evaluates the empty string as false.

To solve this, you just need to change the second regex to \?\s*$ and use re.search() instead of re.match(), as the latter requires that your regex matches from the beginning of the string.

doSearch = { "text":"Search for: ", "url":"http://www.google.com/#&amp;q=QUERY", "words":["^\?\s*","\?\s*$"] }

def lookForPrefix(query,listOfPrefixes):
    for l in listOfPrefixes:
        if re.search(l, query):
            return re.sub(l,'', query)
    return False

The result:

>>> lookForPrefix('?foobar', doSearch["words"])
'foobar'
>>> lookForPrefix('foobar?', doSearch["words"])
'foobar'
>>> lookForPrefix('foobar', doSearch["words"])
False

EDIT: In fact, you might as well combine the two regexes into one: ^\?\s*|\?\s*$. That will work equally well.

Max Shawabkeh
That works. I still don't understand why. Even if I use "^.+\?\s*$" and re.match() is does not work. Shouldn't that expression match and string with one or more characters followed by a ? and any number of spaces then the end of string? Thanks!
Art
The problem is not in the match, as it does match, but in the replacement, since in that case it replaces the whole string instead of just the trailing question mark, returning an empty string as a result.
Max Shawabkeh
Ahhhhhh! Yes, I see. Thank you so much.
Art
A: 

You probably want to use raw strings for regexes, such as: r'^\s\?'. Regular strings will prevent problems with escaped characters becoming other values (r'\0' is the same as '\0', but different from '\0' (a single null character)).

Also r'^\?\s*|\?\s*$' will NOT work as intended by Max S. because the | is alternating between "\s* and \?. The regex proposed in the EDIT interprets to: question mark at the beginning of the line followed by any number of spaces OR a question mark, followed by any number of spaces and the end of the line.

I believe Max S. intended: r'(^\?\s*)|(\?\s*$)', which interprets to: a question mark followed by any number of spaces at the beginning or end of the line.

Sean Reifschneider
I'm afraid you are wrong about the pipe. A pipe outside of brackets will separate the whole regex, meaning "try everything to the left of the pipe, and if that fails, try everything to the right". For example, `re.findall('^a.|d.$', 'abacde')` returns `['ab', 'de']`.
Max Shawabkeh