ansaurus

Question

Greedy versus Non-Greedy matching in Python re

Answer 1

+3 A:

I'd recommend using yubikey-python for Python interfacing to yubikey -- but, that's a side (and strictly pragmatical) issue;-).

In theory, there should be no cases where a choice between greedy and non-greedy causes a RE to match in one case and fail in another -- it should only affects what gets matched (and as you mention performance), not whether the match succeeds at all, since REs are supposed to backtrack for the purpose.

Problem is, I cannot reproduce the problem -- I don't have a yubikey at hand and the tests in this file show no differences between the two REs' match/no-match behavior.

Could you please post a couple of failing examples (where one matches and the other one doesn't), ideally by editing your question, so I can reproduce the problem and try to cut it down to its minimum? Sounds like there may be a RE bug, but without reproducible cases I can't check if and when it's been fixed, already reported, or what. Thanks!

Edit the OP has now posted one failing example but I still can't reproduce:

$ py26
Python 2.6.5 (r265:79359, Mar 24 2010, 01:32:55) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r1 = re.compile(r'^\t?[^a-z0-9]?([cbdefghijklnrtuv1-8]{0,32})\t?([cbdefghijklnrtuv1-8]{32})\t?\r?\n?$')
>>> r2 = re.compile(r'^\t?[^a-z0-9]?([cbdefghijklnrtuv1-8]{0,32}?)\t?([cbdefghijklnrtuv1-8]{32})\t?\r?\n?$'
... )
>>> nox="vvbrentlnccnhgfgrtetilbvckjcegblehfvbihrdcui"
>>> r1.match(nox)
<_sre.SRE_Match object at 0xcc458>
>>> r2.match(nox)
<_sre.SRE_Match object at 0xcc920>
>>>

i.e., match succeeds in both cases, as it should -- and that's exactly the same 2.6.5 Python version as the OP is using. OP, pls, show the results of this simple sequence of commands on your platform and tell us exactly what the platform is, since it looks like a weird platform-dependent bug... thanks!

Alex Martelli 2010-08-01 16:12:03

@FM, yep, tx, fixing now.

Alex Martelli 2010-08-01 17:12:07

Alex, even though mine was a non-question, I've accepted your answer as being the most thoughtful and informative. No reflection on other answers, though!

Brent.Longborough 2010-08-15 08:38:00

Answer 2

A:

You're right: simply switching from greedy to non-greedy quantifiers should not cause a regex to stop working. It can change how quickly the regex matches (or fails to match), how much it matches, and which parts get captured in which groups, that's all.

(The following "solution" is not applicable, but the question still doesn't indicate that a case-insensitive match is being performed, so I'll leave it.)

Your problem is that the strings with the optional extras also have uppercase letters in them, and your regex only allows for lowercase letters. Stick a (?i) on the front or the regex and it works just fine.

Alan Moore 2010-08-01 17:23:32

@Alan, but the OP said he's experiencing failures only on strings **without** the optional extras, so the presence of uppercase **in** the optional extras seems irrelevant to his reported problem.

Alex Martelli 2010-08-01 17:28:08

@Alex: So he did, but when I tested it, the "without" string *matched*, and the "with" strings didn't. It seems I unconsciously revised the question to fit the observed behavior. Are we being scammed, or what? ;)

Alan Moore 2010-08-01 17:34:45

@Alan: Sorry, in the interests of keeping it (too) simple, I omitted the ",re.I".

Brent.Longborough 2010-08-01 19:03:11

ansaurus

tags:

views:

answers:

Greedy versus Non-Greedy matching in Python re

related questions