tags:

views:

467

answers:

8

Sorry if this is redundant.

I need to match an expression in python with regular expressions that only matches even number of letter occurances. For example:

# Even number of A's
AAA # no match
AA # match
fsfaAAasdf #match
sAfA  # match
sdAAewAsA # match
AeAiA  # no match

EDIT: Sorry for confusion. You are right - even number of A's SHOULD match. Like I said - was tired. Thank you

EDIT2: If you down-vote my question - please comment on what the problem is. I want to make it as clear and as useful as possible to people. Thank you.

+2  A: 

'A*' means match any number of A's. Even 0.

Here's how to match a string with an even number of a's, upper or lower:

re.compile(r'''
    ^
    [^a]*
    (
        (
            a[^a]*
        ){2}
    # if there must be at least 2 (not just 0), change the
    # '*' on the following line to '+'
    )* 
    $
    ''',re.IGNORECASE|re.VERBOSE)

You probably are using a as an example. If you want to match a specific character other than a, replace a with %s and then insert

[...]
$
'''%( other_char, other_char, other_char )
[...]
Ross Rogers
0 is an even number, too. But I suspect something like "(AA)+" is more like what the OP is looking for.
Paul McGuire
Ooops, I just reread the OP's test cases, and he wants non-contiguous letters, too. Better just to iterate over the string and count.
Paul McGuire
Actually, reading this again, it looks like he wants to match a string with doubled As, and no single As anywhere.
Anon.
Thanks, i corrected the question, sry for confusion.
drozzy
matching 0 A's is, however, a tautology.
Alex Brown
The author of the question originally asked why his `a*` regex was matching a string like `bb`. And some days matching 0 `a`'s, isn't super obvious :-)
Ross Rogers
A: 

First of all, note that /A*/ matches the empty string.

Secondly, there are some things that you just can't do with regular expressions. This'll be a lot easier if you just walk through the string and count up all occurences of the letter you're looking for.

Anon.
Yes, easier and faster. But Regexps are too fun.
Tom Leys
A: 

A* means match "A" zero or more times.

For an even number of "A", try: (AA)+

FrustratedWithFormsDesigner
This will only match strings containing a pair but also a non-pair, i.e AlalaAA will match
Tom Leys
+6  A: 

Try this regular expression:

^[^A]*((AA)+[^A]*)*$

And if the As don’t need to be consecutive:

^[^A]*(A[^A]*A[^A]*)*$
Gumbo
+1 Looks correct - find any number of non-A characters, followed by pairs of AA with other chars in between. However, it would be slightly more efficient if the (AA)* was (AA)+ ^[^A]*((AA)+[^A]*)*$ . If the original poster wants "ab" to not match the regexp will need a + on the end too, i.e ^[^A]*((AA)+[^A]*)+$ to force at least one pair.
Tom Leys
Second version seems to work for me!
drozzy
A: 

It's impossible to count arbitrarily using regular expressions. For example, making sure that you have matching parenthesis. To count you need 'memory' which requires something at least as strong as a pushdown automaton, although in this case you can use the regular expression that @Gumbo provided.

The suggestion to use finditeris the best workaround for the general case.

Kaleb Pederson
However the question does not require matching. It is not like he is asking to have as many As as Bs or a similar counting problem.
Tom Leys
Yeah, I misread :(. My fault.
Kaleb Pederson
+1  A: 

'*' means 0 or more occurences 'AA' should do the trick.

The question is if you want the thing to match 'AAA'. In that case you would have to do something like:

r = re.compile('(^|[^A])(AA)+(?!A)',)
r.search(p)

That would work for match even (and only even) number of'A'.

Now if you want to match 'if there is any even number of subsequent letters', this would do the trick:

re.compile(r'(.)\1')

However, this wouldn't exclude the 'odd' occurences. But it is not clear from your question if you really want that.

Update: This works for you test cases:

re.compile('^([^A]*)AA([^A]|AA)*$')
ondra
A: 

This searches for a block with an odd number of A's. If you found one, the string is bad for you:

(?<!A)A(AA)*(?!A)

If I understand correctly, the Python code should look like:

if re.search("(?<!A)A(AA)*(?!A)", "AeAAi"):
   print "fail"
Kobi
A: 
Bryan Oakley
Sry mate - need regex
drozzy
why do you need regex?
Bryan Oakley
cause it's faster, more reusable, more compact, cooler. Also it's my question, and i'm not asking about the BEST way of doing it, just how to do it with regex.
drozzy
faster? How do you know? More resuable? Debatable. More compact? Not as important as readability. Cooler? Who cares? A few years from now you will likely regret having to revisit some of your "cool" code. Now, the last part of your comment I can agree with -- you didn't ask for the best way. For a random person like me it's hard to tell, since most people ultimately want the best solution but just don't know how to ask, so people like me generally try to steer people in what we think is the right direction. Fair enough.
Bryan Oakley
thanks for caring.
drozzy
? I said thanks and upvoted the comment. Stop being cynical.
drozzy