tags:

views:

92

answers:

4

Assume the following strings:

  • A01B100
  • A01.B100
  • A01
  • A01............................B100 ( whatever between A and B )

The thing is, the numbers should be \d+, and in all of the strings A will always be present, while B may not. A will always be followed by one or more digits, and so will B, if present. What regex could I use to capture A and B's digit?

I have the following regex:

(A(\d+)).*?(B?(\d+)?)

but this only works for the first and the third case.

+1  A: 
(?ms)^A(\d+)(?:[^\n\r]*B(\d+))?$

Assuming one string per line:

  • the [^\n\r]* is a non-greedy match for any characters (except newlines) after Axx, meaing it could gobble an intermediate Byy before the last B:

    A01...B01...B23

would be matched, with 01 and 23 detected.

VonC
I think it should be without ^ and $, because A and B can be inside another string.
Geo
A: 
A\d+.*(B\d+)?

OK, so that provides something which passes all test cases... BUT it has some false positives.

A\d+(.*B\d+)?

It seems other characters should only appear if B(whatever) is after them, so use the above instead.

#perl test case hackup
@array = ('A01B100', 'A01.B100', 'A01', 'A01............................B100', 'A01FAIL', 'NEVER');
for (@array) {
print "$_\n" if $_ =~ /^A\d+(.*B\d+)?$/;
}
Autocracy
A: 
import re
m = re.match(r"A(?P<d1>\d+)\.*(B(?P<d2>\d+))?", "A01.B100")
print m.groupdict()
Evan Fosmark
+3  A: 
  • Must A precede B? Assuming yes.
  • Can B appear more than once? Assuming no.
  • Can B appear except as part of a B-number group? Assuming no.

Then,

A\d+.*?(B\d+)?

using the lazy .*? or

A\d+[^B]*(B\d+)?

which is more efficient but requires that B be a single character.

EDIT: Upon further reflection, I have parenthesized the patterns in a less-than-perfect way. The following patterns should require fewer assumptions:

A\d+(.*?B\d+)?
a\d+([^B]*B\d+)?
Thom Smith
What about if the string is A01---Bambam-B90?
Geo
I assumed that B cannot appear except as part of a B-number group. (Additionally, I am not clear on whether B is a single character 'B' or an abbreviation for another pattern.) If either of my assumptions here are untrue, then the [^B] pattern will fail. The .*? pattern should still work, though, unless I'm misunderstanding the interaction of the greedy ? and lazy *? operators.
Thom Smith
I've slightly modified the patterns to clearly express the proper interaction of the lazy and greedy operators.
Thom Smith
Thanks for the update.
Geo
I believe you want the greedy version, ".*", if you assume that the B-number group must come at the end of a line. The lazy .*? will stop at the first "B" character it encounters, whereas the greedy .* will go to the last "B" on the line (the one you want).
tgray
The pattern with correct groupings (you'll want groups \1 and \3: A(\d+).*(B(\d+))?
tgray