views:

48

answers:

3

I have strings like "AMS.I-I.D. ver.5" and "AM0011 ver. 2", of which the "ver. *" needs to be stripped. I have done that with: (^A.*)(ver.\s\d{1,2})

The problem occurs when there is no version number like "AM0003". How can I make the (ver.\s\d{1,2}) part optional?

A: 

To have an optionnal capture with regex you just have to use the ? operator.

(^A.*)(ver\.\s\d{1,2})?

Resources :

Colin Hebert
+6  A: 

The reason why it's not working when you add a question mark is because your first group is matching greedily. Try changing it to a non-greedy match and then making the second group optional:

^(A.*?)(ver\.\s\d{1,2})?$
     ^                 ^
 non-greedy        optional

Note that in both parts the only change is the addition of a question mark but the question mark has a different meaning in each case.

Also, in one of your examples there is no whitespace between the text ver. and the version number so you should consider making the whitespace optional in your regular expression.

See the regular expression in action on Rubular.

Mark Byers
Also, it might be a good idea to escape the dot.
Tim Pietzcker
@Tim Pietzcker: +1 Yes, I didn't spot that error in the original regular expression. Fixed now.
Mark Byers
A: 

Since the examples show a space between the two words (the product ID and the version information), I would expect to design a regex which uses that space to separate the parts. In Perl:

$line =~ s/^(A\S+)\s+ver\.\s?\d{1,2}/$1/;

This removes (without capturing) the version; if the version is not present, then the substitution does nothing.

$line =~ s/^(A\S+)(?:\s+(ver\.\s?\d{1,2}))?/$1/;

This is an almost trivial variation on the idea; it captures the version string if present (as well as substituting, etc. Note the subtlety that the space before the version string is included in the optional material but not captured '(?:...)?', but the version information is captured without leading spaces.

Quoting the regexes in the abstract, without tying them to the Perl context (though they're still using PCRE - Perl Compatible Regular Expression - notation), you could write:

^(A\S+)(?:\s+(ver\.\s?\d{1,2}))?
Jonathan Leffler