ansaurus

Question

How can I match everything that is after the last occurence of some char in a perl regular expression?

Answer 1

+7 A:

my($substr) = $string =~ /.*x(.*)/;

From perldoc perlre:

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match.

That's why .*x will match up to the last occurence of x.

eugene y 2010-08-09 10:30:24

Answer 2

+3 A:

The simplest way would be to use /([^x]*)$/

reko_t 2010-08-09 10:32:32

Answer 3

+4 A:

the first answer is a good one, but when talking about "something that does not contain"... i like to use the regex that "matches" it

my($substr) = $string =~ /.*x([^x]*)$/;

very usefull in some case

benzebuth 2010-08-09 10:35:47

+1 indeed very useful

David B 2010-08-09 10:53:28

You don't need the leading '.*x' in the regex.

runrig 2010-08-09 18:00:32

that's right, thanks, but that was to illustrate the "something that doeas not contain x"

benzebuth 2010-08-10 07:48:24

Answer 4

+1 A:

Regular Expression : /([^x]+)$/ #assuming x is not last element of the string.

Nikhil Jain 2010-08-09 10:37:30

Answer 5

A:

the simplest way is not regular expression, but a simple split() and getting the last element.

$string="axxxghdfx445";
@s = split /x/ , $string;
print $s[-1];

ghostdog74 2010-08-09 10:38:44

Is it really worth it to create a temporary array? `print (split /x/, $string)[-1];` would work just as well.

Zaid 2010-08-09 11:04:58

Depends right? What if I want to use the other elements ?

ghostdog74 2010-08-09 11:51:21

Answer 6

+3 A:

Yet another way to do it. It's not as simple as a single regular expression, but if you're optimizing for speed, this approach will probably be faster than anything using regex, including split.

my $s     = 'axxxghdfx445';
my $p     = rindex $s, 'x';
my $match = $p < 0 ? undef : substr($s, $p + 1);

FM 2010-08-09 12:06:11

Answer 7

+2 A:

I'm surprised no one has mentioned the special variable that does this, $': "$'" returns everything after the matched string. (perldoc perlre)

my $str = 'axxxghdfx445';
$str =~ /x/;

# $' contains '445';
print $';

However, there is a cost (emphasis mine):

WARNING: Once Perl sees that you need one of $&, "$", or "$'" anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program. Perl uses the same mechanism to produce $1, $2, etc, so you also pay a price for each pattern that contains capturing parentheses. (To avoid this cost while retaining the grouping behaviour, use the extended regular expression "(?: ... )" instead.) But if you never use $&, "$" or "$'", then patterns without capturing parentheses will not be penalized. So avoid $&, "$'", and "$`" if you can, but if you can't (and some algorithms really appreciate them), once you've used them once, use them at will, because you've already paid the price. As of 5.005, $& is not so costly as the other two.

But wait, there's more! You get two operators for the price of one, act NOW!

As a workaround for this problem, Perl 5.10.0 introduces "${^PREMATCH}", "${^MATCH}" and "${^POSTMATCH}", which are equivalent to "$`", $& and "$'", except that they are only guaranteed to be defined after a successful match that was executed with the "/p" (preserve) modifier. The use of these variables incurs no global performance penalty, unlike their punctuation char equivalents, however at the trade-off that you have to tell perl when you want to use them.

my $str = 'axxxghdfx445';
$str =~ /x/p;

# ${^POSTMATCH} contains '445';
print ${^POSTMATCH};

I would humbly submit that this route is the best and most straight-forward approach in most cases, since it does not require that you do special things with your pattern construction in order to retrieve the postmatch portion, and there is no performance penalty.

Ether 2010-08-09 15:01:06

Except you are catching the first occurance of 'x', not the last.

runrig 2010-08-09 18:03:28

@runrig: lol, way to derail my entire thesis :) Indeed, in this particular case, `/x([^x]*)$/` would be better... I got too caught up in describing `$'`, which everyone else had overlooked.

Ether 2010-08-09 21:41:16

ansaurus

tags:

views:

answers:

How can I match everything that is after the last occurence of some char in a perl regular expression?

related questions