tags:

views:

77

answers:

4

Suppose I have variables

$x1 = 'XX a b XX c d XX';
$x2 = 'XX a b XX c d XX e f XX';

I want a regular expression that will find each instance of letters between XX. I'm looking for a general solution, because I don't know how many XX's there are.

I tried using /XX(.*?)XX/g but this only matches "a b" for x1 and "a b", "e f" for x2 because once the first match is found, the engine has already read the second "XX".

Thanks for any help.

+7  A: 

Try using a positive lookahead:

/XX(.*?)(?=XX)/
Sjoerd
Beautiful, thanks!
itzy
+3  A: 

you can use split

@stuff_between_xx = split /XX/, $x1;

number of matches:

$stuff_between_xx = split /XX/, $x1;
knittl
Thanks, that'll work. It's funny how you get stuck thinking in one way, and don't see obvious solutions. But I am curious if anyone has another solution that would work just with regex -- mostly so I can learn.
itzy
This assigns to `$stuff_between_xx` the **number** of parts found
kemp
@kemp: whops, corrected
knittl
A: 
my $x2 = 'XX a b XX c d XX e f XX';

my @parts = grep { $_ ne '' } split /\s*XX\s*/, $x2;
kemp
+3  A: 

I would suggest split as well as knittl. But you might want to remove the whitespace as well:

my @stuff = split /\s*XX\s*/, $line;

Also you could use lookaheads, but you really don't need them, because you can use reasonably complex alternations as well:

Non-ws version would just be:

my @stuff = $line =~ m/XX((?:[^X]|X[^X])*)/g; 

The alternation says that you'll take anything if it's not an 'X'--but you will take an 'X' if it's not followed by another 'X'. There will be one character of lookahead, but it can consume characters aggressively, without backtracking.

The trimming version will have to backtrack to get rid of space characters, so the expression is uglier.

my @stuff = $line =~ m/XX\s*((?:[^X]|X[^X])*?(?:[^X\s]|X[^X]))/g;
Axeman