tags:

views:

114

answers:

6

In this string:

"<0> <<1>> <2>> <3> <4>"

I want to match all instances of "<\d{1,2}>" except those i have escaped with an extra set of triangle brackets, eg i want to match 0,2,3,4 but not 1, eg:

"<0> <<1>> <2>> <3> <4>"

I want to do this in one single regular expression but the best i could get is:

(^|[^\<])\<(?<1>\d{1,2})>([^>]|$)

Which will match 0,3,4 but not 2, eg

"<0> <<1>> <2>> <3> <4>"

Does anyone know how this can be done with a single regular expression?

+2  A: 

You can look a negative look-behind zero-width assertion:

(?<!<)<\d{1,2}>
Konrad Rudolph
Hmmm... RegexBuddy says this does not work at all?
Unkwntech
I was too eager, I forgot the first `<` sign. Try again. Perl accepts it, so RegexpBuddy should, too.
Konrad Rudolph
i still run into the same problem when i try to test both sides eg on:"<0> <<1>> <2>> <<3> <4> <5>""(?<!<)<\d{1,2}>(?!>)" will not match 2 or 3
nferr
nferr: sorry, I assumed that you only cared about the brackets before the value, not after it. In that case, Bojan's solution is the right one.
Konrad Rudolph
+5  A: 

You can also try conditionals: (?(?<=<)(<\d{1,2}>(?!>))|(<\d{1,2}>))

Bojan Resnik
Thanks, this works great!
nferr
Drop a few brackets and you will be able to assign the match to and array. @a=/(?(?<=<)<\d{1,2}>(?!>)|<\d{1,2}>)/g;
Beano
+1  A: 

Presuming that with the input set

 "<0> <<1>> <2>> <3> <4><<5>"

we want to match 0, 2, 3, 4 and 5.

The problem is that you need to use zero-width look-ahead and zero-width look-behind, but there are three cases to match, '<', '>' and '', and one not to match '<>'. Also if you want to be able to extract the marked expressions so that you can assign the match to an array, you need to avoid marking things you don't need. So I ended up with the non-elegant

use Data::Dumper;

my $a = "<0> <<1>> <2>> <3> <4><<5>";

my $brace_pair = qr/<[^<>]+>/;
my @matches = $a =~ /(?:(?<!<)$brace_pair(?!>))|(?:$brace_pair(?!>))|(?:(?<!<)$brace_pair)/g;

print Dumper(\@a);

If you wanted to cram this into a single expression - you could.

Beano
A: 

Here's an alternative to a single regex. Split it into a list at the >< boundary and then just exclude <...>.

#!/usr/bin/perl -lw

$s = "<0> <<1>> <2>> <3> <4>";

print join " ",
      map { /(\d+)/; $1 }
      grep !/^<.*>$/,
      split />\s*</, $s;
Schwern
A: 

In case you're using a regex flavor (like Java's) that supports lookarounds but not conditionals, here's another approach:

(?=(<\d{1,2}>))(?!(?<=<)\1(?=>))\1

The first lookahead ensures that you're at the beginning of a tag and captures it for later use. The subexpression in the second lookahead matches the tag again, but only if it's preceded by a < and followed by a >. Making it a negative lookahead achieves the NOT(x AND y) semantics you're looking for. Finally, the second \1 matches the tag again, this time for real (i.e., not in a lookaround).

BTW, I could have used > instead of (?=>) in the second lookahead, but I think this way is easier to read and expresses my intent better.

Alan Moore
A: 

Here is a quick and easy way to do this with Perl.

use strict;
use warnings;

my $str = "<0> <<1>> <2>> <3> <4>";
my @array = grep {defined $_} $str =~ /<<\d+>>|<(\d+)>/g;

print join( ', ', @array ), "\n";
Brad Gilbert