tags:

views:

121

answers:

2

Using a Perl or unix regex, how would I capture a word that is not a range of values. Here is what I am trying to achieve.

(\w:not('int','long'))
+9  A: 

Not sure if this is valid perl syntax, but in "generic" flavor you can say

/\b(?!int\b|long\b)\w+\b/

If you want to capture the word, put parens around \w+, like this

/\b(?!int\b|long\b)(\w+)\b/
harpo
That's what I would have done.
Clement Herreman
@Adam: don't you need `(\w+)` to capture what we want to find instead? Or am I confused? (I.e., you need the parens to have the item available in `$1`, no?)
Telemachus
@Telemachus: Whoops, you are right! I need more coffee. I was reading "match" where the letters clearly spelled "capture".
Adam Bellaire
So would the syntax be /\b(?!int\b|long\b)(\w+)\b/ or /\b((?!int\b|long\b)\w+)\b/?
Nat Ryall
Yes, you can add the parens around \w+ if you want to grab the word from $1.
harpo
The final \b is redundant. \w+ must match at least one word character, and since it's at the end of the regex, will match as many as there are up to the end of the string or a non-word character.
ysth
+6  A: 

It is generally faster to say:

my %exclude = map { $_ => 1 } qw/int long/;
my @words   = grep { not exists $exclude{$_} } /(?:\b|^) (\w+) (?:\b|$)/gx;

especially on versions of Perl prior to 5.10 (when alternation got a massive speed increase).

Chas. Owens
Nice approach. Thoroughly generic :)
Nic Gibson