tags:

views:

227

answers:

6

I want to match all lines that have any uppercase characters in them but ignoring the string A_

To add to the complication I want to ignore everything after a different string, e.g. an open comment

Here are examples of what should and shouldnt match

Matches:

  • fooBar
  • foo Bar foo
  • A_fooBar
  • fooBar /* Comment */

Non Matches (C_ should not trigger a match)

  • A_foobar
  • foo A_bar
  • foobar
  • foo bar foo bar
  • foobar /* Comment */

thanks :)

+1  A: 

Try:

(?<!A_)[a-zA-Z]+

(?!...) is called a negative lookbehind.

As for your specific problem, it's kind of cheating but try:

^([#\.]|(?<!A_))[A-Za-z]{2,}

I get:

fooBar => fooBar
foo Bar foo => foo
A_fooBar (no match)
fooBar /* Comment */ => fooBar
A_foobar (no match)
foo A_bar => foo
foobar => foobar
foo bar foo bar => foo
foobar /* Comment */ => foobar
cletus
thanks, but i dont want to match [a-zA-Z]. This is what i have so far ^([A-Z]|[#.])[^{]*?(?<=[A-Z]) now i need to exclude any matches that have A_ as their only uppercase characters
Alan
it matches CSS incase you are wondering
Alan
That expression doesn't make a lot of sense. I'm running a little test and it only matches the ones with A at the front.
cletus
the expression above only matches the uppercase characters, it checks the start of the line and allows for it to start with # or a . (as its CSS) [A-Z]|[#.]. Then it does anything other than a {, stops and looks behind for any uppercase characters. This works fine for me. The only complication i have now is preventing it from matching if it sees A_ when it looks back
Alan
A: 

This one does it, although the comment handling isn't extremely robust. (It assumes that a comment is always at the end of the line.)

.*((A(?!_)|([B-Z]))(?<!/\*.*)).*\r\n
Mike Hanson
This looks pretty promising Mike, thanks. I think its falling down when there are multiple _, still looking into it
Alan
+1  A: 

Does it have to be a single regex? In perl, you could do something like:

if ($string =~ /[A-Z]/ && $string !~ /A_/)

Its not as cool as a single expression with lookback, but its probably easier to read and maintain.

SDGator
thanks SDGator, i dont think i have the ability to do that
Alan
+1  A: 

My answer:

/([B-Z]|A[^_]|A$)/

I would remove the comment at an earlier stage, if at all possible.

Test:

#!perl
use warnings;
use strict;

my @matches = (
"fooBar",
"foo Bar foo",
"A_fooBar",
"fooBar /* Comment */");

my @nomatches = (
"A_foobar",
"foo A_bar",
"foobar",
"foo bar foo bar",
"foobar /* Comment */");

my $regex = qr/([B-Z]|A[^_]|A$)/;

for my $m (@matches) {
    $m =~ s:/\*.*$::;
    die "FAIL $m" unless $m =~ $regex;
}
for my $m (@nomatches) {
    $m =~ s:/\*.*$::;
    die "FAIL $m" unless $m !~ $regex;
}

Try it: http://codepad.org/EJhWtqkP

Kinopiko
thanks Kinopiko, love the simplicity of your solution. I am writing expressions for use in static code analysis, so i wont actualy be removing anything. This is why i dont want to do a match inside a comment.
Alan
Just copy the string and do a match on the copied one.
Kinopiko
+1  A: 

This should (also?) do it:

(?!A_)[A-Z](?!((?!/\*).)*\*/)

A short explanation:

(?!A_)[A-Z]     # if no 'A_' can be seen, match any uppercase letter
(?!             # start negative look ahead
  ((?!/\*).)    #   if no '/*' can be seen, match any character (except line breaks)
  *             #   match zero or more of the previous match
  \*/           #   match '*/'
)               # end negative look ahead

So, in plain English:

Match any uppercase except 'A_' and also not an uppercase if '*/' can be seen without first encountering '/*'.

Bart Kiers
A: 

Try this:

^(?:[^A-Z/]|A_|/(?!\*))*+[A-Z]

This will work in any flavor that supports possessive quantifiers, e.g. PowerGrep, Java and PHP. The .NET flavor doesn't, but it does support atomic groups:

^(?>(?:[^A-Z/]|A_|/(?!\*))*)[A-Z]

If neither of those features is available, you can use another lookahead to prevent it matching the A_ on the rebound:

^(?:[^A-Z/]|A_|/(?!\*))*(?!A_)[A-Z]
Alan Moore