views:

538

answers:

8

I've created the following regex pattern in an attempt to match a string 6 characters in length ending in either "PRI" or "SEC", unless the string = "SIGSEC". For example, I want to match ABCPRI, XYZPRI, ABCSEC and XYZSEC, but not SIGSEC.

(\w{3}PRI$|[^SIG].*SEC$)

It is very close and sort of works (if I pass in "SINSEC", it returns a partial match on "NSEC"), but I don't have a good feeling about it in its current form. Also, I may have a need to add more exclusions besides "SIG" later and realize that this probably won't scale too well. Any ideas?

BTW, I'm using System.Text.RegularExpressions.Regex.Match() in C#

Thanks, Rich

+5  A: 

Assuming your regex engine supports negative lookaheads, try this:

((?!SIGSEC)\w{3}(?:SEC|PRI))

Edit: A commenter pointed out that .NET does support negative lookaheads, so this should work fine (thanks, Charlie).

Dan
.NET regular expressions do support negative lookaheads, so this will work
Charlie
Ah, good to know, thanks Charlie. I'm really not a .NET guy ;)
Dan
This works perfectly Dan, thanks! Ran a quick test and it will be trivial to add the additional exclusion matches.
Rich
As a side note, .Net regex supports unlimited length lookaround on all kinds of lookarounds. Actually .Net regex and JGsoft engines are the only regex engines that allow "full regular expressions inside lookbehind"
Pop Catalin
A: 

Personally, I'd be inclined to build-up the exclusion list using a second variable, then include it into the full expression - it's the approach I've used in the past when having to build any complex expression.

Something like exclude = 'someexpression'; prefix = 'list of prefixes'; suffix = 'list of suffixes'; expression = '{prefix}{exclude}{suffix}';

warren
A: 

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." -Jamie Zawinski

Matt Cruikshank
A: 

You may not even want to do the exclusions in the regex. For example, if this were Perl (I don't know C#, but you can probably follow along), I'd do it like this

if ( ( $str =~ /^\w{3}(?:PRI|SEC)$/ ) && ( $str ne 'SIGSEC' ) )

to be clear. It's doing exactly what you wanted:

  • Three word characters, followed by PRI or SEC, and
  • It's not SIGSEC

Nobody says you have to force everything into one regex.

Andy Lester
I agree, this is probably the most sensible way to do it. However it looks like he's trying to extract these things from text with a regular expression - not having to worry about dealing with matches you don't want could potentially lead to a cleaner solution.
Dan
+1  A: 

You can try this one:

@"\w{3}(?:PRI|(?<!SIG)SEC)"
  • Matches 3 "word" characters
  • Matches PRI or SEC (but not after SIG i.e. SIGSEC is excluded) (? < !x)y - is a negative lookbehind (it mathces y if it's not preceded by x)

Also, I may have a need to add more exclusions besides "SIG" later and realize that this probably won't scale too well

Using my code, you can easily add another exceptions, for example following code excludes SIGSEC and FOOSEC

@"\w{3}(?:PRI|(?<!SIG|FOO)SEC)"
aku
+2  A: 

To help break down Dan's (correct) answer, here's how it works:

(           // outer capturing group to bind everything
 (?!SIGSEC) // negative lookahead: a match only works if "SIGSEC" does not appear next
 \w{3}      // exactly three "word" characters
 (?:        // non-capturing group - we don't care which of the following things matched
   SEC|PRI  // either "SEC" or "PRI"
 )
)

All together: ((?!SIGSEC)\w{3}(?:SEC|PRI))

Charlie
Nicely summarised :)
Dan
Thanks for the fixup of my final listing.
Charlie
+1  A: 

Why not use more readable code? In my opinion this is much more maintainable.

private Boolean HasValidEnding(String input)
{
    if (input.EndsWith("SEC",StringComparison.Ordinal) || input.EndsWith("PRI",StringComparison.Ordinal))
    {
        if (!input.Equals("SIGSEC",StringComparison.Ordinal))
        {
            return true;
        }
    }
    return false;
}

or in one line

private Boolean HasValidEnding(String input)
{
    return (input.EndsWith("SEC",StringComparison.Ordinal) || input.EndsWith("PRI",StringComparison.Ordinal)) && !input.Equals("SIGSEC",StringComparison.Ordinal);
}

It's not that I don't use regular expressions, but in this case I wouldn't use them.

Davy Landman
Yep, I had actually started with something exactly along those lines but the requirements changed and I decided to externalize the logic. I opted for using a regex inside a config file so as not to have to make code changes when new exclusion strings need to be added.
Rich
A: 

Go and get Regexbuddy from RegExBuddy.com it is an amazingly simple tool that will help you figure out the most complicated regex easily.

Toby Allen