views:

3193

answers:

7

I have the following regex:

(?!^[&#]$)^([A-Za-z0-9-'.,&@:?!()$#/\])$

So allow A-Z, a-Z, 0-9, and these special chars '.,&@:?!()$#/\

I want to NOT match if the following set of chars is encountered anywhere in the string in this order:

&#

When I run this regex with just "&#" as input, it does not match my pattern, I get an error, great. When I run the regex with '.,&@:?!()$#/\ABC123 It does match my pattern, no errors.

However when I run it with:

'.,&#@:?!()$#/\ABC123

It does not error either. I'm doing something wrong with the check for the &# sequence.

Can someone tell me what I've don wrong, I'm not great with these things.

Thanks.

+1  A: 

I would actually do it in two parts:

  1. Check your allowed character set. To do this I would look for characters that are not allowed, and return false if there's a match. That means I have a nice simple expression:
    [^A-Za-z0-9'\.&@:?!()$#^]
  2. Check your banned substring. And since it is just a substring, I probably wouldn't even use a regex for that part.

You didn't mention your language, but if in C#:

bool IsValid(string input)
{
    return !(   input.Contains("&#")  
               || Regex.IsMatch(@"[^A-Za-z0-9'\.&@:?!()$#^]", input) 
            );
}
Joel Coehoorn
yeah I agree, and that's how I'd do it normally, but see below.
John Batdorf
A: 

I'm doing it in c# but the problem is I'm working against an SDK that takes the regex as a value I pass in from a config file, so I don't have the codebehind love. :) So I have to try and do it in one shot? I thought this was possible?

John Batdorf
A: 

Assuming Perl compatible RegExp

To not match on the string '&#':

(?![^&]*&#)^([A-Za-z0-9-'.,&@:?!()$#/\\]*)$

Although you don't need the parenthesis because you are matching the entire string.

Rob
A: 

^((?!&#)[A-Za-z0-9-'.,&@:?!()$#/\\])*$

note that the last \ is escaped (doubled) SO automatically turns \\ into \ if not in backticks

foson
+3  A: 

Borrowing a technique for matching quoted strings, remove & from your character class, add an alternative for & not followed by #, and allow the string to optionally end with &:

^((?:[A-Za-z0-9-'.,@:?!()$#/\\]+|&[^#])*&?)$

Ben Blank
BAM! You're right on the money. Thank you so much.
John Batdorf
Gumbo
A: 

I'd recommend using two regular expressions in a conditional:

if (string has sequence "&#")
     return false
else
     return (string matches sequence "A-Za-z0-9-'.,&@:?!()$#/\")

I believe your second "main" regex of

^([A-Za-z0-9-'.,&@:?!()$#/\])$"

has several errors:

  • It will test only one character in your set
  • The '\' character in regular expressions is a token indicating that the next character is part of some sort of "class" of characters (ex. '\n' = is the line feed character). The character sequence ']' is actually causing your bracketed list not to be terminated.

You may be better off using

^[A-Za-z0-9-'.,&@:?!()$#/\\]+$

Note that the slash character is represented by a double-slash.

The "+" character indicates that at least one character being tested has to match the regex; if it is fine to pass a zero-length string, replace the '+' with a '*'.

Perry Pederson
The errors you pointed out weren't entirely the OP's fault. The forum software ate a couple of asterisks and a backslash. That's what happens when you try to talk about regexes without code-ifying them.
Alan Moore
By the way, if John really had accidentally escaped the closing square bracket, the regex wouldn't even have compiled.
Alan Moore
A: 

Just FYI, although Ben Blank's regex works, it's more complicated than it needs to be. I would do it like this:

^(?:[A-Za-z0-9-'.,@:?!()$#/\\]+|&(?!#))+$

Because I used a negative lookahead instead of a negated character class, the regex doesn't need any extra help to match an ampersand at the end of the string.

Alan Moore