views:

62

answers:

3

I am trying to apply negation on regular expression in .Net. It does not work. When string has valid last name reg ex should not match. For invalid last name it should match. Valid name allows only charecters, spaces, single quote and length between 1-40. Somebody suggested to parse the XML, I don't want to do that. I know there is another way of doing this by removing the negation in reg ex and invert the match condition in code. But I don't want that too. I need pure reg ex solution for this.

Here is my code. That does match the valid last name. But I don't want to match.

string toBevalidated = @"<FirstName>SomeName</FirstName><LastName>Some</LastName><Address1>Addre1</Address1>";
        var regex = new Regex(@"<LastName>([^a-zA-Z'\s])|(.{41,})</LastName>");
        var match = regex.Match(toBevalidated);

        // Check to see if a match was found
        if (match.Success)
        {
            Console.WriteLine("Success");
        }
        else
        {
            Console.WriteLine("Failed");
        }

EDIT: There are confusion here let me give some example what I intended to to. when last name is valid reg ex should not match. For example below samples should not match the reg ex

case 1

<FirstName>SomeName</FirstName><LastName>Brian</LastName><Address1>Addre1</Address1>

Case 2

<FirstName>SomeName</FirstName><LastName>O'neil</LastName><Address1>Addre1</Address1>

case 3

<FirstName>SomeName</FirstName><LastName>Peter John</LastName><Address1>Addre1</Address1>

When last name is invalid, reg ex should match

case 4

<FirstName>SomeName</FirstName><LastName>Brian123</LastName><Address1>Addre1</Address1>

case 5

<FirstName>SomeName</FirstName><LastName>#Brian</LastName><Address1>Addre1</Address1>

case 6

<FirstName>SomeName</FirstName><LastName>BrianBrianBrianBrianBrianBrianBrianBrianBrianBrian</LastName><Address1>Addre1</Address1>

if you need more information please let me know

A: 

Try to rewrite the RegEx to: <LastName>([a-zA-Z'\s]{0,41})</LastName> and use negation in other code: if (!match.success) ...

SoftwareJonas
Are you sure this reg ex matches <LastName>Some123</LastName>?
amz
Robert Rossney
No, it does not match either of these expressions, since it is supposed to only match the allowed expressions, and the rest must be done by coding.
SoftwareJonas
+1  A: 

It would have been helpful if you'd given an example of this not behaving as you expected it to, but I suspect it's because you're only matching an invalid character if it's a single invalid character, e.g.

<LastName>5</LastName>

That will match (I believe; I haven't checked) but this won't:

<LastName>55</LastName>

I think you could do something like:

<LastName>(.*[^a-zA-Z'\s].*)|(.{41,})</LastName>

to ensure that there's at least one invalid character in there (or that there are 41 or more characters). But there may be corner cases here where that's inappropriate.

EDIT: Got it. The alternation operator was taking everything before it as an option, instead of just the preceding group. The final regular expression is:

<LastName>((.*[^a-zA-Z'\s].*)|(.{41,}))</LastName>

And here's some sample code:

using System;
using System.Text.RegularExpressions;

class Test
{
    static void Main()
    {
        string pattern = @"<LastName>((.*[^a-zA-Z'\s].*)|(.{41,}))</LastName>";
        Regex regex = new Regex(pattern);

        string[] samples = {
            "<FirstName>SomeName</FirstName><LastName>Brian</LastName><Address1>Addre1</Address1>",
            "<FirstName>SomeName</FirstName><LastName>O'neil</LastName><Address1>Addre1</Address1>",
            "<FirstName>SomeName</FirstName><LastName>Peter John</LastName><Address1>Addre1</Address1>",
            "<FirstName>SomeName</FirstName><LastName>Brian123</LastName><Address1>Addre1</Address1>",                
            "<FirstName>SomeName</FirstName><LastName>#Brian</LastName><Address1>Addre1</Address1>",
            "<FirstName>SomeName</FirstName><LastName>BrianBrianBrianBrianBrianBrianBrianBrianBrianBrian</LastName><Address1>Addre1</Address1>",
        };

        foreach (var sample in samples)
        {
            bool valid = !regex.IsMatch(sample);
            Console.WriteLine("Valid: {0} Text: {1}", valid, sample);
        }
    }
}
Jon Skeet
@Jon I tried your reg ex like this <LastName>(.*[^a-zA-Z'\s].*)|(.{41,})</LastName>. But it also matches the valid last name. the string I matched <FirstName>SomeName</FirstName><LastName>brian</LastName><Address1>Addre1</Address1>.
amz
@amz: I'll have a look when I get home.
Jon Skeet
@amz: Fixed it now - have another look.
Jon Skeet
A: 

Ok,

I couldn't get it work in one pass but if you do it in 2 passes I think it will work, first you check for the incorrect characters and in the second pass you check for the length,

Match m = Regex.Match(@"<FirstName>SomeName</FirstName><LastName>Some</LastName><Address1>Addre1</Address1>", "<LastName>(.*[^a-zA-Z'\\s].*)</LastName>");

m = Regex.Match(@"<FirstName>SomeName</FirstName><LastName>SomeSomSomeSomeSomeSomeSomeSomeSomeSomeeSomeSomeSomeSomeSomeSomeSome</LastName><Address1>Addre1</Address1>", "<LastName>[a-zA-Z'\\s]{41,}</LastName>");

I haven't checked all the cases you provided please check it out and let me know if it works.

Thanks for Skeet for the correction .[^a-zA-Z'\s]. it does need .* before and after otherwise it won't match the names containing special characters.

The second part of the regex pattern which checks the length matches every thing even the and that's why it does not work.

Good luck.

A_Nablsi
It is working. I will do my test against more names and let you know.Thank you.
amz
second expression did not work for string which is more than 40 charecters and contains spl. charecters SomeSomeSomeSomeSomeSomeSomeSomeSomeSome#
amz
Surely it won't work alone you have to check for special characters first using the first regex and then once the last name in the xml is all valid and there is no special characters then you go for the second pass and check for length using the second regex.
A_Nablsi
That make sense.
amz
Does it solve the problem or it just can't be 2 passes it has to be one pass?
A_Nablsi
2 passes would be fine. I have to run this against bunch of names. And will get back to you soon. Will give upvote and mark as answer .
amz
This also won't work for `<LastName a='b'>Foo</LastName>` or `<LastName xmlns=''>Foo</LastName>` or `<x:LastName>Foo</x:LastName>`, and it will incorrectly flag `<LastName>O'BrienO'BrienO'BrienO'Brien</LastName>` as being more than 41 characters.
Robert Rossney
I got headache of this regex I will not think about it anymore :)
A_Nablsi