views:

96

answers:

5

I am using .Net. I want to match last name which has charecters other than a-z, A-Z, space and single quote and len of charecters should not be between 1-40 . The string that has to be matched is a XML look like this <FirstName>SomeName</FirstName><LastName>SomeLastName</LastName><Address1>Addre1</Address1>

I wrote regualr expression but that is matching only [a-zA-Z'.\s]{1,40} <LastName>[a-zA-Z'.\s]{1,40}</LastName> EDIT:LastName tag is missed. But I want to get negation of this expression. Is that possible or should I take different approach?

+1  A: 

You can have negated character classes. [^abc] matches any character that is NOT a, b, or c. For your case, you might want [^a-zA-Z'.\s]{1,40}

Since your data is in XML tags, you will probably want to extract from those first. XML and regular expressions don't always mix well.


If you absolutely must deal with the XML tags in the regex you could try something like this:

<FirstName>([^a-zA-Z'.\s]{1,40})</FirstName><LastName>([^a-zA-Z'.\s]{1,40})</LastName>

Capture group 1 will be the first name, capture group 2 will be the last name.


Misread original question, if you want to match strings MORE than 40 characters, the length should be {41,} not {1,40}. This will ensure you only match on strings with more than 40 characters.

FrustratedWithFormsDesigner
It failed when cases where charecters are more than 40
amz
Due to code restrictions, I cannot parse the XML. Is that possible appy negation on XML?
amz
If the XML will *remain* this simple, you could just find `<FirstName>`, `</FirstName>`, `<LastName>`, `</LastName>` and all the address stuff (if you're not interested in it) and replace with null, and THEN you do the regex matching.
FrustratedWithFormsDesigner
If I get a chance to make lot of code change, I would apply the if condition in .Net code itself using Match.Success == false property. But I want that to be implemented in Reg Exp itself
amz
@FrustratedWithFormsDesigner I tried with your reg ex <LastName>([^a-zA-Z'.\s]{41,})</LastName> and it did not match string that contains number <FirstName>SomeName</FirstName><LastName>brian6</LastName><Address1>Addre1</Address1>.
amz
@amz: No, it will not. That's because none of the values in your tags match `[^a-zA-Z'.\s]{41,}`. The string "1234. 3" will match (well, it would if it were >41 characters long). That is the negation of your original expression. What were you expecting it to match? Maybe you wanted something other than the simple negation of your original pattern?
FrustratedWithFormsDesigner
got answer from other thread http://stackoverflow.com/questions/4044272/reg-ex-negation-not-working-in-xml-string
amz
A: 

Hi,

The negation character is "^". So your expression would read like the following:

[^a-zA-Z'\S]{1,40}.

Here is a link to Microsoft's site about negation.

Enjoy

Doug
I thought the caret only had to be in there once, after the opening square bracket.
FrustratedWithFormsDesigner
@FrustratedWithFormsDesigner - Good catch yes you are correct. Thanks!
Doug
A: 

try this pattern

"<LastName>([^a-zA-Z'\s])|(.{41,})</LastName>"
A_Nablsi
did not work for this <LastName>SomeName</LastName>. Reg should not match above string.
amz
yes it's clearly won't work for this cause this pattern matches not a-z, not A-Z not ' and not space or any charcters with the length > 40, that what you mentioned you need, you said you want the negation of a regex that matches english characters, qoute and space and the length between 1 and 40.
A_Nablsi
if you didn't include it in the LastName node it will work for the test text you used as it matches the text which its length more than 40, try it now I updated the pattern
A_Nablsi
@A_Nablsi, did not unmatch for this '<FirstName>SomeName</FirstName><LastName>Some</LastName><Address1>Addre1</Address1>' Means it failed
amz
A: 

[EDIT] - Removed other stuff. Here's something that worked for all conditions (including empty) in my tests, including have the XML in the tested string.

/^(<LastName><\/LastName>)|(<LastName>.*[^a-zA-Z'\s]+.*<\/LastName>)|(<LastName>(.{41,})<\/LastName>)$/
Kevin Nelson
Yes there are 100's of filters written. They are all checking match.success == true after applyiing reg ex in .net. Situation I am in that I cannot change the code match.success == false for this one filter alone. That is reason I want to implement all in negations without tocuching the .net code.
amz
some how <LastName> is missed in my question. Please check the questions again. I want this reg ex needs to be applied on XML. Not just on extracted last name.
amz
I modified reg ex like this <LastName>^([a-zA-Z'\s]*[^a-zA-Z'\s]+[a-zA-Z'\s]*)|([a-zA-Z'\s]{41,})</LastName> but did not match string that contains number in last name <FirstName>SomeName</FirstName><LastName>brian6</LastName><Address1>Addre1</Address1>
amz
Okay, that should do it. Posted an expression above that handles the XML being in the string as well.
Kevin Nelson
A: 

You seem to want to know how to negate a pattern match without using some "not"-type logic in the language, but placing it in the pattern match itself.

If that's what you really mean, all you need to do is convert your "regex" into "^(?:(?!regex).)*$".

The first is true of any string that contains "regex", and the second is true of any string that does not contain "regex".

I suppose if you want to be mindful of multilined input strings, that should be "\A(?:(?!regex)(?s).)*\z" just to be super-careful.

tchrist
@tchrist I tried your reg ex like this <LastName>^(?:(?!([a-zA-Z'.\s]{1,40})).)*$</LastName>. But did not match the string than contains number in ast name <FirstName>SomeName</FirstName><LastName>brian6</LastName><Address1>Addre1</Address1>
amz
@amz that's not right. You've misunderstood. Of course it didn't match, you have whole-string anchors in the middle of the pattern. Your character class is all wrong. You have to say what you do not want, not what you do. If you don't want a number, match what you want and then look for whether there's a number there. I'm afraid that complex regex constructs are a bit complicated for where you are right now on the learning path.
tchrist
got answer from another thread http://stackoverflow.com/questions/4044272/reg-ex-negation-not-working-in-xml-string
amz