views:

576

answers:

4

I have an regular expression validator for emails in .NET 2.0 which uses client side validation (javascript).

The current expression is "\w+([-+.']\w+)@\w+([-.]\w+).\w+([-.]\w+)" which works for my needs (or so I thought).

However I was getting a problem with apostrophes as I had copy/pasted an email address from Outlook into the forms text field

Chris.O’[email protected]

You can see the apostrophe is a different character from what get if I were just to type into a text box

' vs ’ - but both are apostrophes

Okay I thought , lets just add in this character into the validation string so I get

"\w+([-+.'’]\w+)@\w+([-.]\w+).\w+([-.]\w+)"

I copy paste the "special" apostrophe into the validation expression, then I type the email and use the same clipboard item to paste the apostrophe but the validation still fails.

The apostrophe doesn't look the same in the .net code behind file as the .net form and because the validation is still failing , I am presuming it's being considered a different character because of some sort of encoding of the .cs source file?

Does this sound plausible, has someone else encountered the same problem?

Thanks

+1  A: 

You should add a '+' after ([-+.'`]\w+), to allow for multiple groups of 'words'. The expression you gave only allows for two words, and you have three: Chris, O, Brian.

Hope this makes things clearer.

Eric Minkes
+1  A: 

There will be a tendency in something like Outlook to use 'Smart Quotes'

Here's some background information

pavium
A: 

In XML you could test the value of an apostrophe character by evaluating it against its character entity reference:

'

That entity does not exist in the SGML form of HTML, however. And as an added bonus JavaScript cannot compare a single quote to a double quote. When compared they evaluated to true. The only solution there is to convert single quote and double quote characters to a character entity reference of your invention, perform the comparison, and then replace those invented entity references with the proper quote characters.

+1  A: 

If you just pasted the ’ (U+2019 RIGHT SINGLE QUOTATION MARK) into your document and it didn't work it means that your document does not use unicode.

When you encode and send the file as UTF-8 (for example) it works just fine without further modifications. Otherwise you have to escape it via \u2019 which also works in JavaScript's regular expressions:

"\w+([-+.'\u2019]\w+)@\w+([-.]\w+).\w+([-.]\w+)"
gix