tags:

views:

76

answers:

5

Hi

I made an XML Schema and I have this in it.

<xs:element name="Email">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:pattern value="\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>

Some of my emails in one of my XML documents fail and I get this error

Email' element is invalid - The value '[email protected]' is invalid according to its datatype 'String' - The Pattern constraint failed. LineNumber: 15404 LinePostion: 32

So just looking at all the emails that passed and the ones that failed I noticed that all the ones that failed have an "_(underscore)". So I am unsure if this is the reason or not.

Edit

So I changed my regex to this

 <xs:pattern value="[\w_]+([-+.'][\w_]+)*@[\w_]+([-.][\w_]+)*\.[\w_]+([-.][\w_]+)*"/>

It now works but don't understand why \w is not capturing it.

A: 

Could very well be, because your regex wont recognize an email w/ an underscore.

Check out this topic: http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses

It's one I have bookmarked for how useful it is.

NinjaCat
actually, you guys are right, \w should catch underscores.
NinjaCat
A: 

Yes. You do not match the underscore character. Just try to add it...

\w+([-+.'_]\w+)*...
relet
+1  A: 

Something is weird because \w typically accepts underscores. Try to add _ to the \w that you would be expecting the _ in, by changing them to [\w_].

orangeoctopus
Hmm this seems to work. I don't understand why \w is not grabbing them.
chobo2
A: 

Something is in fact strange; since the \w character class includes underscores, as we can see with Rubular, the email you have should validate. Is it possible there's another problem—a stray space, for instance? However, the other problem with this is that there is no regular expression which correctly accepts all email addresses and nothing else; this Stack Overflow question has a good answer. There may be a better way to deal with validating email addresses than this schema/regex.

Antal S-Z
Hmm I don't think there are any stray spaces(non that I can see). I added "_" to include this and it works(see my edit)
chobo2
+2  A: 

from XML Schema reference on regexp:

\w - Any character that might appear in a word. A shortcut for '[#X0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]' (all characters except the set of "punctuation", "separator", and "other" characters).

underscore character definition in Unicode is 'LOW LINE' (U+005F), category: punctuation, connector [Pc]

so XML Schema handles character classes more in accordance with Unicode definitions.

But for e-mail regexp, you shold use strict ASCII, like [0-9A-Za-z_-] intead of \w (i bet email address with nonlatin characters is invalid :) ), yet better is to find a proven regexp syntax, or look into RFC, what is the proper e-mail format

mykhal