tags:

views:

351

answers:

1

Hi,

We are using JAXB 1.0.6 (the project has started with JDK1.4 and this is the last compatible version) to create XML-Files from a XSD specification. The XSD defines an attribute "email" with the following regexp pattern.

<xs:simpleType name="EmailAddress">
 <xs:restriction base="xs:string">
  <xs:minLength value="0"/>
  <xs:maxLength value="60"/>
  <xs:pattern value="([\w%\.\-]+@[\w%\.\-]+\.[a-zA-Z]{2,6})?"/>
 </xs:restriction>
</xs:simpleType>

if you try to enter an email [email protected] the validation will fail with the following message:

    attribute "email" has a bad value: the value does not match the 
regular expression "([\w%\.\-]+@[\w%\.\-]+\.[a-zA-Z]{2,6})?

IMHO, the character class \w is equivalent to [a-zA-Z0-9_]

So [email protected] satisfies the expression. If you leave out the underscore the validation will pass successfully. Why is this happening?

Regards

+1  A: 

Hmm. Why do you expect the \w to be equivalent to [a-zA-Z0-9_]? Have you tried replacing the \w with the expression?

At the first glance XML schema (search for \w) defines \w as

all characters except the set of "punctuation", "separator" and "other" characters

[as defined by Unicode]

And Unicode seems to define underscore as punctuation (search for \p{P} in the linked document).

Grzegorz Oledzki
I was so naive to believe regexp implementation is identical everywher. Now I see that the XSD pattern \w has a differrent meaning than the same in JAVA. Thanx.
huo73