views:

23

answers:

1

Hi,

I have the following regex type in my xsd file:

<xsd:simpleType name="Host">
    <xsd:restriction base="xsd:string">
        <xsd:pattern
            value="\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b">
        </xsd:pattern>
    </xsd:restriction>
</xsd:simpleType>

When generating from this in ant via xjc, I am getting the following exception:

  [xjc] [ERROR] InvalidRegex: Pattern value '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b' is not a valid regular expression. The reported error was: 'This expression is not supported in the current option setting.' at column '2'.
  [xjc]   line 10 of file:/.../src/META-INF/portscan.xsd

I can fix this, by changing every backslash () to a double backslash (\):

<xsd:simpleType name="Host">
    <xsd:restriction base="xsd:string">
        <xsd:pattern
            value="\\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b">
        </xsd:pattern>
    </xsd:restriction>
</xsd:simpleType>

But then, when the validation runs during the marshalling I am getting the following exception:

Caused by - class org.xml.sax.SAXParseException: cvc-pattern-valid: Value '80.245.120.45' is not facet-valid with respect to pattern '\\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b' for type 'Host'.

Obviously, the double backslash (\\) is responsible for the validation to fail. But how can I encode the single backslash to get xjc working?

Edit:

Ah well, found the answer now, seems like "\b" aint supported in xjc regexp's. Leaving them out fixed the issue, it now generated without error and seems to work during runtime. Yay! :)

Though does anyone know if this is secure without the word boundaries? Maybe there's an alternative?

A: 

The regex flavor defined in the XML Schema specification does not support word boundaries.

In your case, the word boundaries are not needed. Pattern facets in XML schema types always require the regular expression to match the entire string, as if the regex started with a start-of-string anchor ^ or \A and ended with an end-of-string anchor $ or \z. Because XML schema regexes always match the whole string, you cannot use these anchors in your regexes either.

Jan Goyvaerts