views:

25

answers:

1

I'm using python's lxml to validate xmls against a schema. I have a schema with an element:

<xs:element name="link-url" type="xs:anyURL"/>

and I test, for example, this (part of an) xml:

<a link-url="server/path"/>

I would like this test to FAIL because the link-url doesn't start with http://. I tried switching anyURI to anyURL but this results in an exception - it's not a valid tag.

Is this possible with lxml? is it possible at all with schema validation?

+1  A: 

(I'm pretty sure xs:anyURL is not valid. The XML Schema standard calls it anyURI. And since link-url is an attribute, shouldn't you be using xs:attribute instead of xs:element?)

You could restrict the URIs by creating a new simpleType based on it, and put a restriction on the pattern. For example,

#!/usr/bin/env python2.6

from lxml import etree
from StringIO import StringIO

schema_doc = etree.parse(StringIO('''
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"&gt;

    <xs:simpleType name="httpURL">
        <xs:restriction base="xs:anyURI">
            <xs:pattern value='https?://.+'/>
            <!-- accepts only http:// or https:// URIs. -->
        </xs:restriction>
    </xs:simpleType>

    <xs:element name="a">
        <xs:complexType>
            <xs:attribute name="link-url" type="httpURL"/>
        </xs:complexType>
    </xs:element>
    </xs:schema>
''')) #/
schema = etree.XMLSchema(schema_doc)

schema.assertValid(etree.parse(StringIO('<a link-url="http://sd" />')))
assert not schema(etree.parse(StringIO('<a link-url="server/path" />')))
KennyTM