views:

330

answers:

1

Hi all, Hi, I am new to python and I'm trying to parse a XML file with SAX without validating it.

The head of my xml file is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE n:document SYSTEM "schema.dtd">
<n:document....

and I've tried to parse it with python 2.5.2:

from xml.sax import make_parser, handler
import sys

parser = make_parser()
parser.setFeature(handler.feature_namespaces,True)
parser.setFeature(handler.feature_validation,False)
parser.setContentHandler(handler.ContentHandler())
parser.parse(sys.argv[1])

but I got an error:

python doc.py document.xml
(...)
  File "/usr/lib/python2.5/urllib2.py", line 244, in get_type
    raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: schema.dtd

I don't want the SAX parser to look for a schema. Where am I wrong ? Thanks !

+1  A: 

expatreader considers the DTD external subset as an external general entity. So the feature you want is:

parser.setFeature(handler.feature_external_ges, False)

However, it's a bit dodgy pointing the DTD external subset to a non-existant URL; as this shows, it's not only validating parsers that read it.

bobince