views:

1127

answers:

2

I'm using a seperate .dtd file as a doctype for my custom xml file:

names.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE name SYSTEM "names.dtd">
<names>
    <name>
        <text>Pep&eacute;</text>
        <creator>&lost;</creator>
        <history>&lost;</history>
    </name>
    <name>
        <text>Charles</text>
        <creator>James</creator>
        <history>&lost;</history>
    </name>
</names>

names.dtd

<!ELEMENT name (text, creator+, history)>
<!ELEMENT text (#PCDATA)>
<!ELEMENT creator (#PCDATA)>
<!ELEMENT history (#PCDATA)>

<!-- Placeholder/unknown history or creator name -->
<!ENTITY lost "Lost in the depths of time.">
<!ENTITY eacute "é">

However when trying to access names.xml I get the following error:

XML Parsing Error: undefined entity Location: http://localhost/.../names.xml Line Number 5, Column 18:

<text>Pep&eacute;</text>
---------^

Just for clarification names.xml and names.dtd are in the same directory and using http://localhost/.../names.dtd doesn't work either.

This does seem to work when putting the <!ENTITY inside a <!DOCTYPE in names.xml however. Can anyone advise on this?

+2  A: 

Firefox does not load external DTDs (nor does Safari; it looks like no browsers do). Your DTD & XML work fine in xmllint if I tell it to load external DTDs:

$ xmllint --loaddtd names.xml 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE names SYSTEM "names.dtd">
<names>
    <name>
        <text>Pep&eacute;</text>
        <creator>&lost;</creator>
        <history>&lost;</history>
    </name>
    <name>
        <text>Charles</text>
        <creator>James</creator>
        <history>&lost;</history>
    </name>
</names>

edit: As hsivonen points out in the comments, using DTDs to resolve external entities is a bad idea. You should generally not use DOCTYPEs or DTDs on the web. If you want to validate a document, you should use a separate schema (RELAX NG is recommended for this purpose), and not a DTD embedded in the document itself.

Brian Campbell
It would be a very bad idea for browser to load DTDs: http://hsivonen.iki.fi/no-dtd/
hsivonen
Yes, you're right. I was wondering if any of them loaded even local DTDs. Good reference on why DTDs are a bad idea, though.
Brian Campbell
@hsivonen Updated my answer to include info on why DTDs are a bad idea; thanks for the good article on that.
Brian Campbell
+2  A: 

If you're opening the document in Firefox to try to find out if you have the dtd correct, don't. Firefox doesn't pass the xml and dtd through a proper xml parser. Open your xml document in IE which will cause your document to be passed through the MSXML parser.

When opening the xml document in IE, it will throw an error about your DTD using invalid characters. You need to use the character code for the eacute rather than the character itself. Here is the code I got to work...

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE NAME SYSTEM "names.dtd">
<names>
    <name>
     <text>Pep&eacute;</text>
     <creator>&lost;</creator>
     <history>&lost;</history>
    </name>
    <name>
     <text>Charles</text>
     <creator>James</creator>
     <history>&lost;</history>
   </name>
</names>

and

<!ELEMENT name (text, creator+, history)>
<!ELEMENT text (#PCDATA)>
<!ELEMENT creator (#PCDATA)>
<!ELEMENT history (#PCDATA)>

<!ENTITY lost "Lost in the depths of time.">
<!ENTITY eacute "&#233;">
Justin Niessner
You can use the ‘é’ character directly in the external DTD subject if it is encoded OK. By default it should be in UTF-8; you can change this by including a “text declaration” at the top of the .dtd with a different ‘encoding’. (A text declaration is basically the same as the <?xml?> declaration.)
bobince
Incidentally it is legal for an XML parser to not-include external references such as the DTD external subset, and it's a good thing browsers don't allow it on web pages as it could enable cross-site scripting. What happens with undeclared entity references in this case is implementation-defined.
bobince
Correction: Firefox uses a proper XML parser but the entity resolver (the thing that resolves system ids into byte streams) has been hacked to resolve external DTDs to zero-length streams.
hsivonen