tags:

views:

34

answers:

1

I have an XML file that gets bulkloaded into a database. now at the top of the xml there is a doctype newfile.

<!DOCTYPE NEWFILE SYSTEM "XXX_OUT_1234_YYMMDD_00.dtd">

What i'd like to know is what it is used for and if it is needed there at all. The data gets loaded into SQL with a C# bulk uploader, using a schema and the filename (i put the basics below, there is quite a few more steps but i don't think its really relevant.)

SQLXMLBulkLoad3Class objBL = new SQLXMLBulkLoad3Class();
objBL.Execute (schema,filename);

the schema file has a name like this XXX_OUT_1234_090700_06.xsd similar to the Doctype.

Now the reason is that the doctype would be removed and i am not sure if it really has a use. i've looked around but mostly doctypes are used with websites (this is an old windows forms app.) most info i find is like the below and i'd like to know what the doctype does in this case.

Validating against a DTD is straight forward if the piece of XML contains a DOCTYPE declaration with a SYSTEM identifier that can be resolved at validation time. Simply create a Validator object using one of the single argument constructors.

+3  A: 

Good question. Mostly people simply ignore the actual content of the DOCTYPE statement :)

The (basic) syntax of a document type declaration is

<!DOCTYPE root-element PUBLIC "publicID" SYSTEM "systemID">

The public identifier is optional so you can also say:

<!DOCTYPE root-element SYSTEM "systemID">

In both of these the token following the DOCTYPE is the name of the root element of the XML or SGML document containing this declaration. So...

<!DOCTYPE NEWFILE SYSTEM "XXX_OUT_1234_YYMMDD_00.dtd">
<NEWFILE>
...
</NEWFILE>

Both the PUBLIC and SYSTEM identifiers refer to the DTD for this instance document. The SYSTEM identifier can be used to locate a physical file containing the DTD. The PUBLIC identifier is usually used with catalogs to provide a locally cached lookup for the DTD. There's a bit more to it than that but that's the basics. In your case there is only a SYSTEM identifier. If you were using a validating parser (I suspect you aren't) that was validating your document against a DTD (document type definition this time) it would use this information to search for the DTD itself. If you don't have a catalog (you almost definitely don't) the the application would search for "XXX_OUT_1234_YYMMDD_00.dtd" in the same location as the instance file itself. If you aren't getting validation errors, it isn't looking it up and you can safely remove the declaration

If you have a schema for this content, then the DOCTYPE is fairly redundant (there are some uses when one is using entities but you probably aren't). You can almost certainly safely remove it as I would be very surprised if you are validating against a DTD in that process.

Nic Gibson
This is a great answer, thank you.
Andy