I'm have a ADO DataSet that I'm loading from its XML file via ReadXml. The data and the schema are in separate files.
Right now, it takes close to 13 seconds to load this DataSet. I can cut this to 700 milliseconds if I don't read the DataSet's schema and just let ReadXml infer the schema, but then the resulting DataSet doesn't contain any constraints.
I've tried doing this:
Console.WriteLine("Reading dataset with external schema.");
ds.ReadXmlSchema(xsdPath);
Console.WriteLine("Reading the schema took {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
dt.BeginLoadData();
}
ds.ReadXml(xmlPath);
Console.WriteLine("ReadXml completed after {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
dt.EndLoadData();
}
Console.WriteLine("Process complete at {0} milliseconds.", sw.ElapsedMilliseconds);
When I do this, reading the schema takes 27ms, and reading the DataSet takes 12000+ milliseconds. And that's the time reported before I call EndLoadData on all the DataTables.
This is not an enormous amount of data - it's about 1.5mb, there are no nested relations, and all of the tables contain two or three columns of 6-30 characters. The only thing I can figure that's different if I read the schema up front is that the schema includes all of the unique constraints. But BeginLoadData is supposed to turn constraints off (as well as change notification, etc.). So that shouldn't apply here. (And yes, I've tried just setting EnforceConstraints to false.)
I've read many reports of people improving the load time of DataSets by reading the schema first instead of having the object infer the schema. In my case, inferring the schema makes for a process that's about 20 times faster than having the schema provided explicitly.
This is making me a little crazy. This DataSet's schema is generated off of metainformation, and I'm tempted to write a method that creates it programatically and just deseralizes it with an XmlReader. But I'd much prefer not to.
What am I missing? What else can I do to improve the speed here?