views:

1896

answers:

2

I'm have a ADO DataSet that I'm loading from its XML file via ReadXml. The data and the schema are in separate files.

Right now, it takes close to 13 seconds to load this DataSet. I can cut this to 700 milliseconds if I don't read the DataSet's schema and just let ReadXml infer the schema, but then the resulting DataSet doesn't contain any constraints.

I've tried doing this:

Console.WriteLine("Reading dataset with external schema.");
ds.ReadXmlSchema(xsdPath);
Console.WriteLine("Reading the schema took {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
   dt.BeginLoadData();
}
ds.ReadXml(xmlPath);
Console.WriteLine("ReadXml completed after {0} milliseconds.", sw.ElapsedMilliseconds);
foreach (DataTable dt in ds.Tables)
{
   dt.EndLoadData();
}
Console.WriteLine("Process complete at {0} milliseconds.", sw.ElapsedMilliseconds);

When I do this, reading the schema takes 27ms, and reading the DataSet takes 12000+ milliseconds. And that's the time reported before I call EndLoadData on all the DataTables.

This is not an enormous amount of data - it's about 1.5mb, there are no nested relations, and all of the tables contain two or three columns of 6-30 characters. The only thing I can figure that's different if I read the schema up front is that the schema includes all of the unique constraints. But BeginLoadData is supposed to turn constraints off (as well as change notification, etc.). So that shouldn't apply here. (And yes, I've tried just setting EnforceConstraints to false.)

I've read many reports of people improving the load time of DataSets by reading the schema first instead of having the object infer the schema. In my case, inferring the schema makes for a process that's about 20 times faster than having the schema provided explicitly.

This is making me a little crazy. This DataSet's schema is generated off of metainformation, and I'm tempted to write a method that creates it programatically and just deseralizes it with an XmlReader. But I'd much prefer not to.

What am I missing? What else can I do to improve the speed here?

A: 

It's not an answer, exactly (though it's better than nothing, which is what I've gotten so far), but after a long time struggling with this problem I discovered that it's completely absent when my program's not running inside Visual Studio.

Something I didn't mention before, which makes this even more mystifying, is that when I loaded a different (but comparably large) XML document into the DataSet, the program performed just fine. I'm now wondering if one of my DataSets has some kind of metainformation attached to it that Visual Studio is checking at runtime while the other one doesn't. I dunno.

Robert Rossney
A: 

Another dimesion to try is to read the dataset without the schema and then Merge it into a typed dataset that has the constraints enabled. That way it has all of the data on hand as it builds the indexes used to enforce constraints -- maybe it would be more efficient?

From MSDN:

The Merge method is typically called at the end of a series of procedures that involve validating changes, reconciling errors, updating the data source with the changes, and finally refreshing the existing DataSet

.

Tom A