views:

1017

answers:

5

Here's the deal. I have an XML document with a lot of records. Something like this:

print("<?xml version="1.0" encoding="utf-8" ?>
      <Orders>
       <Order>
         <Phone>1254</Phone>
         <City>City1</City>
      <State>State</State>
      </Order>
      <Order>
         <Phone>98764321</Phone>
         <City>City2</City>
        <State>State2</State>
      </Order>  
     </Orders>");

There's also an XSD schema file. I would like to extract data from this file and insert these records into a database table. First of course I would like to validate each order record. For example if there are 5 orders in the file and 2 of them fail validation I would like to insert the 3 that passed validation into the db and left the other 2. There can be thousands of records in one xml file. What would be the best approach here. And how would the validation go for this since I need to discard the failed records and only use the ones that passed validation. At the moment I'm using XmlReaderSettings to validate the XML document records. Should I extract these records into another XML file or a Dataset or a custom object before I insert into a DB. I'm using .Net 3.5. Any code or link is welcome.

+1  A: 

If the data maps fairly cleanly to an object model, you could try using xsd.exe to generate some classes from the .xsd, and process the classes into your DAL of choice. The problem is that if the volume is high (you mention thousands of records), you will most likely have a lot of round-trips.

Another option might be to pass the data "as is" through to the database and use SQL/XML to process the data in TSQL - presumably as a stored procedure that accepts a parameter of type xml (SQL Server 2005 etc).

Marc Gravell
A: 

A lot of that depends on what "validation" means in your scenario. I assume, since you're using an .xsd, you are already validating that the data is syntactically correct. So, validation probably means you'll be calling other services or procedures to determine if an order is valid?

You might want to look at Sql Server Integration Services. The XML Task in SSIS lets you do things like XPath queries, merging, likely anything and everything you'd need to do with that document. You could also use that do to all of your upfront validation with schema file too.

Marc's option of passing that data to a stored procedure might work in this scenario too, but SSIS (or, even DTS but you're going to give up too much related to XML to make it as nice of an option) will let you visually orchestrate all of this work. Plus, it'll make it easier for these things to run out of process so you should end up with a much more scalable solution.

joshua.ewer
A: 

You have a couple of options:

  1. XmlDataDocument or XmlDocument. The downside to this approach is that the data will be cached in memory, which is bad if you have a lot of it. On the other hand, you get good in-memory querying facilities with DataSet. XmlDocument requires that you use XPath queries to work on the data, whereas XmlDataDocument gives you an experience more like the DataSet functionality.

  2. XmlReader. This is a good, fast approach because the data isn't cached; you read it in a bit at a time as a stream. You move from one element to the next, and query information about that element in your application to decide what to do with it. This does mean that you maintain in your application's memory the tree level that you're at, but with a simple XML file structure like yours this should be very simple.

I recommend option 2 in your case. It should scale well in terms of memory usage, and should provide the simplest implementation for processing a file.

Neil Barnwell
A: 

By validation I mean validating each node. The nodes that have at least one error need to be inserted into a new xml document. Basically at the end I should have 2 xml documents. One containing the successful nodes and the other containing the failure nodes. Any way I can accomplish that? I'm using LINQ.

Daud
A: 

I agree with idea that you should use an XmlReader, but I thought I'd try something a little different.

Basically, I am first validating the whole XDocument, then if there are errors, I enumerate through the orders and bin them as needed. It's not pretty, but maybe it'll give you some ideas.

        XDocument doc = XDocument.Load("sample.xml");
        XmlSchemaSet schemas = new XmlSchemaSet();
        schemas.Add("", "sample.xsd");

        bool errors = false;
        doc.Validate(schemas, (sender, e) =>
        {
            errors = true;
        });

        List<XElement> good = new List<XElement>();
        List<XElement> bad = new List<XElement>();
        var orders = doc.Descendants("Order");
        if (errors)
        {
            foreach (var order in orders)
            {
                errors = false;
                order.Validate(order.GetSchemaInfo().SchemaElement, schemas, (sender, e) =>
                {
                    errors = true;
                });

                if (errors)
                    bad.Add(order);
                else
                    good.Add(order);
            }
        }
        else
        {
            good = orders.ToList();
        }

Instead of the lambda expressions, you could use a common function, but I just threw this together. Also, you could build two XDocuments instead of shoving the order elements into a list. I'm sure there are a ton of other problems here too, but maybe this will spark something.

Richard Morgan