tags:

views:

1449

answers:

5

I have a large xml document that needs to be processed 100 records at a time

It is being done within a Windows Service written in c#.

The structure is as follows :

<docket xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="docket.xsd">
    <order>
     <Date>2008-10-13</Date>
     <orderNumber>050758023</orderNumber>
     <ParcelID/>
     <CustomerName>sddsf</CustomerName>
     <DeliveryName>dsfd</DeliveryName>
     <Address1>sdf</Address1>
     <Address2>sdfsdd</Address2>
     <Address3>sdfdsfdf</Address3>
     <Address4>dffddf</Address4>
     <PostCode/>

    </order>
    <order>
     <Date>2008-10-13</Date>
     <orderNumber>050758023</orderNumber>
     <ParcelID/>
     <CustomerName>sddsf</CustomerName>
     <DeliveryName>dsfd</DeliveryName>
     <Address1>sdf</Address1>
     <Address2>sdfsdd</Address2>
     <Address3>sdfdsfdf</Address3>
     <Address4>dffddf</Address4>
     <PostCode/>

    </order>

    .....

    .....

</docket>

There could be thousands of orders in a docket.

I need to chop this into 100 element chunks

However each of the 100 orders still need to be wrapped with the parent "docket" node and have the same namespace etc

is this possible?

A: 

Naive, iterative, but works [EDIT: in .NET 3.5 only]

    public List<XDocument> ChunkDocket(XDocument docket, int chunkSize)
    {
        var newDockets = new List<XDocument>();
        var d = new XDocument(docket);
        var orders = d.Root.Elements("order");
        XDocument newDocket = null;

        do
        {
            newDocket = new XDocument(new XElement("docket"));
            var chunk = orders.Take(chunkSize);
            newDocket.Root.Add(chunk);
            chunk.Remove();
            newDockets.Add(newDocket);
        } while (orders.Any());

        return newDockets;
    }
Jim Burger
I know its horribly inefficient.
Jim Burger
A: 

Hi thanks for replying. I should have said that this is .net 2.0

I have just got home from work and dont remember my openID so cant login as myself

+2  A: 

Another naive solution; this time for .NET 2.0. It should give you an idea of how to go about what you want. Uses Xpath expressions instead of Linq to XML. Chunks a 100 order docket into 10 dockets in under a second on my devbox.

 public List<XmlDocument> ChunkDocket(XmlDocument docket, int chunkSize)
    {
        List<XmlDocument> newDockets = new List<XmlDocument>();
        //            
        int orderCount = docket.SelectNodes("//docket/order").Count;
        int chunkStart = 0;
        XmlDocument newDocket = null;
        XmlElement root = null;
        XmlNodeList chunk = null;

        while (chunkStart < orderCount)
        {
            newDocket = new XmlDocument();
            root = newDocket.CreateElement("docket");
            newDocket.AppendChild(root);

            chunk = docket.SelectNodes(String.Format("//docket/order[position() > {0} and position() <= {1}]", chunkStart, chunkStart + chunkSize));

            chunkStart += chunkSize;

            XmlNode targetNode = null;
            foreach (XmlNode c in chunk)
            {
                targetNode = newDocket.ImportNode(c, true);
                root.AppendChild(targetNode);
            }

            newDockets.Add(newDocket);
        } 

        return newDockets;
    }
Jim Burger
A: 

If the reason to process 100 orders at a time is for performance purposes, e.g. taking too much time and resource to open a big file, You can utilize XmlReader to process order element one at a time without degrading the performance.

XmlReader reader = XmlReader.Create(@"c:\foo\Doket.xml")
while( reader.Read())
{
  if(reader.LocalName == "order")
  {
     // read each child element and its value from the reader.
     // or you can deserialize the order element by using a XmlSerializer and Order class
  }     
}
codemeit
A: 

hi - thanks so much for this, Jim. I didnt think of using xpath. I was looping over the doc and using an XmlTextWriter to create new XMLDocuments - but was generally creating some pretty horrific code. Nice solution. I will accept it when I am in work tomorrow and you'll get the credit you deserve. thanks again

Hey no problem, I love this stuff :)
Jim Burger