views:

98

answers:

3

I am writing code that pulls data from database tables and writes them to an XML file (to be picked up by another process). There are bout 60,000 records, and each record can have multiple entites, i.e. a member can have several contacts, each contact can have several phone numbers, etc. The entire process takes several hours, and I've narrowed down the slowness to the actual pull of the data (not writing the XML, or any data processing after it's pulled, etc.). I've tried the code two different ways, with similar results. First I leveraged LINQ queries and created all the elements in one statement:

Dim output =
    From m In dc.members
    Select New XElement("member", _
        New XElement("id", m.member_id), _
        New XElement("address", m.Address), _
        New XElement("city", m.City), _
        New XElement("state", m.State), _
        New XElement("contacts", _
            From c in m.contacts
            Select New XElement("contact", _
                New XElement("contact_name", c.name), _
                New XElemdnt("contact_address", c.address), _
...

I thought it might be the creation of all the XElements that was slowing it down, so I tried writing the elements directly to the XML file using For loops:

Dim output As New Xml.XmlTextWriter("my.xml", Nothing)
For Each m in dc.members
    output.WriteStartElement("member")
    output.WriteElementString("id", m.member_id)
    output.WriteElementString("address", m.Address)
    output.WriteElementString("city", m.City)
    output.WriteElementString("state", m.State)
    output.WriteStartElement("contacts")
    For Each c in m.contacts
        output.WriteStartElement("contact")
        output.WriteElementString("contract_name", m.name)
        output.WriteElementString("contract_address", m.address)
....

That produced almost no change in amount of time the process took. I then tried stripping out all the elements and reduced the code down to just the database pulls, and it was similarly slow.

Is there a faster/better way to pull all this normalized data from the database so I can get it to the XML file as quickly as possible?

+4  A: 

Each pass of the inner loop is hitting the database. Use a LINQ expression to fetch just the data you want in a single hit. In C# (sorry, I don't know VB.Net) it would look something like:

var members = from m in dc.members
              select new {
                  m.member_id,
                  m.Address,
                  ...
                  contacts = from c in m.contacts
                             select new {
                                 c.name,
                                 c.address
                             }
              };
var output = from m in members
             select new XElement...

As a sidenote, you can use XML syntax in VB.Net...

Dim output = _
    From m In members _
    Select <member>
        <id><%= m.member_id %></id>
        <address><%= m.Address %></address>
        ...
        <contacts>
            <%= From c in m.contacts _
                Select <contact>
                    <contact_name><%= c.name %></contact_name>
                    <contact_address><%= c.address %></contact_address>
                    ...
                </contact>
            %>
        </contacts>
    </member>
Marcelo Cantos
This doesn't solve the lazy loading problem. The members variable just becomes an in memory query and it still hits the database multiple times when the variable is used in the next step.
gfrizzle
Hmm, that's surprising. (Thinking out loud) perhaps you could say something like `from c in dc.contacts group c by c.member ...`. You are ultimately trying to induce a SQL `GROUP BY` clause, so using it more directly might help.
Marcelo Cantos
+2  A: 

I think your problem is lazy loading of the Linq.

First thing I will do is to run the SQL Profiler and find out if that is the case and thousands of connections are opened and closed accessing each level of data.

If that is the case, I will scrap the Linq and use SqlDataReader which will probably take seconds to do this job.

If it comes a one big query in profiler and SQL side is slow, I will look at the indexes and optimise teh database indexes.

Aliostad
I didn't want to lose the flexibility of LINQ, so I compromised - I pulled the entire contents of each table into in-memory lists, then manually joined the lists. Now it only hits the database for a couple minutes, then does everything else in memory, which cut my processing time in half.
gfrizzle
Thanks for making it the answer. I certainly did not find a Linq way to do it but I see the trend towards doing everything Linq-wise similar to over-use of XSLT back in the years. Sometimes I think it does not even make it more readable so I am not ashamed of using loop when I think it is superior.
Aliostad
+1  A: 

This should be significantly faster for the above mentioned reasons

Dim output =
   From m In dc.members.**Include("contacts")**
   Select New XElement("member", _
      New XElement("id", m.member_id), _
      New XElement("address", m.Address), _
      New XElement("city", m.City), _
      New XElement("state", m.State), _
      New XElement("contacts", _
          From c in m.contacts
          Select New XElement("contact", _
              New XElement("contact_name", c.name), _
              New XElemdnt("contact_address", c.address)
             .....)**.tolist()**

It will select everything in one db access call instead of N+1

also you could try parallel linq .. something like this ... I don't have an editor available.

dim output = from m in dc.members.asparallel().select(function (m)  new XElement("member", New XElement("id", m.member_id), etc..)
Poker Villain
I was just typing something like this +1.
wcm
Is this .NET 4.0? I'm on 3.5, and don't see either Include() or asparallel() as options.
gfrizzle