views:

47

answers:

1

Hello All,

I'm trying to find a better and faster way to insert pretty massive amount of data(~50K rows) than the Linq that I'm using now. The data I'm trying to write to a local db is in a list of ORM mapped data serialized and received from WCF. I'm keen on using SqlBulkCopy, but the problem is that the tables are normalized and are actually a sequence or interconnected tables with one-to-many relationships.

Here's some code that illustrates my point:

foreach (var meeting in meetingsList)
    {
         int meetingId = dbAccess.InsertM(value1, value2...);
         foreach (var competition in meeting.COMPETITIONs)
         {
                int competitionId = dbAccess.InsertC(meetingid, value1, value2...);
                foreach(var competitor in competition.COMPETITORs)
                {
                       int competitorId = dbAccess.InsertCO(comeetitionId, value1,....)
                       // and so on
                }
         }
    }

where dbAccess.InsertMeeting looks something like this:

// check if meeting exists
int  meetingId = GetMeeting(meeting, date);

if (meetingId == 0)
{
   // if meeting doesn't exist insert new
   var m = new MEETING
   {
       NAME = name,
       DATE = date
   }
   _db.InsertOnSubmit(m);
   _db.SubmitChanges();
}

Thanks in advance for any answers. Bojan

+3  A: 

I would still use SqlBulkCopy to quickly copy your data from the external file into a staging table that has the same (flat) structure as the file (you'll need to create that table ahead of time)

Once it's loaded, you can split up the data across multiple tables using e.g. a stored procedure or something - should be pretty fast since everything's on the server already.

marc_s
+1 I always use a staging table first..there is always a problem with 3rd party data and it needs cleansing
SQLMenace
that's an interesting idea...I'll give it a try, thanks
bojanskr
YOu can get the ids back from the parent table insert using the OUTPUT clausue, read how in Books online.
HLGEM
Isn't a "staging table" going to produce a multiplication of the number of records, since they are connected trough several PK-FK relationships with other tables.The overall number of records will be numOfMeetingRows * numOfCompetitionRows * numOfCompetitorRows * ...Will this approach still be faster then the one I'm using now?
bojanskr
@Bojanskr: you load your data from external file into staging table, then move it out into the "real" production tables, and truncate (or even delete) the staging table after you're done. So no - this doesn't produce a duplication for any longer than the time it takes to load and distribute the data.
marc_s
@marc_s: I think you misunderstood the multiplication issue...What I was thinking was multiplication when creating the "staging table" itself in the "code part" of the program. Take a look at the question above. What I'm doing there is going trough several foreach loops and write in each iteration. I suppose, now the approach would be for me to write to a DataTable object(instead using Linq to write) and SqlBulkCopy that to Sql Server. What I was trying to ask is: Isn't that DataTable going to be pretty big and still take a lot of time to be written, and then the Stored Procedure and so on.
bojanskr
@Bojanskr: sure - the more data you want to load, the longer it'll take. But if you load large amounts, it's typically a lot faster to first bulk load using SqlBulkCopy (since that's really REALLY fast) and then distribute the data **on the server** using a SProc. That's potentially several orders of magnitude faster than having nested loops which insert a little bit of data from the client to the server, and then on to the next little bit and so on. TRY IT!
marc_s