views:

561

answers:

4

Hi,

I need to write some code to insert around 3 million rows of data.
At the same time I need to insert the same number of companion rows.

I.e. schema looks like this:

Item
  - Id
  - Title

Property
  - Id
  - FK_Item
  - Value

My first attempt was something vaguely like this:

BaseDataContext db = new BaseDataContext();
foreach (var value in values)
{
    Item i = new Item() { Title = value["title"]};
    ItemProperty ip = new ItemProperty() { Item = i, Value = value["value"]};
    db.Items.InsertOnSubmit(i);
    db.ItemProperties.InsertOnSubmit(ip);
}
db.SubmitChanges();

Obviously this was terribly slow so I'm now using something like this:

BaseDataContext db = new BaseDataContext();
DataTable dt = new DataTable("Item");
dt.Columns.Add("Title", typeof(string));
foreach (var value in values)
{
    DataRow item = dt.NewRow();
    item["Title"] = value["title"];
    dt.Rows.Add(item);
}
using (System.Data.SqlClient.SqlBulkCopy sb = new System.Data.SqlClient.SqlBulkCopy(db.Connection.ConnectionString))
{
    sb.DestinationTableName = "dbo.Item";
    sb.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Title", "Title"));
    sb.WriteToServer(dt);
}

But this doesn't allow me to add the corresponding 'Property' rows.

I'm thinking the best solution might be to add a Stored Procedure like this one that generically lets me do a bulk insert (or at least multiple inserts, but I can probably disable logging in the stored procedure somehow for performance) and then returns the corresponding ids.

Can anyone think of a better (i.e. more succinct, near equal performance) solution?

+3  A: 

The best way to move that much data into SQL Server is bcp. Assuming that the data starts in some sort of file, you'll need to write a small script to funnel the data into the two tables. Alternately you could use bcp to funnel the data into a single table and then use an SP to INSERT the data into the two tables.

Jeff Hornby
The problem is, I can't "funnel" the second lot of data into a table without having a corresponding id to work with in the first table. I guess one solution is to assign a fake id and later remove this but that doesn't seem a very good way for doing this in the long term.
Graphain
Thanks for pointing me to bcp though, looks useful.
Graphain
+1 for pointing to BCP
RBarryYoung
+2  A: 

Bulk copy the data into a temporary table, and then call a stored proc that splits the data into the two tables you need to populate.

Joe
Thanks, probably the best approach.
Graphain
+1: Add in to use BCP, unless you have a good reason not to (it's about 30% faster than SQLBulkCopy most of the time), and this is the SOP answer.
RBarryYoung
Yes, use BCP over bulk copy if you can.
Joe
+2  A: 

To combine the previous best two answers and add in the missing piece for the IDs:

1) Use BCP to Load the data into a temporary "staging" table defined like this

CREATE TABLE stage(Title AS VARCHAR(??), value AS {whatever});

and you'll need the appropriate index for performance later:

CREATE INDEX ix_stage ON stage(Title);

2) Use SQL INSERT to load the Item table:

INSERT INTO Item(Title) SELECT Title FROM stage;

3) Finally load the Property table by joining stage with Item:

INSERT INTO Property(FK_ItemID, Value)
SELECT id, Value
FROM stage
JOIN Item ON Item.Title = stage.Title
RBarryYoung
Sure, but this assumes title is unique (which it may be, but I wouldn't rely on it. However, this is probably the best direction.
Graphain
I had to assume something, since you didn't provide the source table's information. You must have some way to uniquely correlate the information: relying on record order is invalid in SQL.
RBarryYoung
That said, you can instead put an ID on the staging table for the correlation (usually works even with order dependencies), but it gets more complicated.
RBarryYoung
Yeah I figured that because it was a single transaction I could rely on record order. Assigning the ID or relying on unique title is probably a better way to go though.
Graphain
+2  A: 

You can bulk copy in code as well, using the .NET SqlBulkCopy class.

David Andres
I am using the .NET SqlBulkCopy class, can you point to where I should look?
Graphain
If you have total control over the data, perhaps you can leverage the KeepIdentity option of the SqlBulkCopyOptions parameter of the constructor. This way, you can pull the highest identity value currently available, and work from there.
David Andres
Thanks David ... this just helped me out.
John at CashCommons
Glad to hear it.
David Andres