tags:

views:

500

answers:

5

We use strongly typed DataSets in our application. When importing data we use the convenient DataSet.Merge() operation to copy DataRows from one DataSet to another.

StringCollection lines = ReadFromFile(fileName);
foreach (string line in lines)
{
    DataRow dr = ImportRow(line);
    dataSet1.Merge( new DataRow[] { dr } );
}
DoAdditionalCalculationsWith(dataset1);
SaveToDatabase(dataSet1);

Unfortunately this doesn't scale. For larger imports the Merge takes up 80% of our total import time (according to our profiler).

Is there a faster way to do this?

Edit: We can't just add the rows because they might already be in the DataSet and doing it in the database is also not an option, because our import logic is quite complex.

+2  A: 

You've probably already tried this, but just in case:

DataSet.Merge takes an array or DataRows as a parameter.

Have you tried batching the merge, i.e. doing the following?

dataSet1.Merge(lines.Select(line=>ImportRow(line)).ToArray());

However, it's quite possibly the case that you cannot improve performance - maybe you can avoid the need to Merge in the first place somehow - such as by doing the merge in the DB, as Sklivvz suggests.

Eamon Nerbonne
My tests show that batching would help, but to implement it in our application is not trivial...
Stefan Haubold
In that case, it looks like you probably cannot improve Merge performance - it's a well know problem, and the best you can hope for is to achieve similar functionality by some other (custom, self-written) means.
Eamon Nerbonne
A: 

The obvious answer is 'do it in the DB' -- I'll assume it's not applicable in your case.

You should try to use a row loop. This can be quite performant if the tables you are merging are sorted.

http://en.wikipedia.org/wiki/Merge_algorithm

Sklivvz
A: 

Why not just add rows? Or do it in the DB as 'Skliwz' suggests?

Noon Silk
A: 

Can't you just add or update the row depending the row exists or not in the table (using the not typed method "table.Rows.Find(primaryKeyValues)")?

Note that you can have a lot of scalability problems with the DataSet (compared to a DB):
- no transaction => no concurreny.
- slow load from xml (maybe it is faster/linear from DB).
- missing index (except primary key).
- do not cache as a DB, it can be a problem in ram limited system (in 32b system).

Olivier de Rivoyre
A: 

The best merging algorithm I know is Sort-Merge if your input datasets are sorted by the same attribute. But I do not know C# so well to say if it is possible to force ADO object to use this algorithm.

Skarab