Hello,
I have written an importer which copies data from a flat table into several other tables, mapping them by a given XML. This is for a shop database, where each products can have several properties and each property can have several different languages, meaning it pretty fast sums up to a whole lot of data.
There are over 50,000 rows as of right now. My current import code looks like this:
string query = "SELECT * FROM " + tableDataProducts + " ORDER BY "
+ productIdField;
DataSet importData = new DataSet();
Hashtable data = new Hashtable();
db.DoSelectQuery(query, ref importData, tableDataProducts);
foreach (DataRow row in importData.Tables[0].Rows) {
foreach (MapEntry e in mapping[tableObjPropertyValue]) {
string value = row[e.ImportXmlAttributeName].ToString();
if (value.Equals("null",
StringComparison.OrdinalIgnoreCase)
|| value.Length < 1)
continue;
data.Clear();
data.Add("ProductSN", productIdToSn[row[
productIdField].ToString()]);
data.Add("ObjPropertyGroupID", "0");
data.Add("ObjPropertyID", e.ObjPropertyID);
data.Add("LanguageID", e.LanguageID);
data.Add("Value", value);
db.DoPreparedInsertQuery(tableObjPropertyValue, data);
}
}
As can be seen, I first read the data from the flat import table, then iterate over each row representing a single product and for each product I iterate over the property mapping and copy each property into a Hashtable called data
. null values are skipped.
After all columns are copied into the hashtable, I insert the row.
Currently, this approach only processes around 700 rows per minute, which results in this import taking approximately one hour. How can I optimize this?
[EDIT]
Here is a simplified version of the XML, as the actual XML is way too big to show here:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<DATAPACKET Version="2.0">
<METADATA>
<FIELDS>
<FIELD FieldName="source_id" DisplayLabel="source_id" FieldType="String" FieldClass="TField"/>
<FIELD FieldName="data_field" DisplayLabel="data_field" FieldType="Unknown" FieldClass="TField"/>
</FIELDS>
</METADATA>
<ROWDATA>
<ROW source_id="data_1" data_field="some string"/>
<ROW source_id="data_2" data_field="another string"/>
</ROWDATA>
</DATAPACKET>
This XML is imported into a single table which each FIELD becoming a column. There is a mapping XML which looks as follows:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<DATAPACKET Version="2.0">
<METADATA>
<FIELDS>
<FIELD FieldName="source_id" DisplayLabel="source_id" FieldType="String" FieldClass="TField"/>
<FIELD FieldName="target" DisplayLabel="target" FieldType="Unknown" FieldClass="TField"/>
</FIELDS>
</METADATA>
<ROWDATA>
<ROW source_id="data_1" target="products::id"/>
<ROW source_id="data_2" target="products::name"/>
</ROWDATA>
</DATAPACKET>
The target attribute contains the target table and column in the following format: target='table::column'
.