views:

97

answers:

5

Hi ,

I have implemented an import functionality which takes data from a csv file in an Asp.Net appication. The file of the size can vary from a few kb's to a max of 10 MB.

However when an import occurs and if the file size is > 50000 it takes around 20 MINS . Which is way too much of a time. I need to perform an import for around 300000 records within a timespan of 2-3 Mins .

I know that the import to a database also depends on the physical memory of the db server .I create insert scripts in bulk and execute . I also know using SqlBulkCopy would also be another option but in my case its just not the inserting of product's that take place but also update and delete that is a field called "FUNCTION CODE" which decides whether to Insert,Update Or Delete.

Any suggestions regarding as to how to go about this would be greatly appreciated.

One approach towards this would be to implement multiple threads which carry out processes simultaneosly ,but i have never implemented threading till date and hence am not aware of the complication i would incur by implementing the same.

Thanks & Regards, Francis P.

A: 

I guess you are using SQL Server...

If you are using 2005/2008 consider using SSIS to process the file. Technet

Importing huge amount of data within the asp.net process is not the best thing you can do. You might upload the file and start a process that is doing the magic for you.

Yves M.
A: 

If this is a repeated process and the file is uploaded via asp.net plus you are doing some decision making on the data to decide insert/update or delete, then try out http://www.codeproject.com/KB/database/CsvReader.aspx it is this fast csv reader. Its quite quick and economical with memory

Gboyega s
No need to use external code. .net has a (little known) CSV reader built in. http://stackoverflow.com/questions/2405787/csv-parser-in-one-routine-function
spender
A: 

You are doing all your database queries with 1 connection sequentially. So for every insert/update/delete you are sending the command through the wire, wait for the db to do it's thing, and then wake up again when something is sent back.

Databases are optimized for heavy parallel access. So there are 2 easy routes for a significant speedup:

Open X connections to the database (where you have to tweak X but just start with 5) and either: spin up 5 threads who each do a chunk of the same work you were doing. or: use asynchronous calls and when a callback arrives shoot in the next query.

Toad
I don't think that this is a good solution since the question mentions asp.net. Your solution would be fine in a separate process that is started from the asp.net process.
Yves M.
@Yves - that a file is *uploaded* via asp.net has nothing to do with how it ought to be processed.
Winston Smith
@Winston Smith: true. but it is most likley process within asp.net process. since it is not stated otherwise. and this is bad period.
Yves M.
+4  A: 

SqlBulkCopy is definitely going to be fastest. I would approach this by inserting the data into a temporary table on the database. Once the data is in the temp table, you could use SQL to merge/insert/delete accordingly.

spender
Did you only read my first sentence? Can you elaborate as to why this won't work? It's perfectly possible to represent inserts and updates in some way such that they can be interpretted by SQL statements run on the DB. My answer is a reasonable means of conveying the data *quickly* to SQL server. I'm not suggesting that the bulk insert is directly into the required table. I'm suggesting transmitting the data via bulk insert and using logic on the DB to perform the correct action. Of course this will work.
spender
Still the answer is right. LOAD THE TABLE INTO A TEMPDB, then use the MERGE statement (2008 upward) to merge the changes into the main data table.
TomTom
A: 

I suggest using the XML functionality in SQL Server 2005/2008, which will allow you to bulk insert and bulk update. I'd take the following approach:

  • Process the entire file into an in-memory data structure.
  • Create a single XML document from this structure to pass to a stored proc.
  • Create a stored proc to load data from the XML document into a temporary table, then perform the inserts and updates. See below for guidance on creating the stored proc.

There are numerous advantages to this approach:

  • The entire operation is completed in one database call, although if your dataset is really large you may want to batch it.
  • You can wrap all the database writes into a single transaction easily, and roll back if anything fails.
  • You are not using any dynamic SQL, which could have posed a security risk.
  • You can return the IDs of the inserted, updated and/or deleted records using the OUTPUT clause.

In terms of the stored proc you will need, something like the following should work:

CREATE PROCEDURE MyBulkUpdater
(
   @p_XmlData VARCHAR(MAX)
)

AS
DECLARE @hDoc INT
EXEC sp_xml_preparedocument @hDoc OUTPUT, @p_XmlData

-- Temporary table, should contain the same schema as the table you want to update
CREATE TABLE #MyTempTable
(
   -- ...
)

INSERT INTO #MyTempTable
(
        [Field1],
        [Field2]
)
SELECT
        XMLData.Field1,
        XMLData.Field2
FROM OPENXML (@hdoc, 'ROOT/MyRealTable', 1)
WITH
(
        [Field1] int,
        [Field2] varchar(50),
        [__ORDERBY] int
) AS XMLData

EXEC sp_xml_removedocument @hDoc

Now you can simply insert, update and delete your real table from your temporary table as required eg

INSERT INTO MyRealTable (Field1, Field2)
SELECT Field1, Field2
FROM #MyTempTable
WHERE ...

UPDATE MyRealTable
SET rt.Field2 = tt.Field2
FROM MyRealTable rt
JOIN MyTempTable tt ON tt.Field1 = MyRealTable.Field1
WHERE ...

For an example of the XML you need to pass in, you can do:

SELECT TOP 1 *, 0 AS __ORDERBY FROM MyRealTable AS MyRealTable FOR XML AUTO, ROOT('ROOT') 

For more info, see OPENXML, sp_xml_preparedocument and sp_xml_removedocument.

Winston Smith