views:

635

answers:

2

All,

I'm working on an application that allows authorized users to upload Excel spreadsheets to our server.

Once they're uploaded, there's a classic ASP script that reads the data in the Excel spreadsheet, does a bunch of validation, parsing, and manipulation, then writes the modified data to a tab-delimited text file.

Then, it kicks of a stored procedure in SQL Server that uses 'Bulk Insert' to load the data into a database.

The whole process works reasonably well when the files are small. But as they get larger (15,000+ rows, 5+ mb), the script starts to take a really long time to run (60+ seconds).

So, I'm looking for ways to make it more efficient/faster/robust.

Roughly, here's how the code looks now:

'' # read the uploaded Excel file

set objConnection = Server.CreateObject("ADODB.Connection") 
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & xlsfile & ";Extended Properties=""Excel 8.0;"""
set objRS = objConnection.execute("Select * from [Sheet1$]")
rows = ""
while (not objRS.EOF and Err.Number = 0)
    row = objRS("col1") & vbTab & objRS("col2") & vbTab ... objRS("coln") & vbCrLF
    rows = rows & row
    objRS.MoveNext
wend
objRS.close

'' # Write the file
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(txtFile, 2, true)
objFile.WriteLine rows
objFile.Close
set objFSO = nothing
set objFile = nothing

As you can see, the entire file is being read into a variable named "rows", when then gets dumped into the text file.

Are there better alternatives to this? E.g., should I write the text file row by row as I read it from Excel?

Another possibility I've considered - reading the Excel file into a temporary table in SQL Server, and doing all the validation/manipulation there. The only problem is that I can't find an equivalent to 'Bulk Insert' - I think I might have to read the data row by row in Excel, and then write it row by row into the Temp table in SQL Server.

Many thanks in advance for any advice and insight!

+2  A: 

Whats killing you is all the string concatenation. A simple step would be to interleave the row reading and calls to WriteLine on the text file. You would only hold as string the size of a row at any one time.

Something like this:-

set objConnection = Server.CreateObject("ADODB.Connection") 
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & xlsfile & ";Extended Properties=""Excel 8.0;"""
set objRS = objConnection.execute("Select * from [Sheet1$]")
set objFSO = CreateObject("Scripting.FileSystemObject")
set objFile = objFSO.OpenTextFile(txtFile, 2, true)

do until objRS.EOF
    tab = ""
    for each fld in objRS.Fields
        objFile.Write tab
        objFile.Write fld.value
        tab = vbTab
    next
    objFile.Write vbCrLf
    objRS.MoveNext
loop
objRS.close
objConnection.Close
objFile.Close
AnthonyWJones
Anthony - Many thanks! You probably knew this already, but this reduced the processing time by about 98%.
mattstuehler
+1  A: 

Agree with Anthony, work on tidying up that String Concat. Every time you add to the string (rows = rows & row) its expensive as it has to create a whole new string and try to clean up the old one, over and over and over. And the space needed it more and more. So it struggles to find it, hence all the time.

You might want to look into using GetRows on your recordset, that could do all the tab stuff for you in one call and returns a big old string.

http://www.aspdev.org/articles/asp-getrows/

It was designed to easily (and quickly) render out HTML tables et al of data. Nothing saying you can't use it to render out a TAB'ed list too. Should work a treat although I've never used it with a spreadsheet.

This doc seems to say its possible though:

http://www.fontstuff.com/ebooks/free/fsADOConnectExcel.pdf

Pete Duncanson