views:

29

answers:

3

I'm trying to upload Excel (2003) file into Sql server 2005.

At present we are using SSIS to upload the file into a Temporary table (with defined schema) and then moving the data to their intended location using Stored procedure. This gives lots of errors due to datatype mismatch and sometimes null values are uploaded into table even though the excel file contains values.

Now I have the following options

Option 1 : (Suggested here)

Import the data into a table with all columns of Type varchar(255). Then have an SP validate the datatype of each column before moving it to live... Here again will it be a performance problem to use IsNumeric, len, IsDate etc ?

Option 2 : (XML / XSD)

Generate an XML out of the XL file and validate it against the predefined XSD before importing the xml through .NET or through SSIS. Will this way be slow compared to option 1? Will I be able to validate the length of the data too? Will I be able to report exactly which row and column caused the error?

From your opinion which is good on performance and other aspects... Particularly when there is a an error, I should be able to report to the user exactly which row and column caused the error.

If there is a better way to handle this... Please let me know...

Thanks

+1  A: 

Definitely separate validation from import. Personally, I would use MS Access for the validation, as it's more forgiving than SQL Server of Excel's data storage issues, but you can use a table on SQL Server with all varchar(255) fields if you want.

The import is trivial, the validation is a project all by itself.

Data can fail validation tests in so many ways, each way needing it's own custom code.

Beth
A: 

I personally use the former method, importing into a "schemaless" pre-staging table, then do data scrubbing to clean it up, shoving the good rows into a "proper" staging table (with a schema that matches the expected datatypes of the columns) and moving the bad rows into an exception table that matches the pre-staging table, with an added column describing which validation failed (either a text description or, if you really want to normalize it, an error code with a related error table).

technomalogical
A: 

Though the previous two answers are perfectly valid, the way I did was little different, hence wanted to provide the way here...

Step 1 : I genereated a Xml File from the Excel, using excel macro.

Step 2 : Validated the same using a predefined XSD, in .NET (Windows App). The XSD contains all the schema information like data type and even the length, scale and precision of the data required.

One more advantage here is that I can show the precise row and column in which the validation fails, so that user can correct it easily

Step 3 : Uploading the XML file using SSIS in to a table, which confirms to the given schema.

The disadvantages I found till now :

  1. The upload process is little slower as more operations are involved.
  2. Changing the DataType and length of even a single field takes 3 places to upload. (1 in the Schema, 2 in SSIS, 3 in Uploaded Table)

This is the first time I'm trying this. It works fine till now and I was able to achieve what I wanted. There may be other disadvantages, that may surface over time. I will try to keep this post updated, when I face some.

Thanks for all your help.

The King