tags:

views:

73

answers:

2

Im doing a one-time import routine from one system to another. The first step is to import a lots of data from a lots of csv-files. The first line in each csv-file is fieldnames tab separated. The rest is the data, also tab separated and rows are separated with a CR.

I have written a 20-liner routine that creates the tables and reads in the data to a sql database. I set all fields that contains only numeric data and has "ID" in its field name as INT and the rest as NVARCHAR (255).

Now I want to refine it so it looks at the data and try to find out what kind of data it contains in each column. There are time, dates, id-fields, textfields, and numeric fields.

I know how to do this, its fairly easy, but I want to know if there are some work done in this area and if there are some classes that handle this out there.

Finding out the preffered Size is harder. I think that I have to 1. Looking through all rows and find out the longest data for each column. 2. Have a lookup table where I map for example length 0-50 to be 100, length 51-255 to be 255 and so on.

Anyone have some smart imput on this? As I said its not anything particulary hard to do, but it would be nice to make a smart library of this, to use in other import-scenarios later on.

+1  A: 

I would run a pre-processing stage - analyse the data in the CSV files to work out sensible capacities for nvarchar - e.g., nvarchar(20) or nvarchar(2000). I would get it to print a report with an example entry for each column - indeed even generate the "create table" SQL statement which you can manually edit before running. If every entry in a column formats as a number then make it an int, and so on.

As a one-time thing I fully expect that you should visually check each table you would be creating first before running the data import.

JeeBee
+1  A: 

Have you considered ADO.NET? You can read the data into a dataset?

Connection string would look like: "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\TxtFile\;Extended Properties='text;HDR=Yes;FMT=Delimited'" Note the HDR=YES, which you would need.

And then when you have a connection to a directory you can use a select statement like:

"SELECT * FROM data.txt"

There are additional settings to tweak for how many rows to look at to determine the type etc. You might look into schema.ini file to manually adjust the types. Also, you can set the Registry Key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text to: ImportMixedTypes=Text from the default 'Majority Type'

I have also used the SQL Import table on an Excel file (after saving the csv file after opening in Excel). This works ok too. Nothing has worked perfectly though -sometimes I still find it easier to use some manual coding and check the column data myself.

aSkywalker