views:

38

answers:

2

I have a table called Animals. I pull data from this table to populate another system.

I get Excel data with lists of animals that need to go in the Animals table.

The Excel data will also have other identifiers, like Breed, Color, Age, Favorite Toy, Veterinarian, etc.

These identifiers will change with each new excel file. Some may repeat, others are brand new.

Because the fields change, and I never know what new fields will come with each new excel file, my Animals table only has Animal Id and Animal Name.

I've created a Values table to hold all the other identifier fields. That table is structured like this:

AnimalId
Value
FieldId
DataFileId

And then I have a Fields table that holds the key to each FieldId in the Values table.

I do this because the alternative is to keep a big table with fields that may not even be used each time I need to add data. A big table with a lot of null columns.

I'm not sure my way is a good way either. It can seem overly complex.

But, assuming it is a good way, what is the best way to get this excel data into my Values table? The list of animals is easy to add to my Animals table. But for each identifier (Breed, Color, etc.) I have to copy or import the values and then update the table to assign a matching FieldId (or create a new FieldId in the Fields table if it doesn't exist yet).

It's a huge pain to load new data if there are a lot of identifiers. I'm really struggling and could use a better system.

Any advice, help, or just pointing me in a better direction would be really appreciated.

Thanks.

A: 

Depending on your client (eg, I use SequelPro on a Mac), you might be able to import CSVs. This is generally pretty shaky, but you can also export your Excel document as a CSV... how convenient.

However, this doesn't really help with your database structure. Granted, using foreign keys is a good idea, but importing that data unobtrusively (and easily) is something that will need to likely be done a row at a time.

However, you could try modifying something like this to suit your needs, by first exporting your Excel document as a CSV, removing the header row (the first one), and then using regular expressions on it to change it into a big chunk of SQL. For example:

Your CSV:

myval1.1,myval1.2,myval1.3,myval1.4
myval2.1,myval2.2,myval2.3,myval2.4
...

At which point, you could do something like:

myCsvText.replace(/^(.+),(.+),(.+)$/mg, 'INSERT INTO table_name(col1, col2, col3) VALUES($1, $2, $3)')

where you know the number of columns, their names, and how their values are organized (via the regular expression & replacement).

Might be a good place to start.

mway
A: 

Your table looks OK. Since you have a variable number of fields, it seems logical to expand vertically. Although you might want to make it easier on yourself by changing DataFileID and FieldID into FieldName and DataFileName, unless you will use them in a lot of other tables too.

Getting data from Excel into SQL Server is unfortunately not so easy as you would expect from two Microsoft products interacting with eachother. There are several routes that I know of that you can take:

  1. Work with CSV files instead of Excel files. Excel can edit CSV files just as easily as Excel files, but CSV is an infinitely more reliable datasource when it comes to importing. You don't get problems with different file formats for different Excel versions, Excel having to be installed on the computer that will run the script or quirks with automatic datatype recognition. A CSV can be read with the BCP commandline tool, the BULK INSERT command or with SSIS. Then use stored procedures to convert the data from a horizontal bulk of columns into a pure vertical format.

  2. Use SSIS to read the data directly from the Excel file(s). It is possible to make a package that loops over several Excel files. A drawback is that the column format and the sheet name of the Excel file has to be known beforehand, and so a different template (with a separate loop) has to be made each time a new Excel format arrives. There exist third-party SSIS components that claim to be more flexible, but I haven't tested them out yet.

  3. Write a Visual C# program or PowerShell script that grabs the Excel file, extracts the data and outputs into your SQL table. Visual C# is a pretty easy language with powerful interfaces into Office and SQL Server. I don't know how big the learning curve is to get started, but once you do, it will be a pretty easy program to write. I have also heard good things about Powershell.

  4. Create an Excel Macro that uses VB code to open other Excel files, loop through their data and write the results either in a predefined sheet or as CSV to disk. Once everything is in a standard format it will be easy to import the data using one of the above methods.

Since I have had headaches with 1) and 2) before, I would advise on either 3) or 4). Because of my greater experience with VBA than Visual C# or Powershell, I´d go for 4) if I was in a hurry. But I think 3) is the better investment for the long term.

(You could also go adventurous and use another script language, such as Python as I once did because Python is cool, unfortunately Python offers pretty slow and limited interfaces to SQL server and Excel)

Good luck!

littlegreen