views:

162

answers:

4

I need to import tabular data into my database. The data is supplied via spreadsheets (mostly Excel files) from multiple parties. The format of each of these files is similar but not the same and various transformations will be necessary to massage the data into the final format suitable for import. Furthermore the input formats are likely to change in the future. I am looking for a tool that can be run and administered by regular users to transform the input files.

Now let me list some of the transformations I am looking to do:

  • swap columns:

Input is:

|Name|Category|Price|
|data|data    |data |

Output is

|Name|Price|Category|
|data|data |data    |
  • rename columns

Input is:

|PRODUCTNAME|CAT |PRICE|
|data       |data|data |

Output is

|Name|Category|Price|
|data|data    |data |
  • map columns according to a lookup table, like in the above examples: replace every occurrence of the string "Car" by "automobile" in the column Category
  • basic maths: multiply the price column by some factor
  • basic string manipulations Lets say that the format of the Price column is "3 x $45", I would want to split that into two columns of amount and price
  • filtering of rows by value: exclude all rows containing the word "expensive"
  • etc.

I have the following requirements:

  • it can run on any of these platform: Windows, Mac, Linux
  • Open Source, Freeware, Shareware or commercial
  • the transformations need to be editable via a GUI
  • if the tool requires end user training to use that is not an issue
  • it can handle on the order of 1000-50000 rows

Basically I am looking for a graphical tool that will help the users normalize the data so it can be imported, without me having to write a bunch of adapters.

What tools do you use to solve this?

+2  A: 
JRL
A: 

The Mule Data Integrator will do all of this from a csv file. So you can export your spreadsheet to a CSV file, and load the CSV file ito the MDI. It can even load the data directly to the database. And the user can specify all of the transformations you requested. The MDI will work fine in non-Mule environments. You can find it here mulesoft.com (disclaimer, my company developed the transformation technology that this product is based on).

Francis Upton
A: 

You didn't say which database you're importing into, or what tool you use. If you were using SQL Server, then I'd recommend using SQL Server Integration Services (SSIS) to manipulate the spreadsheets during the import process.

John Saunders
A: 

I tend to use MS Access as a pipeline between multiple data sources and destinations - but you're looking for something a little more automated. You can use macros and VB script with Access to help through a lot of the basics.

However, you're always going to have data consistency problems with users mis-interpreting how to normalize their information. Good luck!

Nick