I need to import tabular data into my database. The data is supplied via spreadsheets (mostly Excel files) from multiple parties. The format of each of these files is similar but not the same and various transformations will be necessary to massage the data into the final format suitable for import. Furthermore the input formats are likely to change in the future. I am looking for a tool that can be run and administered by regular users to transform the input files.
Now let me list some of the transformations I am looking to do:
- swap columns:
Input is:
|Name|Category|Price|
|data|data |data |
Output is
|Name|Price|Category|
|data|data |data |
- rename columns
Input is:
|PRODUCTNAME|CAT |PRICE|
|data |data|data |
Output is
|Name|Category|Price|
|data|data |data |
- map columns according to a lookup table, like in the above examples: replace every occurrence of the string "Car" by "automobile" in the column Category
- basic maths: multiply the price column by some factor
- basic string manipulations Lets say that the format of the Price column is "3 x $45", I would want to split that into two columns of amount and price
- filtering of rows by value: exclude all rows containing the word "expensive"
- etc.
I have the following requirements:
- it can run on any of these platform: Windows, Mac, Linux
- Open Source, Freeware, Shareware or commercial
- the transformations need to be editable via a GUI
- if the tool requires end user training to use that is not an issue
- it can handle on the order of 1000-50000 rows
Basically I am looking for a graphical tool that will help the users normalize the data so it can be imported, without me having to write a bunch of adapters.
What tools do you use to solve this?