views:

23

answers:

1

We have 4 datasources.2 datasources are internal and we can directly connect to the database.For the 3rd datasource we get a flat file (.csv) and have to pull in the data.4rth datasource is external and we cannot access it directly.

We need to pull data from all the 4 datasources, run business rules on them and store them in our database. We have a web application that runs on top of this database.Also every month we have to pull the data and do any updates/deletes/adds etc to existing data.

I am pretty much ignorant about this process.Also Can you please point some good books to study this topic.

These are the current approaches that i was thinking of.

  • To write an internal webservice that will talk to internal datasoureces and pull data.Create a handler to the external datasource using middleware (mqseries is already setup for this in some other existing project,planning to reuse that).PUll data from csv file again using Java. On this data run some business rules from Java.Use this data. This approach might run in my dev box, but not sure what all problems can occur in prod (specially due to synchronization)
  • Pull data from internal using plain java jdbc connection.For the remaining 2 get flat files, dump data using sql loader.All the data goes to temporary tables first.Run busines rules thru pl/sql and use.
  • Use some ELT tool like informatica to pull data.write business rules in perl (invoked by informatica)

Thanks.

+1  A: 

A book like "The Data Warehouse ETL Toolkit" by Ralph Kimball is a good resource for learning techniques/architectures to bring data from different sources into one place.

DetectiveEric