tags:

views:

72

answers:

1

Looking for any recommendations for an ETL system for 200+ distributed systems (Windows, AS400, Linux etc).

We collect data each month from all of our customers (regardless of system type), bring it back, process it all together and send the aggregate solutions back to them. I'm tasked with automating this system - any suggestions on how to do this robustly, I really don't want to re-invent the wheel. I don't own any of the systems I'm pulling data from, which has made this task more difficult but can install a client.

I've created a prototype client/server architecture in Java with FTP for transport but it feels brittle to me. I should note that all of the extract/transformation code for the different systems already exists in Java (albeit legacy).

I should mention we pull data once per month currently, but working towards weekly.

Any insight is appreciated.

A: 

I think it would depend on how the project will become. If this porject will be adding more requirement and there is some money involved, the ETL tool might be good idea.

However, if you have fixed output(the report) now and it is not intended to go anywhere, the custom ETL might be worth it. The reason is the most ETL tools have various output format(Chart, text file etc) and convinience to use the tool but the bottom line is Data moving part is almost universal for all the tools. Even with any other ETL tool, you need to implement same query you are doing now, plus you need to learn the tool. Who knows? Some tool might involved in 200+ site installation.

Recently, our company spent a lot of money to buy report tools & servers & human resource to build good ETL since our in-house ETL has been critisized for the slowness and not professional looking(You know it is not using popular ETL tools. It is bunch of script command). With all the money spending, the project faced on almost dead end.

One more thing. I don't understand how Java & FTP is involved in this process. Can you directly connect the DB in your customer system using SQL? If you could, using SQL & stored procedure is always better idea than using JAVA & FTP.

Hope it would help.

exiter2000
BG4
Let me rephrase.. *in the past* we didn't directly connect..
BG4