views:

172

answers:

2

Having spent some time working on data warehousing, I have created both ETL (extract transform load) and ELT (extract load transform) processes. It seems that ELT is a newer approach to populating data warehouses that can more easily take advantage of cluster computing resources. I would like to hear what other people think the advantages are of ETL and ELT over each other and when you should use one or the other.

A: 

I use both. It's simply a matter of convenience and functionality. It all depends on the case. Sometimes I do TEL - i.e. the transform is done in the source database (in a stored procedure or view) and then extracted and loaded directly.

Cade Roux
+1  A: 

Which is better is hard to answer -- depends on the problem.

I prefer multi-step ETL -- ECCD (Extract, Clean, Conform, Deliver) whenever possible. I also keep intermediate csv files after each extract, clean, and conform step; takes some disk space, but is quite useful. Whenever DW has to be re-loaded due to bugs in etl, or DW schema changes, there is no need to query source systems again -- it is already in flat files. It is also quite convenient to be able to grep, sed and awk through flat files in the staging area when needed. In the case when there are several source systems which feed into the same DW, only extract steps have to be developed (and maintained) for each of the source systems -- clean, conform, and deliver steps are all common.

Damir Sudarevic