views:

69

answers:

0

My question applies to ETL scenarios, where transformation is performed outside of database (completely). If you were to Extract, Transform, and Load huge volumes of data (20+ million records or more) and the databases involved are : Oracle and MSSQL Server, what would be the best way to:

  1. Effectively read from the source database : Is there a way I could avoid all the querying over the network? I have heard good things about Direct Path Extract method/ bulk unload method - I'm quite not sure how they work, but I presume I would need a dump file of sorts for any kind of non-network based data read/import?
  2. Effectively write the transformed data to the target database?: Should I consider Apache Hadoop? Will it help me start my transformation and parallely load all my data to the destination database? - Would it be faster than say, Oracle's bulk load utility? If not,, is there a way to remote invoke bulk load utlities on Oracle/MSSQL Server?

Appreciate your thoughts/suggestions.