My question applies to ETL scenarios, where transformation is performed outside of database (completely). If you were to Extract, Transform, and Load huge volumes of data (20+ million records or more) and the databases involved are : Oracle and MSSQL Server, what would be the best way to:
- Effectively read from the source database : Is there a way I could avoid all the querying over the network? I have heard good things about Direct Path Extract method/ bulk unload method - I'm quite not sure how they work, but I presume I would need a dump file of sorts for any kind of non-network based data read/import?
- Effectively write the transformed data to the target database?: Should I consider Apache Hadoop? Will it help me start my transformation and parallely load all my data to the destination database? - Would it be faster than say, Oracle's bulk load utility? If not,, is there a way to remote invoke bulk load utlities on Oracle/MSSQL Server?
Appreciate your thoughts/suggestions.