views:

32

answers:

2

Hi,

I am working on an in-house ETL solution, from db1 (Oracle) to db2 (Sybase). We needs to transfer data incrementally (Change Data Capture?) into db2.

I have only read access to tables, so I can't create any table or trigger in Oracle db1.

The challenge I am facing is, how to detect record deletion in Oracle?

The solution which I can think of, is by using additional standalone/embedded db (e.g. derby, h2 etc). This db contains 2 tables, namely old_data, new_data.

old_data contains primary key field from tahle of interest in Oracle.

Every time ETL process runs, new_data table will be populated with primary key field from Oracle table. After that, I will run the following sql command to get the deleted rows:

SELECT old_data.id FROM old_data WHERE old_data.id NOT IN (SELECT new_data.id FROM new_data)

I think this will be a very expensive operation when the volume of data become very large. Do you have any better idea of doing this?

Thanks.

+4  A: 

Which edition of Oracle ? If you have Enterprise Edition, look into Oracle Streams. You can grab the deletes out of the REDO log rather than the database itself

Gary
+1 Streams Change Data Capture is definitely the lightest touch on the source database, but you'll have to make a one-time alteration of the tables to add supplemental logging.
dpbradley
I am not allowed to modify/add the source table in anyway.The ownership of the source db is belong to another company.
janetsmith
A: 

One approach you could take is using the Oracle flashback capability (if you're using version 9i or later):

http://forums.oracle.com/forums/thread.jspa?messageID=2608773

This will allow you to select from a prior database state.

If there may not always be deleted records, you could be more efficient by:

  • Storing a row count with each query iteration.
  • Comparing that row count to the previous row count.
  • If they are different, you know you have a delete and you have to compare the current set with the historical data set from flashback. If not, then don't bother and you've saved a lot of cycles.

A quick note on your solution if flashback isn't an option: I don't think your select query is a big deal - it's all those inserts to populate those side tables that will really take a lot of time. Why not just run that query against the sybase production server before doing your update?

Greg Harman
The direction of communication between the servers is only one way which is from Oracle to Sybase. This is due to some sort of security policy of my client's company.
janetsmith