tags:

views:

437

answers:

5

We are designing a fairly large brownfield application, and run into a bit of a issue.

We have a fairly large amount of information in a DB2 database from a legacy application that is still loading data. We also have information in an Oracle database that we control.

We have to do a 'JOIN' type of operation on the tables. Right now, I was thinking of pulling the information out of the DB2 table into a List<> and then iterating those into a SQL statement on the Oracle database such as:

select * from accounts where accountnum in (...)

Is there any easier way to interact between the databases, or at least, what is the best practice for this sort of action?

+1  A: 

I've done this two ways.

With two Sybase databases on different boxes, I set up store procedures, and called then like functions, to send data back and forth. This additionally allowed the sprocs to audit/log, to convince the customer no data was being lost in the process.

On an Oracle to Sybase one way, I used a view to marshall the data and each vendors' C libraries called from a C++ program that gave the C APIs a common interface.

On a MySQL and DB2 setup, where like your situation, the Db2 was "legacy but live", I employed a setup similar to what you're describing: pulling the data out into a (Java) client program.

If the join is always one-to-one, and each box's resultset has the same key, you can pull them both with the same ordering and trivially connect them in the client. Even if they're one-to-many, stitching them together is just a one-way iteration of both of your lists.

If it gets to be many-to-many, then I might fall back to processing one item at a time (though you could use HashSet lookup).

Basically, though, your choices are sprocs (for which you'd need to and a client layer), or just doing it in the client.

tpdi
A: 

If it is possible to copy the data from the legacy database to the database you control, you can think to a data extraction job that copies once per day (or as often as possible) the new records from the legacy DB to the Oracle DB. It might not be so simple, if you can't identify the new records that are produced in the legacy database since the last data loading.

Then, you can do the joins in your Oracle instance.

Cătălin Pitiș
A: 

If you ask the vendors, probably the best practice would be to buy another product.

From the IBM side, there is IBM Federation Server, which can "Combine data from disparate sources such as DB2, Oracle, and SQL Server into a single virtual view." I imagine there is also one from Oracle but I'm less familiar with their products.

Michael Sharek
+1  A: 
  1. You can export data from DB2 in flat file format and use this flat file as an external table or use sql loader, this is a batch process.

  2. There is also something called heterogeneous connectivity. Here you create a database link from Oracle to DB2. This makes it possible to query your DB2 database real time and you can join a Oracle table with a DB2 table.

You can also use this database link in combination with materialized views.

There are different kinds of heterogeneous connectivity so read the documentation carefully.

tuinstoel
+1  A: 

Does it have to be real time data?. If so, there are products available for heterogeneous connectivity especially db2 relational connect which is part of federated server. If the lag is accepted, you can setup scripts to replicate the data to oracle using which you can do a native join. You will get poor performance with pulling data to client application. If this is the only option, try to create a db2 stored procedure to return the data which will make the performance slightly better.

kishore