views:

208

answers:

7

Hi

We have a requirement in which we need to query data across 2 different databases ( 1 in SQL Server and other in Oracle).

Here are the scenarios which need to be implemented:

  1. Query: Get the data from one database and match for values in other
  2. Update: Get the data from one database and update the objects in other

Technology that we are using: ASP.net, C#

The options that we have thought about:

  1. Staging area in one database
  2. Link Server ( can't go with the approach as it is not allowed due to organization wide policy)
  3. Create web services
  4. Create 2 different DAL and perform list operations with the data from 2 sources in DAL

I would like to know what is the best design strategy to deal with this kind of a scenario? If yes, then what are the pros and cons of that approach

A: 

Will the results from one of the databases be small enough to efficiently pass around?

If so, I would suggest treating the databases as two independent datasources.

If the datasets are large, then you may have to consider some form of ETL into a staging area on one of the database. You may have issues if you need the queries to return up-to-date data from both databases. Because you will need to do a real-time ETL.

PenFold
Resultset will be typically in thousands
asyou007
Will you be doing the lookups and updates on record at a time, or batch (thousands at a time)???
hminaya
A: 

There is an article here about performing distributed transactions between Microsoft SQL server and Oracle:

I don't know how well this works, however if it does work, this will probably be the best solution for you:

  • It will almost certainly be the fastest method of querying across multiple database servers.
  • It should also allow for true transactional support even when writing to both databases.
Kragen
A: 

The best strategy for this will be to use Linked Server, as it is designed for querying and writing to heterogeneous databases as you described above. But obviously due to the policy constraint you mentioned, this is not the option.

Therefore, to achieve the result you want in the most optimal performance, here is what I suggest:

  • Decide which database contains the lookup data only (minimal dataset) and you will need to execute a query on it to pull the info out
  • Insert the lookup data using bulk copy into a temp/dummy table in the main database (contains most of the data that you will want to retrieve and return to the caller)
  • Use stored procedure or query to join the temp table with other tables in your main database to retrieve the dataset desired

The decision to whether to write this as web service or not isn't going to change the data retrieval process. But consideration should be given in essentially reducing the overhead on data transfer time by keeping the process as close to your db server as possible either on same machine or within LAN/high speed connection link.

Data update will be quite straightforward. It will just be the standard two phase operations of pull data out from one and update the other. -

Fadrian Sudaman
A: 

It's hard to tell what the best solution is. But we have a scenario that's nearly the same.

RealTime:

For realtime data updating, we are using WebServices, since in our case, the two different databases belongs to distinct projects. So every project offers a WebService which can be used for data retrieval and data update. That has the advantage, that the project must not take care for database structure changes as long the webservice interface does not change.

Static Data:

Static data (e.g. employees) will be mirrored because for faster access. For that huge amount of data we are using flat files for the nightly update.

In case of static data I think it's important to explicit define data owners. For every piece of data it should be clear which database has the original data, and which database only has shadow copies for faster access.

So Static data is readonly in the shadow database, or only updateable through designated WebServices.

BitKFu
+1  A: 

Is it not possible to use SSIS package to do the data transformation between 2 servers and invoke it either via ASP.Net & c# project or via schedule job invoked on demand?

TheITGuy
A: 

The problem with using multiple data sources in your .NET code is that you run the risk of having your CRUD ops fail ACID tests and having data inconsistencies.

I would be most inclined to pursue @Will A's comment to your question...

Set up a replication to a remove server, then link the two remote servers.

Matthew PK
A: 

Have multiple DALs and handle it in the application - thousands is not a big number, you need to worry only if you are into 100,000s or millions in which case your application will hang.

Use linq to perform data operations on the datasets that are generated rather than looping through them.

Roopesh Shenoy