views:

34

answers:

1

Hi guys,

We have a normalized SQL Server 2008 database designed using generic tables. So, instead of having a separate table for each entity (e.g. Products, Orders, OrderItems, etc), we have generic tables (Entities, Instances, Relationships, Attributes, etc).

We have decided to have a separate denormalized database for quick retrieval of data. Could you please advise me of various technologies out there to synchronize these 2 databases, assuming they have different schemas?

Cheers, Mosh

A: 

When two databases have so radically different schemas you should be looking at techniques for data migration or replication, not synchronization. SQL Server provides two technologies for this, SSIS and Replication, or you can write your own script to do this.

Replication will take new or modified data from a source database and copy it to a target database. It provides mechanisms for scheduling, packaging and distributing changes and can handle both real-time as well as batch updates. To work it needs to add enough info in both databases to track modifications and matching rows. In your case it would be hard to identify which "Products" have changed as you would have to identify all relevant modified rows in 4 or more different tables. It can be done but it will require some effort. In any case, you would have to create views that match the target schema, as replication doesn't allow any transformation of the source data.

SSIS will pull data from one source, transform it and push it to a target. It has no built-in mechanisms for tracking changes so you will have to add fields to your tables to track changes. It is strictly a batch process that can run according to a schedule. The main benefit is that you can perform a wide variety of transformations while replication allows almost none (apart from drawing the data from a view). You could create dataflows that modify only the relevant Product field when a Product related Attribute record changes, or simply reconstitute an entire Product record and overwrite the target record.

Finally, you can create your own triggers or stored procedures that will run when the data changes and copy it from one database to the other.

I should also point out that you have probably over-normalized your database. In all three cases you will have some performance penalty when you join all tables to reconstitute an entity, resulting in a larger amount of locking that is necessary and inefficient use of indexes. You are sacrificing performance and scalability for the sake of ease of change.

Perhaps you should take a look at the Sparse Column feature of SQL Server 2008 for a way to support flexible schemas while maintaining performance and scalability.

Panagiotis Kanavos
Thank you very much for your long detailed answer. Just a few thoughts: I thought about creating my own stored procedures to update the target database. However, this blocks the initial request (that was for modifying the data). So, in other words, the user has to wait till the data is updated in 2 places. I read some about nServiceBus and thought I can perhaps implement a separate denormalizer that I can call asynchronously. This way, the user gets a response quickly without having to wait for the Read database(s) to be updated. What are your thoughts about this?
Mosh
Also, as for SSIS, you mentioned it runs according to a schedule. Wouldn't that be an issue for me in this case? Imagine the user inserts a new Product but SSIS may run 2 hours later. Assuming all queries to read the data are handled by the Read database, the user may insert a new Product and not see it in 2 hours!
Mosh