views:

254

answers:

8

We have a client that needs to set up N local databases, each one containing one site's data, and then have a master corporate database containing the union of all N databases. Changes in an individual site database need to be propagated to the master database, and changes in the master database need to be propagated to the appropriate individual site database.

We've been using MySQL replication for a client that needs two databases that are kept simultaneously up to date. That's a bidirectional replication. If we tried exactly the same approach here we would wind up with all N local databases equivalent to the master database, and that's not what we want. Not only should each individual site not be able to see data from the other sites, sending that data N times from the master instead of just once is probably a huge waste.

What are my options for accomplishing this new star pattern with MySQL? I know we can replicate only certain tables, but is there a way to filter the replication by records?

Are there any tools that would help or competing RDBMSes that would be better to look at?

+1  A: 

Short answer no, you should redesign.

Long answer yes, but it's pretty crazy and will be a real pain to setup and manage.

One way would be to roundrobin the main database's replication among the sites. Use a script to replicate for say 30 seconds from a site record how far it got and then go on the the next site. You may wish to look at replicate-do-db and friends to limit what is replicated.

Another option that I'm unsure would work is to have N mysqls in the main office that replicates from each of the site offices, and then use the federated storage engine to provide a common view from the main database into the per-site slaves. The site slaves can replicate from the main database and pick up whichever changes they need.

A: 

Sounds like you need some specialist assistance - and I'm probably not it.

How 'real-time' does this replication need to be? Some sort of ETL process (or processes) is possibly an option. we use MS SSIS and Oracle in-house; SSIS seems to be fairly good for ETL type work (but I don't work on that specific coal face so I can't really say).

How volatile is the data? Would you say the data is mostly operational / transactional? What sort of data volumes are you talking about?

Is the central master also used as a local DB for the office where it is located? if it is you might want to change that - have head office work just like a remote office - that way you can treat all offices the same; you'll often run into problems / anomalies if different sites are treated differently.

Adrian K
+2  A: 

I've done this before, and AFAIK this is the easiest way. You should look in to using Microsoft SQL Server Merge Replication, and using Row Filtering. Your row filtering would be set up to have a column that states what individual site destination it should go to.

For example, your tables might look like this:

ID_column | column2 | destination

The data in the column might look like this: 12345 | 'data' | 'site1'

You would then set your merge replication "subscriber" site1 to filter on column 'destination' and value 'site1'.

This article will probably help:

Filtering Published Data for Merge Replication There is also an article on msdn called "Enhancing Merge Replication Performance" which may help - and also you will need to learn the basics of setting up publishers and subscribers in SQL Server merge replication.

Good luck!

vdoogs
A: 

it sounds like you would be better served by stepping outside of a direct database structure for this.

I don't have a detailed answer for you, but this is the high level of what I would do:

I would select from each database a list of changes during the past (reasonable time frame), construct the insert and delete statements that would unify all of the data on the 'big' database, and then separate smaller sets of insert and delete statements for each of the specific databases.

I would then run these.

There is a potential for 'merge' issues with this setup if there is any overlap with data coming in and out.

There is also the issue of data being lost or duplicated because your time frame were not constructed properly.

MikeEL
+1  A: 

If you need unidirectional replication, then use multiple copies of databases replicated in center of star and custom "bridge" application to move data further to the final one

RocketSurgeon
+1  A: 

Just a random pointer: Oracle lite supports this. I've evaluated it once for a similar task, however it needs something installed on all clients which was not an option. A rough architecture overview can be found here

wwerner
+1  A: 

Might be worth a look at mysql-table-sync from maatkit which lets you sync tables with an optional --where clause.

rjp
There's a good chance that will do it for us. We will definitely be checking it out. Thank you!
skiphoppy
+2  A: 

SymmetricDS would work for this. It is web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.

We have used it to synchronize 1000+ MySQL retail store databases to an Oracle corporate database.

chenson42