What's the best way of synchronizing data between decoupled systems?

views:

374

answers:

+2 Q:

What's the best way of synchronizing data between decoupled systems?

I have let's say 2 (but they'll become more in the future) fully decoupled systems: system A and system B.

Let's say every piece of information on each system has an informationID. There's nothing stopping the informationID to be the same on different systems. What univocally identifies a piece of information across all systems is a Source-informationID pair.

Let's say I need to export a piece of information from System A to system B. I then want to export the same piece of information from System B and re-import it into System A and I need to be able to recognize that's the same piece of information.

What's the best way of doing this in people's experience?

This is what I am thinking to do:

Setup a message bus between the systems with message queues.
Setup endpoints for each system that will monitor changes and generate commands wrapped into messages that will be pumped into queues (for example when a piece of information is created/deleted/updated).
Assign ranks to the endpoints relative to create/delete/update commands in order no to rely on system names but only on a general hierarchy - so that each system doesn't need to know about the others.
Assign a treshold on update/delete/create command to each endpoint so that commands not meeting the treshold requirement will be filtered out and not processed

This won't solve the fact that I still need to carry around originalSource+originalSourceID though.

Any help appreciated.

+1 A:

Unless there is some specific limitation in the system design preventing this, I'd suggest factoring out the shared/sharable information into a separate DB that the other two can either reference or just replicate locally. Then you don't need the dual-element key nor any elaborate ESB contraption...

Jeff Kotula 2008-12-15 20:15:38

this is the Big-DB approach - which is an option I am looking into. It has its drawbacks though with regards to the fact that it could get messy soon enough.

JohnIdol 2008-12-15 20:50:32

+1 A:

This problem has been addressed by EAI (Enterprise Application Integration) vendors like Tibco and webMethods (now part of Software AG). I've never used Tibco before, but I've used webMethods to solve these kind of problems so I'll just focus on webmethods. For example, in an enterprise, data about employees could reside in both Active Directory and PeopleSoft. webMethods could be used to ensure changes, additions, deletes in one system (application) will be reflected in the other in real time. In some other organization, data about employees could also be in an Oracle or SQL Server database. Again, not a problem. These EAI tools like webMethods can talk to a wide variety of back-ends. webMethods is not limited to a single source and a single target, but because it has a publish-subscribe architecture, data from a single source can flow to multiple interested targets who subscribe to a particular piece of information. Guaranteed delivery and may other features can be found in these products. Back to the employee example, ultimately if one does it right, at any given time, all systems and applications in an enterprise can contain the same information about the employees without any discrepancy.

So instead of doing programming in C# or Java, you'll be doing webMethods programming which is very much like a 4GL language. I call it programming because there are still logic involved, loop, if then else, branch, variables, packages, etc but it's very procedure oriented, i.e. no concept of OOP at all.

These EAI tools are built with limited purposes in mind and one of the purposes is to synchronize data between disparate systems in an enterprise easily. And they do their job very well.

The drawback is these tools cost a lot of money. Companies often have a long-term strategy before investing in these tools.

Khnle 2008-12-15 20:42:13

+1 A:

We're doing pretty much exactly the sort of A -> B -> A thing that you describe. We initially considered trying to have all the A,B,C etcs being peers, but that was too hard, so we now designate one as the master, and the others the slaves. It's still easy enough to get stuff from one slave to another, but via the master.

It's all done over web services - datasets go up and down from slave to master and vice versa, and the slave runs the export on itself, and calls the import on the master. It then tells the master to do an export, and runs the import on itself.

So the code is identical on each system. It's only the slaves that call home.

The export and import processes tell the relevant business objects to do all their listing and saving stuff, since they already know how to instantiate and persist themselves from DataRows.

It's not a many-tens-of-transactions-per-second architecture, but it works, and can achieve nearly real time synchronisation.

We haven't improved on the Source/Id uniqueness, by the way :)

ChrisA 2008-12-15 21:00:34

sounds like a good option - one of my main worries is the Source-Id uniqueness though!

JohnIdol 2008-12-15 21:11:55

+1 A:

As somebody already wrote, this sounds like a typical EAI problem. Even if EAI tools used to be expensive, now there is a wide choice of free, open-source tools. Below a list of the ones I like most

My favorite is OpenESB, I know it best, it has a full IDE (Netbeans), optional support from a big vendor and a huge amount of additional components. For its simplicity and effectiveness I then love Apache Camel, but you can try some of those and decide which one works better for you. Then you can even decide to buy support services for all of those.

Maurizio 2009-01-31 02:38:04

+1 A:

This is hugely simplified if you assign each piece of information a GUID. If you need to keep track of source and other IDs, that's fine, but the information shuold always travel with its assigned GUID.

When a machine sees that piece of information again, it'll see the GUID and associate it with the existing data, and then you can decide what to do. But you already know it's the same data piece - just better traveled.

Keep in mind that GUIDs are created in such a manner that each machine will create its own and they won't conflict (for all practical intents and purposes) with the GUIDs created on another machine, or the same machine at a different time.

This is one of the bigger reasons GUIDs were created.

Adam Davis 2009-01-31 02:50:47

sounds like my GUID could be source + sourceID

JohnIdol 2009-01-31 15:46:25

ansaurus

tags:

views:

answers:

What's the best way of synchronizing data between decoupled systems?

related questions