views:

25

answers:

3

Short version:

We have multiple teams each developing multiple apps. They need to share some data. Should we combine these apps into one larger one to simplify data integration or should we keep them separate and utilize some data exchange/caching mechanism?

Longer version:

We have a number of teams each working on a set of applications. Many of these applications need to share data. One option is to use asynchronous messaging to have one system of record - where all writes occur - and to broadcast that data out to any other systems that need it. These systems would store the bits of data they need in a read only cache (in their database).

The benefit of this layout is that one system can blow up without affecting the other systems. It also makes it easier for individual teams to work on their individual applications. It makes scheduling of releases easier, smaller code base to navigate, etc.

Another option is to decide that these apps share way too much data, and that the overhead from the messaging/caching is too high. In this case you could decide to merge these three applications into one larger app. You would then completely eliminate the data integration problem, since you'd move the integration into the app's individual module's service/transactional layer. In other words, MyGiantApp could still be split (jars, app contexts, etc) into various modules who speak with each other via transactional services API of the other module. In our case we'd be using Spring almost like a service bus, with method invocation instead of web services or async messaging.

While this second option simplifies data integration, it complicates development. Now X teams have to work on the same code base. This can be eased somewhat by using branches, continuous integration, and separate libraries/contexts, etc, but at the end of the day it's still one deplorable artifact we're all building. Also, now the mistakes of one team can spread more easily to the entire app; one app blowing out the heap could take all down.

How would you decide when to use solution #1 and when to use solution #2?

A: 

Have you thought about concurrency, transactions, ...

I don't know your requirements but it feels like it's time to tackle this problem the right way and use a central "repository" to manage your shared data.

Gerrie Schenck
Transactions: JTA would be used for JMS messages. Concurrency is less of a problem in our case because we wouldn't have concurrent writes; there is one system of record responsible for managing writes. Data would then be broadcast via a simple Producer/Consumer model. Consumers wouldn't be exchanging that data with each other, etc. I don't understand your last comment, because in both scenarios there _is_ a central repository for managing the shared data. The question is how is that data accessed.
rcampbell
+1  A: 

Hi - not sure if I'm covering the exact points you want answered - but it's a start; ask for clarification if you want, and I'll update my answer accordingly.

In terms of data management you might want to start thinking along Master Data Management lines. The first thought that leaps to mind is that you don't want inconsistent Reference Data; this would suggest you use a single shared instance of the data - alternatively you could have multiple copies assuming the management and custodial processes around that data is crystal clear.

In terms of integration, reuse depends on the Non-Functional Requirements of the various systems. A single shared service makes it easy to manage the data (in that there's only a single copy to worry about) but it means that it can become a bottle-neck; further more the "weakest link in the chain" rule applies - what's the impact if the shared service goes down?

Possible Solution Options

It's hard to know what to recommened without more info, but my thoughts are:

  • Keep your three apps separate.
  • Use a single shared service for shared data.
  • Access to the shared data can be via a "service" - this would give you the most control and options.
  • The shared service might exist as a single instance - or - you might have mutliple instances of the service deployed (but only one instance of the DB).

The key point is tht you should be able to abstract out the shared service without affecting the other apps - specifcially, you shouldn't be thinking of making an uber app.

One more thing - thinking of there being one "service" in a business sense doesn't preclude having multiple ways of technically exposing and consuming it.

Adrian K
Hi Adrian, thank you for the thoughtful answer. You are correct in thinking that we wish to avoid inconsistent reference data. In the "one giant app" model this is ensured by having one module control all access to a data model, and all read/writes going through that module's API. In the distributed messaging model, this is ensured by making messages transactional and ensuring that local caches of the data are read-only. Any writes must be done through the "owner"/system of record app.
rcampbell
rcampbell
Your answer seems to suggest the distributed app model, but instead of implementing it w/async messaging + local data caches, to implement it with sync services (RMI, web services, etc). This synchronous model has the benefit of avoiding the need for local data caches, but as you correctly point out it's more brittle. The purpose of async messaging was that when the system of record app goes down, it doesn't take down all of the consumers of that data as well since they have their own local cache. Local caches also solve the performance problem of large data models being sent over the wire.
rcampbell
+1  A: 

I've put this in a seperate answer as its focus is quite different to my first one.

It sounds like you've thought things through and have a good grasp of both your problem and solution. Let me have another crack at (what I think is) your key question...

How would you decide...

Go back to basics: start with what you know to be true.
Have a whiteboard workshop with relevant people from your team(s), in which you need to:

  • Draw up a list of the relevant constraints.
  • List outcomes you don't want (single big unmanagable application, etc).
  • Also recommended but perhaps not essential: Identify risks and potential impacts (as part of the constraints or undesireable outcomes).
  • List the outcomes you want (sharing data, easy deployment, etc).
  • Prioritise the desireable and undesireable outcomes (very important), and don't be surprised if they change during the course of the debate.

This information should be enough to start establishing a framework for discussion and decison making.

There's also some extra things you can do ahead of this workshop (or as part of it depending on the political and social dynamics of your situation).

  • Identify the goals of the system - both in a "business" sense and an "architectural" sense. (hint: the two should align - or at least not conflict).
  • If there's a well defined Vision for the system or your business that should help (but bear in mind that these don't always exists, and good ones are even rarer).
  • Refer to business / strategic plans for the systems your working on - and consider the market, trends etc. What do you think is likely to happen to the applications in the future? Thinking ahead architecturally isn't the same as YAGNI at a code level, and careful consideration of the future may influence decsions you're making now.
Adrian K