views:

2984

answers:

10

I've been working on this for a few days now, and I've found several solutions but none of them incredibly simple or lightweight. The problem is basically this: We have a cluster of 10 machines, each of which is running the same software on a multithreaded ESB platform. I can deal with concurrency issues between threads on the same machine fairly easily, but what about concurrency on the same data on different machines?

Essentially the software receives requests to feed a customer's data from one business to another via web services. However, the customer may or may not exist yet on the other system. If it does not, we create it via a web service method. So it requires a sort of test-and-set, but I need a semaphore of some sort to lock out the other machines from causing race conditions. I've had situations before where a remote customer was created twice for a single local customer, which isn't really desirable.

Solutions I've toyed with conceptually are:

  1. Using our fault-tolerant shared file system to create "lock" files which will be checked for by each machine depending on the customer

  2. Using a special table in our database, and locking the whole table in order to do a "test-and-set" for a lock record.

  3. Using Terracotta, an open source server software which assists in scaling, but uses a hub-and-spoke model.

  4. Using EHCache for synchronous replication of my in-memory "locks."

I can't imagine that I'm the only person who's ever had this kind of problem. How did you solve it? Did you cook something up in-house or do you have a favorite 3rd-party product?

A: 

Back in the day, we'd use a specific "lock server" on the network to handle this. Bleh.

Your database server might have resources specifically for doing this kind of thing. MS-SQL Server has application locks usable through the sp_getapplock/sp_releaseapplock procedures.

clintp
A: 

I made a simple RMI service with two methods: lock and release. both methods take a key (my data model used UUIDs as pk so that was also the locking key).

RMI is a good solution for this because it's centralized. you can't do this with EJBs (specialially in a cluster as you don't know on which machine your call will land). plus, it's easy.

it worked for me.

entzik
+1  A: 

I have done a lot of work with Coherence, which allowed several approaches to implementing a distributed lock. The naive approach was to request to lock the same logical object on all participating nodes. In Coherence terms this was locking a key on a Replicated Cache. This approach doesn't scale that well because the network traffic increases linearly as you add nodes. A smarter way was to use a Distributed Cache, where each node in the cluster is naturally responsible for a portion of the key space, so locking a key in such a cache always involved communication with at most one node. You could roll your own approach based on this idea, or better still, get Coherence. It really is the scalability toolkit of your dreams.

I would add that any half decent multi-node network based locking mechanism would have to be reasonably sophisticated to act correctly in the event of any network failure.

Craig
+1  A: 

Not sure if I understand the entire context but it sounds like you have 1 single database backing this? Why not make use of the database's locking: if creating the customer is a single INSERT then this statement alone can serve as a lock since the database will reject a second INSERT that would violate one of your constraints (e.g. the fact that the customer name is unique for example).

If the "inserting of a customer" operation is not atomic and is a batch of statements then I would introduce (or use) an initial INSERT that creates some simple basic record identifying your customer (with the necessary UNIQUEness constraints) and then do all the other inserts/updates in the same transaction. Again the database will take care of consistency and any concurrent modifications will result in one of them failing.

Boris Terzic
I apologies, I know my "question" wasn't the clearest in the world - , the ESB is interacting via services, so we have no control over constraints. I have to ask via services if their Customer exists, and then make a separate request to create one if it does not.
firebird84
Can you update the question?
Boris Terzic
+10  A: 

you might want to consider using Hazelcast distributed locks. Super lite and easy.

java.util.concurrent.locks.Lock lock = Hazelcast.getLock ("mymonitor");
lock.lock ();
try {
// do your stuff
}finally {
   lock.unlock();
}

Regards,

-talip

Hazelcast - Distributed Queue, Map, Set, List, Lock

That is EXACTLY the kind of semantics I was attempting to use with my (failed) home-grown solution. I'm curious why I wasn't able to find it on my own, but thanks for the tip!
firebird84
+4  A: 

Terracotta is closer to a "tiered" model - all client applications talk to a Terracotta Server Array (and more importantly for scale they don't talk to one another). The Terracotta Server Array is capable of being clustered for both scale and availability (mirrored, for availability, and striped, for scale).

In any case as you probably know Terracotta gives you the ability to express concurrency across the cluster the same way you do in a single JVM by using POJO synchronized/wait/notify or by using any of the java.util.concurrent primitives such as ReentrantReadWriteLock, CyclicBarrier, AtomicLong, FutureTask and so on.

There are a lot of simple recipes demonstrating the use of these primitives in the Terracotta Cookbook.

As an example, I will post the ReentrantReadWriteLock example (note there is no "Terracotta" version of the lock - you just use normal Java ReentrantReadWriteLock)

import java.util.concurrent.locks.*;

public class Main
{
    public static final Main instance = new Main();
    private int counter = 0;
    private ReentrantReadWriteLock rwl = new ReentrantReadWriteLock(true);

    public void read()
    {
        while (true) {
            rwl.readLock().lock();
                try {
                System.out.println("Counter is " + counter);
            } finally {
                rwl.readLock().unlock();
            }
            try { Thread.currentThread().sleep(1000); } catch (InterruptedException ie) {  }
        }
    }

    public void write()
    {
        while (true) {
            rwl.writeLock().lock();
            try {
               counter++;
               System.out.println("Incrementing counter.  Counter is " + counter);
            } finally {
                 rwl.writeLock().unlock();
            }
            try { Thread.currentThread().sleep(3000); } catch (InterruptedException ie) {  }
        }
    }

    public static void main(String[] args)
    {
        if (args.length > 0)  {
            // args --> Writer
            instance.write();
        } else {
            // no args --> Reader
            instance.read();
        }
    }
}
Taylor Gautier
You should use `Thread.sleep(...)` instead of `Thread.currentThread().sleep(...)` because it's a static method; your code might tempt you to do `someOtherThread.sleep(...)` which does not sleep `someOtherThread`.
newacct
+3  A: 

We use Terracotta, so I would like to vote for that.

I've been following Hazelcast and it looks like another promising technology, but can't vote for it since I've not used it, and knowing that it uses a P2P based system at its heard, I really would not trust it for large scaling needs.

But I have also heard of Zookeeper, which came out of Yahoo, and is moving under the Hadoop umbrella. If you're adventurous trying out some new technology this really has lots of promise since it's very lean and mean, focusing on just coordination. I like the vision and promise, though it might be too green still.

fern
+1  A: 

I was going to advice on using memcached as a very fast, distributed RAM storage for keeping logs; but it seems that EHCache is a similar project but more java-centric.

Either one is the way to go, as long as you're sure to use atomic updates (memcached supports them, don't know about EHCache). It's by far the most scalable solution.

As a related datapoint, Google uses 'Chubby', a fast, RAM-based distributed lock storage as the root of several systems, among them BigTable.

Javier
A: 

If you can set up your load balancing so that requests for a single customer always get mapped to the same server then you can handle this via local synchronization. For example, take your customer ID mod 10 to find which of the 10 nodes to use.

Even if you don't want to do this in the general case your nodes could proxy to each other for this specific type of request.

Assuming your users are uniform enough (i.e. if you have a ton of them) that you don't expect hot spots to pop up where one node gets overloaded, this should still scale pretty well.

Jonathan
A: 

We have been developing an open source, distributed synchronization framework, currently DistributedReentrantLock and DistributedReentrantReadWrite lock has been implemented, but still are in testing and refactoring phase. In our architecture lock keys are devided in buckets and each node is resonsible for certain number of buckets. So effectively for a successfull lock requests, there is only one network request. We are also using AbstractQueuedSynchronizer class as local lock state, so all the failed lock requests are handled locally, this drastically reduces network trafic. We are using JGroups (http://jgroups.org) for group communication and Hessian for serialization.

for details, please check out http://code.google.com/p/vitrit/.

Please send me your valuable feedback.

Kamran

Kamran