views:

322

answers:

9

What would be a simplest algorithm one or more people could use to decide who of them should perform some task? There is one task, which needs to be done only once, and one or more people. People can speak, that is, send messages one to another. Communication must be minimal, and all people use the exact same algorithm.

One person saying "I'm doing it" is not good enough since two persons may say it at a same time.

Simplest that comes to my mind is that each person says a number and waits a bit. If somebody responds in that time, the person with lower number "wins" and does the task. If nobody responds, person says that she's doing it and does it. When she says that she does it, everybody else backs off. This should be enough to avoid two persons doing the task in the same time (since there is wait/handhake period), but might need a "second round" if both persons say the same number.

Is there something simpler?

For those curious, I'm trying to synchronize several copies of SecondLife LSL script to do something only once.

A: 

Make everyone stand in line. Next person in line gets the job.

jm
I actually like this one, it really is the simplest way (as I wrote in comment to nik). Creating a single line is a problem though.
Domchi
+1  A: 

Most languages have an atomic increment command. Initialize a variable to 0. Everyone increments the variable by 1. Then the thread can use it's personal return value of the increment to find which one it is. So if you have just one action to perform, perhaps number 0 does it.

The whole ordering is still usable for more complex actions like having half of your threads do one thing and half another, etc.

Edit: you say "people" but I assume you mean threads, since your last sentence says this is done by scripts.

SPWorley
I say people and mean clients with no shared memory. :)
Domchi
+2  A: 

This really depends on what communication primitives one has. If there's shared memory, a compare-and-swap is a nice, easy way - everyone just tries to replace 0 with 1, and whoever succeeds does the task.

If all you have is message-sending, you may need to implement a Paxos protocol or something along those lines. In this case, be very careful that you can prove correctness for your protocol, as it's more subtle than it looks!

Edit: Since you're saying you're using LSL, why not just have them query an external server using LlHTTPRequest to arbitrate? If you're worried about where to host the external server, you could use App Engine or something easily enough.

bdonlan
No shared memory. It's more like P2P, with clients which can exchange messages. I'm trying to avoid llHTTPRequest since I don't want external dependancy and complexity (especially error handling) it brings. I'll look into Paxos, thanks.
Domchi
+1  A: 

Ok this is a long shot. Maybe, it will help formulate some method.

Assumptions.

  • you have a way to communicate between the machines (the LSL instances)
  • you have a point of task-generation that can publish the task as a to-be-done across all instances

Election.

  • If you can create a list of some sort available to all instances
  • The task-generator can create a list instance (or enter requirement entry in the list)
  • Other instances detect the list (or the new entry in it)
  • There is a time out for the requirement within which instances wanting to pick it up have to respond
  • the instances can put their id on the list to indicate their interest in completing the task
  • after the timeout, the last one to answer is the one selected (or, depending on your dynamics, you can choose the first one on the list); i am assuming that the generator posts an acceptance for theat instance at this time
  • If all the instances can see the list the right one knows when the election is complete

The reaction time of individual instances and their availability should do your job

nik
No point of task generation, that is, clients poll server for tasks, but server serves simple web page and is not smart enough to assign tasks - clients have to negotiate who does it. If I had intelligent task generator, it would be simple enough - the first client who calls in gets the job. I could have first client that becomes aware of the task "reserve" a task, but there is a delay in client communication so I need some way to resolve conflicts if more than one client does that at the same time.
Domchi
Ok, in that case, I guess the server does have the ability to accept task reservations from clients. I am still thinking about how this can be used more efficiently. Meanwhile, do you think using a timeout to resolve conflicts would help? The conflict resolution delay cannot be bad enough to worry about introducing latencies in task start off.
nik
I hope that the conflicts can be easily resolved and the problem is limited to the effort in coding this and the short latency seen when this is invoked -- and, maybe the conflicts will not be so often.
nik
"Server" is maybe much too strong word. "Server" is basically a HTML file with tasklist which clients read from time to time. Clients can communicate though (send messages to each other and to all of them), and can use any kind of agreement to allocate work. I could write an intelligent server, but that would mean that I have to make it reliable/redundant as well, and I'm trying to avoid that. One of the clients can become "server", but choosing which one is essentialy the same problem as which one does the task... at least I think so.
Domchi
Or maybe not - when instance of client is first run, it could query if there is a server. If nobody replies, it becomes a server. So the "oldest" one is the decision-maker...
Domchi
And, periodically all the other clients recheck the servers' existence. If the server does not respond, a re-election is done for another one. This recheck could also be a pushed broadcast from the current server -- when the clients timeout they restart election.
nik
But, if your network breaks such that all clients can see your HTML page but the client-server communications are severed along a line, you could land up with two servers across this line. Just a theoretical case to remember.
nik
Please note that my comments use the word 'election' in the context of server election where as in the answer text I am talking about election of the client for completing a task.
nik
A: 

OK, since these are real people in a room the answer becomes easy, and it's a common solution.

Rock paper scissors.

Everyone pairs up and plays RPS. To make it easier, allow 2 and 3 players per group (even though this makes the odds not completely even). After every round, the winners pair up with winners and repeat until you have just one winner.

This actually scales decently well! I was once at a goofy team-building icebreaker and this worked for a big conference room with over 500 people in it within 5 minutes.

SPWorley
+1  A: 

A raffle.

Everyone gets a number.. it may be a seat number, it may be a ticket stub number.

Put the numbers in a hat, pull out one and have them stand up.

This also scales, even to huge numbers, and is faster. The weakness is that it does need the number-assignment prestep.

SPWorley
A: 

In the end, I came up with a solution that works pretty well. I started thinking - why not try to minimize collisions (that is, try to have only one volunteer) instead of negotiating who does the task?

The key principles are:

  • if the volunteer is the only volunteer, he's the one who does it
  • if two (or more) people want to do the task at the same time, make both wait for a random period to minimize collisions

The algorithm is:

  1. if you hear "Finished!" at any time, even while waiting, restart
  2. wait for random amount of time (between 30m and 1h 30m)
  3. wait for constant predefined amount of time (one cycle, 24h)
  4. if there's a task to be done
    1. say "I'm alive!"
    2. wait 5 seconds (assuming that task is always done in less than 5 seconds!)
    3. if you hear "I'm alive!" during that time
      1. repeat "I'm alive!"
      2. goto 2
    4. else (if you don't hear anything)
      1. do the task
      2. say "Finished!"
  5. restart

That basically means that there is a grammar of two possible signals/messages ("I'm alive!" and "Finished!").

Say n is number of clients/people. In ideal conditions, that also means 2 messages per cycle when no collisions for any n. When there are collisions, this means +2 messages per cycle per collision. There are low chances for it, but in the worst case scenario there are n messages plus the task doesn't get done in this cycle. In the worst case scenario where task does get done, there are n+1 messages.

Possible simplifications

Since I can afford to skip a cycle (task doesn't get done in this cycle), but I can't afford to have another task done before next cycle, I make everyone wait for one cycle in beginning. If you don't have "one task per cycle" requirement, you can skip step 3. There is still no guarantee that the task will be done in one cycle, but with large values in step 2, small values in step 4.2, and small number of clients/people we'd have pretty good chances.

Algorithm could work even with only one signal/message. We could skip step 1 and say "I'm alive!" in 4.4.2 as well, but only if 4.4.1 would make check in step 4 fail immediately.

Domchi
A: 

This is LSL prototype implementation (public domain, though you'll probably need to adapt it for your use):

// user configuration parameters
integer CHANNEL = -635;

// ------------------------------------------------

// scripter configuration parameters (in seconds)
float POLL_PERIOD = 60.0;               // 1 minute
// minimum length of suspend period
float CORE_SUSPEND_PERIOD = 15.0;       // 15 seconds 
// maximum length added to core suspend period (minimum is 0)
float MAX_RANDOM_SUSPEND_PERIOD = 30.0; // 30 seconds
float LOCK_PERIOD = 5.0;                // 5 seconds

// variables
integer lock = FALSE;


// mock poll method, assumes there are always tasks
integer poll()
{
    llSay(0, "Polling for tasks...");
    return TRUE;
}

// mock work method
work()
{
    llSay(0, "*** Executing task... ***");
}

default
{
    state_entry()
    {
        llSay(0, "Entering default state.");
        lock = FALSE;
        llListen(CHANNEL, "", NULL_KEY, "");
        llSetTimerEvent(POLL_PERIOD);
    }

    timer()
    {
        if (lock)
        {
            // step 4 - do some work
            work();
            // step 5 - make everybody go into suspend state
            // to make sure run times are randomized AND not sooner
            // than POLL_PERIOD
            llRegionSay(CHANNEL, "suspend");
            lock = FALSE;
            state suspended;
        }
        else
        {
            if (poll())
            {
                // step 1 - acquire lock
                llRegionSay(CHANNEL, "lock");
                lock = TRUE;
                // step 2 - wait and listen for others
                llSetTimerEvent(LOCK_PERIOD);
            }
        }
    }

    listen(integer channel, string name, key id, string message) 
    {
        // step 3 - did someone reply?
        if (message == "lock" && lock)
        {
            // other script woke up at the same time - signal
            // that you're here and suspend until next round,
            // where there will hopefully be a winner
            llSay(0, "Collision!");
            llRegionSay(CHANNEL, "lock");
            state suspended;
        }
        else if (message == "suspend")
            state suspended;
    }
}

state suspended
{
    state_entry()
    {
        // this gives random number between 0 and MAX_RANDOM_SUSPEND_PERIOD
        float random = llFrand(MAX_RANDOM_SUSPEND_PERIOD);
        float total = CORE_SUSPEND_PERIOD + random;
        llSetTimerEvent(total);
        llSay(0, "Entering suspended state for " + (string) ((integer) total)
            + " seconds.");
    }

    timer()
    {
        state default;
    }
}
Domchi
+1  A: 

I would say the answer depends on whether you are able to send broadcast messages or not.

if you have the ability to send broadcast, the you just wait a small (10ms) random period of time, send broadcast, wait for a reasonable period of time to allow for network delay, and then have the person who sent the earliest message do the task.

If your network only knows its neighbors, you do the same but in rounds; in each round you eliminate some nodes who go on to do a different thins.

Practically, I advise you to go with the 'earliest time' rather then 'smallest number' for the trivial reason that being the first should correlate with being a faster machine / having good network connection / being idle, that is the qualities that you would like for the machine selected to perform a task.

ilya n.