tags:

views:

406

answers:

5

I currently have a system where the server tells all client applications when to next connect to the server between a server configured time window (say 12 to 6 am client time).

The current algorithm does a mod of the client's 10 digit ID number(fairly distributed) by the number of seconds in the time window and gives a pretty evenly distributed, predictable time for each client to connect to the server. The problem now, is that clients are in different time zones dis-proportionately, and certain time zones overlap for the given window, so the net effect is that the load is not distributed evenly on the server. What I would like is to devise an algorithm that I could configure with a percentage of clients we currently have for each time zone, and have it distribute the client's next connect time between the window that results in an even load on the server in a manner that is predictable (non-random).

here is a simple graphical representation:

                            12AM 1AM 2AM 3AM 4AM 5AM 6AM GMT
GMT -4  40% of the clients  ||||||||||||||||||||||||||||||            
GMT -5  10% of the clients       ||||||||||||||||||||||||||||||        
GMT -6  20% of the clients           ||||||||||||||||||||||||||||||    
GMT -7  30% of the clients               ||||||||||||||||||||||||||||||
A: 

How about this for something simple:

  • If the load on the server is OK, send the client the same number of seconds you sent last time.

  • If the load on the server is too high, instead send the client some other random number in the time window.

Over a few days things should sort themselves out.

(This assumes you have some way of measuring the quantity you're trying to optimize, which doesn't seem too unreasonable.)

Jason Orendorff
I stated it needs to be Non-Random (IE deterministic)
duckworth
A: 

Why not generate your reconnect-window times in GMT on the server and convert to client local time before you send the time to the client?

jilles de wit
Regardless of local or GMT, the window applies to the client, so if it is 12-6 AM that is each clients specific time window based on their local time. It also doesn't solve the problem of being able to deterministically distribute the times based on the proportion of clients in a each time zone.
duckworth
+3  A: 

You could take into account the time zone of the user in addition to its ID.

One example solution that uses this would be the following:

There are 24 time zones. Calculate which relative load there is for each of the time zones. You can do this by summing the total number of clients from each time zone from your static data. Now you have "weighted time zones". Each time zone will get a time share proportional to its weight.

For example, if you have the following data (for simplicity, lets assume that there are only three time zones):

Time Zone | Clients num
------------------------
    0     |     20
    1     |     30
    2     |     10

Then you would divide your time interval size by 60, and give each of the time zones its share of time: the first time zone will get (20/60 * #time), the second will get the following (30/60 * #time) etc.

Once you have the smaller time frames, you can tell each client of its time according to your previous function (i.e. mod for example), using the smaller interval according to what you calculated for its specific time zone.

Notes:

  1. Obviously, you will need some minimum clients num value for time zones that are very low on traffic, but this is simple - you just edit the original table.
  2. This is one example of "time division", you can modify this example to your need, for example you could have mutual time frames for several time zones.

EDIT:

Given the example you added to your question, you could apply this method the following way:

If I understand you correctly, you have 10 hours in which your server is active, and you would like for the load to be more or less equal for each of these hours. Meaning: in each of these hours you would like that 10% of the clients would access the server. Using the idea explained above, it is possible to divide the users non uniformly, so that for each time zone, there are hours with "more probability", and hours with "less probability". In your example, in the GMT-4 group, 10%/40% of the clients will access the server in the first hour: 12AM-01AM GMT. It is possible to calculate the load for each of the time zones, so that the total load for the server in every hour is 10%. There are many methods to do this - a greedy one will do. Once you have this, you know the weights for each of the time zones, and it should be clearer how to use the time sharing method described above.

Anna
I have been thinking along similar lines to your suggestion, however I am trying to figure out an implementation using the weighting approach that is deterministic.
duckworth
This approach is deterministic. You just have several deterministic functions instead of one. Unless your users can change time zones, which is an option I haven't thought of.
Anna
note that there are rather more than 24 timezones, though simply rounding them to the nearest hour will likely be ok.
ShuggyCoUk
+4  A: 

Break the problem into two parts: (1) determining what distribution you want each set of clients to have; and (2) deterministically assigning reconnect times that fit that distribution.

For problem (1), consider a two-dimensional array of numbers, much like the diagram you've drawn: each row represents a time zone and each column represents an equal period of time (an hour, perhaps) during the day. The problem you have to solve is to fill in the grid with numbers such that

  • the total of each row is the number of clients in that time zone;
  • for each row, all the numbers outside that time zone's reconnect window are zero;
  • the sums of the columns do not exceed some predetermined maximum (and are as evenly balanced as possible).

This kind of problem has lots of solutions. You can find one by simulation without doing any hard math. Write a program that fills the grid in so that each time zone's clients are evenly distributed (that is, the way you're distributing them now) and then repeatedly moves clients horizontally from crowded times-of-day to less crowded ones.

For problem (2), you want a function that takes a ten-digit ID and a desired distribution (that is, one row of the matrix from problem 1 above), and deterministically produces a reconnect time. This is easily done by linear interpolation. Suppose the desired distribution is:

12:00    1:00   2:00   3:00   4:00   5:00   6:00 ...
  +------+------+------+------+------+------+----
  |    0 |    0 |  100 |   70 |   30 |    0 |   ...
  +------+------+------+------+------+------+----

First find the sum of the whole row, and scale the numbers up to the range of IDs. That is, divide by the sum and multiply by 1010.

12:00    1:00   2:00        3:00       4:00        5:00   6:00 ...
  +------+------+-----------+-----------+-----------+------+----
  |    0 |   0  | 500000000 | 350000000 | 150000000 |    0 |   ...
  +------+------+-----------+-----------+-----------+------+----

Now let x = the ten-digit ID, and read the row from left to right. At each box, subtract the value in that box from x. Keep going until the number in the box is greater than what's left in x. Return the time

(start time for this box) + (duration of this box) * x / (number in box)

Note that once you calculate the solution to problem (1), the reconnect times will be deterministic until the next time you recalculate the matrix. Then everyone's reconnect time will shift around a little--but not much, unless the matrix changes dramatically.

Jason Orendorff
This is good, particularly pointing out that you can only obtain "as evenly balanced as possible". Completely balanced may or may not be possible, depending on the distribution. This solution will only work if the time zone is known for a given client.
Zac Thompson
+1  A: 

I would define a helper class for each of the Timezones you are looking at:

class Timezone
{
  DateTime start;
  int hourlyWeights[6]; //assuming you have 6 hour long timeslot for every timezone

  DateTime GetStartTime(long clientId)
  {
    long allTicks = 3600*sum(hourlyWeights);
    long clientTicks = clientId%allTicks;
    int i = 0;
    while(clientTicks>hourlyWeights[i])
    {
      clientTicks -= hourlyWeights[i]*3600;
      i++;
    }
    long seconds = clientTicks/hourlyWeights[i];
    return start.AddHours(i).AddSeconds(seconds);
  }
}

You now use the method GetStartTime to get the start time for a client from this timezone. The idea here is that we have this hourlyWeights table with the distribution you want to get for the given timezone, e.g. [40, 20, 0, 0, 0, 0] would mean that these clients will be served only during the first 2 hours, and we want twice as many clients during the first hour. Note: I assume that ids are uniformly distributed among clients from a given timezone.

The tricky bit is to get these classes created. If you have fairly stable structure of customers, than you can figure out the distributions manually and put them in the config file. If it changes often, let me know and I will post some code to figure it out dynamically.

Grzenio