views:

141

answers:

1

A device contains an array of locations, some of which contain values that we want to read periodically.

Our list of locations that we want to read periodically also specifies how often we want to read them. It is permitted to read a value more frequently than specified, but not less frequently.

A single read operation can read a contiguous sequence of locations from the array, so it is possible to return a group of multiple values from one read operation. The maximum number of contiguous locations that can be read in a single operation is M.

The goal is to group locations so as to minimize the time-averaged number of read operations. In the event that there is more than one way to do this, the tie-breaker is to minimize the time-averaged number of locations read.

(Bonus points are awarded if the algorithm to do this allows incremental changes to the list of locations - i.e. adding or removing one location to/from the list doesn't require the groupings to be recalculated from scratch!)

I'll try to clarify this with some examples where M=6.

The following diagram shows the array of locations. The numbers represent the desired read period for that location.

| 1 | 1 |   |   | 1 |   |   |   |   |   | 5 |   | 2 |
\-------------------/                   \-----------/
     group A                               group B

In this first example group A is read every second and group B every 2 seconds. Note that the location that should be read every 5 seconds is actually read every 2 seconds - which is fine.

| 1 |   |   |   |   | 1 | 1 |   | 1 |
\-----------------------/\----------/
         group A            group B         (non-optimal!)

This example shows a failure of my initial simple-minded algorithm, which was to fill up the first group until full and then start another. The following grouping is more optimal because although the number of group reads per second is the same, the number of locations read in those groups is smaller:

| 1 |   |   |   |   | 1 | 1 |   | 1 |
\---/               \---------------/
group A                  group B            (optimal)

Finally, an example where three groups is better than two:

| 5 |   |   |   |   | 1 | 1 |   |   |   |   | 5 |
\-----------------------/\----------------------/
        group A                  group B    (non-optimal)

This solution requires two group reads per second. A better solution is as follows:

| 5 |   |   |   |   | 1 | 1 |   |   |   |   | 5 |
\---/               \-------/               \---/
group A              group B               group C

This requires two reads every 5 seconds (groups A and C) plus one every second (group B): 1.4 group reads per second.

Edit: (There is an even better solution to this example if you allow reads to be non-periodic. On the 1st second, read both groups of the first solution. On seconds 2, 3, 4 and 5 read group B of the second solution. Repeat. This results in 1.2 group reads per second. But I'm going to disallow this because it would make the code responsible for scheduling the reads much more complicated.)

I looked up clustering algorithms but this isn't a clustering problem. I also found Algorithm to allocate a list of numbers to N groups under certain condition, which pointed to the 'Bin packing' problem, but I don't think this is it either.

By the way, sorry for the vague title. I can't think of a concise description, or even relevant search keywords!

New examples added 28 September 2010:

This is like the previous example, but all items updating at the same rate. Now two groups is better than three:

| 1 |   |   |   |   | 1 | 1 |   |   |   |   | 1 |
\-----------------------/\----------------------/
        group A                  group B          (optimal)

I've started trying to see how iterative improvements might be implemented. Suppose a grouping algorithm came up with:

| 1 |   |   |   |   | 1 | 1 |   |   |   |   | 1 | 1 |   |   |   |   | 1 |
\---/               \-------/               \-------/               \---/
group A              group B                 group C               group D  (non-optimal)
\-----------------------/\----------------------/\----------------------/
        group A                  group B                  group C           (optimal)

This can be improved to three adjacent groups each of 6. Rex suggested (comment below) that I could try combining triplets into pairs. But in this case I would have to combine a quartet into a triplet, because there is no legal intermediate arrangement in which A+B+C (or B+C+D) can be rearranged into a pair leaving D as it is.

I originally thought that this was an indication that in the general case there is no guarantee that a new valid solution can be created from an existing valid solution by making a local modification. This would have meant that algorithms such as simulated annealing, genetic algorithms, etc, could be used to try to refine a suboptimal solution.

But Rex pointed out (comment below) that you can always split an existing group into two. Despite the fact that this always increases the cost function, all that means is that the solution needs to get out of its local minimum in order to reach the global minimum.

+1  A: 

This problem has the same property of instability on addition of new items that similar NP-complete problems do, so I assume it is one also. Since I suspect that you want something that works reasonably well instead of a proof of why it's hard, I'll focus on an algorithm to give an approximate solution.

I would solve this problem by converting this into a graph where bins were valued at 1/N if they had to be read N times per second, and blur the graph with a width of M (e.g. 6), peaked at the original. (For 6, I might use weighting (1/6 1/5 1/4 1/3 1/2 1 1/2 1/3 1/4 1/5 1/6).) Then throw bins at all the local maxima (sort pairs by distance apart and cover close pairs of maxima first if you can). Now you'll have most of your most important values covered. Then catch any missing groups by extending the existing reads, or by adding new reads if necessary. Depending on the structure you may want to add some refinement by shifting locations between reads, but if you're lucky that won't even be necessary.

Since this is essentially a local algorithm, if you keep track of the blurred graph, you can fairly easily add new items and re-do the peak-covering locally (and the refinement locally).

Just to see how this would work on your data, the two-group case would look like (multiplying by 60 so I don't have to keep track of fractional weights)

 60 30 20 15 12 10 00 00 00   <- contribution from left-most location
 10 12 15 20 30 60 30 20 15   <- second
 00 10 12 15 20 30 60 30 20   <- third
 00 00 00 10 12 15 20 30 60   <- rightmost
 --------------------------
 70 42 47 50 74 B5 B0 80 95   (using "B" to represent 11)
 ^^             ^^       ^^   Local maxima
   -------------  -------
     dist=6        dist=4
               |===========|  <- Hit closely-spaced peaks first
|==|                          <- Then remaining

So we're done, and the solution is optimal.

For the three group example, weighting "5" as "1/5" and multiplying everything by 300 so again there are no fractions,

060 030 020 015 012 010 000 000 000 000 000 000   <- from 5 on left
050 060 075 100 150 300 150 100 075 060 050 000   <- 1 on left
000 050 060 075 100 150 300 150 100 075 060 050   <- on right
000 000 000 000 000 000 010 012 015 020 030 060   <- 5 on right
-----------------------------------------------
110 140 155 190 262 460 460 262 190 155 140 110
                   |=======|                      <- only one peak, grab it
===                                         ===   <- missed some, so pick them back up
Rex Kerr
OK, I'm back on this now. Unfortunately, the above algorithm fails in the latter example if all items update at the same rate - you get the same three groups, whereas you should get only two. I'm trying to see if I can tweak the algorithm to correct this, but it seems fundamentally difficult to get it to prefer a bigger group in order to satisfy what is essentially a non-local criterion (mininizing number of groups).
Ian Goldby
You could repeatedly try to convert triples to pairs until no triples are improved if rewritten as pairs.
Rex Kerr
I've added a bit more to the original question. I should perhaps explain why I am so concerned about this. I expect that in most practical cases the number of groups in the optimal case will be small: just one, two or three maybe. Adding just one extra group therefore is a very large additional cost, relatively speaking. Makes me wonder if a brute-force attack might be the answer…
Ian Goldby
I think the point about there being no legal **partial** modifications of a solution is key. What other NP-Complete/Hard problems are there with this property, and how are they solved?
Ian Goldby
@Ian - The property that any local change can propagate to the entire rest of the solution is something that all NP-type problems have. There are two ways to go: (1) exhaustive search (cleverly, of course; e.g. if there is a stretch of 0 that is too wide to be spanned by a single read, you split the problem into two subproblems), and (2) approximate heuristics that work reasonably well on average but necessarily fail in specific problematic cases. My solution was of type (2).
Rex Kerr
What I was getting at was that in all NP-type problems I've looked at while researching this problem you can make a small local change to a solution and it will still be a valid solution, albeit probably less optimal, e.g. swapping two cities in the travelling salesman problem. But in the final example here there is no local modification I can make that results in a still-valid solution. I'm re-evaluating this though - if I can find a way to notate a solution such that invalid solutions have no representation, then a local modification would always be possible. Not sure if this is feasible.
Ian Goldby
@Ian - You're just thinking about the wrong type of local modification. You can always split one read into two, which is a local modification. But that local modification won't lead you to the global solution easily--you're trapped in a local minimum. And other problems--e.g. traveling salesman--are exactly the same way. You can be stuck on a good but sub-optimal route and have to rework everything to get on the genuinely shortest path. Anyway, if the problem is small and optimality is critical, do a brute-force search. Otherwise, perhaps you can live with a few inefficiencies.
Rex Kerr
Thanks for that really helpful comment. I have been looking at this problem for too long and couldn't see the wood for the trees. I realised that a modification can legitimately **increase** the cost function en-route to a better solution, but somehow missed the idea of splitting a group. That does indeed unblock the blockage, and means I could use an algorithm such as simulated annealing to see if a refinement can be found to the solution from your smoothed inverse rates algorithm.
Ian Goldby
As it happens, I've also hit upon a reasonably efficient brute-force search that avoids considering non-valid groupings. Since there are vastly more invalid groupings than valid ones it reduces the search space enormously. It also has the nice feature that the first grouping it tries is an acceptable fallback, so I can safely stop the algorithm after, say, 50 msecs if it hasn't finished by then. Thanks for all your help.
Ian Goldby