ansaurus

Question

Answer 1

+1 A:

This problem has the same property of instability on addition of new items that similar NP-complete problems do, so I assume it is one also. Since I suspect that you want something that works reasonably well instead of a proof of why it's hard, I'll focus on an algorithm to give an approximate solution.

I would solve this problem by converting this into a graph where bins were valued at 1/N if they had to be read N times per second, and blur the graph with a width of M (e.g. 6), peaked at the original. (For 6, I might use weighting (1/6 1/5 1/4 1/3 1/2 1 1/2 1/3 1/4 1/5 1/6).) Then throw bins at all the local maxima (sort pairs by distance apart and cover close pairs of maxima first if you can). Now you'll have most of your most important values covered. Then catch any missing groups by extending the existing reads, or by adding new reads if necessary. Depending on the structure you may want to add some refinement by shifting locations between reads, but if you're lucky that won't even be necessary.

Since this is essentially a local algorithm, if you keep track of the blurred graph, you can fairly easily add new items and re-do the peak-covering locally (and the refinement locally).

Just to see how this would work on your data, the two-group case would look like (multiplying by 60 so I don't have to keep track of fractional weights)

 60 30 20 15 12 10 00 00 00   <- contribution from left-most location
 10 12 15 20 30 60 30 20 15   <- second
 00 10 12 15 20 30 60 30 20   <- third
 00 00 00 10 12 15 20 30 60   <- rightmost
 --------------------------
 70 42 47 50 74 B5 B0 80 95   (using "B" to represent 11)
 ^^             ^^       ^^   Local maxima
   -------------  -------
     dist=6        dist=4
               |===========|  <- Hit closely-spaced peaks first
|==|                          <- Then remaining

So we're done, and the solution is optimal.

For the three group example, weighting "5" as "1/5" and multiplying everything by 300 so again there are no fractions,

060 030 020 015 012 010 000 000 000 000 000 000   <- from 5 on left
050 060 075 100 150 300 150 100 075 060 050 000   <- 1 on left
000 050 060 075 100 150 300 150 100 075 060 050   <- on right
000 000 000 000 000 000 010 012 015 020 030 060   <- 5 on right
-----------------------------------------------
110 140 155 190 262 460 460 262 190 155 140 110
                   |=======|                      <- only one peak, grab it
===                                         ===   <- missed some, so pick them back up

Rex Kerr 2010-09-14 14:37:12

OK, I'm back on this now. Unfortunately, the above algorithm fails in the latter example if all items update at the same rate - you get the same three groups, whereas you should get only two. I'm trying to see if I can tweak the algorithm to correct this, but it seems fundamentally difficult to get it to prefer a bigger group in order to satisfy what is essentially a non-local criterion (mininizing number of groups).

Ian Goldby 2010-09-27 13:15:59

You could repeatedly try to convert triples to pairs until no triples are improved if rewritten as pairs.

Rex Kerr 2010-09-27 16:31:55

I've added a bit more to the original question. I should perhaps explain why I am so concerned about this. I expect that in most practical cases the number of groups in the optimal case will be small: just one, two or three maybe. Adding just one extra group therefore is a very large additional cost, relatively speaking. Makes me wonder if a brute-force attack might be the answer…

Ian Goldby 2010-09-28 09:32:58

I think the point about there being no legal **partial** modifications of a solution is key. What other NP-Complete/Hard problems are there with this property, and how are they solved?

Ian Goldby 2010-09-28 10:11:16

@Ian - The property that any local change can propagate to the entire rest of the solution is something that all NP-type problems have. There are two ways to go: (1) exhaustive search (cleverly, of course; e.g. if there is a stretch of 0 that is too wide to be spanned by a single read, you split the problem into two subproblems), and (2) approximate heuristics that work reasonably well on average but necessarily fail in specific problematic cases. My solution was of type (2).

Rex Kerr 2010-09-28 13:56:00

What I was getting at was that in all NP-type problems I've looked at while researching this problem you can make a small local change to a solution and it will still be a valid solution, albeit probably less optimal, e.g. swapping two cities in the travelling salesman problem. But in the final example here there is no local modification I can make that results in a still-valid solution. I'm re-evaluating this though - if I can find a way to notate a solution such that invalid solutions have no representation, then a local modification would always be possible. Not sure if this is feasible.

Ian Goldby 2010-09-28 15:00:55

@Ian - You're just thinking about the wrong type of local modification. You can always split one read into two, which is a local modification. But that local modification won't lead you to the global solution easily--you're trapped in a local minimum. And other problems--e.g. traveling salesman--are exactly the same way. You can be stuck on a good but sub-optimal route and have to rework everything to get on the genuinely shortest path. Anyway, if the problem is small and optimality is critical, do a brute-force search. Otherwise, perhaps you can live with a few inefficiencies.

Rex Kerr 2010-09-28 15:18:41

Thanks for that really helpful comment. I have been looking at this problem for too long and couldn't see the wood for the trees. I realised that a modification can legitimately **increase** the cost function en-route to a better solution, but somehow missed the idea of splitting a group. That does indeed unblock the blockage, and means I could use an algorithm such as simulated annealing to see if a refinement can be found to the solution from your smoothed inverse rates algorithm.

Ian Goldby 2010-09-29 14:04:54

As it happens, I've also hit upon a reasonably efficient brute-force search that avoids considering non-valid groupings. Since there are vastly more invalid groupings than valid ones it reduces the search space enormously. It also has the nice feature that the first grouping it tries is an acceptable fallback, so I can safely stop the algorithm after, say, 50 msecs if it hasn't finished by then. Thanks for all your help.

Ian Goldby 2010-09-29 14:06:01

ansaurus

tags:

views:

answers:

Algorithm to find optimal groups

related questions