tags:

views:

206

answers:

5

It's been awhile since my algorithms class in school, so forgive me if my terminology is not exact.

I have a series of actions that, when run, produces some desired state (it's basically a set of steps to reproduce a bug, but that doesn't matter for the sake of this question).

My goal is to find the shortest series of steps that still produces the desired state. Any given step might be unnecessary, so I'm trying to remove those as efficiently as possible.

I want to preserve the order of the steps (so I can remove steps, but not rearrange them).

The naive approach I'm taking is to take the entire series and try removing each action. If I successfully can remove one action (without altering the final state), I start back at the beginning of the series. This should be O(n^2) in the worst case.

I'm starting to play around with ways to make this more efficient, but I'm pretty sure this is a solved problem. Unfortunately, I'm not sure exactly what to Google - the series isn't really a "path," so I can't use path-shortening algorithms. Any help - even just giving me some terms to search - would be helpful.

Update: Several people have pointed out that even my naive algorithm won't find the shortest solution. This is a good point, so let me revise my question slightly: any ideas about approximate algorithms for the same problem? I'd rather have a short solution that's near the shortest solution quickly than take a very long time to guarantee the absolute shortest series. Thanks!

+1  A: 

The most obvious thing that comes to mind is a binary search-inspired recursive division into halves, where you alternately leave out each half. If leaving out a half at any stage of the recursion still reproduces the end state, then leave it out; otherwise, put it back in and recurse on both halves of that half, etc.

Recursing on both halves means that it tries to eliminate large chunks before giving up and trying smaller chunks of those chunks. The running time will be O(n log(n)) in the worst case, but if you have a large n with a high likelihood of many irrelevant steps, it ought to win ahead of the O(n) approach of trying leaving out each step one at a time (but not restarting).

This algorithm will only find some minimal paths, though, it can't find smaller paths that may exist due to combinatorial inter-step effects (if the steps are indeed of that nature). Finding all of those will result in combinatorial explosion, though, unless you have more information about the steps with which to reason (such as dependencies).

Barry Kelly
+2  A: 

If you are making strickly no assumptions about the effect of each action and you want to strickly find the smallest subset, then you will need to try all possible subets of actions to find the shortest seuence.

The binary search method stated, would only be sufficient if a single step caused your desired state.

For the more general state, even removing a single action at a time would not necessarily give you the shortest sequence. This is the case if you consider pathological examples where actions may together cause no problem, but individually trigger your desired state.

Your problem seem reducable to a more general search problem, and the more assumptions you can create the smaller your search space will become.

Akusete
No, you didn't read my answer. It was binary-search *inspired*; it does *not* discard half the search-space each time, as I explicitly say that your recurse on *both* halves.
Barry Kelly
My appologies, no offence intended, I was speaking from the point of view that by keeping both halves you loose the advantage over the scanning method.
Akusete
+4  A: 

Your naive n^2 approach is not exactly correct; in the worst case you might have to look at all subsets (well actually the more accurate thing to say is that this problem might be NP-hard, which doesn't mean "might have to look at all subsets", but anyway...)

For example, suppose you are currently running steps 12345, and you start trying to remove each of them individually. Then you might find that you can't remove 1, you can remove 2 (so you remove it), then you look at 1345 and find that each of them is essential -- none can be removed. But it might turn out that actually, if you keep 2, then just "125" suffice.

If your family of sets that produce the given outcome is not monotone (i.e. if it doesn't have the property that if a certain set of actions work, then so will any superset), then you can prove that there is no way of finding the shortest sequence without looking at all subsets.

ShreevatsaR
Very good point. OK, any ideas about approximation algorithms for the same problem? I don't necessarily need the shortest series: quickly finding a very short series is better than slowly finding the absolute shortest series in my application. Thanks!
Ben Brinckerhoff
A: 

You problem domain can be mapped to directional graph where you have states as nodes and steps as links , you want to find the shortest path in a graph , to do this a number of well known algorithms exists for example Dijkstra's or A*

Updated:

Let's think about simple case you have one step what leads from state A to state B this can be drawn as 2 nodes conected by a link. Now you have another step what leads from A to C and from C you havel step what leads to B. With this you have graph with 3 nodes and 3 links, a cost of reaching B from A it eather 2 (A-C-B) or 1 (A-B). So you can see that cost function is actualy very simple you add 1 for every step you take to reach the goal.

MichaelT
I'm afraid that I don't how such a mapping would work - each step doesn't have a cost like in a graph. It's either necessary or not and whether or not it's necessary may depend on other steps.
Ben Brinckerhoff
I updated my answer with more details please take a look
MichaelT
+2  A: 

Delta Debugging, A method for minimizing a set of failure inducing input, might be a good fit.
I've previously used Delta(minimizes "interesting" files, based on test for interestingness) to reduce a ~1000 line file to around 10 lines, for a bug report.

Hasturkun
+1 promised (reached vote limit, I must wait).
MaD70