views:

240

answers:

6

There are two data types: tasks and actions. An action costs a certain time to complete, and a set of tasks this actions consists of. A task has a set of actions, and our job is to choose one of them. So:

class Task { Set<Action> choices; }
class Action { float time; Set<Task> dependencies; }

For example the primary task could be "Get a house". The possible actions for this task: "Buy a house" or "Build a house". The action "Build a house" costs 10 hours and has the dependencies "Get bricks" (which may cost 6 hours) and "Get cement" (which costs 9 hours), etcetera.

The total time is the sum of all the times of the actions required to perform (in this case 10+6+9 hours). We want to choose actions such that the total time is minimal.

Note that the dependencies can be diamond shaped. For example "Get bricks" could require "Get a car" (to transport the bricks) and "Get cement" would also require a car. Even if you do "Get bricks" and "Get cement" you only have to count the time it takes to get a car once.

Note also that the dependencies can be circular. For example "Money" -> "Job" -> "Car" -> "Money". This is no problem for us, we simply select all of "Money", "Job" and "Car". The total time is simply the sum of the time of these 3 things.

Mathematical description:

Let actions be the chosen actions.

valid(task) = ∃action ∈ task.choices. (action ∈ actions ∧ ∀tasks ∈ action.dependencies. valid(task))
time = sum {action.time | action ∈ actions}
minimize time subject to valid(primaryTask)

I'm interested in an optimal solution but also in an approximate solution. Perhaps some kind of dynamic programming can help there? If the problem is tree structured then dynamic programming can give an optimal solution in polynomial time, but diamond structures seem to make the problem much more difficult. If you have an algorithm but it doesn't work if there are cycles, do post it! I can probably still learn a lot from it.

alt text

The boxes represent tasks and the circles represent actions (the time to perform the action is in the circle). An action has a line to a task if that task is a dependency for the action. Here's the description of the problem again in terms of pictures: if a rectangle (=task) is chosen, then one of the circles (=actions) inside must be chosen. If a circle is chosen, then all of the connected rectangles must be chosen. The goal is to minimize the sum of the numbers in the chosen circles.

An optimal solution in this case is to choose the action with time 2 in the top task, and the actions with time 1 in the bottom tasks. The total time is 2+1+1=4. In this case there are 2 optimal solutions. The second solution is to choose the action with time 3 in the top task, and the action with time 1 in the bottom right task. The total time is 3+1=4 again. If we choose the action with time 3 in the top task we do not have to perform the left bottom task, because there is no line between the action with time 3 and the left bottom task.

I apologize for the crappy drawing ;) And two more examples (the optimal solution for each has been indicated with blue, and the primary task has been indicated with grey): alt text

+3  A: 

You could model this as a graph and use a shortest path algorithm to find the solution. Each of the Tasks would be a node and the Actions would be edges in the graph.

In fact it will probably be easier to represent nodes as states and edges as the actions needed to as edges to transition between states.

If you consider a task as simply an aggregate of actions and you model the nodes as states and the actions as transition between those actions. Instead of thinking of "Get a house" as the primary task, consider "Start" and "Have a house" as 2 nodes and "Get a house" as a transition between them. The "Get a house" transition action can be decomposed into a graph that represents the intermediate states and actions (i.e. nodes and edges). You should be able to decompose the problem down as far as necessary and calculate the shortest path from the resulting graph.

Michael Barker
Thanks. The shortest path between which nodes? I assume between the primary task and ...?
Jules
I don't see how this could work because in general you need to select a subgraph of the graph, not just a path.
Jules
This does sound like a problem that can be modeled with a graph, but the model must be more complex than this. Note that you have to take into account that there are "and" requirements (to build a house you need bricks AND cement) as well as "or" requirements (to have house you need to build a house OR by a house), which need to be represented differently.
sepp2k
Exactly. The graph is a kind of "two level" graph, alternating a level of actions with a level of tasks.
Jules
@sepp2k - I believe the semantics are: each Action in Task.choices is an "OR" requirement, and each Task in Action.dependencies is an "AND" requirement.
mbeckish
mbeckish: yes, that's right: a *task* could be "Get a house" with options "Build a house" OR "Buy a house". The *action* "Build a house" would have dependencies on *tasks* "Get bricks" AND "Get cement".
Jules
A: 

Topological Sorting

tsinik
Please elaborate. I don't think that topological sorting is not really related to this. For one the graphs can be cyclic, and topological sorting doesn't optimize anything.
Jules
Hi. If you have a cyclic graph then there is a task (1) that can not be competed, because it needs another task (2) to be complited before that depands on (1) to be complited...If your graph is acyclic than you will get an optimum result since you alwase chose to preform a task that doesn't depand on any other and thous the order of "critical tasks" is the best.If you want to make the algoritm better you can always choose the least expansive action from the Set of all nodes with no incoming edges.Hope it helps.
tsinik
+1  A: 

I think I did this ages ago in the context of PERT charts.

How big are these networks, and how frequently do you need to solve them? You don't actually have a performance problem until you see one.

I would use dynamic programming. I wouldn't assume shared subtasks would be a problem until I found out from experience that they were.

Mike Dunlavey
The structures are sometimes small and sometimes large thousands and sometimes millions of nodes, but the structure might be "easy" (e.g. mostly tree like). Shared subtasks are still common, though, so an algorithm certainly needs to take those into account. I would still like to know an optimal algorithm that I can use on small instances.
Jules
Jules
@Jules: Then I would start by generating a worst-case dataset, and coding it any way at all, and work forward from there. I've learned not to deal with problems in the abstract.
Mike Dunlavey
That would not work without already knowing a good algorithm, and I already know that I can do a bad algorithm. Just iterate over all possible action choices, of which there are 2^n for n tasks. For n = a million this will definitely not work.
Jules
Also, the problem is not abstract in my view. The conditions are quite well defined, there simply isn't more to it.
Jules
@Jules: I understand. An example of what I mean is, as a former mechanical engineer, there is no comparison between working with paper designs and having actual material you can get your fingers on. In programming, I may not know in advance the best way to do something, but in "working with the clay" ideas come to me.
Mike Dunlavey
A: 

I believe that each possible execution path must end with a Task whose set of choices consists entirely of Actions with no dependencies.

If that is true, then you can easily determine the minimal time for each such Task.

Then work backwards to the Actions that only depend upon Tasks that you have "solved", and calculate their total times.

Work backwards through all of the possible paths until you get to the beginning.

The run time should be O(number of nodes in graph), which should be the same run time that it took to generate the graph.

mbeckish
The problem description allows for cycles.
Nabb
Yes, and even if cycles are not allowed I still don't see how to do it. Diamond shapes are the problem. Give me a few minutes.
Jules
@Nabb I think any execution path that doesn't go into an infinite loop must have an end.
mbeckish
mbeckish, you're thinking in terms of execution paths. The problem doesn't specify execution order. The dependent tasks may well be executed before or after the action, it doesn't matter. You should think about it as selecting a subset of actions, the order doesn't matter.
Jules
Here is an example where your approach doesn't work: http://i.imgur.com/DjhuO.jpg Sorry graphviz rendered that upside down...The optimal solution is to choose the 5 and the two 0 and 0. However if you look at one of the two tasks connected to primary in isolation the optimal solution is to choose the 4 instead of 0+5.
Jules
Here is a better picture: http://i.imgur.com/6T6r8.png
Jules
If we use your method we start at the bottom task. We determine that the time is 5. Then we go on to the left task. We determine that the time is either 4, or 0+5. Because 4 is smaller, we see that the time here is 4. The same for the right task: takes time 4. Now the top task: we have to execute left and right, both cost 4 so the total is 8. Note that this is *not* the optimal solution!
Jules
@Jules - Even if you model it as a graph of how Tasks and Actions depend upon each other, rather than a graph of all possible execution paths, I think there must exists dead ends (Actions with no dependencies) to avoid infinite loops.
mbeckish
No, there cannot be infinite loops, just cycles. It is perfectly fine if two tasks are mutually dependent, for example the task "Walk" and "Move": to move you need to walk and to walk you need to move. You cannot do one without doing the other. The optimal solution is still well defined even if there are cycles.
Jules
I have put a new picture at the bottom of my post. It shows an example where the dynamic programming method you describe fails.
Jules
@Jules - When talking about cycles, I was thinking of a directed graph, where the edges are directed from each Action to its dependent Tasks. In that case, your examples would not be cycles.
mbeckish
Correct, but even without cycles your algorithm produces an incorrect answer.
Jules
@Jules - I believe my method works if you graph it is an execution path. Graphing it your way, it doesn't work.
mbeckish
Well, yeah, but I already mentioned that in the original post: if the graph is a tree then dynamic programming will trivially solve the problem.
Jules
@Jules - I don't mean your graph connecting Task.choices to Action.dependencies. I mean if you take your graph, and construct all of the possible execution paths. However, that might not be feasible, given the number of tasks and actions you quoted in another comment.
mbeckish
+1  A: 

You may be looking for the Partial Order Planning algorithm: http://blackcat.brynmawr.edu/~dkumar/UGAI/planning.html#algorithms

dreeves
This is not exactly the same problem but there are good ideas there that I can use. People of mathoverflow pointed out that my problem is NP-hard, so a polynomial time solution is probably too much to ask ;)
Jules
A: 

At risk of being overly picky, this would appear to be a classic Critical Path Method (CPM) problem rather than PERT. PERT assumes a worst, best, and average completion time for each task while Jules has specified only one time for each task. That said, you can use linear programming to find the critical path and produce earliest start times, latest finish times, etc for each activity. Here's a useful one page description of the approach.

Grembo
The CTM is different: the goal of that is to *schedule* (i.e. choose an ordering) the tasks whereas this problem is choosing which actions to take (not in which order). Second, the CTM tries to minimize the critical path whereas I would like to minimize total time if the tasks are to be solved by 1 worker.
Jules
CPM=CTM? At the link to CPM:"a) The critical path is the longest path through the network– it is the minimum time in which the network can be completed."Isn't this what you want - to minimize the total time required to complete the entire set of activities?"CPM scheduling" is really the process of using CPM as an aid to scheduling. Where the scheduling part comes in is the "process" of using CPM on an ongoing basis to monitor and revise the schedule.
Grembo