views:

311

answers:

6

(This is not exactly the problem that I have, but it's isomorphic, and I think that this explanation will be easiest for others to understand.)

Suppose that I have a set of points in an n-dimensional space. Using 3 dimensions for example:

A : [1,2,3]
B : [4,5,6]
C : [7,8,9]

I also have a set of vectors that describe possible movements in this space:

V1 : [+1,0,-1]
V2 : [+2,0,0]

Now, given a point dest, I need to find a starting point p and a set of vectors moves that will bring me to dest in the most efficient manner. Efficiency is defined as "fewest number of moves", not necessarily "least linear distance": it's permissible to select a p that's further from dest than other candidates if the move set is such that you can get there in fewer moves. The vectors in moves must be a strict subset of the available vectors; you can't use the same vector more than once unless it appears more than once in the input set.

My input contains ~100 starting points and maybe ~10 vectors, and my number of dimensions is ~20. The starting points and available vectors will be fixed for the lifetime of the app, but I'll be finding paths for many, many different dest points. I want to optimize for speed, not memory. It's acceptable for the algorithm to fail (to find no possible paths to dest).

Update w/ Accepted Solution

I adopted a solution very similar to the one marked below as "accepted". I iterate over all points and vectors and build a list of all reachable points with the routes to reach them. I convert this list into a hash of <dest, p+vectors>, selecting the shortest set of vectors for each destination point. (There is also a little bit of optimization for hash size, which isn't relevant here.) Subsequent dest lookups happen in constant time.

+2  A: 

Given that you have the starting points and a fixed set of vectors, can you calculate the list of all reachable destinations and then just look up a given destination?

Chris Hulan
+4  A: 

I believe you would be able to make a generalized application of the A* (aka A star) path finding algorithm. There is no reason why it can't be done in Nth space. It gaurantees the most optimal path, if you can describe the cost of each move.

http://en.wikipedia.org/wiki/A*_search_algorithm

Matt
There is -- considering you have a set of moves, distance is **NOT** a good approximation.
Kornel Kisielewicz
@Kornel I'm not sure I follow
Matt
@Matt -- A* heuristic is distance based -- that works good if you take single steps -- here, the movement can be more like that of a Chessboard knight -- worse, it's got a few moves and then they end. In short -- "Hitting" the target is more a problem then "Getting close". It would be perfectly possible for the first move in the solution to take you "AWAY" from the target -- and that could be the only solution. A* would then traverse the whole possibility tree. Not to mention that to decide if a path is not possible, you'd have to traverse the tree anyway -- the heuristic is not much help here
Kornel Kisielewicz
@Kornel A* heuristic is cost based, it just so happens the most common implementation of cost is distance. That doesn't have to be the case - it can be anything you want it to be. And you don't have to traverse the whole tree if there is a solution. A* is (sort of) a breadth first search, so you stop when you reach your destination, and it is guaranteed to be the most efficient. You can discard all unvisited paths because you already know they are more costly. You would end up traversing the whole tree if there was no solution, but I can't think of an algorithm that can avoid that.
Matt
@Matt : The algorithm presented by Strilanc is already much better, because it reduces the space to 2^10 possibilities.
Kornel Kisielewicz
@Kornel I wouldn't be convinced it's better without a mathematical proof comparing the two, or a real life demonstration. Just because that approach reduces the search space to 2^10 doesn't mean anything - A*, as a heuristic, can eliminate dead/unimportant branches of the search space. Since we don't know what his cost evaluation is, or what his *real* problem is (since this is a simplified/orthogonal example), it will ultimately come down which is better in the given use case. We don't have enough information to determine what that is though.
Matt
cont) In other words, Strilanc's solution may work just fine. But you have to examine all 2^10 possibilities, and even then do some optimization after that (though not sure what he meant there). A* could possibly find the solution without looking at 2^10 possibilities. I may also take longer, but only JS Bangs knows the answer to that.
Matt
A* is a very good heuristic, on the assumptions that you can reasonably calculate a distance, and getting close is good. I don't think this holds. The goal is not to get close to a destination point, but hit it, and that means that farther away is better if there's one big vector that will hit the destination. If you think A* would work, please suggest a heuristic function.
David Thornley
It reduces to breadth-first search with the heuristic being number of vectors added.
klochner
@David What is this business about "getting close", both you and Kornel mentioned it. A* will get you *exactly* at your destination, if a solution is available.
Matt
They mean that for any given point, you have no good heuristic for estimating total path length to the goal. Your heuristic will be existing path length, which is breadth first search.
klochner
Note that as usual in breadth-first search, you can search from both ends (from the target point *dest* and from the starting points) and try to meet in the middle. At first it is cheaper to search from *dest*, but once you get to 8 or 9 moves and still haven't reached some starting points, it is cheaper to do a round going in the opposite direction.
Jason Orendorff
A: 
  1. Begin at the start.
  2. Do for a While
  3. Get the distance to the destination
  4. Test all possible moves and pick the one that gets you closest to the destination in one move.
  5. end do

This may oscillate around the destination, but it will get you close.

xpda
+5  A: 

Actually, considering that you have around 10 vectors, you can, for a given dest point, calculate only the 1024 "targets" from the subset of vectors -- e.g every reachable space, with the information about what set of moves gets there. That might be "slow", or "fast" depending on context (it's be absurdly fast if implemented on a parallel computing device like the GPU).

Having all the sets that get there you can calculate the paths a lot quicker, then you can pick the point from which to get to dest in the fewest moves, choosing from the subset of the ones that are either your query or further.

(thanks to Strilanc)

Kornel Kisielewicz
You're going to want a dynamic programming approach with this solution.
klochner
@klochner Please explain your reasoning. If the dimensions and ranges are large, that's going to chow memory (and therefore time). 1024 destinations is sweet nothing.
marcog
Gave it some more thought and you're right - it's faster to just calculate all the sums (1024*20) and check for matches (1024*20*100 in the worst case).
klochner
The only problem that I see with this is that search time increases exponentially with every vector that I add. The # of vectors is _about_ 10, but it could maybe get as high as 20 or 30, which would put the performance of this algo through the floor.
JSBangs
One billion (2^30) is still easily calculatable, especially in parallel. CUDA would be a great tool for this task :>
Kornel Kisielewicz
Also, your problem is NP-complete, hence you can either ho with a exponential result, or try to do approximations that won't necessary work.
Kornel Kisielewicz
It's not the search time that increases exponentially - it's the generation time for the reachability table, which can be done upfront (since it depends only on the vectors and starting points, which are fixed). The search time through that table can even be made O(1), since you can build a perfect hash (because the table contents are fixed).
caf
+4  A: 

So you want to find a subset of your set of vectors such that the subset sums to a given value. In 1 dimension this is called the subset sum problem, and is NP-Complete.

Luckily you only have ~10 vectors, so your problem size is actually quite small and tractable. Start by just trying all 2^10 move combinations for each starting point and picking the best one. Then work from there looking for simple optimizations.

Some easy optimizations that might work:

  • Prioritize searching subsets including vectors pointed in the right direction.
  • Meet-in-the-middle. Use a hash table to store all points reachable using subsets of the first half of your move set, and see if you can hit any using the second half of your move set starting from the end.
  • Go backwards. You only have one endpoint, so hash all reachable start points from there then check against all possible start points.
  • Concurrency
Strilanc
2^10? You've got your math mixed up ;> -- remember that the vectors can be in differen orders, that gives us 10! = 3628800
Kornel Kisielewicz
Vector addition is commutative. The order doesn't affect where you end up.
Strilanc
@Stirlanc: How will you know if the given path takes you there in 3 or 8 moves then?
Kornel Kisielewicz
(however, if the solution set is small then it's a big improvement anyway)
Kornel Kisielewicz
You keep track of that information during the computation, of course. Instead of storing just reachable points you store tuples of reachable points and their cost.
Strilanc
You're solving a linear program essentially by brute force (or with some simple heuristics). This would get blown away by any LP solver.
klochner
@klochner: Why? LP solvers are good for really big systems. When I first used one I was amazed at how big a system it could take and spit out an answer fast. This is a really small system.
David Thornley
it's fast for small problems as well, so long as they are LPs.
klochner
+1  A: 

As Kornel states, you have at most 2^10 = 1024 reachable destinations. So you could just generate all reachable destinations in 2^N time (where N is the number of vectors) by a simple recursive generation. This is going to be fast enough, of course. However, let's assume you wanted to stretch it.

You could optimise it to O(2^(N/2+1)) time by using a meet-in-the-middle solution. You split the vector set into two subsets and generate all reachable destinations for each subset independently. You then iterate through one set of reachable destinations, and for each location find the difference between it and the target destination. If that difference vector is in the other set of reachable destinations, you have a solution: combine the two and you're done. The difficulty here is in efficiently querying if you have the required vector in the other set: this can be done in O(1) time using a hash table.

That's O(2^(N/2)) time per subset, times two subsets gives O(2^(N/2+1)). To join the two it's O(2^(N/2)) time. So that gives us O(2^(N/2+1)) time overall.

marcog