views:

1288

answers:

13

Finding routes for a car is pretty easy: you store a weighted graph of all the roads and you could use Djikstra's algorithm [1]. A bus route is less obvious. With a bus you have to represent things like "wait 10 minutes for the next bus" or "walk one block to another bus stop" and feed those into your pathfinding algorithm.

It's not even always simple for cars. In some cities, some roads are one-way-only into the city in the morning, and one-way-only out of the city in the evening. Some advanced GPSs know how to avoid busy routes during rush hour.

How would you efficiently represent this kind of time-dependent graph and find a route? There is no need for a provably optimal solution; if the traveler wanted to be on time, they would buy a car. ;-)

[1] A wonderful algorithm to mention in an example because everyone's heard of it, though A* is a likelier choice for this application.

A: 

Basically, a node in your graph should not only represent a location, but also the earliest time you can get there. You can think of it as graph exploration in the (place,time) space. Additionally, if you have (place, t1) and (place,t2) where t1<t2, discard (place,t2).

Theoretically, this will get the earliest arrival time for all possible destinations from your starting node. In practice, you need some heuristic to prune roads that take you too far away from your destination.

You also need some heuristic to consider the promising routes before the less promising ones - if a route leads away from your destination, it is less likely (but not totally unlikely) to be good.

Rafał Dowgird
Please expand the acronym 'BFS'
joeforker
Breadth-first search. For practical purposes, simple BFS is probably the wrong way to go here, but it's good fit conceptually.
David Thornley
Agreed, replaced 'BFS' with the more generic 'graph exploration'.
Rafał Dowgird
I think you certainly want routes "going the wrong way" to be considered. It's very common to go one stop out of your way to be able to pickup express trains/buses.
Tall Jeff
A: 

I dont think there is any other special data structure that would cater for these specific needs but you can still use the normal data structures like a linked list and then make route calculations per given factor-you are going to need some kind of input into your app of the variables that affect the result and then make calculations accordingly i.e depending on the input.

As for the waiting and stuff, these are factors that are associated with a particular node right? You can translate this factor into a route node for each of the branches attached to the node. For example you can say for every branch from Node X, if there is a wait for say m minutes on Node X, then scale up the weight of the branch by [m/Some base value*100]% (just an example). In this way, you have factored in the other factors uniformly but at the same time maintaining a simple representation of the problem you want to solve.

Lonzo
+8  A: 

What you're talking about is more complicated than something like the mathematical models that can be described with simple data structures like graphs and with "simple" algorithms like Djikstra's. What you are asking for is a more complex problem like those encountered in the world of automated logistics management.

One way to think about it is that you are asking a multi-dimensional problem, you need to be able to calculate:

  1. Distance optimization
  2. Time optimization
  3. Route optimization
  4. "Time horizon" optimization (if it's 5:25 and the bus only shows up at 7:00, pick another route.)

Given all of these circumstances you can attempt to do deterministic modeling using complex multi-layered data structures. For example, you could still use a weighted di-graph to represent the existing potential routes, wherein each node also contained a finite state automata which added a weight bias to a route depending on time values (so by crossing a node at 5:25 you get a different value than if your simulation crossed it at 7:00.)

However, I think that at this point you are going to find yourself with a simulation that is more and more complex, which most likely does not provide "great" approximation of optimal routes when the advice is transfered into the real world. It turns out that software and mathematical modeling and simulation is at best a weak tool when encountering real world chaotic behaviors and dynamism.

My suggestion would go to use an alternate strategy. I would attempt to use a genetic algorithm in which the DNA for an individual calculated a potential route, I would then create a fitness function which encoded costs and weights in a more "easy to maintain" lookup table fashion. Then I would let the Genetic Algorithm attempt to converge on a near optimal solution for a public transport route finder. On modern computers a GA such as this is probably going to perform reasonably well, and it should be at least relatively robust to real world dynamism.

I think that most systems that do this sort of thing take the "easy way out" and simply do something like an A* search algorithm, or something similar to a greedy costed weighted digraph walk. The thing to remember is that the users of the public transport don't themselves know what the optimal route would be, so a 90% optimal solution is still going to be a great solution for the average case.

earino
+1 A* is the obvious way to go
annakata
That's awesome. You have this amazing model of the transport system and its riders, and you can show the user "Let's have Sally, Phil, and Cuthbert try to get where they're going." The user watches the simulation, rooting for their favorite character. Sally and Phil meet on the bus and have kids.
joeforker
As an aside, the route finder for my local bus system has different algorithms to minimize your choice of travel time, walking time, or bus transfers.
Kibbee
It is demonstratably incorrect to say mathematical models cannot handle this situation: see Steven A. Lowe's solution.
Joe Soul-bringer
I hope you don't mind that I find your DNA+GA solution hilarious. +1 for entertainment, -1 for practical. Of course systems take the "easy way out", that's how they provide the result in an acceptable amount of time.
joeforker
A: 

If I was tackling this problem, I'd probably start with an annotated graph. Each node on the graph would represent every intersection in the city, whether or not the public transit system stops there - this helps account for the need to walk, etc. On intersections with transit service, you annotate these with stop labels - the labels allowing you to lookup the service schedule for the stop.

Then you have a choice to make. Do you need the best possible route, or merely a route? Are you displaying the routes in real time, or can solutions be calculated and cached?

If you need "real time" calculation, you'll probably want to go with a greedy algorithm of sorts, I think an A* algorithm would probably fit this problem domain fairly nicely.

If you need optimal solutions, you should look at dynamic programming solutions to the graph... optimal solutions will likely take much longer to calculate, but you only need to find them once, then they can be cached. Perhaps your A* algorithm could use pre-calculated optimal paths to inform its decisions about "similar" routes.

ceretullis
Would you implement an intersection.getEdges(time_of_day) method?
joeforker
@Daniel Holth: hmmm... not sure I would, though you could. If you're using the A* algorithm, then it will continually be "scoring" adjacent nodes based on how well they move us toward the solution. There is also a bit of state, you need to track whether we are currently on public transit or walking.
ceretullis
A: 

A horribly inefficient way that might work would be to store a copy of each intersection in the city for each minute of the day. A bus route from Elm St. and 2nd to Main St. and 25th would be represented as, say,

elm_st_and_2nd[12][30].edges :
 elm_st_and_1st[12][35] # 5 minute walk to the next intersection
   time = 5 minutes
   transport = foot
 main_st_and_25th[1][15] # 40 minute bus ride
   time = 40 minutes
   transport = bus
 elm_st_and_1st[12][36] # stay in one place for one minute
   time = 1 minute
   transport = stand still

Run your favorite pathfinding algorithm on this graph and pray for a good virtual memory implementation.

joeforker
I don't think you need to store a separate copy for each minute. It is sufficient to create a copy for each discovered arrival time. If you can be at Elm/2nd at 12:05, you don't need to explicitly store the fact that you can also be there at 12:06.
Rafał Dowgird
My own answer to my own question, downvoted! Nyarrgh.
joeforker
+2  A: 

Finding routes for a car is pretty easy: you store a weighted graph of all the roads and you could use Djikstra's algorithm. A bus route is less obvious.

It may be less obvious, but the reality is that it's merely another dimension to the car problem, with the addition of infinite cost calculation.

For instance, you mark the buses whose time is past as having infinite cost - they then aren't included in the calculation.

You then get to decide how to weight each aspect.

Transit Time might get weighted by 1 Waiting time might get weighted by 1 Transfers might get weighted by 0.5 (since I'd rather get there sooner and have an extra transfer)

Then you calculate all the routes in the graph using any usual cost algorithm with the addition of infinite cost:

Each time you move along an edge you have to keep track of 'current' time (add up the transit time) and if you arrive at a vector you have to assign infinite cost to any buses that are prior to your current time. The current time is incremented by the waiting time at that vector until the next bus leaves, then you're free to move along another edge and find the new cost.

In other words, there's a new constraint, "current time" which is the time of the first bus starting, summed with all the transit and waiting times of buses and stops traveled.

It complicates the algorithm only a little bit, but the algorithm is still the same. You can see that most algorithms can be applied to this, some might require multiple passes, and a few won't work because you can't add the time-->infinite cost calculation inline. But most should work just fine.

You can simplify it further by simply assuming that the buses are on a schedule, and there's ALWAYS another bus, but it increases the waiting time. Do the algorithm only adding up the transit costs, then go through the tree again and add waiting costs depending on when the next bus is coming. It will sometimes result in less efficient versions, but the total graph of even a large city is actually pretty small, so it's not really an issue. In most cases one or two routes will be the obvious winners.

Google has this, but also includes additional edges for walking from one bus stop to another so you might find a slightly more optimal route if you're willing to walk in cities with large bus systems.

Adam Davis
+1  A: 

The way I think of this problem is that ultimately you are trying to optimize your average speed from your starting point to your ending point. In particular, you don't care at all about total distance traveled if going [well] out of your way saves time. So, a basic part of the solution space is going to need to be identifying efficient routes available that cover non-trivial parts of the total distance at relatively high speeds between start and finish.

To your original point, the typical automotive route algorithms used by GPS navigation units to make the trip by car is a good bound for a target optimal total time and optimal route evaluations. In other words, your bus based trip would be doing really good to approach a car based solution. Clearly, the bus route based system is going to have many more constraints than the car based solutions, but having the car solution as a reference (time and distance) gives the bus algorithm a framework to optimize against*. So, put loosely, you want to morph the car solution towards the set of possible bus solutions in an iterative fashion or perhaps more likely take possible bus solutions and score them against your car based solution to know if you are doing "good" or not.

Making this somewhat more concrete, for a specific departure time there are only going to be a limited number of buses available within any reasonable period of time that can cover a significant percentage of your total distance. Based on the straight automotive analysis reasonable period of time and significant percentage of distance become quantifiable using some mildly subjective metrics. Certainly, it becomes easier to score each possibility relative to the other in a more absolute sense.

Once you have a set of possible major segment(s) available as possible answers within the solution, you then need to hook them together with other possible walking and waiting paths....or if sufficiently far apart recursive selection of additional short bus runs. Intuitively, it doesn't seem that there is really going to be a prohibitive set of choices here because of the Constraints Paradox (see footnote below). Even if you can't brute force all possible combinations from there, what remains should be able to be optimized using a simulated annealing (SA) type algorithm. A Monte Carlo method would be another option.

The way we've broken the problem down to this point leaves us something that is quite analogous to how SA algorithms are applied to the automated layout and routing of ASIC chips, FPGA's and also the placement and routing of printed circuit boards of which there is quite a bit of published work on optimizing that type of problem form.

* Note: I usually refer to this as "The Constraints Paradox" - my term. While people can naturally think of more constrained problems as harder to solve, the constraints reduce choices and less choices means easier to brute force. When you can brute force, then even the optimal solution is available.

Tall Jeff
A: 

You're answering the question yourself. Using A* or Dijkstra's algorithm, all you need to do is decide on a good cost per part of each route.

For the bus route, you're implying that you don't want the shortest, but the fastest route. The cost of each part of the route must therefore include the average travel speed of a bus in that part, and any waits at bus stops.

The algorithm for finding the most suitable route is then still the same as before. With A*, all the magic happens in the cost function...

+3  A: 

if the cost of each leg of the trip is measured in time, then the only complication is factoring in the schedule - which just changes the cost at each node to a function of the current time t, where t is just the total trip time so far (assuming schedules are normalized to start at t=0).

so instead of Node A having a cost of 10 minutes, it has a cost of f(t) defined as:

  • t1 = nextScheduledStop(t); //to get the next stop time at or after time t
  • baseTime for leg = 10 //for example, a 10-minute trip
  • return (t1-t)+baseTime

wait-time is thus included dynamically in the cost of each leg, and walks between bus stops are just arcs with a constant time cost

with this representation you should be able to apply A* or Dijkstra's algorithm directly

Steven A. Lowe
A: 

You need to weight the legs differently. For example - on a rainy day I think someone might prefer to travel longer in a vehicle than walk in the rain. Additionally, someone who detests walking or is unable to walk might make a different/longer trip than someone who would not mind walking.

These edges are costs, but I think you can expand the notion/concept of costs and they can have different relative values.

Tim
A: 

The algorithm remains the same, you just increase the weight of each graph edge according to different scenarios (Bus schedules etc).

I put together a subway route finder as an exercise in graph path finding some time ago:

http://gfilter.net/code/pathfinderDemo.aspx

FlySwat
+3  A: 

Some data points to be aware of from the public transportation arena:

  1. Each transfer incurs a 10 minute penalty (unless it is a timed transfer) in the riders mind. That is to say mentally a trip involving a single bus that takes 40 minutes is roughly equivalent to a 30minute trip that requires a transfer.
  2. Maximum distance that most people are willing to walk to a bus stop is 1/4 mile. Train station / Light rail about 1/2 mile.
  3. Distance is irrelevant to the public transportation rider. (Only time is important)
  4. Frequency matters (if a connection is missed how long until the next bus). Riders will prefer more frequent service options if the alternative is being stranded for an hour for the next express.
  5. Rail has a higher preference than bus ( more confidence that the train will come and be going in the right direction)
  6. Having to pay a new fare is a big hit. (add about a 15-20min penalty)
  7. Total trip time matters as well (with above penalties)
  8. How seamless is the connect? Does the rider have to exist a train station cross a busy street? Or is it just step off a train and walk 4 steps to a bus?
  9. Crossing busy streets -- another big penalty on transfers -- may miss connection because can't get across street fast enough.
Pat
Yes, I agree (but the values may be discussed). I remember that we had different weights for transport with rail (1.0 x travel time) respectively bus (1.1 x travel time) in the system we built.To calculate the cost of transfer we a had table with nodes (see my answer) with arrival and departure time and error margin. The margin is larger for bus than train and increased over time. A transfer is possible when arrival+margin+linktime<departure-margin. The cost was calculated as (departure-arrival)x2. Link time between stops includes time for walking and penalty for passing fare control.
Fredrik Haglund
+8  A: 

I have been involved in development of one journy planner system for Stockholm Public Transportation in Sweden. It was based on Djikstra's algorithm but with termination before every node was visited in the system. Today when there are reliable coordinates available for each stop, I guess the A* algorithm would be the choise.

Data about upcoming trafic was extracted from several databases regularly and compiled into large tables loaded into memory of our search server cluster.

One key to a sucessfull algorith was using a path cost function based on travel and waiting time multiplied by diffrent weightes. Known in Swedish as “kresu"-time these weighted times reflect the fact that, for example, one minute’s waiting time is typically equivalent in “inconvenience” to two minutes of travelling time.

KRESU Weight table

  • x1 - Travel time
  • x2 - Walking between stops
  • x2 - Waiting at a stop during the journey. Stops under roof, with shops, etc can get a slightly lower weight and crowded stations a higher to tune the algorithm.
  • The weight for the waiting time at the first stop is a function of trafic intensity and can be between 0.5 to 3.

Data structure

Area A named area where you journey can start or end. A Bus Stop could be an area with two Stops. A larger Station with several platforms could be one area with one stop for each platform. Data: Name, Stops in area

Stops An array with all bus stops, train and underground stations. Note that you usually need two stops, one for each direction, because it takes some time to cross the street or walk to the other platform. Data: Name, Links, Nodes

Links A list with other stops you can reach by walking from this stop. Data: Other Stop, Time to walk to other Stop

Lines/Tours You have a number on the bus and a destination. The bus starts at one stop and passes several stops on its way to the destination. Data: Number, Name, Destination

Nodes Usually you have a timetable with the least the time for when it should be at the first and last stop in a Tour. Each time a bus/train passes a stop you add a new node to the array. This table can have millions of values per day. Data: Line/Tour, Stop, Arrival Time, Departure Time, Error margin, Next Node in Tour

Search Array with the same size as the Nodes array used to store how you got there and the path cost. Data: Back-link with Previous Node (not set if Node is unvisited), Path Cost (infinit for unvisited)

Fredrik Haglund
That is very nice, I hadn't thought of not including the coordinates for the route, it makes things much simpler.
joeforker
Yes, creating a heuristic function with the distance from current node to destination simplifies optimization a lot since A* search algoritm automatically will truncate the tree for you. Without coordinates we had to be very creative using zones and analyzing trafic to optimize the search.
Fredrik Haglund
Hopefully (distance from destination * average speed) would be a good enough metric for A*, but that might not always work for public transit where the rider doesn't care about distance traveled.
joeforker
This is a wonderful answer. Thank you.
joeforker