Calculating trip travel times using available geo APIs for 5k+ addresses

views:

169

answers:

+4 Q:

Calculating trip travel times using available geo APIs for 5k+ addresses

I'm working on a transportation model, and am about to do a travel time matrix between 5,000 points. Is there a free, semi-reliable way to calculate the travel times between all my nodes?

I think google maps has a limit on the number of queries / hits I can achieve.

EDIT

I'd like to use an api such as google maps or similar ones as they include data such as road directions, number of lanes, posted speed, type of road, etc ...

EDIT 2

Please be advised that openstreet map data is incomplete and not available for all jurisdictions outside the US

+1 A:

As that's 12,502,500 total connections, I'm pretty sure you'll hit some sort of limit if you attempt to use Google maps for all of them. How accurate of results do you need/how far are you travelling?

I might try to generate a crude map with travel speeds on it (e.g. mark off interstates as fast, yadda yadda) then use some software to calculate how long it would take from point to point. One could visualize it as an electromagnetic fields problem, where you're trying to calculate the resistance from point to point over a plane with varying resistance (interstates are wires, lakes are open circuits...).

Nick T 2010-07-13 19:11:42

It's more the data available on sites such as google maps that makes it desirable than the method itself. I have access to gis software and ways to do the analysis; however, road directionality, lengths, travel times, speeds, yada yada, is degrees more detailed than any data I have.

dassouki 2010-07-13 19:13:35

Why 12,502,500 total connections ?

gauteh 2010-07-21 13:05:13

@gauteh - I think it's twice that well 5,000 ^ 2 since the trip time to a destination might be different than the way back

dassouki 2010-07-21 14:09:28

if it's different time going back its just: 5000^2 - an ordered selection - the 12,502,500 is for sum (range (0, 5000)) (python) and i think it should be: 12,497,500 .. but i guess it doesn't really matter so much for this case :)

gauteh 2010-07-22 06:51:04

@gauteh Actually 4999*5000 :P My first answer used `sum(range(5001))`, but I guess time from A to A is somewhat irrelevant.

Nick T 2010-07-23 15:49:00

@Nick T - Transportation wise - there is a time (A,A) called intra zonal travel time. there is a few ways to calculate this time, but that's out of the scope of the question

dassouki 2010-07-25 00:10:04

+6 A:

Google Directions API restricts you to 2500 calls per day. Additionally, terms of service stipulate that you must only use the service "in conjunction with displaying the results on a Google map".

You may be interested in OpenTripPlanner, an in-development project which can do multi-modal routing, and Graphserver on which OpenTripPlanner is built.

One approach would be to use OpenStreetMap data with Graphserver to generate Shortest Path Trees from each node.

tcarobruce 2010-07-13 19:17:44

OpenStreet data for my jurisdiction is extremely lacking. I have a semi working pgrouting implementation working; however, I do not have road directionality and most of the speed data

dassouki 2010-07-13 19:19:30

And thanks for all the amazing links

dassouki 2010-07-13 19:20:15

Many GIS software packages have routing algorithms, if you have the data... Transportation data can be fairly spendy.

There are some other choices of sources for planning routes. Is this something to be done repeatedly, or a one-time process? Can this be broken up into smaller sub-sets of points? Perhaps you can use multiple routing sources and break up the data points into segments small enough for each routing engine.

Here are some other choices from quick Google search: Wikipedia Route66 Truck Miles

Ruz 2010-07-26 16:21:24

+1 A:

If you really need all these routes accurately calculated and stored in your database, it sounds like (and I would believe) that you are going to have to spend the money to obtain this. As you can imagine, this is expensive to develop and there should be renumeration.

I would, however, probe a bit about your problem:

Do you really need all 5000! distances in a database? What if you asked google for them as you needed them, and then cached them (if allowed). I've had web applications like this that because of the slow traffic ramp-up pattern, I was able to leverage free services early on to vet the idea.
Do you really need all 5000 points? Or could you pick the top 100 and have a more tractable problem?
Perhaps there is some hybrid where you store distances between big cities and do more estimates for shorter distances.

Again, I really don't know what your problem is, but maybe thinking a bit outside the box will help you find an easier solution.

ndp 2010-07-27 18:31:16

I'm trying to build a travel demand model for transportation, where we require the travel time between all nodes of a system

dassouki 2010-07-27 19:59:48

+1 A:

You might have to go for some heuristics here. Maybe you can estimate travel time based on a few factors like geometric distance and some features about the start and end points (urban vs rural areas, country, ...). You could get a few distances, try to fit your parameters on a subset of them and see how well you're able to predict the other ones. My prediction would be, for example, that travel times approach linear dependence from distance as distance grows larger, in many cases.

I know it's messy, but hey you're trying to estimate 12.5mio datapoints (or whatever the amount :)

You might also be able to incrementally add knowledge from already-retrieved "real" travel times by finding close points to the ones you're looking for:

get closest points StartApprox, EndApprox to starting and end position such that you have a travel time between StartApprox and EndApprox
compute distances StartError, EndError between start and StartApprox, end and EndApprox
if StartError+EndError>Distance(StartApprox, EndApprox) * 0.10 (or whatever your threshold) -> compute distance via API (and store it), else use known travel time plus overhead time based on StartError+EndError

(if you have 100 addresses in NY and 100 in SF, all the values are going to be more or less the same (ie the difference between them is probably lower than the uncertainty involved in these predictions) and such an approach would keep you from issuing 10000 queries where 1 would do)

Nicolas78 2010-07-28 10:38:14

ansaurus

tags:

views:

answers:

Calculating trip travel times using available geo APIs for 5k+ addresses

related questions