views:

1156

answers:

7

I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.

I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.

+3  A: 

Two cameras so you can detect parallax. It's what humans do.

edit

Please see ravenspoint's answer for more detail. Also, keep in mind that a single camera with a splitter would probably suffice.

Steven Sudit
When the camera is moving, you can get "two views" by comparing two successive frames (frames taken from a slightly different position): http://stackoverflow.com/questions/2135116/how-can-i-determine-distance-from-an-object-in-a-video/2135469#2135469
Robert Cartaino
@Robert: don't you have to know the location of the two different positions?
John Saunders
@Jon Saunders - Sure. Two camera views gives you a *relative* size and distance between the objects. To bring the calculations further, you need to know your speed, frame rate (and possibly the angle of the camera). That gives you the distance between your views.
Robert Cartaino
A: 

Someone please correct me if I'm wrong, but it seems to me that if you're going to simply use a single camera and simply relying on a software solution, any processing you might do would be prone to false positives. I highly doubt that there is any processing that could tell the difference between objects that really are at the perceived distance and those which only appear to be at that distance (like the "forced perspective") in movies.

Any chance you could add an ultrasonic sensor?

Pontiac6000fan
Unless the Scene is compleely homogenous (think driving in a completely white arctic landscape) then it is possible to get adisplacement map of each pixel and from there get a distance.
kigurai
A: 

Put and object of known size in the cameras field of view. That way you can have a more objective metric to measure angular distances. Without a second viewpoint/camera you'll be limited to estimating size/distance but at least it won't be a complete guess.

Kelly French
+2  A: 

You need to identify the same points in the same object on two different frames taken a known distance apart. Since you know the location of the camera in each frame, you have a baseline ( the vector between the two camera positions. Construct a triangle from the known baseline and the angles to the identified points. Trigonometry gives you the length of the unknown sides of the traingles for the known length of the baseline and the known angles between the baseline and the unknown sides.

You can use two cameras, or one camera taking successive shots. So, if your vehicle is moving a 1 m/s and you take fames every second, then successibe frames will gibe you a 1m baseline which should be good to measure the distance of objects up to, say, 5m away. If you need to range objects further away than the frames used need to be further apart - however more distant objects will in view for longer.

Observer at F1 sees target at T with angle a1 to velocity vector. Observer moves distance b to F2. Sees target at T with angle a2.

Required to find r1, range from target at F1

The trigonometric identity for cosine gives

Cos( 90 – a1 ) = x / r1 = c1

Cos( 90 - a2 ) = x / r2 = c2

Cos( a1 ) = (b + z) / r1 = c3

Cos( a2 ) = z / r2 = c4

x is distance to target orthogonal to observer’s velocity vector

z is distance from F2 to intersection with x

Solving for r1

r1 = b / ( c3 – c1 . c4 / c2 )

ravenspoint
The range I'm looking at is much greater, possibly on the order of kilometers. Also, the objects I'm looking at could be moving. This sounds like it would work great for short distances and stationary objects, but unfortunately I don't think it'll work in this situation. +1 anyway :)
Ryan R.
Distant objects require a longer baseline, so use frames further apart for objects that calculate to be at "infinity"For moving objects, use two pairs of frames. The difference in calculated location between the two pairs, minus the difference in your location, gives movement of observed object.
ravenspoint
+7  A: 

When you have moving video, you can use temporal parallax to determine the relative distance of objects. Parallax: (definition).

The effect would be the same we get with our eyes which which can gain depth perception by looking at the same object from slightly different angles. Since you are moving, you can use two successive video frames to get your slightly different angle.

Using parallax calculations, you can determine the relative size and distance of objects (relative to one another). But, if you want the absolute size and distance, you will need a known point of reference.

You will also need to know the speed and direction being traveled (as well as the video frame rate) in order to do the calculations. You might be able to derive the speed of the vehicle using the visual data but that adds another dimension of complexity.

The technology already exists. Satellites determine topographic prominence (height) by comparing multiple images taken over a short period of time. We use parallax to determine the distance of stars by taking photos of night sky at different points in earth's orbit around the sun. I was able to create 3-D images out of an airplane window by taking two photographs within short succession.

The exact technology and calculations (even if I knew them off the top of my head) are way outside the scope of discussing here. If I can find a decent reference, I will post it here.

Robert Cartaino
I thought of that, but I had a serious concern, which is that this would only work if the items didn't move much between frames. This is a reasonable assumption if you're looking at a landscape from a plane, but a bad one when dealing with other vehicles.
Steven Sudit
Moving objects make it *way* more complicated. It could be done but this project already sounds *way* outside the scope of most programmers.
Robert Cartaino
I suppose you're right about it being possible in principle, but you're also right about the scope. Frankly, even spatial parallax doesn't sound all that easy to pull off in practice. Pontiac6000fan's suggestion about using a distance sensor (ultrasonic or radar or whatever) is starting to sound good to me. I'll go toss them an upvote.
Steven Sudit
If the camera is calibrated, I think getting the actual distance should be possible, not just a relative distance.
kigurai
I know both the exact speed and direction of the vehicle to which the camera is attached. It is probable that the other vehicles would be moving so based on your solution if there was a stationary landmark (e.g., a building) then I could perform the parallax calculations. Unfortunately there is no guarantee that there would be a distinguishable landmark at any given point. Thanks for the excellent answer! Parallax sounds like a very interesting concept and I might play around with it outside of this project just for fun. :)edit:Also, correct me if I'm wrong on any of this.
Ryan R.
Robert, please take a look at my answer, as there was a question posed to you there by John Saunders.
Steven Sudit
A: 

use stereo disparity maps. lots of implementations are afloat, here are some links: http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html

http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf

In you case you don't have stereo camera, but depth can be evaluated using video http://www.springerlink.com/content/g0n11713444148l2/

I think the above will be what might help you the most.

research has progressed so far that depth can be evaluated ( though not to a satisfactory extend) from a single monocular image http://www.cs.cornell.edu/~asaxena/learningdepth/

Egon
I think he has a monocular sequence and the paper you cited will not give the *actual* depth, only up to a scale factor
Jacob
you are right. but you can always use it to point to something on the vehicle itself, which is a known distance away; so you have the scaling factor!
Egon
+13  A: 

Your problem's quite standard in the field.

Firstly,

you need to calibrate your camera. This can be done offline (makes life much simpler) or online through self-calibration.

Calibrate it offline - please.

Secondly,

Once you have the calibration matrix of the camera K, determine the projection matrix of the camera in a successive scene (you need to use parallax as mentioned by others). This is described well in this OpenCV tutorial.

You'll have to use the GPS information to find the relative orientation between the cameras in the successive scenes (that might be problematic due to noise inherent in most GPS units), i.e. the R and t mentioned in the tutorial or the rotation and translation between the two cameras.

Once you've resolved all that, you'll have two projection matrices --- representations of the cameras at those successive scenes. Using one of these so-called camera matrices, you can "project" a 3D point M on the scene to the 2D image of the camera on to pixel coordinate m (as in the tutorial).

We will use this to triangulate the real 3D point from 2D points found in your video.

Thirdly,

use an interest point detector to track the same point in your video which lies on the object of interest. There are several detectors available, I recommend SURF since you have OpenCV which also has several other detectors like Shi-Tomasi corners, Harris, etc.

Fourthly,

Once you've tracked points of your object across the sequence and obtained the corresponding 2D pixel coordinates you must triangulate for the best fitting 3D point given your projection matrix and 2D points. Triangulation

The above image nicely captures the uncertainty and how a best fitting 3D point is computed. Of course in your case, the cameras are probably in front of each other!

Finally,

Once you've obtained the 3D points on the object, you can easily compute the Euclidean distance between the camera center (which is the origin in most cases) and the point.

Note

This is obviously not easy stuff but it's not that hard either. I recommend Hartley and Zisserman's excellent book Multiple View Geometry which has described everything above in explicit detail with MATLAB code to boot.

Have fun and keep asking questions!

Jacob
+1 for being the only good answer. Mentioning MVG/Zisserman is almost worth an upvote in itself.
kigurai
+1 definitely a good answer
Amro
Jacob
Hell, any decent vision forum would do ...
Jacob
@Jacob +1 for computervisionoverflow
overrider
http://area51.stackexchange.com/proposals/11036/computer-vision
Jacob