views:

340

answers:

2

I have a situation as shown in image, where I need to determine x,y of red and blue square in rectangular space but under perspective. Length and width of the rectangle are know. The rectangles are the same just viewed from different locations. Difference is that one is viewed with 90 degrees offset to the right of the rectangle.

Thanks... alt text

A: 

Do you have basic linear algebra knowlege? If yes, than it's easy.

1) You need the projection matrix of both projections (call them P1 and P2 that are 3x2 matrices) 2) You have the equations : t(x1,y1) = P1 t(x,y,z) and t(x2,y2) = P2 t(x,y,z) (where t is the transposition of the vector) 3) You get a system of 3 unknows and 4 equations

Maybe you don't know the projection matrices, then you have first to find them out.

You can most likely make up something fancier (just one 3 by 4 matrix that should have an pseudo-inverse on its left).

If you don't know anything about linear algebra, well... just ask and I'll develop.

PS: sorry if the math stuff is in bad english

Tristram Gräbener
I would be very grateful for and help, at least some psudocode with math explained.
NccWarp9
given the 3D coordinates, do you know how to draw the two images?or the only thing you know about the gray plane is its size?
Tristram Gräbener
only the size of the plane is know, center of two images is used as measure from/to so the actual size of them is irrelevant.
NccWarp9
Oh! Then things get complicated... You need to find out where the camera is located. Sorry, I guess you'll have to study a bit more the whole projection problems...
Tristram Gräbener
Camera location is known. lets say its offset from the border (for both cameras) is 20 and 20 in the air with focus on the middle of the plane.
NccWarp9
I don't think 3x2 matrices are enough here; you can only do an affine transformation with them, not a general projective transformation (which is needed for a camera). Actually, I recommend looking through a book on 3D computer graphics for the basic math (sorry, I don't know of a good online tutorial).
comingstorm
You're right. I was a bit quick on it.However, if you consider that the camera is centered on (0,0,0), then it works ;) Homogenous coordinates are needed to model the translation (projection and rotation are no problem).But it's quite a strong supposition that works here, but not on general computer graphics (where the camera moves in space)
Tristram Gräbener
A: 

Note that the following is a general solution. I think I've worked the math out right; anybody who knows better should comment (or edit, if you're sure...)

First, you need to calibrate your cameras: where they are, where they're pointing, and what their field of view is. This will come as (or else needs to be reduced to) a projection matrix for each camera, which transforms homogeneous worldspace points into homogeneous view points. If you don't know this a priori, you might be able to figure it out from known common features (say, the projected gray rectangles in your diagram)

Then, you need to identify your objects and their position on each image. Say, find the center of the red and blue rectangles.

This will give you a linear system for each object or feature:

P=[x,y,z,w]^t = world point (homog. 4-vector)
p1=[x1,y1,w1]^t = point on screen 1;  M1= 3x4 projection matrix 1:  p1=M1*P
p2=[x2,y2,w2]^t = point on screen 2;  M2= 3x4 projection matrix 2:  p2=M2*P

Your known data are: p1/w1=(u1,v1,1) and p2/w2=(u2,v2,1); multiply these out by variables w1 and w2, to get:

(u1,v1) are constant  ->  p1=[u1*w1,v1*w1,w1]^t
(u2,v2) are constant  ->  p2=[u2*w2,v2*w2,w2]^t
assume that w=1       ->  P=[x,y,z,1]^t

Finally, you have a system of 6 equations in 5 variables: (x,y,z,w1,w2). Since this system is overdetermined, you can solve it using least squares methods.

One way to understand the overdetermined bit: given a pair of cameras (as described by their matrices), you expect them to show the scene consistently. If one of the cameras is misaligned (i.e., if the camera matrix does not reflect the actual camera perfectly), it may show objects in a location higher or lower than it ought to (say), so that the result of that camera is inconsistent with the other one.

Since you are likely using floating point (and possibly even real-world data), your values will never be perfectly accurate anyway, so you will always need to deal with this problem. Least squares allows you to solve this kind of overdetermined system; it also provides error values that may help in diagnosing and solving data problems.

comingstorm
what is w, u and v ? I have no background in graphics programing.
NccWarp9
(u,v) is the actual point on the "screen". The "w" coordinates are extra coordinates that make the system homogeneous. Basically, they're a scaling factor: after the transformation, you divide the homogeneous vector by its "w" coordinate to get your (u,v) coordinates. (this is what allows you to do the 3D camera thing at all -- you need to divide by the distance from the camera)
comingstorm
Er, clarifying to avoid possible confusion: you don't divide by the exact literal distance from the point to the camera -- you divide by a value proportional to the "depth", which is the distance along the central axis of the camera. Actually, if you have no background in graphics programming, you should probably read up on it. I recommend Foley and Van Dam's _Computer Graphics: Principles and Practice_ -- an ancient but still useful text.
comingstorm