Let me say this up front: this problem is hard. There is a reason Dan Story's linked question has not been answered. Let provide an explanation for people who want to take a stab at it. I hope I'm wrong about how hard it is, though.
I will assume that the 2D screen coordinates and projection/perspective matrix is known to you. You need to know at least this much (if you don't know the projection matrix, essentially you are using a different camera to look at the world). Let's call each pair of 2D screen coordinates (a_i, b_i)
, and I will assume the projection matrix is of the form
P = [ px 0 0 0 ]
[ 0 py 0 0 ]
[ 0 0 pz pw]
[ 0 0 s 0 ], s = +/-1
Almost any reasonable projection has this form. Working through the rendering pipeline, you find that
a_i = px x_i / (s z_i)
b_i = py y_i / (s z_i)
where (x_i, y_i, z_i)
are the original 3D coordinates of the point.
Now, let's assume you know your shape in a set of canonical coordinates (whatever you want), so that the vertices is (x0_i, y0_i, z0_i)
. We can arrange these as columns of a matrix C
. The actual coordinates of the shape are a rigid transformation of these coordinates. Let's similarly organize the actual coordinates as columns of a matrix V
. Then these are related by
V = R C + v 1^T (*)
where 1^T
is a row vector of ones with the right length, R
is an orthogonal rotation matrix of the rigid transformation, and v
is the offset vector of the transformation.
Now, you have an expression for each column of V
from above: the first column is { s a_1 z_1 / px, s b_1 z_1 / py, z_1 }
and so on.
You must solve the set of equations (*)
for the set of scalars z_i
, and the rigid transformation defined R
and v
.
Difficulties
- The equation is nonlinear in the unknowns, involving quotients of
R
and z_i
- We have assumed up to now that you know which 2D coordinates correspond to which vertices of the original shape (if your shape is a square, this is slightly less of a problem).
- We assume there is even a solution at all; if there are errors in the 2D data, then it's hard to say how well equation
(*)
will be satisfied; the transformation will be nonrigid or nonlinear.