tags:

views:

1157

answers:

4

Hey,

I am trying to get the 2D screen coordinates of a point in 3D space, i.e. I know the location of the camera its pan, tilt and roll and I have the 3D x,y,z coordinates of a point I wish to project.

I am having difficulty understanding transformation/projection matrices and I was hoping some intelligent people here could help me along ;)

Here is my test code I have thrown together thus far:

public class TransformTest {

public static void main(String[] args) {

    // set up a world point (Point to Project)
    double[] wp = {100, 100, 1};
    // set up the projection centre (Camera Location)
    double[] pc = {90, 90, 1};

    double roll = 0;
    double tilt = 0;
    double pan = 0;

    // translate the point
    vSub(wp, pc, wp);

    // create roll matrix
    double[][] rollMat = {
            {1, 0, 0},
            {0, Math.cos(roll), -Math.sin(roll)},
            {0, Math.sin(roll), Math.cos(roll)},
    };
    // create tilt matrix
    double[][] tiltMat = {
            {Math.cos(tilt), 0, Math.sin(tilt)},
            {0, 1, 0},
            {-Math.sin(tilt), 0, Math.cos(tilt)},
    };
    // create pan matrix
    double[][] panMat = {
            {Math.cos(pan), -Math.sin(pan), 0},
            {Math.sin(pan), Math.cos(pan), 0},
            {0, 0, 1},
    };

    // roll it
    mvMul(rollMat, wp, wp);
    // tilt it
    mvMul(tiltMat, wp, wp);
    // pan it
    mvMul(panMat, wp, wp);

}

public static void vAdd(double[] a, double[] b, double[] c) {
    for (int i=0; i<a.length; i++) {
        c[i] = a[i] + b[i];
    }
}

public static void vSub(double[] a, double[] b, double[] c) {
    for (int i=0; i<a.length; i++) {
        c[i] = a[i] - b[i];
    }      
}

public static void mvMul(double[][] m, double[] v, double[] w) {

    // How to multiply matrices?
} }

Basically, what I need is to get the 2D XY coordinates for a given screen where the 3D point intersects. I am not sure how to use the roll, tilt and pan matrices to transform the world point (wp).

Any help with this is greatly appreciated!

+3  A: 

The scope of this is way too large to get a good answer here: I'd recommend reading a good reference on the topic. I've always liked the Foley and VanDam...

McWafflestix
Is there a lot more to it than just applying the pan, tilt and roll matrices to the world point?I was given the impression this was a relatively straightforward procedure, what more is needed to achieve what I need?Thanks
It is in fact a relatively straightforward procedure, but there's a lot of underlying theory that I would really recommend reading up on in the process of doing this. I learned the hard way, and I can only say that my experience showed me that it's really important to pick up on the underlying theory.
McWafflestix
+13  A: 

This is complicated stuff. Please read a book about this topic to get all the math and nitty gritty details. If you plan on playing with this stuff at length, you need to know these things. This answer is just so you can get your feet wet and hack around.

Multiplying matrices

First things first. Multiplying matrices is a reasonably simple affair.

Let's say you have matrices A, B, and C, where AB = C. Let's say you want to figure out the value of matrix C at row 3, column 2.

  • Take the third row of A and the second column of B. You should have the same number of values from A and B now. (If you don't matrix multiplication isn't defined for those two matrices. You can't do it.) If both are 4×4 matrices, you should have 4 values from A (row 3) and 4 values from B (column 2).
  • Multiply each value of A with each value of B. You should end up with 4 new values.
  • Add these values.

You now have the value of matrix C at row 3, column 2. The challenge is, of course, to do this programmatically.

/* AB = C

Row-major ordering
a[0][0] a[0][2] a[0][3]...
a[1][0] a[1][4] ...
a[2][0] ...
...*/
public static mmMul(double[][] a, double[][] b, double[][] c) {
    c_height = b.length; // Height of b
    c_width = a[0].length; // Width of a
    common_side = a.length; // Height of a, width of b

    for (int i = 0; i < c_height; i++) {
        for (int j = 0; j < c_width; j++) {
            // Ready to calculate value of c[i][j]
            c[i][j] = 0;

            // Iterate through ith row of a, jth col of b in lockstep
            for (int k = 0; k < common_side; k++) {
                c[i][j] += a[i][k] * b[k][j];
            }
        }
    }
}


Homogenous coordinates

You have 3D coordinates. Let's say you have (5, 2, 1). These are Cartesian coordinates. Let's call them (x, y, z).

Homogenous coordinates mean that you write an extra 1 at the end of your Cartesian coordinates. (5, 2, 1) becomes (5, 2, 1, 1). Let's call them (x, y, z, w).

Whenever you do a transformation that makes w ≠ 1, you divide every component of your coordinates by w. This changes your x, y, and z, and it makes w = 1 again. (There is no harm in doing this even when your transformation doesn't change w. It just divides everything by 1, which does nothing.)

There is some majorly cool stuff you can do with homogenous coordinates, even if the math behind them doesn't make total sense. It is at this point that I ask you to look again at the advice at the top of this answer.


Transforming a point

I'll be using OpenGL terminology and approaches in this and following sections. If anything is unclear or seems to conflict with your goals (because this seems vaguely homework-like to me :P), please leave a comment.

I'll also start by assuming that your roll, tilt, and pan matrices are correct.

When you want to transform a point using a transformation matrix, you right-multiply that matrix with a column vector representing your point. Say you want to translate (5, 2, 1) by some transformation matrix A. You first define v = [5, 2, 1, 1]T. (I write [x, y, z, w]T with the little T to mean that you should write it as a column vector.)

// Your point in 3D
double v[4][5] = {{5}, {2}, {1}, {1}}

In this case, Av = v1, where v1 is your transformed point. Do this multiplication like a matrix multiplication, where A is 4×4 and v is 4×1. You will end up with a 4×1 matrix (which is another column vector).

// Transforming a single point with a roll
double v_1[4][6];
mmMul(rollMat, v, v_1);

Now, if you have several transformation matrices to apply, first combine them into one transformation matrix. Do this by multiplying the matrices together in the order that you want them applied.

Programmatically, you should start with the identity matrix and right-multiply each transformation matrix. Let I4 be 4×4 identity matrix, and let A1, A2, A3, ... be your transformation matrices. Let your final transformation matrix be Afinal

AfinalI4
AfinalAfinal A1
AfinalAfinal A2
AfinalAfinal A3

Note that I'm using that arrow to represent assignment. When you implement this, make sure not to overwrite Afinal while you're still using it in the matrix multiplication calculation! Make a copy.

// A composite transformation matrix (roll, then tilt)

double a_final[4][4] =
{
    {1, 0, 0, 0},
    {0, 1, 0, 0},
    {0, 0, 1, 0},
    {0, 0, 0, 1}
}; // the 4 x 4 identity matrix

double a_final_copy[4][4];
mCopy(a_final, a_final_copy); // make a copy of a_final
mmMul(rollMat, a_final_copy, a_final);
mCopy(a_final, a_final_copy); // update the copy
mmMul(tiltMat, a_final_copy, a_final);

Finally, do the same multiplication as above: Afinal v = v1

// Use the above matrix to transform v
mmMul(a_final, v, v_1);


From start to finish

Camera transformations should be represented as a view matrix. Perform your Aview v = v1 operation here. (v represents your world coordinates as a 4×1 column vector, Afinal is your Aview.)

// World coordinates to eye coordinates
// A_view is a_final from above
mmMult(a_view, v_world, v_view);

Projection transformations describe a perspective transform. This is what makes nearer objects bigger and farther objects smaller. This is performed after the camera transformation. If you don't want perspective yet, just use the identity matrix for the projection matrix. Anyway, perform A v1 = v2 here.

// Eye coordinates to clip coordinates
// If you don't care about perspective, SKIP THIS STEP
mmMult(a_projection, v_view, v_eye);

Next, you need to do a perspective divide. This delves deeper into homogenous coordinates, which I haven't described yet. Anyway, divide every component of v2 by the last component of v2. If v2 = [x, y, z, w]T, then divide each component by w (including w itself). You should end up with w = 1. (If your projection matrix is the identity matrix, like I described earlier, this step should do nothing.)

// Clip coordinates to normalized device coordinates
// If you skipped the previous step, SKIP THIS STEP
for (int i = 0; i < 4; i++) {
    v_ndc[i] = v_eye[i] / v[3];
}

Finally, take your v2. The first two coordinates are your x and y coordinates. The third is z, which you can throw away. (Later, once you get very advanced, you can use this z value to figure out which point is in front of or behind some other point.) And at this point, the last component is w = 1, so you don't need that at all anymore.

x = v_ndc[0]
y = v_ndc[1]
z = v_ndc[2]  // unused; your screen is 2D

If you skipped the perspective and perspective divide steps, use v_view instead of v_ndc above.

This is very similar to the set of OpenGL coordinate systems. The difference is that you start with world coordinates, while OpenGL starts with object coordinates. The difference is as follows:

  • You start with world coordinates
    • OpenGL starts with object coordinates
  • You use the view matrix to transform world coordinates to eye coordinates
    • OpenGL uses the ModelView matrix to transform object coordinates to eye coordinates

From there on, everything is the same.

Wesley
Thanks, that makes more sense now. Although how would you apply the roll transformation matrix for example which has three rows and three columns to the world point which has one row and three columns? Does this need special treatment?
Are you familiar with homogenous coordinates? 4×4 matrices and 4-vectors representing 3D transformations?
Wesley
No I am completely new to this particular area.
It's a good answer; but thank you also for pointing out that for this domain, "reasonably simple" is not exactly an overlap with the normal definition of "simple" (and I'm not trying to denigrate your answer in any way; simply pointing out that the length alone makes this less than completely "simple").
McWafflestix
Thanks for your detailed reply. Its a lot to take in for a complete beginner such as myself. This is not actually homework, rather a small element within a much larger system I am working on so I am hoping to find a quick solution and move on.However, I am finding understanding your description difficult as I have little knowledge of matrices. Ideally, I would spend time reading up on them and i'm sure it would appear much clearer. I don't suppose, when you have time, you could explain it in a pseudo code form? I find understanding things much easier if presented in a code like fashion!
Yes, this answer is far, far too complicated for getting a simple system working. And yes, it reads like a math textbook. Unfortunately, trying to explain it in detail creates a muddled mess. I'll try bulletizing some things and adding a little pseudocode. Which part most badly explodes your brain at this point?
Wesley
The main areas which confuse me are how to apply the three pan, tilt and roll matrices to a world point. You mention "right multiplying" and multiplying all three of them together. How does one do this?If this part could be explained in pseudo code, I think it would become a lot clearer.Again, thanks for your help with this...
If it makes any difference I am trying to implement a system similar to the one linked to in the PDF document of my previous post. Given a GPS coordinate and a photograph of that coordinate, I want to find the 2D X,Y pixel of the photograph where the GPS point lies...
Having read your last comment and looked at the PDF, I think I'm answering a question different from the one you're asking. In what form is the GPS coordinate? (Latitude, logitude, altitude? x, y, z with 0, 0, 0 being the center of the earth? Something else?) Is the photograph "wrapped" around the entire earth? (If not, what area does it cover? Is it curved?)
Wesley
The GPS coordinate (wp) is in the form Latitude, Longitude, Altitude and similarly the camera location is of Latitude, Longitude, Altitude with additional information such as roll, tilt and pan - I have all this information available. Now, imaging you have a camera and are pointing it at the subject (the GPS world point). After taking the photograph I wish to determine the 2D X,Y pixels on the photograph that the GPS point lies. I.e. if you are pointing the camera directly at the GPS point it would lie somewhere in the centre of the photograph.I hope this makes it clearer ;)
Just to confirm, the photograph is not wrapped around the entire earth it is just a normal photograph which will be taken by a smartphone device. For example sake, lets say it 480x320 pixels.
Alright, it's very obvious that my answer isn't what you're looking for. Having said that, it may not be worth your while to continue clarifying, especially if you would instead be seeking out other resources to solve your problem. [comment split]
Wesley
However, if you're still interested, take a look at <http://i40.tinypic.com/2yucapc.jpg>. Is this what you mean? The camera is the start of the arrow, and the end of the arrow is the point on Earth. Doesn't the point on Earth always correspond to the center of the photo then? And how far away from Earth is the photo? Does it touch the surface of Earth? And do you actually see Earth behind the photo, or does the photo take up the entire "screen" (if the camera isn't rolled/tilted/panned)?
Wesley
The GPS point refers to a location on the earths surface and the elevation specifies a distance above the surface - thats the 3D point I am trying to project. The camera points at this point, however it may have a degree of tilt/roll/pan so it will not always be on the center of the photograph. Imaging you took a photo of a mountain, you know the GPS location of the summit and its elevation, likewise you know your position (camera) and its roll/tilt/pan. What I want to do is find the 2D XY coordinates on the photograph where the summit intersects with the photograph.
Hey, sorry it took me so long since my last post to reply again. After reading your comment, I still didn't quite understand what you meant. After reading your discussion with Alnitak, it seems to make a lot more sense; also, it looks like you've got your answer sufficiently answered that you can get coding again. In any case, I'll jump in on that comment thread instead if there's anything I can add.
Wesley
A: 

I would just add, I am trying to achieve something similar to what has been described here: http://www.mics.ch/SumIntF05/CyntiaDuc.pdf

However, my camera is not cylindrical, it is normal.

+1  A: 

I've posted some code here that does much of what you need.

It contains Java implementations of the OpenGL gluPerspective() and gluLookAt() functions:

Camera camera = new Camera();

Point3d eye = new Point3d(3, 4, 8);
Point3d center = new Point3d(0, 0, 0);
Vector3d up = new Vector3d(0, 1, 0);

camera.perspective(60.0, 1.6, 0.1, 20); // vertical fov, aspect ratio, znear, zfar
camera.lookAt(eye, center, up);

To use the project() function therein, use:

void plot(Camera camera, Point4d p) {
    Point4d q = Camera.project(p);
    float x = q.x / q.w;
    float y = q.y / q.w;
    ...
}

The x and y values returned fall in the range -0.5 ... 0.5

Alnitak
Thanks Alnitak, I appreciate you helping me with this.A few questions:The x, y values returned, how do I convert these to screen pixels? I.e. if I have a screen with resolution 480x360, what is the method of converting the x, y values to actual pixels?Also when I convert GPS spherical coordinates to Cartesian coordinates for use here, they are very large does this make a difference in the transformation?Lastly, where are the Ray, Matrix4d, Point3d and Vector3d classes?Also the code you posted calls a Camera class with a different constructor? I cannot get it working...Thanks!
Sorry ignore the last comment about the Camera Class having a different constructor - I misread it.Oh I am assuming these classes I need i.e. Point3d, Point4d etc are available to download somewhere?
Okay I found the needed classes in the Java 3D javax.vecmath package.I now have one final problem - the plot() method takes a Camera parameter and a Point4d parameter, where does this Point4d object come from?
The Point4d parameter is just one of those homogenous vectors that Wesley described. Just use <x, y, z, 1>
Alnitak
Thanks, that seems to be working. How do you translate the returned X,Y values into actual screen coordinated though?
sx = (x + 0.5) * 480
Alnitak
Thanks Alnitak, I feel like i'm getting somewhere now ;) One last question - Where do you specify the roll, tilt and pan values of the camera? Is it in the 'eye' vector parameter to the lookAt method?Oh and could you tell me what the 'up' vector does?
tilt and pan is equivalent to setting the eye direction vector which in my code is the normalised unit vector "center - eye" - you'll need to use matrix math to reconstruct those. "roll" is equivalent to the "up" vector - it defines how much the camera is rotated around its own axis.
Alnitak