I see this question is a bit old, but I decided to give an answer anyway for those who find this question by searching.
The standard way to represent 2D/3D transformations nowadays is by using homogeneous coordinates. [x,y,w] for 2D, and [x,y,z,w] for 3D. Since you have three axes in 3D as well as translation, that information fits perfectly in a 4x4 transformation matrix. I will use column-major matrix notation in this explanation. All matrices are 4x4 unless noted otherwise.
The stages from 3D points and to a rasterized point, line or polygon looks like this:
- Transform your 3D points with the inverse camera matrix, followed with whatever transformations they need. If you have surface normals, transform them as well but with w set to zero, as you don't want to translate normals. The matrix you transform normals with must be isotropic; scaling and shearing makes the normals malformed.
- Transform the point with a clip space matrix. This matrix scales x and y with the field-of-view and aspect ratio, scales z by the near and far clipping planes, and plugs the 'old' z into w. After the transformation, you should divide x, y and z by w. This is called the perspective divide.
- Now your vertices are in clip space, and you want to perform clipping so you don't render any pixels outside the viewport bounds. Sutherland-Hodgeman clipping is the most widespread clipping algorithm in use.
- Transform x and y with respect to w and the half-width and half-height. Your x and y coordinates are now in viewport coordinates. Z is discarded, but w and 1/z is usually saved because they are needed to do perspective correct interpolation across the polygon surface. z is not linear across the surface, but 1/z is. This stage is the actual projection, because z isn't used as a component in the position any more.
The algorithms:
Calculation of field-of-view
This calculates the field-of view. Whether tan takes radians or degrees is irrelevant, but angle must match. Notice that the result reaches infinity as angle nears 180 degrees. This is a singularity, as it is impossible to have a focal point that wide. If you want numerical stability, keep angle less or equal to 179 degrees.
fov = 1.0 / tan(angle/2.0)
Also notice that 1.0 / tan(45) = 1. Someone else here suggested to just divide by z. The result here is clear. You would get a 90 degree FOV and an aspect ratio of 1:1. Using homogeneous coordinates like this has several other advantages as well; we can for example perform clipping against the near and far planes without treating it as a special case.
Calculation of the clip matrix
This is the layout of the clip matrix. aspectRatio is Width/Height. So the FOV for the x component is scaled based on FOV for y. Far and near are coefficients which are the distances for the near and far clipping planes.
[fov * aspectRatio][ 0 ][ 0 ][ 0 ]
[ 0 ][ fov ][ 0 ][ 0 ]
[ 0 ][ 0 ][(far+near)/(far-near) ][ 1 ]
[ 0 ][ 0 ][(2*near*far)/(near-far)][ 0 ]
Screen Projection
After clipping, this is the final transformation to get our screen coordinates.
x' = (x * Width ) / (2.0 * w) + halfWidth;
y' = (y * Height) / (2.0 * w) + halfHeight;
Trivial example implementation in C++
#include <vector>
#include <cmath>
#include <stdexcept>
#include <algorithm>
struct Vector
{
Vector() : x(0),y(0),z(0),w(1){}
Vector(float a, float b, float c) : x(a),y(b),z(c),w(1){}
/* Assume proper operator overloads here, with vectors and scalars */
float Length() const
{
return std::sqrt(x*x + y*y + z*z);
}
Vector Unit() const
{
const float epsilon = 1e-6;
float mag = Length();
if(mag < epsilon){
std::out_of_range e("");
throw e;
}
return *this / mag;
}
};
inline float Dot(const Vector& v1, const Vector& v2)
{
return v1.x*v2.x + v1.y*v2.y + v1.z*v2.z;
}
class Matrix
{
public:
Matrix() : data(16)
{
Identity();
}
void Identity()
{
std::fill(data.begin(), data.end(), float(0));
data[0] = data[5] = data[10] = data[15] = 1.0f;
}
float& operator[](size_t index)
{
if(index >= 16){
std::out_of_range e("");
throw e;
}
return data[index];
}
Matrix operator*(const Matrix& m) const
{
Matrix dst;
int col;
for(int y=0; y<4; ++y){
col = y*4;
for(int x=0; x<4; ++x){
for(int i=0; i<4; ++i){
dst[x+col] += m[i+col]*data[x+i*4];
}
}
}
return dst;
}
Matrix& operator*=(const Matrix& m)
{
*this = (*this) * m;
return *this;
}
/* The interesting stuff */
void SetupClipMatrix(float fov, float aspectRatio, float near, float far)
{
Identity();
float f = 1.0f / std::tan(fov * 0.5f);
data[0] = f*aspectRatio;
data[5] = f;
data[10] = (far+near) / (far-near);
data[11] = 1.0f; /* this 'plugs' the old z into w */
data[14] = (2.0f*near*far) / (near-far);
data[15] = 0.0f;
}
std::vector<float> data;
};
inline Vector operator*(const Vector& v, const Matrix& m)
{
Vector dst;
dst.x = v.x*m[0] + v.y*m[4] + v.z*m[8 ] + v.w*m[12];
dst.y = v.x*m[1] + v.y*m[5] + v.z*m[9 ] + v.w*m[13];
dst.z = v.x*m[2] + v.y*m[6] + v.z*m[10] + v.w*m[14];
dst.w = v.x*m[3] + v.y*m[7] + v.z*m[11] + v.w*m[15];
return dst;
}
typedef std::vector<Vector> VecArr;
VecArr ProjectAndClip(int width, int height, float near, float far, const VecArr& vertex)
{
float halfWidth = (float)width * 0.5f;
float halfHeight = (float)height * 0.5f;
float aspect = (float)width / (float)height;
Vector v;
Matrix clipMatrix;
VecArr dst;
clipMatrix.SetupClipMatrix(60.0f * (M_PI / 180.0f), aspect, near, far);
/* Here, after the perspective divide, you perform Sutherland-Hodgeman clipping
by checking if the x, y and z components are inside the range of [-w, w].
One checks each vector component seperately against each plane. Per-vertex
data like colours, normals and texture coordinates need to be linearly
interpolated for clipped edges to reflect the change. If the edge (v0,v1)
is tested against the positive x plane, and v1 is outside, the interpolant
becomes: (v1.x - w) / (v1.x - v0.x)
I skip this stage all together to be brief.
*/
for(VecArr::iterator i=vertex.begin(); i!=vertex.end(); ++i){
v = (*i) * clipMatrix;
v /= v.w; /* Don't get confused here. I assume the divide leaves v.w alone.*/
dst.push_back(v);
}
/* TODO: Clipping here */
for(VecArr::iterator i=dst.begin(); i!=dst.end(); ++i){
i->x = (i->x * (float)width) / (2.0f * i->w) + halfWidth;
i->y = (i->y * (float)height) / (2.0f * i->w) + halfHeight;
}
return dst;
}
If you still ponder about this, the OpenGL specification is a really nice reference for the maths involved.
The DevMaster forums at http://www.devmaster.net/ have a lot of nice articles related to software rasterizers as well.