Let's start with the basics.
Usually, you want to transform your local triangle vertices through the following steps:
local-space coords-> world-space coords -> view-space coords -> clip-space coords
In standard GL, the first 2 transforms are done through GL_MODELVIEW_MATRIX
, the 3rd is done through GL_PROJECTION_MATRIX
These model-view transformations, for the many interesting transforms that we usually want to apply (say, translate, scale and rotate, for example), happen to be expressible as vector-matrix multiplication when we represent vertices in homogeneous coordinates. Typically, the vertex V = (x, y, z)
is represented in this system as (x, y, z, 1)
.
Ok. Say we want to transform a vertex V_local through a translation, then a rotation, then a translation. Each transform can be represented as a matrix*, let's call them T1, R1, T2.
We want to apply the transform to each vertex: V_view = V_local * T1 * R1 * T2
. Matrix multiplication being associative, we can compute once and for all M = T1 * R1 * T2
.
That way, we only need to pass down M to the vertex program, and compute V_view = V_local * M
. In the end, a typical vertex shader multiplies the vertex position by a single matrix. All the work to compute that one matrix is how you move your object from local space to the clip space.
Ok... I glanced over a number of important details.
First, what I described so far only really covers the transformation we usually want to do up to the view space, not the clip space. However, the hardware expects the output position of the vertex shader to be represented in that special clip-space. It's hard to explain clip-space coordinates without significant math, so I will leave that out, but the important bit is that the transformation that brings the vertices to that clip-space can usually be expressed as the same type of matrix multiplication. This is what the old gluPerspective, glFrustum and glOrtho compute.
Second, this is what you apply to vertex positions. The math to transform normals is somewhat different. That's because you want the normal to stay perpendicular to the surface after transformation (for reference, it requires a multiplication by the inverse-transpose of the model-view in the general case, but that can be simplified in many cases)
Third, you never send 4-D coordinates to the vertex shader. In general you pass 3-D ones. OpenGL will transform those 3-D coordinates (or 2-D, btw) to 4-D ones so that the vertex shader does not have to add the extra coordinate. it expands each vertex to add the 1 as the w
coordinate.
So... to put all that back together, for each object, you need to compute those magic M matrices based on all the transforms that you want to apply to the object. Inside the shader, you then have to multiply each vertex position by that matrix and pass that to the vertex shader Position output. Typical code is more or less (this is using old nomenclature):
mat4 MVP;
gl_Position=MVP * gl_Vertex;
* the actual matrices can be found on the web, notably on the man pages for each of those functions: rotate, translate, scale, perspective, ortho