Here's an explicit example based on shape functions.
Consider the functions:
u1(x,z) = (x-x_b)/(x_c-x_b)
One has u1(x_b,z_b) = u1(x_a,z_a) = 0 (because x_a = x_b) and u1(x_c,z_c) = u1(x_d,z_d) = 1
u2(x,z) = 1 - u1(x,z)
Now we have u2(x_b,z_b) = u2(x_a,z_a) = 1 and u2(x_c,z_c) = u2(x_d,z_d) = 0
v1(x,z) = (z-z_b)/(z_a-z_b)
This function satisfies v1(x_a,z_a) = v1(x_d,z_d) = 1 and v1(x_b,z_b) = v1(x_c,z_c) = 0
v2(x,z) = 1 - v1(x,z)
We have v2(x_a,z_a) = v2(x_d,z_d) = 0 and v2(x_b,z_b) = v2(x_c,z_c) = 1
Now let's build new functions as follows:
S_D(x,z) = u1(x,z) * v1(x,z)
We get S_D(x_d, z_d) = 1 and S_D(x_a,z_a) = S_D(x_b,z_b) = S_D(x_c,z_c) = 0
S_C(x,z) = u1(x,z) * v2(x,z)
We get S_C(x_c, z_c) = 1 and S_C(x_a,z_a) = S_C(x_b,z_b) = S_C(x_d,z_d) = 0
S_A(x,z) = u2(x,z) * v1(x,z)
We get S_A(x_a, z_a) = 1 and S_A(x_b,z_b) = S_A(x_c,z_c) = S_A(x_d,z_d) = 0
S_B(x,z) = u2(x,z) * v2(x,z)
We get S_B(x_b, z_b) = 1 and S_B(x_a,z_a) = S_B(x_c,z_c) = S_B(x_d,z_d) = 0
Now define your interpolating function as
H(x,z) = h_a * S_A(x,z) + h_b * S_B(x,z) + h_c * S_C(x,z) + h_d * S_D(x,z),
where h_a is the heigh at point A, h_b is the height at point B, and so on.
You can easily verify that H is indeed an interpolating function:
H(x_a,z_a) = h_a, H(x_b,z_b) = h_b, H(x_c,z_c) = h_c and H(x_d,z_d) = h_d.
Now, in order to approximate the height at P, all you need to do is evaluate H at this point:
h_p = H(x_p, z_p)
The functions S are normally referred to as "shape functions". There's one such function for each node you want your interpolated value to depend on, and in this case they all satisfy Kronecker's delta property (they take the value one at one node and zero at all other nodes).
There are many ways to build shape functions for a given set of nodes. If I remember correctly, the construction of 2D shape functions by multiplication of 1D shape functions (as we've done in this case) is called "tensor product of functions" (easy in this case because the grid is rectangular). We have ended up with four functions (one per node), all of them linear combinations of {1, x, z, xz}.
If you want to use only three points for your interpolation, then you should be able to easily build three shape functions as linear combinations of {1, x, z} only, but you will loose a 25% of the height information provided by the grid and your interpolant will not be smooth inside the rectangle when h_b != h_d.