Note that the following is a general solution. I think I've worked the math out right; anybody who knows better should comment (or edit, if you're sure...)
First, you need to calibrate your cameras: where they are, where they're pointing, and what their field of view is. This will come as (or else needs to be reduced to) a projection matrix for each camera, which transforms homogeneous worldspace points into homogeneous view points. If you don't know this a priori, you might be able to figure it out from known common features (say, the projected gray rectangles in your diagram)
Then, you need to identify your objects and their position on each image. Say, find the center of the red and blue rectangles.
This will give you a linear system for each object or feature:
P=[x,y,z,w]^t = world point (homog. 4-vector)
p1=[x1,y1,w1]^t = point on screen 1; M1= 3x4 projection matrix 1: p1=M1*P
p2=[x2,y2,w2]^t = point on screen 2; M2= 3x4 projection matrix 2: p2=M2*P
Your known data are: p1/w1=(u1,v1,1) and p2/w2=(u2,v2,1); multiply these out by variables w1 and w2, to get:
(u1,v1) are constant -> p1=[u1*w1,v1*w1,w1]^t
(u2,v2) are constant -> p2=[u2*w2,v2*w2,w2]^t
assume that w=1 -> P=[x,y,z,1]^t
Finally, you have a system of 6 equations in 5 variables: (x,y,z,w1,w2). Since this system is overdetermined, you can solve it using least squares methods.
One way to understand the overdetermined bit: given a pair of cameras (as described by their matrices), you expect them to show the scene consistently. If one of the cameras is misaligned (i.e., if the camera matrix does not reflect the actual camera perfectly), it may show objects in a location higher or lower than it ought to (say), so that the result of that camera is inconsistent with the other one.
Since you are likely using floating point (and possibly even real-world data), your values will never be perfectly accurate anyway, so you will always need to deal with this problem. Least squares allows you to solve this kind of overdetermined system; it also provides error values that may help in diagnosing and solving data problems.