views:

420

answers:

3

I'm trying to reconstruct 3D points from 2D image correspondences. My camera is calibrated. The test images are of a checkered cube and correspondences are hand picked. Radial distortion is removed. After triangulation the construction seems to be wrong however. The X and Y values seem to be correct, but the Z values are about the same and do not differentiate along the cube. The 3D points look like as if the points were flattened along the Z-axis.

What is going wrong in the Z values? Do the points need to be normalized or changed from image coordinates at any point, say before the fundamental matrix is computed? (If this is too vague I can explain my general process or elaborate on parts)

Update

Given: x1 = P1 * X and x2 = P2 * X

x1, x2 being the first and second image points and X being the 3d point.

However, I have found that x1 is not close to the actual hand picked value but x2 is in fact close.

How I compute projection matrices:

P1 = [eye(3), zeros(3,1)];
P2 = K * [R, t];

Update II

Calibration results after optimization (with uncertainties)

% Focal Length:          fc = [ 699.13458   701.11196 ] ± [ 1.05092   1.08272 ]
% Principal point:       cc = [ 393.51797   304.05914 ] ± [ 1.61832   1.27604 ]
% Skew:             alpha_c = [ 0.00180 ] ± [ 0.00042  ]   => angle of pixel axes = 89.89661 ± 0.02379 degrees
% Distortion:            kc = [ 0.05867   -0.28214   0.00131   0.00244  0.35651 ] ± [ 0.01228   0.09805   0.00060   0.00083  0.22340 ]
% Pixel error:          err = [ 0.19975   0.23023 ]
% 
% Note: The numerical errors are approximately three times the standard
% deviations (for reference).

-

K =

  699.1346    1.2584  393.5180
         0  701.1120  304.0591
         0         0    1.0000


E =

    0.3692   -0.8351   -4.0017
    0.3881   -1.6743   -6.5774
    4.5508    6.3663    0.2764


R =

   -0.9852    0.0712   -0.1561
   -0.0967   -0.9820    0.1624
    0.1417   -0.1751   -0.9743


t =

    0.7942
   -0.5761
    0.1935


P1 =

     1     0     0     0
     0     1     0     0
     0     0     1     0


P2 =

 -633.1409  -20.3941 -492.3047  630.6410
  -24.6964 -741.7198 -182.3506 -345.0670
    0.1417   -0.1751   -0.9743    0.1935


C1 =

     0
     0
     0
     1


C2 =

    0.6993
   -0.5883
    0.4060
    1.0000


% new points using cpselect

%x1
input_points =

  422.7500  260.2500
  384.2500  238.7500
  339.7500  211.7500
  298.7500  186.7500
  452.7500  236.2500
  412.2500  214.2500
  368.7500  191.2500
  329.7500  165.2500
  482.7500  210.2500
  443.2500  189.2500
  402.2500  166.2500
  362.7500  143.2500
  510.7500  186.7500
  466.7500  165.7500
  425.7500  144.2500
  392.2500  125.7500
  403.2500  369.7500
  367.7500  345.2500
  330.2500  319.7500
  296.2500  297.7500
  406.7500  341.2500
  365.7500  316.2500
  331.2500  293.2500
  295.2500  270.2500
  414.2500  306.7500
  370.2500  281.2500
  333.2500  257.7500
  296.7500  232.7500
  434.7500  341.2500
  441.7500  312.7500
  446.2500  282.2500
  462.7500  311.2500
  466.7500  286.2500
  475.2500  252.2500
  481.7500  292.7500
  490.2500  262.7500
  498.2500  232.7500

%x2
base_points =

  393.2500  311.7500
  358.7500  282.7500
  319.7500  249.2500
  284.2500  216.2500
  431.7500  285.2500
  395.7500  256.2500
  356.7500  223.7500
  320.2500  194.2500
  474.7500  254.7500
  437.7500  226.2500
  398.7500  197.2500
  362.7500  168.7500
  511.2500  227.7500
  471.2500  196.7500
  432.7500  169.7500
  400.2500  145.7500
  388.2500  404.2500
  357.2500  373.2500
  326.7500  343.2500
  297.2500  318.7500
  387.7500  381.7500
  356.2500  351.7500
  323.2500  321.7500
  291.7500  292.7500
  390.7500  352.7500
  357.2500  323.2500
  320.2500  291.2500
  287.2500  258.7500
  427.7500  376.7500
  429.7500  351.7500
  431.7500  324.2500
  462.7500  345.7500
  463.7500  325.2500
  470.7500  295.2500
  491.7500  325.2500
  497.7500  298.2500
  504.7500  270.2500

Update III

See answer for corrections. Answers computed above were using the wrong variables/values.

+1  A: 

It may be that your points are in a degenerate configuration. Try to add a couple of points from the scene that don't belong to the cube and see how it goes.

jmbr
Added about another 9 points not on the cube. Still the cube looks flat, although the newly added points vary in their Z values relative to the cube. However, the Z values themselves do not scale with the X and Y values (z values are about 10^3 smaller than X or Y values).
srand
Have you tried using the new points (and these points only) to compute the fundamental matrix? If that works then you could replace non-cube points with cube points, recompute F, and see how this affects the reconstruction. I insist on this approach because I've been bitten in the past by the degeneracy of points on the cube while debugging calibration algorithms. Hope this helps.
jmbr
Hey jmbr, what do you think about subpixel correspondence errors in triangulation?
Jacob
Hi Jacob, see my comment in the thread corresponding to your answer.The high skewness bothers me too.
jmbr
+1  A: 

More information required:

  • What is t? The baseline might be too small for parallax.
  • What is the disparity between x1 and x2?
  • Are you confident about the accuracy of the calibration (I'm assuming you used the Stereo part of the Bouguet Toolbox)?
  • When you say the correspondences are hand-picked, do you mean you selected the corresponding points on the image or did you use an interest point detector on the two images are then set the correspondences?

I'm sure we can resolve this problem :)

Jacob
t = [-0.8754; 0.2249; -0.4279]. image points differ an average of: x 65px, y 89px. Fairly confident, updated the question with my calibration information, error is about 0.2 pixels.
srand
What units is that? Could you post *K*?
Jacob
In fact, could you post *K*,*R*,*t* and as many `x1` and `x2`?
Jacob
Ah thanks, didn't see the calib info while commenting!
Jacob
*Why* is the skew 1.2584? It's normally (and is often assumed to be) **0**. Are you using a normal camera?
Jacob
Poor quality camera? :) Using Microsoft LifeCam NX-6000 Webcam, K = [fc(1), alpha_c*fc(1), cc(1); 0, fc(2), cc(2); 0, 0, 1]. I tried it with alpha_c = 0... still the same results
srand
Hmm .. I see your problem, are these faces supposed to be orthogonal?
Jacob
Yes, the faces should be orthogonal
srand
What's your baseline, i .e. |t|? I see you've normalized it so what's the actual length?
Jacob
I'm not sure. How do I compute the baseline? And I didn't know it was normalized :)
srand
What's the distance between the two cameras in your stereo images?
Jacob
They were moved by hand (two snapshots), so I don't have their exact distance of change
srand
Try increasing the baseline, I'll keep you updated if I think of anything
Jacob
Doesn't K*[R,t] take care of the normalization?
srand
I was referring to *t* being normalized thus not giving me an idea of the length of the baseline.
Jacob
Also, how did you get these correspondences?
Jacob
The correspondences were hand picked
srand
As in, you clicked them on the screen? Hmmm .. maybe you should use an interest point detector like SIFT (try the VL-feat toolbox) - subpixel estimation is important in triangulation.
Jacob
If he clicked them on the screen using some tool like cpselect (which is part of the Image Processing Toolbox) using sub-pixel accuracy then it's fine.
jmbr
I've tried VLFeat's SIFT library but there are some false positives that get through my RANSAC algorithm, seems more trouble than its work for now. I do use the optimal triangulation method (12.1 in Hartley/Zisserman's book), so that might compensate for some inaccuracies in my hand picked results..
srand
@jmbr, I just read them off the screen using a paint application. If accuracy is that important, I'll take a look at cpselect.
srand
If you want to handpick them, do zoom up the section so you can click them as accurately as possible.
Jacob
I've done manual correspondence matching in the past and it's OK as long as you're careful and use optimization afterwards.
jmbr
Zooming on the image, which is something that cpselect allows you to easily do.
jmbr
I would suggest using VL-feat to generate the interest points and then hand-picking the correspondences. Then you get the benefit of subpixel corners! But if not, do zoom on the patches in question.
Jacob
So did subpixel corners and/or increasing the baseline work?
Jacob
So, I made an observation. The Microsoft LifeCam NX-6000 Webcam has a 2 megapixel (1.3 effective??) CMOS sensor. My calibration images are 800 x 600 pixels. However, a 2MP camera can take 1600 x 1200 pixel images. Does this affect the calibration results if everything is kept at 800x600 ?
srand
Also, I gather a new set of points from new images. This time, I picked points on 3 sides of the cube (top, left and right). Again, X and Y look good but the Z is near zero (thus making the 3d pts look flat). Maybe the calibration is bad?
srand
Also, sub-pixel accuracy shouldn't be necessary because the X and Y are obviously (visually) accurate. The lack of accuracy shouldn't only affect the Z axis
srand
Setting the skew to zero doesn't affect the reconstruction by much and still exhibits the same problem
srand
Did you obtain points with subpixel accuracy using `cpselect`?
Jacob
I used Paint and the zoom tool :) cpselect doesn't seem to guide the matching, unless I missed something
srand
`cpselect` allows you to zoom and click points on two images resulting in vectors representing the correspondences- you have to decide the matching.
Jacob
I'm still getting the same results, is there anything else that could be the culprit ?
srand
Could you post the new points selected with cpselect?
Jacob
Updated with the new cpselect points
srand
So, when I triangulate I use the essential matrix rather than the fundamental matrix. Also I multiply every image correspondences in the triangulation algorithm by K^-1. My projection matrix for P2 is [R, t];. Now I get a varying z value, and the reconstruction looks semi-decent. What is going on? I thought the optimal triangulation method (hartley/zisserman 12.1) required the F not E, additional has no mention of multiplying the image correspondences by K^-1.
srand
A: 

** Note all reference are to Multiple View Geometry in Computer Vision by Hartley and Zisserman.

OK, so there were a couple bugs:

  1. When computing the essential matrix (p. 257-259) the author mentions the correct R,t pair from the set of four R,t (Result 9.19) is the one where the 3D points lay in front of both cameras (Fig. 9.12, a) but doesn't mention how one computes this. By chance I was re-reading chapter 6 and discovered that 6.2.3 (p.162) discusses depth of points and Result 6.1 is the equation needed to be applied to get the correct R and t.

  2. In my implementation of the optimal triangulation method (Algorithm 12.1 (p.318)) in step 2 I had T2^-1' * F * T1^-1 where I needed to have (T2^-1)' * F * T1^-1. The former translates the -1.I wanted, and in the latter, to translate the inverted the T2 matrix (foiled again by MATLAB!).

  3. Finally, I wasn't computing P1 correctly, it should have been P1 = K * [eye(3),zeros(3,1)];. I forgot to multiple by the calibration matrix K.

Hope this helps future passerby's !

srand
Also thanks Jacob and jmbr :)
srand