views:

3133

answers:

5

The Screen-to-world problem on the iPhone

I have a 3D model (CUBE) rendered in an EAGLView and I want to be able to detect when I am touching the center of a given face (From any orientation angle) of the cube. Sounds pretty easy but it is not...

The problem:
How do I accurately relate screen-coordinates (touch point) to world-coordinates (a location in OpenGL 3D space)? Sure, converting a given point into a 'percentage' of the screen/world-axis might seem the logical fix, but problems would arise when I need to zoom or rotate the 3D space. Note: rotating & zooming in and out of the 3D space will change the relationship of the 2D screen coords with the 3D world coords...Also, you'd have to allow for 'distance' in between the viewpoint and objects in 3D space. At first, this might seem like an 'easy task', but that changes when you actually examine the requirements. And I've found no examples of people doing this on the iPhone. How is this normally done?

An 'easy' task?:
Sure, one might undertake the task of writing an API to act as a go-between between screen and world, but the task of creating such a framework would require some serious design and would likely take 'time' to do -- NOT something that can be one-manned in 4 hours...And 4 hours happens to be my deadline.

The question:

  • What are some of the simplest ways to know if I touched specific locations in 3D space in the iPhone OpenGL ES world?
A: 

Google for opengl screen to world (for example there’s a thread where somebody wants to do exactly what you are looking for on GameDev.net). There is a gluUnProject function that does precisely this, but it’s not available on iPhone, so that you have to port it (see this source from the Mesa project). Or maybe there’s already some publicly available source somewhere?

zoul
Is associating touches with areas on 3D models in an EAGLView this complicated? There seems to be little coverage on this topic.. That link you provided really did not provide me any clues for achieving this in Xcode. Thanks for the reply, however.
RexOnRoids
I think that it really is that complicated and that the link answers your question, but let’s wait – maybe somebody will come up with a better answer.
zoul
EAGLView and OpenGL are only tools for drawing your models. They don't take care of hit detection or anything like that. You have to do that.
Rhythmic Fistman
+1  A: 

Imagine a line that extends from the viewer's eye
through the screen touch point into your 3D model space.

If that line intersects any of the cube's faces, then the user has touched the cube.

Rhythmic Fistman
That would work if it was just a simple cube with 8 vertices. Mine is actually a 3D-Lego cube with 500+ vertices (from the cylinders on top) and the data is automatically loaded from a blender export file -- i.e.: I have no idea which of the 500+ vertices to track as my 8 corners and thus cannot track my faces.
RexOnRoids
Why don't you calculate/guestimate a bounding rectangular prism for your lego block, then check that for intersection. Go on, you know you want to.
Rhythmic Fistman
How would I 'check' a bounding box for an intersection with respect to a touch that is a 2D point on the screen?
RexOnRoids
Just as you would with the cube: check for intersection against its 6 faces.
Rhythmic Fistman
Google for "line mesh intersection". It is not a trivial task to do it efficiently.
Marco Mustapic
If he were ray-tracing the lego block, then, no, a bounding box alone would not suffice. Luckily, he's just trying to detect a "touch" in the centre of one of its faces. Sounds pretty simple to me.
Rhythmic Fistman
RexOnRoids
+1  A: 

Two solutions present themselves. Both of them should achieve the end goal, albeit by a different means: rather than answering "what world coordinate is under the mouse?", they answer the question "what object is rendered under the mouse?".

One is to draw a simplified version of your model to an off-screen buffer, rendering the center of each face using a distinct color (and adjusting the lighting so color is preserved identically). You can then detect those colors in the buffer (e.g. pixmap), and map mouse locations to them.

The other is to use OpenGL picking. There's a decent-looking tutorial here. The basic idea is to put OpenGL in select mode, restrict the viewport to a small (perhaps 3x3 or 5x5) window around the point of interest, and then render the scene (or a simplified version of it) using OpenGL "names" (integer identifiers) to identify the components making up each face. At the end of this process, OpenGL can give you a list of the names that were rendered in the selection viewport. Mapping these identifiers back to original objects will let you determine what object is under the mouse cursor.

Michael E
Just a note for the iPhone: no OpenGL picking in Open GL ES 1.1, so you're out of luck.
jv42
+1  A: 

You need to have the opengl projection and modelview matrices. Multiply them to gain the modelview projection matrix. Invert this matrix to get a matrix that transforms clip space coordinates into world coordinates. Transform your touch point so it corresponds to clip coordinates: the center of the screen should be zero, while the edges should be +1/-1 for X and Y respectively.

construct two points, one at (0,0,0) and one at (touch_x,touch_y,-1) and transform both by the inverse modelview projection matrix.

Do the inverse of a perspective divide.

You should get two points describing a line from the center of the camera into "the far distance" (the farplane).

Do picking based on simplified bounding boxes of your models. You should be able to find ray/box intersection algorithms aplenty on the web.

Another solution is to paint each of the models in a slightly different color into an offscreen buffer and reading the color at the touch point from there, telling you which brich was touched.

Here's source for a cursor I wrote for a little project using bullet physics:

float x=((float)mpos.x/screensize.x)*2.0f -1.0f;
 float y=((float)mpos.y/screensize.y)*-2.0f +1.0f;
 p2=renderer->camera.unProject(vec4(x,y,1.0f,1));
 p2/=p2.w;
 vec4 pos=activecam.GetView().col_t;
 p1=pos+(((vec3)p2 - (vec3)pos) / 2048.0f * 0.1f);
 p1.w=1.0f;

 btCollisionWorld::ClosestRayResultCallback rayCallback(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z));
 game.dynamicsWorld->rayTest(btVector3(p1.x,p1.y,p1.z),btVector3(p2.x,p2.y,p2.z), rayCallback);
 if (rayCallback.hasHit())
 {
  btRigidBody* body = btRigidBody::upcast(rayCallback.m_collisionObject);
  if(body==game.worldBody)
  {
   renderer->setHighlight(0);
  }
  else if (body)
  {
   Entity* ent=(Entity*)body->getUserPointer();

   if(ent)
   {
    renderer->setHighlight(dynamic_cast<ModelEntity*>(ent));
    //cerr<<"hit ";
    //cerr<<ent->getName()<<endl;
   }
  }
 }
heeen
+2  A: 

You can now find gluUnProject in http://code.google.com/p/iphone-glu/. I've no association with the iphone-glu project and haven't tried it yet myself, just wanted to share the link.

How would you use such a function? This PDF mentions that:

The Utility Library routine gluUnProject() performs this reversal of the transformations. Given the three-dimensional window coordinates for a location and all the transformations that affected them, gluUnProject() returns the world coordinates from where it originated.

int gluUnProject(GLdouble winx, GLdouble winy, GLdouble winz, 
const GLdouble modelMatrix[16], const GLdouble projMatrix[16], 
const GLint viewport[4], GLdouble *objx, GLdouble *objy, GLdouble *objz);

*Map the specified window coordinates (winx, winy, winz) into object coordinates, using transformations defined by a modelview matrix (modelMatrix), projection matrix (projMatrix), and viewport (viewport). The resulting object coordinates are returned in objx, objy, and objz. The function returns GL_TRUE, indicating success, or GL_FALSE, indicating failure (such as an noninvertible matrix). This operation does not attempt to clip the coordinates to the viewport or eliminate depth values that fall outside of glDepthRange().*

There are inherent difficulties in trying to reverse the transformation process. A two-dimensional screen location could have originated from anywhere on an entire line in three-dimensional space. To disambiguate the result, gluUnProject() requires that a window depth coordinate (winz) be provided and that winz be specified in terms of glDepthRange(). For the default values of glDepthRange(), winz at 0.0 will request the world coordinates of the transformed point at the near clipping plane, while winz at 1.0 will request the point at the far clipping plane.

Example 3-8 (again, see the PDF) demonstrates gluUnProject() by reading the mouse position and determining the three-dimensional points at the near and far clipping planes from which it was transformed. The computed world coordinates are printed to standard output, but the rendered window itself is just black.

In terms of performance, I found this quickly via Google as an example of what you might not want to do using gluUnProject, with a link to what might lead to a better alternative. I have absolutely no idea how applicable it is to the iPhone, as I'm still a newb with OpenGL ES. Ask me again in a month. ;-)

Louis St-Amour