views:

971

answers:

1

Hi all,

For my job i've been using a Java version of ARToolkit (NyARTookit). So far it proven good enough for our needs, but my boss is starting to want the framework ported in other platforms such as web (Flash, etc) and mobiles. While i suppose i could use other ports, i'm increasingly annoyed by not knowing how the kit works and beyond that, from some limitations. Later i'll also need to extend the kit's abilities to add stuff like interaction (virtual buttons on cards, etc), which as far as i've seen in NyARToolkit aren't supported.

So basically, i need to replace ARToolkit with a custom mark detector (and in case of NyARToolkit, try to get rid of JMF and use a better solution via JNI). However i don't know how these detectors work. I know about 3D graphics and i've built a nice framework around it, but i need to know how to build the underlying tech :-).

Does anyone know any sources about how to implement a marker-based augmented reality application from scratch? When searching in google i only find "applications" of AR, not the underlying algorithms :-/.

+1  A: 

'From scratch' is a relative term. Truly doing it from scratch, without using any pre-existing vision code, would be very painful and you wouldn't do a better job of it than the entire computer vision community.

However, if you want to do AR with existing vision code, this is more reasonable. The essential sub-tasks are:

  1. Find the markers in your image or video.
  2. Make sure they are the ones you want.
  3. Figure out how they are oriented relative to the camera.

The first task is keypoint localization. Techniques for this include SIFT keypoint detection, the Harris corner detector, and others. Some of these have open source implementations - i think OpenCV has the Harris corner detector in the function GoodFeaturesToTrack.

The second task is making region descriptors. Techniques for this include SIFT descriptors, HOG descriptors, and many many others. There should be an open-source implementation of one of these somewhere.

The third task is also done by keypoint localizers. Ideally you want an affine transformation, since this will tell you how the marker is sitting in 3-space. The Harris affine detector should work for this. For more details go here: http://en.wikipedia.org/wiki/Harris_affine_region_detector

forefinger
Thanks for the references. I'm not looking to do a better work than the entire computer vision community, i'm just looking to implement the subset required for the particular task of finding the markers :-). However what i need to know is how every part works, from the step of having a bitmap with the frame to the construction of the transformation matrix used to place 3D objects. In detail :-)Basically: 1. Get the image from the camera 2. Convert it to RGB or some other processable format 3. ??? N. Use the transformation matrix.I need to know exactly the steps 3..N-1 :-)
Bad Sector
I suggest buying the O'Reilly OpenCV book.
forefinger
does it explain the algorithms or just the API?
Bad Sector
It explains both.
forefinger