views:

1867

answers:

6
+7  A: 

Gesture recognition, as I've seen it anyway, is usually implemented using machine learning techniques similar to image recognition software. Here's a cool project on codeproject about doing mouse gesture recognition in c#. I'm sure the concepts are quite similar since you can likely reduce the problem down to 2D space. If you get something working with this, I'd love to see it. Great project idea!

dustyburwell
@ascalonx, thanks for the link! I'm sure it will be useful.
Simucal
Mouse Gesture Recognition in ActionScript: http://www.bytearray.org/?p=91
jleedev
+4  A: 

One way to look at it is as a compression / recognition problem. Basically, you want to take a whole bunch of data, throw out most of it, and categorize the remainder. If I were doing this (from scratch) I'd probably proceed as follows:

  • work with a rolling history window
  • take the center of gravity of the four points in the start frame, save it, and subtract it out of all the positions in all frames.
  • factor each frame into two components: the shape of the constellation and the movement of it's CofG relative to the last frame's.
  • save the absolute CofG for the last frame too
  • the series of CofG changes gives you swipes, waves, etc.
  • the series of constellation morphing gives you pinches, etc.

After seeing your photo (two points on each hand, not four points on one, doh!) I'd modify the above as follows:

  • Do the CofG calculation on pairs, with the caveats that:
    • If there are four points visible, pairs are chosen to minimize the product of the intrapair distances
    • If there are three points visible, the closest two are one pair, the other one is the other
    • Use prior / following frames to override when needed
  • Instead of a constellation, you've got a nested structure of distance / orientation pairs (i.e., one D/O between the hands, and one more for each hand).
  • Pass the full reduced data to recognizers for each gesture, and let them sort out what they care about.
  • If you want to get cute, do a little DSL to recognize the patterns, and write things like:

    fire when
        in frame.final: rectangle(points) 
      and
        over frames.final(5): points.all (p => p.jerk)
    

    or

    fire when
        over frames.final(3): hands.all (h => h.click)
    
MarkusQ
@MarkusQ, thanks for the comments. Just for reference, the Netherlands students clicking algorithm works as follows: if the wiimote loses track of a point and the other point in its pair was within a closeness threshold, then it is a "click".
Simucal
As the fingers come together, the wiimote will see both the fingers as one blob and not two any longer. So it loses one of its points. This can also happen when your hands are no longer visible so the closeness threshold is used to prevent false positives.
Simucal
A: 

I'm not very well versed in this type of mathematics, but I have read somewhere that people sometimes use Markov Chains or Hidden Markov Models to do Gesture Recognition.

Perhaps someone with a little more background in this side of Computer Science can illuminate it further and provide some more details.

A: 

Err.. I've been working on gesture recognition for the past year or so now, but I don't want to say too much because I'm trying to patent my technology :) But... we've had some luck with adaptive boosting, although what you're doing looks fundamentally different. You only have 4 points of data to process, so I don't think you really need to "reduce" anything.

What I would investigate is how programs like Flash turn a freehand drawn circle into an actual circle. It seems like you could track the points for duration of about a second, and then "smooth" the path in some fashion, and then you could probably get away with hardcoding your gestures (if you make them simple enough). Otherwise, yes, you're going to want to use a learning algorithm. Neural nets might work... I don't know. Just tossing out ideas :) Maybe look at how OCR is done too... or even Hough transforms. It looks to me like this is a problem of recognizing shapes more than it is of recognizing gestures.

Mark
Well, fundamentally, drawing a circle, an x, or swiping all 4 points across in different directions ~are~ gestures. In my 2d world my gestures are shapes. I'll have to look further into the learning algorithms though.
Simucal
Well, yes, they *are* gestures, I just mean that if you can figure out what shape it makes, you can also figure out what gesture it was. i.e., I think the gesture recognition is reducible to shape recognition (which may be an easier problem to solve -- less probabilistic).
Mark
+1  A: 

A video of what has been done with this sort of technology, if anyone is interested?

Pattie Maes demos the Sixth Sense - TED 2009

Elijah Glover
A: 

Most simple gesture-recognition tools I've looked at use a vector-based template to recognize them. For example, you can define right-swipe as "0", a checkmark as "-45, 45, 45", a clockwise circle as "0, -45, -90, -135, 180, 135, 90, 45, 0", and so on.

Ignacio Vazquez-Abrams