Gesture recognition, as I've seen it anyway, is usually implemented using machine learning techniques similar to image recognition software. Here's a cool project on codeproject about doing mouse gesture recognition in c#. I'm sure the concepts are quite similar since you can likely reduce the problem down to 2D space. If you get something working with this, I'd love to see it. Great project idea!
One way to look at it is as a compression / recognition problem. Basically, you want to take a whole bunch of data, throw out most of it, and categorize the remainder. If I were doing this (from scratch) I'd probably proceed as follows:
- work with a rolling history window
- take the center of gravity of the four points in the start frame, save it, and subtract it out of all the positions in all frames.
- factor each frame into two components: the shape of the constellation and the movement of it's CofG relative to the last frame's.
- save the absolute CofG for the last frame too
- the series of CofG changes gives you swipes, waves, etc.
- the series of constellation morphing gives you pinches, etc.
After seeing your photo (two points on each hand, not four points on one, doh!) I'd modify the above as follows:
- Do the CofG calculation on pairs, with the caveats that:
- If there are four points visible, pairs are chosen to minimize the product of the intrapair distances
- If there are three points visible, the closest two are one pair, the other one is the other
- Use prior / following frames to override when needed
- Instead of a constellation, you've got a nested structure of distance / orientation pairs (i.e., one D/O between the hands, and one more for each hand).
- Pass the full reduced data to recognizers for each gesture, and let them sort out what they care about.
If you want to get cute, do a little DSL to recognize the patterns, and write things like:
fire when in frame.final: rectangle(points) and over frames.final(5): points.all (p => p.jerk)
or
fire when over frames.final(3): hands.all (h => h.click)
I'm not very well versed in this type of mathematics, but I have read somewhere that people sometimes use Markov Chains or Hidden Markov Models to do Gesture Recognition.
Perhaps someone with a little more background in this side of Computer Science can illuminate it further and provide some more details.
Err.. I've been working on gesture recognition for the past year or so now, but I don't want to say too much because I'm trying to patent my technology :) But... we've had some luck with adaptive boosting, although what you're doing looks fundamentally different. You only have 4 points of data to process, so I don't think you really need to "reduce" anything.
What I would investigate is how programs like Flash turn a freehand drawn circle into an actual circle. It seems like you could track the points for duration of about a second, and then "smooth" the path in some fashion, and then you could probably get away with hardcoding your gestures (if you make them simple enough). Otherwise, yes, you're going to want to use a learning algorithm. Neural nets might work... I don't know. Just tossing out ideas :) Maybe look at how OCR is done too... or even Hough transforms. It looks to me like this is a problem of recognizing shapes more than it is of recognizing gestures.
A video of what has been done with this sort of technology, if anyone is interested?
Most simple gesture-recognition tools I've looked at use a vector-based template to recognize them. For example, you can define right-swipe as "0", a checkmark as "-45, 45, 45", a clockwise circle as "0, -45, -90, -135, 180, 135, 90, 45, 0", and so on.