I'm implementing a face tracker on Android, and as a literature study, would like to identify the underlying technique of Android's FaceDetector.
A brief Google search didn't yield anything informative, so I thought I'd take a look at the code.
By looking at the Java source code, FaceDetector.java
, there isn't much to be learned: FaceDetector
is simply a class that is provided the image dimensions and number of faces, then returns an array of faces.
The Android source contains the JNI code for this class. I followed through the function calls, where, simply put, I learned:
- The "FaceFinder" is created in
FaceFinder.c:75
- On line 90,
bbs_MemSeg_alloc
returns abtk_HFaceFinder
object (which contains the function to actually find faces), essentially copying it thehsdkA->contextE.memTblE.espArrE
array of the originalbtk_HSDK
object initialized within initialize() (FaceDetector_jni.cpp:145
) bybtk_SDK_create()
- It appears that a maze of functions provide each other with pointers and instances of
btk_HSDK
, but nowhere can I find a concrete instantiation ofsdk->contextE.memTblE.espArrE[0]
that supposedly contains the magic.
What I have discovered, is a little clue: the JNI code references a FFTEm library that I can't find the source code for. By the looks of it, however, FFT is Fast Fourier Transform, which is probably used together with a pre-trained neural network. The only literature I can find that aligns with this theory is a paper by Ben-Yacoub et al.
I don't even really know if I'm set on the right path, so any suggestions at all would undoubtedly help.
Edit: I've added a +100 bounty for anybody who can give any insight.