a simple/practical example of fuzzy c-means algorithm

i a writing my master thesis on the subject of dynamic keystroke authentication. to support ongoing research, i am writing code to test out different methods of feature extraction and feature matching.

my current simple approach just checks if the reference password keycodes matches the currently typed in keycodes and also checks if the keypress times (dwell) and the key-to-key times (flight) are the same as reference times +/- 100ms (tolerance). this is of course very limited and i want to extend it with some sort of fuzzy c-means pattern matching.

for each key the features look like: keycode, dwelltime, flighttime (first flighttime is always 0).

obviously the keycodes can be taken out of the fuzzy algorithm because they have to be exactly the same. in this context, how would a practical implementation of fuzzy c-means look like?

Generally, you would do the following:

Determine how many clusters you want (2? "Authentic" and "Fake"?)
Determine what elements you want to cluster (individual keystrokes? login attempts?)
Determine what your feature vectors will look like (dwell time, flight time?)
Determine what distance metric you will be using (how will you measure the distance of each sample from each cluster?)
Create exemplar training data for each cluster type (what does an authentic login look like?)
Run the FCM algorithm on the training data to generate the clusters
To create the membership vector for any given login attempt sample, run it through the FCM algorithm using the clusters you found in step 6
Use the resulting membership vector to determine (based on some threshold criteria) whether the login attempt is authentic

I'm not an expert, but this seems like an odd approach to determining whether a login attempt is authentic or not. I've seen FCM used for pattern recognition (eg. which facial expression am I making?), which makes sense because you're dealing with several categories (eg. happy, sad, angry, etc...) with defining characteristics. In your case, you really only have one category (authentic) with defining characteristics. Non-authentic keystrokes are simply "not like" authentic keystrokes, so they won't cluster.

Perhaps I am missing something?

thank you, but you left out the context. the problem i'm facing is the how to apply the algorithm to the specific problem of having for example 3 times training data and one time the actual login attempt. it isn't clear how to use the algorithm to match the training data against the login attempt.

fin 2009-10-18 15:41:33

ansaurus

tags:

views:

answers:

a simple/practical example of fuzzy c-means algorithm

related questions