views:

82

answers:

2

i'm working on a project and need your ideas, advices.

First of all, let me tell my problem.

There is power button and some other keys of a machine and there is only one user has authentication to use this machine.There are no other authentication methods, the machine is in public area in a company.

the machine is working with the combination of pressing power button and some other keys. The order of pressing keys is secret but we don't trust it, anybody can learn the password and can access the machine.

i have the capability of managing the key hold time and also some other metrics to measure the time differences between the key such as horizantal or vertical key press times (differences). and also i can measure the hold time etc.

These all means i have some inputs,

Now i'm trying to get a user profile by analysing these inputs.

My idea is to get the authenticated user to press the password n times and create a threshold or something similar to that.

This method also can be said BIOMETRICS, anyone else who knows the machine button combination, can try the password but if he is out of this range can not get access it.

How can i adapt these into my algorithms? where should i start ?

i don't want to delve deep into machine learning, and also i can see that in my first try i can get false positive and false negative values really high, but i can manage it by changing my inputs.

thanks.

+1  A: 

Hey cool idea.

To me this seems like a good candidate for a classification problem. You have two classes (correct password input / incorrect), your data could be the time (from time 0) that buttons were pressed. You could teach a learning algorithm but having several examples of correct password data and incorrect password data. Once your classifier is trained and working satisfactorily, you could try it out to predict new password input attempts for correctness.

You could try out several classifiers from Weka, a GUI based machine learning tool http://www.cs.waikato.ac.nz/ml/weka/

What you need is your data to be in a simple table format for experimenting in weka, something like the following:

Attempt No | 1st button time | 2nd button time | 3rd button time | is_correct
-----------|-----------------|-----------------|-----------------|------------
     1     |    1.2          |    1.5          |  2.4            | YES
     2     |    1.3          |    1.8          |  2.2            | YES
     3     |    1.1          |    1.9          |  2.0            | YES
     4     |    0.8          |    2.1          |  2.9            | NO
     5     |    1.2          |    1.9          |  2.2            | YES
     6     |    1.1          |    1.8          |  2.1            | NO

This would be a training set. The outcome (which is known) is the class is_correct. You would run this data through weka selecting a classifier (Naive Bayes' for example). This would produce a claffier ( for example a set of rules) which could be used to predict future entries.

darren
i have a doubt about the trainining the classifier.Let's say that i put my data in a table and applied one of the algorithm. As a result i'm getting what? i'm thinking to write my code in c++ so i get kind of classifier library to include my algorithm or ...?
berkay
Good classifier, C4 algorithm, just check it out!
The Elite Gentleman
the classifier you get is essentially a set of rules, or a decision tree etc. Basiclally a way for the program to predict the outcome for new samples. IN this example, you would run your training set through an algorithm (C4 is a good suggestion) and you get back rules that may say something like "if button one < 1.2 sec AND button two <1.6 sec THEN class is_correct = YES.
darren
Good idea. I would however not use the absolute values but their differences. (So the time from 1st button to 2nd button). By this, you reduce the mutual information between the features and then sth like the naive Bayes (that assumes independence) will be more sound.
bayer
that's a great point bayer. I was being very general in my suggestion.
darren
Thanks for all, That's long way to walk, Let's start and see what happens.
berkay
By the way: it is generally believed that you should try these 4 classifiers before anything else: naive bayes, gaussian classifier, logistic regression, fisher's linear discriminant.
bayer
Bayer, appreciate your interest and help, by the way every classifier results in set of rules (?).if it is so that's great as Darren said if button one < 1.2 sec AND button two <1.6 sec THEN class is_correct = YES. i can get classifier result and use it, but till now as far as i understand, not every classifier works like that.
berkay
No, actually none of the above works like that. Decision trees e.g. do. Problem is that rule based classifiers may have problems with uncertainty (as uncertainty is not modelled, as in probabilistic frameworks). But you should just try and see what works best.
bayer
I cam across this programmatic tutorial that shows how to work the classifier into your software. It is a java / Weka example, but the approach is correct. http://weka.wikispaces.com/Programmatic+Use
darren
+1  A: 

The key to this sort of problem is devising good metrics. Once you have a vector of input values, you can use one of a number of machine learning algorithms to classify it as authorised or declined. So the first step should be to determine which metrics (of those you mention) will be the most useful and pick a small number of them (5-10). You can probably benefit by collapsing some by means of averaging (for example, the average length of any key press, rather than a separate value for every key). Then you will need to pick an algorithm. A good one for classifying vectors of real number is Support vector machines - at this point you should read up on it, particularly on what the "kernel" function is so you can choose one to use. Then you will need to gather a set of learning examples (vectors with a known result), train the algorithm with them, and test the trained svm on a fresh set of examples to see how it performs. If the performance is poor with a simple kernel (e.g. linear), you may choose to use a higher dimensional one. Good luck!

abc
yes you are right metrics are really important because i will try to have minimum FAR AND FRR. Can you offer me online web pages to get some experience?
berkay
This is a research area, so I am not aware of tutorials or simplified accounts. Your best bet is to read a paper and wikipedia on your favorite machine learning algorithm and on similar problems that have been solved (classification based on multi-dimensional data). If after that you still haven't got ideas, you should ask your project supervisor how to approach it - he should be helpful especially if you show that you know something already.
abc