ansaurus

Question

Adapting Machine Learning Algorithms to my Problem

Answer 1

+1 A:

Hey cool idea.

To me this seems like a good candidate for a classification problem. You have two classes (correct password input / incorrect), your data could be the time (from time 0) that buttons were pressed. You could teach a learning algorithm but having several examples of correct password data and incorrect password data. Once your classifier is trained and working satisfactorily, you could try it out to predict new password input attempts for correctness.

You could try out several classifiers from Weka, a GUI based machine learning tool http://www.cs.waikato.ac.nz/ml/weka/

What you need is your data to be in a simple table format for experimenting in weka, something like the following:

Attempt No | 1st button time | 2nd button time | 3rd button time | is_correct
-----------|-----------------|-----------------|-----------------|------------
     1     |    1.2          |    1.5          |  2.4            | YES
     2     |    1.3          |    1.8          |  2.2            | YES
     3     |    1.1          |    1.9          |  2.0            | YES
     4     |    0.8          |    2.1          |  2.9            | NO
     5     |    1.2          |    1.9          |  2.2            | YES
     6     |    1.1          |    1.8          |  2.1            | NO

This would be a training set. The outcome (which is known) is the class is_correct. You would run this data through weka selecting a classifier (Naive Bayes' for example). This would produce a claffier ( for example a set of rules) which could be used to predict future entries.

darren 2010-03-22 22:50:26

i have a doubt about the trainining the classifier.Let's say that i put my data in a table and applied one of the algorithm. As a result i'm getting what? i'm thinking to write my code in c++ so i get kind of classifier library to include my algorithm or ...?

berkay 2010-03-22 23:01:25

Good classifier, C4 algorithm, just check it out!

The Elite Gentleman 2010-03-22 23:14:01

the classifier you get is essentially a set of rules, or a decision tree etc. Basiclally a way for the program to predict the outcome for new samples. IN this example, you would run your training set through an algorithm (C4 is a good suggestion) and you get back rules that may say something like "if button one < 1.2 sec AND button two <1.6 sec THEN class is_correct = YES.

darren 2010-03-22 23:31:34

Good idea. I would however not use the absolute values but their differences. (So the time from 1st button to 2nd button). By this, you reduce the mutual information between the features and then sth like the naive Bayes (that assumes independence) will be more sound.

bayer 2010-03-22 23:48:08

that's a great point bayer. I was being very general in my suggestion.

darren 2010-03-22 23:54:16

Thanks for all, That's long way to walk, Let's start and see what happens.

berkay 2010-03-23 00:18:52

By the way: it is generally believed that you should try these 4 classifiers before anything else: naive bayes, gaussian classifier, logistic regression, fisher's linear discriminant.

bayer 2010-03-23 11:31:54

Bayer, appreciate your interest and help, by the way every classifier results in set of rules (?).if it is so that's great as Darren said if button one < 1.2 sec AND button two <1.6 sec THEN class is_correct = YES. i can get classifier result and use it, but till now as far as i understand, not every classifier works like that.

berkay 2010-03-23 17:47:59

No, actually none of the above works like that. Decision trees e.g. do. Problem is that rule based classifiers may have problems with uncertainty (as uncertainty is not modelled, as in probabilistic frameworks). But you should just try and see what works best.

bayer 2010-03-24 11:04:26

I cam across this programmatic tutorial that shows how to work the classifier into your software. It is a java / Weka example, but the approach is correct. http://weka.wikispaces.com/Programmatic+Use

darren 2010-03-24 22:37:31

Answer 2

+1 A:

The key to this sort of problem is devising good metrics. Once you have a vector of input values, you can use one of a number of machine learning algorithms to classify it as authorised or declined. So the first step should be to determine which metrics (of those you mention) will be the most useful and pick a small number of them (5-10). You can probably benefit by collapsing some by means of averaging (for example, the average length of any key press, rather than a separate value for every key). Then you will need to pick an algorithm. A good one for classifying vectors of real number is Support vector machines - at this point you should read up on it, particularly on what the "kernel" function is so you can choose one to use. Then you will need to gather a set of learning examples (vectors with a known result), train the algorithm with them, and test the trained svm on a fresh set of examples to see how it performs. If the performance is poor with a simple kernel (e.g. linear), you may choose to use a higher dimensional one. Good luck!

abc 2010-03-22 22:51:38

yes you are right metrics are really important because i will try to have minimum FAR AND FRR. Can you offer me online web pages to get some experience?

berkay 2010-03-22 23:03:46

This is a research area, so I am not aware of tutorials or simplified accounts. Your best bet is to read a paper and wikipedia on your favorite machine learning algorithm and on similar problems that have been solved (classification based on multi-dimensional data). If after that you still haven't got ideas, you should ask your project supervisor how to approach it - he should be helpful especially if you show that you know something already.

abc 2010-03-24 23:02:17

ansaurus

tags:

views:

answers:

Adapting Machine Learning Algorithms to my Problem

related questions