machine-learning

Get recall (sensitivity) and precision (PPV) values of a multi-class problem in PyML

I am using PyML for SVM classification. However, I noticed that when I evaluate a multi-class classifier using LOO, the results object does not report the sensitivity and PPV values. Instead they are 0.0: from PyML import * from PyML.classifiers import multi mc = multi.OneAgainstRest(SVM()) data = VectorDataSet('iris.data', labelsColum...

A question about classifiers in Machine Learning

Hi All,I am taking classes on intro to AI,and the teacher mentioned some point that for the classifier ZeroR,the accuracy under ZeroR is a helpful baseline for interpreting other classifiers. I searched online about this but still couldn't get my head around it,could anyone give some idea on what that means please,thanks in advance. ...

Alternatives (or ways to speed up) Acts_As_Recommendable plugin for Ruby on Rails

Hi all- I am currently using the Acts_as_recommendable plugin available here. It is using the pearson correlation coefficient to find recommendations, which is pretty much exactly what I want. The problem however is scale. With more than 2000 or so items, the plugin slows considerably (with 5000 items, I see load times of about a min...

Using r and weka. How can I use meta-algorithms along with nfold evaluation method?

Here is an example of my problem library(RWeka) iris <- read.arff("iris.arff") Perform nfolds to obtain the proper accuracy of the classifier. m<-J48(class~., data=iris) e<-evaluate_Weka_classifier(m,numFolds = 5) summary(e) The results provided here are obtained by building the model with a part of the dataset and testing it with ...

Algorithm for text classification

Hello. I have millions of short (up to 30 words) documents which I need to split into several known categories. It's possible, that a document matches several of the categories (seldom, but possible). It's also possible that a document doesn't match any of the categories (also seldom). I also have millions of documents which have already...

Please help me on choosing right classifer

Hi all, I am facing a problem on selecting correct classifier for my data-mining task. I am labeling webpages using statistical method and label them using a 1-4 scale,1 being the poorest while 4 being the best. Previously,I used SVM to train the system since I was using a binary(1,0) label then.But now since I switch to this 4-class ...

Image classification in python

I'm looking for a method of classifying scanned pages that consist largely of text. Here are the particulars of my problem. I have a large collection of scanned documents and need to detect the presence of certain kinds of pages within these documents. I plan to "burst" the documents into their component pages (each of which is an ind...

is this classification result acceptable?

Hi all, I have a very simple linear classification problem,which is to work out a linear classification problem for the following three classes in coordinates: Class 1: points (0,1) (1,0) Class 2: points (-1,0) (1,0) Class 3: points (0,-1) (1,-1) I manually used a random initial weight [ 1 0,0 1] (2*2 matrix) and a random initial bias...

Untrained Sentiment Analysis

Hi, I've been reading alot of articles that explain the need for an initial set of texts that are classified as either 'positive' or 'negative' before a sentiment analysis system will really work. My question is: Has anyone attempted just doing a rudimentary check of 'positive' adjectives vs 'negative' adjectives, taking into account a...

Are there any open source Hierarchical Temporal Memory libraries?

I'm potentitally interested in the using Hierarchical temporal memory heuristic to solve a research problem I am working on. Some more details about it can be found here: http://en.wikipedia.org/wiki/Hierarchical_temporal_memory Are there any open source libraries for this? (I'm fairly open to languages although c++, java or haskell is ...

Is there a java alternative to the Bayesian Belief Network Framework "Infer.NET"?

Is the are java alternative to Bayesian Belief Network framework - Infer.NET? Preferable if it be scalable(online learning for large datasets), well-supported(last updated since 2010) and open source and easy to write network structure. So all features from Infer.NET. ...

Can an SVM learn incrementally?

I am using a multi-dimensional SVM classifier (SVM.NET, a wrapper for libSVM) to classify a set of features. Given an SVM model, is it possible to incorporate new training data without having to recalculate on all previous data? I guess another way of putting it would be: is an SVM mutable? ...

Is there some .NET machine learning library that could, for example, suggest tags for a question?

Just to use it as an example, StackOverflow users already associated tags to questions for a lot of questions. Is there a .NET machine learning library that could use this historic data to 'learn' how to associate tags to newly created questions and suggest them to the user? ...

The effect of Decision Tree Pruning

Hi all,I want to know if I build up a decision tree A like ID3 from training and validation set,but A is unpruned. At the same time,I have another decision tree B also in ID3 generated from the same training and validation set,but B is pruned. Now I test both A and B on a future unlabeled test set,is it always the case that pruned tree w...

Help me understand linear separability in a binary SVM

I'm cross-posting this from math.stackexchange.com because I'm not getting any feedback and it's a time-sensitive question for me. My question pertains to linear separability with hyperplanes in a support vector machine. According to Wikipedia: ...formally, a support vector machine constructs a hyperplane or set of hyperplane...

Laplacian smoothing to Biopython

Hi, I am trying to add Laplacian smoothing support to Biopython's Naive Bayes code 1 for my Bioinformatics project. I have read many documents about Naive Bayes algorithm and Laplacian smoothing and I think I got the basic idea but I just can't integrate this with that code (actually I cannot see which part I will add 1 -laplacian num...

Medical information extraction using Python

Hello there, I am a nurse and I know python but I am not an expert, just used it to process DNA sequences We got hospital records written in human languages and I am supposed to insert these data into a database or csv file but they are more than 5000 lines and this can be so hard. All the data are written in a consistent format let me s...

Mass Point, Dirac Delta in Dirichlet Processes

When dealing with Dirichlet Processes, according to [Teh, 2007], a DP is defined as by a base Probability H and a scale factor "alpha" According to the Stick Breaking Construction, the random draws G from a DP: G~DP(alpha,H) are given by: G=sum(pi_k*delta_theta_k) over k from 1 to infinity pi_k are ordered draws from a Beta Distribu...

MLE for Naive Bayes in R

i am using naivebayes function of e1071 library of R like below: model <- naiveBayes(Species ~ ., data = iris) pred <- predict(model, iris[,]) my question is: how can i get maximum likelihood estimate for conditional probability distibution of this model? ...

C++ decision tree with pruning

Hello. Can you recommend me a good decision tree C++ class with support for continous features and pruning(its very important)? Im writing a simple classifier(two classes) using 9 features. I've been using Waffles recently, but looks like tree is overfitting so i get Precision around 82% but Recall is around 51% which is inacceptable. Wa...