machine-learning

simple Feed forward (newff) network in MATLAB

I used ffnew functions many times but when I am trying to create a simple feed forward network such that the input vector is P=[1;2;3;4] and the desired output is T=[1 ;0;0;1]. So i only have one sample input vector The code is net = newff(P,T,[4 1],{'tansig','tansig'}); net=train (net,P,T); When I write the last line I got: ??? Err...

What are the advantages/disadvantages between R and MATLAB with respect to Machine Learning?

I am beginning some studies into machine learning and it seems these two are often used in this field. They seem very similar, so how would one decide which is best to use? ...

fitting in neural network with MATLAB

I want to fit a function using neural networks, with 0/1 as outputs. please help me find the best way to it. In fact I want to know the fitting function in MATLAB, specifically in the neural network toolbox. I don't know which method is good for modeling a function with binary output. Also is there anyway in MATLAB that I can gain weig...

Fastest general machine learning library?

Weka is probably the most popular general purpose machine learning library. But it can be quite slow in my experience. I have been looking at Shark, Waffles, dlib, Plearn, and MLC++ as alternatives. Of these, Shark and dlib look the most promising. Does anyone have any experience when it comes to performance testing of these librarie...

numpy convert categorical string arrays to an integer array

I'm trying to convert a string array of categorical variables to an integer array of categorical variables. Ex. import numpy as np a = np.array( ['a', 'b', 'c', 'a', 'b', 'c']) print a.dtype >>> |S1 b = np.unique(a) print b >>> ['a' 'b' 'c'] c = a.desired_function(b) print c, c.dtype >>> [1,2,3,1,2,3] int32 I realize this can be d...

What is a good first-implementation for learning machine learning?

I find learning new topics comes best with an easy implementation to code to get the idea. This is how I learned genetic algorithms and genetic programming. What would be some good introductory programs to write to get started with machine learning? Preferably, let any referenced resources be accessible online so the community can ben...

AdaBoost ML algorithm python implementation

Hi guys, Is there anyone that has some ideas on how to implement the AdaBoost (Boostexter) algorithm in python? Cheers! ...

How to use R Random forests to reduce attributes having no discrete classes?

I want to use Random forests for attribute reduction. One problem I have in my data is that I don't have discrete class - only continuous, which indicates how example differs from 'normal'. This class attribute is a kind of distance from zero to infinity. Is there any way to use Random forest for such data? ...

Large scale Machine Learning

I need to run various machine learning techniques on a big dataset (10-100 billions records) The problems are mostly around text mining/information extraction and include various kernel techniques but are not restricted to them (we use some bayesian methods, bootstrapping, gradient boosting, regression trees -- many different problems an...

Scalable Classifier For Finding Missing Attributes

I have a large sparse matrix representing attributes for millions of entities. For example, one record, representing an entity, might have attributes "has(fur)", "has(tail)", "makesSound(meow)", and "is(cat)". However, this data is incomplete. For example, another entity might have all the attributes of a typical "is(cat)" entity, but i...

Probabilistic Generation of Semantic Networks

I've studied some simple semantic network implementations and basic techniques for parsing natural language. However, I haven't seen many projects that try and bridge the gap between the two. For example, consider the dialog: "the man has a hat" "he has a coat" "what does he have?" => "a hat and coat" A simple semantic network, based...

Algorithms and methods for attribute/feature selection?

I have data with continuous class and I'm searching for good methods to reduce number of attributes. Now I'm using correlation based filters, random forests and Gram–Schmidt algorithm. What I want to achieve is answer which attributes are more important/relevant to class attribute than others. By using methods that I mentioned befor...

How to purposely overfit Weka tree classifiers?

I have a binary class dataset (0 / 1) with a large skew towards the "0" class (about 30000 vs 1500). There are 7 features for each instance, no missing values. When I use the J48 or any other tree classifier, I get almost all of the "1" instances misclassified as "0". Setting the classifier to "unpruned", setting minimum number of inst...

Interactive Decision Tree Classifier

Can anyone recommend a decision tree classifier implementation, in either Python or Java, that can be used incrementally? All the implementations I've found require you to provide all the features to the classifier at once in order to get a classification. However, in my application, I have hundreds of features, and some of the features...

Using Python Functions From the Clips Expert System

Using PyClips, I'm trying to build rules in Clips that dynamically retrieve data from the Python interpreter. To do this, I register an external function as outlined in the manual. The code below is a toy example of the problem. I'm doing this because I have an application with a large corpus of data, in the form of a SQL database, whic...

Trained Spam Machine Learning Classifier/Model

I have a list of sentences about 17 million. I need to identify sentence as spam/ham/unsure. Are there trained models present on the internet to which I could just feed in my data as a "test" set and the system would classify my sentence as spam/ham ? Note: The sentences aren't e-mails. ...

Good source for machine learning datasets in computer vision

What is a good source for datasets that can be used to train machine learning algorithms, specifically image sets for computer vision projects. The best source I've found so far is: http://archive.ics.uci.edu/ml/index.html ...

Can the value of information gain be negative?

Hi to everyone, Is there a chance to get the value of information gain be negative? It is calculated according to the formula in the following paper. I cannot write the formula, because it includes some hard notations. http://citeseerx.ist.psu.edu Thanks! ...

Language Modelling toolkit

Hi, I would like to build a language model for a text corpus. Are there good out-of-the-box toolkits which will alleviate my task? The only toolkit I know off is the Statistical Language Modelling(SLM) Toolkit by CMU. Regards, ...

Adding documents to a scored TF-IDF collection?

I have a large collection of documents that already have their TF-IDF computed. I'm getting ready to add some more documents to the collection, and I am wondering if there is a way to add TF-IDF scores to the new documents without re-processing the entire database? ...