classification

Bag of words Classification

I need find words training words and their classification. Simple classification such as . Sports Entertainment and Politics things like that. Where Can i find the words and their classifications. I know many universities have done Bag of words classifications. Is there any repository of training examples ? ...

Classifying captured data in unknown format?

I've got a large set of captured data (potentially hundreds of thousands of records), and I need to be able to break it down so I can both classify it and also produce "typical" data myself. Let me explain further... If I have the following strings of data: 132T339G1P112S 164T897F5A498S 144T989B9B223T 155T928X9Z554T ... you might sta...

string categorization strategies

I'm the one-man dev team on a fledgling military history website. One aspect of the site is a catalog of ~1,200 individual battles, including the nations & formations (regiments, divisions, etc) which took part. The formation information (as well as the other battle info) was manually imported from a series of books by a 10-man voluntee...

please help me to interpret the naive bayes result in weka..

Anybody please help me to interpret the following result generated in weka for classification using naive bayes.....Please explain clearly what is this Normal Distribution , Mean , StandardDev , WeightSum and Precision.Please help me.Am new in weka. ** Naive Bayes Classifier Class Normal: Prior probability = 0.5 1374195_at: Nor...

Machine leaning algorithm for data classification.

Hi all, I'm looking for some guidance about which techniques/algorithms I should research to solve the following problem. I've currently got an algorithm that clusters similar-sounding mp3s using acoustic fingerprinting. In each cluster, I have all the different metadata (song/artist/album) for each file. For that cluster, I'd like t...

Can you suggest me a good Java library to perform text classification with the Vector Space Model?

I need to extract the vector space representation of several documents and then to compute the cosine distance among them. I'd like to use that distance to classify some new documents using a k-Nearest-Neighbor approach. Do you have some suggestion on the libraries I could use? So far I saw that both Weka and Apache Lucene should supp...

What's the correct terminology for something that isn't quite classification nor regression?

Let's say that I have a problem that is basicly classification. That is, given some input and a number of possible output classes, find the correct class for the given input. Neural networks and decision trees are some of the algorithms that may be used to solve such problems. These algorithms typically only emit a single result however:...

Finding the closest match

I Have an object with a set of parameters like: var obj = new {Param1 = 100; Param2 = 212; Param3 = 311; param4 = 11; Param5 = 290;} On the other side i have a list of object: var obj1 = new {Param1 = 1221; Param2 = 212; Param3 = 311; param4 = 11; Param5 = 290;} var obj3 = new {Param1 = 35; Param2 = 11; Param3 = 319; param4 = 211; Pa...

Rare Event Detection

Is there any good reference to Algorithms that people use for rare event detection ? Also, How is the time factor taken into account ? If i have a case where successive data points tell something (t_1 to t_n) , How can one factor this into normal Machine learning scenario ? Any pointer will be appreciated. ...

Feature Selection methods in MATLAB?

Hi, I am trying to do some text classification with SVMs in MATLAB and really would to know if MATLAB has any methods for feature selection(Chi Sq.,MI,....), For the reason that I wan to try various methods and keeping the best method, I don't have time to implement all of them. That's why I am looking for such methods in MATLAB.Does any...

Decision Trees For Document Classification

Hi I wanted to know that is it possible to use decision trees for document classification and if yes then how should be the data representation be? I know the use of R package party for Decision Trees. ...

Advice for classifying symbols/images

I am working on a project that requires classification of characters and symbols (basically OCR that needs to handle single ASCII characters and symbols such as music notation). I am working with vector graphics (Paths and Glyphs in WPF) so the images can be of any resolution and rotation will be negligable. It will need to classify (and...

Measuring rectangles at odd angles with a low resolution input matrix (Linear regression classification?)

I'm trying to solve the following problem: Given an input of, say, 0000000000000000 0011111111110000 0011111111110000 0011111111110000 0000000000000000 0000000111111110 0000000111111110 0000000000000000 I need to find the width and height of all rectangles in the field. The input is actually a single column at a time (think like a sc...

AI / Statistical methods for determining the name of a colour

I'm thinking about writing a little library to make a guess at the name of an (RGB value) colour, from a predetermined list of candidates. My first attempt was based purely on pythagorean distance within the three-dimensional RGB colour space - this wasn't massively succesful as most of the named colour points were at the edges of the s...

Multilabel AdaBoost for MATLAB

Hi, I am currently looking for a multilabel AdaBoost implementation for MATLAB or a technique for efficiently using a two-label implementation for the multilabel case. Any help in that matter would be appreciated. ...

Ordinal classification packages and algorithms

I'm attempting to make a classifier that chooses a rating (1-5) for a item i. For each item i, I have a vector x containing about 40 different quantities pertaining to i. I also have a gold standard rating for each item. Based on some function of x, I want to train a classifier to give me a rating 1-5 that closely matches the gold sta...

Parsing HTML: Adult Classification Systems

I'm research the different and (sometimes obsolete) Ratings/Classification standards used on the web. i.e. PICS, POWDER, ICRA Which standard is the most popular (number of sites using it)? Is there a C# library which will handle any (or all) of these? ...

Detecting an online poker cheat

It recently emerged on a large poker site that some players were possibly able to see all opponents cards as they played through exploiting a security vulnerability that was discovered. A naïve cheater would win at an incredibly fast rate, and these cheats are caught very quickly usually, and if not caught quickly they are easy to detec...

interpreting Naive Bayes results

i start using NaiveBayes/Simple classifier for classification (Weka), however i have some problems to understand while training the data. The data set i'm using is weather.nominal.arff. While i use use training test from the options, the classifier result is : Correctly Classified Instances 13 - 92.8571 % Incorrectly Classif...

Classifier performance on subset of data

I'm using Weka to perform classification on a set of labelled web pages, and measuring classifier performance with AUC. I have a separate six-level factor that is not used in classification, and I'd like to know how well classifiers perform on each level of the factor. What techniques or measures should I use to test classifier performa...