classification

Algorithm for best-effort classification of vector

Given four binary vectors which represent "classes": [1,0,0,0,0,0,0,0,0,0] [0,0,0,0,0,0,0,0,0,1] [0,1,1,1,1,1,1,1,1,0] [0,1,0,0,0,0,0,0,0,0] What methods are available for classifying a vector of floating point values into one of these "classes"? Basic rounding works in most cases: round([0.8,0,0,0,0.3,0,0.1,0,0,0]) = [1 0 0 0 0 0 0...

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm?

Is Latent Semantic Indexing (LSI) a Statistical Classification algorithm? Why or why not? Basically, I'm trying to figure out why the Wikipedia page for Statistical Classification does not mention LSI. I'm just getting into this stuff and I'm trying to see how all the different approaches for classifying something relate to one another...

Know any good c++ support vector machine (SVM) libraries ?

Hey everyone, Do you know of any good c++ svm libraries out there ? I tried libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) but so far I'm not flabbergasted (no documentation, or close to none). I have also heard of SVMLight and TinySVM. Have you tried them ? Any new players ? Thanks ! JC ...

Question About Using Weka, the machine learning tool

I'm using the explorer feature of Weka for classification. So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}). Sample: @RELATION summary @ATTRIBUTE feature1 NUMERIC @ATTRIBUTE feature2 NUMERIC @ATTRIBUTE class {1,0} @DATA 23,11,0 20,100,1 2,36,0 98,8,1 ..... I load this .arff file,...

read url entered in a browser's address bar using JAVA

i want to write a java application that will classify url into malicious and benign. ie when the user types a url in the address bar, my program should read that url , classify it and block it if it is malicious. how do i read the url from address bar of a browser once the user has entered it.. please help.. Thanks ...

online url classifier

I want to write an online application that: reads the URL from address bar of the browser extracts its lexical features (like n-grams) extracts its host based features (fetch DNS records online, its A, PTR, TTL fields) classify the URL into malicious or benign (using machine learning) Can anyone help me with 1 and 3? ...

Does anyone know of any standards for the classification of software?

I have, of course, tried Google/Bing and have found one or two classifications for specific industries, but nothing general. The sort of thing I'm looking for is: General Office Tools -> Wordprocessing -> Word Utilities -> File Management -> Compression -> Winzip Waht I am after is a Standar that has been issued by some organization, or...

Image Classification - Detecting Floor Plans

I am working on a real estate website and i would like to write a program that can figure out(classify) if an image is a floor plan or a company logo. Since i am writing in php i will prefer a php solution but any c++ or opencv solution will be fine as well. Floor Plan Sample: Logo Sample: ...

Ruby, why FeedNormalizer usage breaks Classifier::CRM114

Hi, Just learning Ruby and found something bizarre (at least for ansi-c programmer). Having Mac OS X 10.6.2, ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0], feed-normalizer 1.5.1 and crm114 1.0.3 require 'rubygems' require 'crm114' require 'feed-normalizer' #FeedNormalizer::FeedNormalizer.parse open("http://news.google.c...

Adaboost algorithm and its usage in face detection

I am trying to understand Adaboost algorithm but i have some troubles. After reading about Adaboost i realized that it is a classification algorithm(somehow like neural network). But i could not know how the weak classifiers are chosen (i think they are haar-like features for face detection) and how finally the H result which is the fin...

How does music fingerprinting work (for sites such as Shazam and Lala.com)?

My large (120gb) music collection contains many duplicate songs, and I've been trying to fingerprint tracks in the hopes of detecting duplicates. And since I'm a CS Major I'm very curious as to what is done out there? Nothing I do has nearly the accuracy of something like Shazam or Lala.com. How do they "hash" tracks? I have run a standa...

C++ library/framework,API for Mixture models in machine learning

I want to use Gaussian mixture models for data clustering ( using an expectation maximization (EM) algorithm, which assigns posterior probabilities to each component density with respect to each observation ) . Is there a c++ library which has Gaussian mixture models implemented alongwith sample dataset and examples? ...

Aggregating automatically-generated feature vectors

Hi all, I've got a classification system, which I will unfortunately need to be vague about for work reasons. Say we have 5 features to consider, it is basically a set of rules: A B C D E Result 1 2 b 5 3 X 1 2 c 5 4 X 1 2 e 5 2 X We take a subject and get its values for A-E, then try matching the rules in sequenc...

How to let users specify multiple-level categories in Excel?

I'm developing a kind of template with Excel 2007. Users will use it to create records which fall in 3-level categories. A user should be able to create a new category, specify an existing category or not specify one. A record may belong to multiple categories. I'm wondering what my best choice is to present the category structure to us...

C#: Is there a way to classify enums?

Given the following enum: public enum Position { Quarterback, Runningback, DefensiveEnd, Linebacker }; Is it possible to classify the named constants, such that I could mark 'Quarterback' and 'Runningback' as offensive positions and 'DefensiveEnd' and 'Linebacker' as defensive positions? ...

List of proper names?

I'm trying to filter names out of text blobs. Currently I'm just generating a words list and filtering it by hand but I've got ~8k words to go so I'm looking for a better way. I could grab a dictionary and filter them out but that would cull names like smith and cliff. What I need is either of the following: a list of common names (I'...

blindly classifying new trends in incoming data

how do news outlets like google news automatically classify and rank documents about emerging topics, like "obama's 2011 budget"? i've got a pile of articles tagged with baseball data like player names and relevance to the article (thanks, opencalais), and would love to create a google news-style interface that ranks and displays new po...

In java - Grouping similar values

Hi All, First of all,thanks for reading my question. I used TF/IDF then on those values, I calculated cosine similarity to see how many documents are more similar. You can see the following matrix. Column names are like doc1, doc2, doc3 and rows names are same like doc1, doc2, doc3 etc. With the help of following matrix, I can see that...

How to approach machine learning problems with high dimensional input space?

How should I approach a situtation when I try to apply some ML algorithm (classification, to be more specific, SVM in particular) over some high dimensional input, and the results I get are not quite satisfactory? 1, 2 or 3 dimensional data can be visualized, along with the algorithm's results, so you can get the hang of what's going on...

Classifying type samples from image files

Which approach would you suggest for automatically classifying type found in images? The samples are likely large, with black text on a white background. The categories are defined here, with some examples on each (Google Books link): http://bit.ly/9Mnu7P This is an extended version of the VOX-ATypI classification system. My initial t...