machine-learning

What machine learning benchmarks are out there?

What repositories for machine learning benchmarks do you know? ...

How do I group objects in a set by proximity?

I have a set containing thousands of addresses. Assuming I can get the longitude and latitude of each address, how would I go about splitting the set into groups by proximity? Further, I may want to retry the 'clustering' according to different rules: N groups M addresses per group maximum distance between any address in a group Ad...

What's the best open-source Java Bayesian spam filter library?

In other answers at Stackoverflow it's been suggested that Weka is good, but there are others (Classifier4j, jBNC, Naiban). Does anyone have actual experience with these? ...

Do People Actually Use Machine Learning?

I'm playing around with machine learning in an academic setting, and it's really fun. I'm wondering how machine learning algorithms such as Support Vector Machines make it into software applications. Do people actually use machine learning algorithms? Do you use them because it's part of a spec written by someone else, or are they more o...

Best approach to what I think is a machine learning problem

Hello.. I am wanting some expert guidance here on what the best approach is for me to solve a problem. I have investigated some machine learning, neural networks, and stuff like that. I've investigated weka, some sort of baesian solution.. R.. several different things. I'm not sure how to really proceed, though. Here's my problem. ...

What are some economically important applications of machine learning?

Apologies in advance if this is too vague. My list so far: statistical arbitrage actuarial science manufacturing process control image processing (security, manufacturing, medical imaging) computational biology/drug design sabermetrics yield management operations research/logistics (I'll include business intelligence wit...

How to get started on Information Extraction?

Could you recommend a training path to start and become very good in Information Extraction. I started reading about it to do one of my hobby project and soon realized that I would have to be good at math (Algebra, Stats, Prob). I have read some of the introductory books on different math topics (and its so much fun). Looking for some gu...

Binarization in Natural Language Processing

Binarization is the act of transforming colorful features of of an entity into vectors of numbers, most often binary vectors, to make good examples for classifier algorithms. If we where to binarize the sentence "The cat ate the dog", we could start by assigning every word an ID (for example cat-1, ate-2, the-3, dog-4) and then simply r...

Overwhelmed by Machine Learning---is there an ML101 book?

It seems like there are so many subfields linked to Machine Learning. Is there a book or a blog that gives an overview of those different fields and what each of them do, maybe how to get started, and what background knowledge is required? ...

Are evolutionary algorithms and neural networks used in the same problem domains?

I am trying to get a feel for the difference between the various classes of machine-learning algorithms. I understand that the implementations of evolutionary algorithms are quite different from the implementations of neural networks. However, they both seem to be geared at determining a correlation between inputs and outputs from a...

Shared user-choice ruleset application

A large chunk of the questions on this site are something like, "Is there a tool to convert jQuery to Prototype?" or, "Is there a way to find dead methods in my PHP codebase?". Aside from the typical answer of, "You shouldn't do that" which is entirely unhelpful, I'm looking for a framework to support these types of tasks. Here is how ...

Machine learning challenge: learn english pronunciation

Say you want to take CMU's phonetic data set input that looks like this: ABERRATION AE2 B ER0 EY1 SH AH0 N ABERRATIONAL AE2 B ER0 EY1 SH AH0 N AH0 L ABERRATIONS AE2 B ER0 EY1 SH AH0 N Z ABERT AE1 B ER0 T ABET AH0 B EH1 T ABETTED AH0 B EH1 T IH0 D ABETTING AH0 B EH1 T IH0 NG ABEX EY1 B EH0 K S ABEYANCE AH0 B EY1 AH0 N S (The w...

Neural networks - obsolete?

According to an answer from here, artificial neural networks are obsoleted by Support Vector Machines, Gaussian Processes, generative and descriptive models. What is your opinion? ...

Realistic time estimates for progress bars etc.

I know I am not the only one who does not like progress bars or time estimates which give unrealistic estimates in software. Best examples are installers which jump from 0% to 90% in 10 seconds and then take an hour to complete the final 10%. Most of the time programmers just estimate the steps to complete a task and then display curren...

Can I use arbitrary metrics to search KD-Trees?

I just finished implementing a kd-tree for doing fast nearest neighbor searches. I'm interested in playing around with different distance metrics other than the Euclidean distance. My understanding of the kd-tree is that the speedy kd-tree search is not guaranteed to give exact searches if the metric is non-Euclidean, which means that I ...

Intelligent code-completion? Is there AI to write code by learning?

I am asking this question because I know there are a lot of well-read CS types on here who can give a clear answer. I am wondering if such an AI exists (or is being researched/developed) that it writes programs by generating and compiling code all on it's own and then progresses by learning from former iterations. I am talking about wo...

What are some ways to have fun with a large amount of data? (ie, the Twitter, del.icio.us etc. APIs)

Twitter, Google, Amazon, del.icio.us etc. all give you a lot of data to play with, all for free. There's also a lot of textual data available through initiatives like Project Gutenberg. And that, it seems, is just the tip of the iceberg. I have been wondering how you could use this data for fun. I'm a first year IT student, so I have no...

Correcting a known bias in collected data

Ok, so here is a problem analogous to my problem (I'll elaborate on the real problem below, but I think this analogy will be easier to understand). I have a strange two-sided coin that only comes up heads (randomly) 1 in every 1,001 tosses (the remainder being tails). In other words, for every 1,000 tails I see, there will be 1 heads. ...

Good implementations of reinforced learning?

For an ai-class project I need to implement a reinforcement learning algorithm which beats a simple game of tetris. The game is written in Java and we have the source code. I know the basics of reinforcement learning theory but was wondering if anyone in the SO community had hands on experience with this type of thing. What would your ...

Naive bayes calculation in sql

I want to use naive bayes to classify documents into a relatively large number of classes. I'm looking to confirm whether an mention of an entity name in an article really is that entity, on the basis of whether that article is similar to articles where that entity has been correctly verified. Say, we find the text "General Motors" in a...