Good source for machine learning datasets in computer vision

views:

151

answers:

+8 Q:

Good source for machine learning datasets in computer vision

What is a good source for datasets that can be used to train machine learning algorithms, specifically image sets for computer vision projects.

The best source I've found so far is: http://archive.ics.uci.edu/ml/index.html

+5 A:

Here's my list which i've maintained for the past five years or so.

Many of the data sources listed her are specifically intended for use in ML algorithms (i.e., the data is divided into a training set and a much smaller test set), others are more general, but are certainly suitable for ML.

I. A few meta-sources:

Swivel, a data search engine
StatLib, a widely used meta-directory of data sources maintained by Carnegie Mellon University
A meta-directory from a leading ML blog, Inductio Ex Machina
A directory of 30 datameta-directories from Flowing Data

II. Some domain-specific databases:

MNIST, the primary resource for handwriting-recognition data
Labeled Faces in the Wild, the primary resource for facial-recognition data
GroupLens, a massive dataset of viewer film preferences
Spambase, a dataset prepared from > 4,500 different email messages, each message parsed against over 50 different attributes
databaseSports.com, (AFAIK) the most comprehensive directory of sports data

III. Some collections directed to particular Machine Learning techniques:

A collection of data by the author of LIBSVM, the leading SVM engine, intended for use in SVM and support-vector regression
Various datasets specifically intended for training/testing Neural Networks

doug 2010-07-18 09:38:52

+2 A:

MNIST Handwritten digit dataset : Many scores of different algorithms included..

http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/index.html

And the much harder 3d object reckognition dataset. Both have much documentation/papers..

http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/index.html

Julian de Wit 2010-07-18 11:56:27

+2 A:

kaggle.com - it's also a good way to benchmak yourself against others.

Anthony Goldbloom 2010-07-18 23:25:29

+1 A:

Caltech's datasets are useful:

http://www.vision.caltech.edu/archive.html

jeff7 2010-07-19 16:23:04

+1 A:

There is a large dataset of birds photos that Berkeley has collected

http://cone.berkeley.edu/app/webroot/dataset/

Mark 2010-07-20 18:06:11

ansaurus

tags:

views:

answers:

Good source for machine learning datasets in computer vision

related questions