views:

151

answers:

5

What is a good source for datasets that can be used to train machine learning algorithms, specifically image sets for computer vision projects.

The best source I've found so far is: http://archive.ics.uci.edu/ml/index.html

+5  A: 

Here's my list which i've maintained for the past five years or so.

Many of the data sources listed her are specifically intended for use in ML algorithms (i.e., the data is divided into a training set and a much smaller test set), others are more general, but are certainly suitable for ML.

I. A few meta-sources:

  • Swivel, a data search engine

  • StatLib, a widely used meta-directory of data sources maintained by Carnegie Mellon University

  • A meta-directory from a leading ML blog, Inductio Ex Machina

  • A directory of 30 datameta-directories from Flowing Data


II. Some domain-specific databases:

  • MNIST, the primary resource for handwriting-recognition data

  • Labeled Faces in the Wild, the primary resource for facial-recognition data

  • GroupLens, a massive dataset of viewer film preferences

  • Spambase, a dataset prepared from > 4,500 different email messages, each message parsed against over 50 different attributes

  • databaseSports.com, (AFAIK) the most comprehensive directory of sports data

III. Some collections directed to particular Machine Learning techniques:

  • A collection of data by the author of LIBSVM, the leading SVM engine, intended for use in SVM and support-vector regression

  • Various datasets specifically intended for training/testing Neural Networks

doug
+2  A: 

MNIST Handwritten digit dataset : Many scores of different algorithms included..

http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/index.html

And the much harder 3d object reckognition dataset. Both have much documentation/papers..

http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/index.html

Julian de Wit
+2  A: 

kaggle.com - it's also a good way to benchmak yourself against others.

Anthony Goldbloom
+1  A: 

Caltech's datasets are useful:

http://www.vision.caltech.edu/archive.html

jeff7
+1  A: 

There is a large dataset of birds photos that Berkeley has collected

http://cone.berkeley.edu/app/webroot/dataset/

Mark