What is a good source for datasets that can be used to train machine learning algorithms, specifically image sets for computer vision projects.
The best source I've found so far is: http://archive.ics.uci.edu/ml/index.html
What is a good source for datasets that can be used to train machine learning algorithms, specifically image sets for computer vision projects.
The best source I've found so far is: http://archive.ics.uci.edu/ml/index.html
Here's my list which i've maintained for the past five years or so.
Many of the data sources listed her are specifically intended for use in ML algorithms (i.e., the data is divided into a training set and a much smaller test set), others are more general, but are certainly suitable for ML.
I. A few meta-sources:
Swivel, a data search engine
StatLib, a widely used meta-directory of data sources maintained by Carnegie Mellon University
A meta-directory from a leading ML blog, Inductio Ex Machina
A directory of 30 datameta-directories from Flowing Data
II. Some domain-specific databases:
MNIST, the primary resource for handwriting-recognition data
Labeled Faces in the Wild, the primary resource for facial-recognition data
GroupLens, a massive dataset of viewer film preferences
Spambase, a dataset prepared from > 4,500 different email messages, each message parsed against over 50 different attributes
III. Some collections directed to particular Machine Learning techniques:
A collection of data by the author of LIBSVM, the leading SVM engine, intended for use in SVM and support-vector regression
Various datasets specifically intended for training/testing Neural Networks
MNIST Handwritten digit dataset :
Many scores of different algorithms included..
http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/index.html
And the much harder 3d object reckognition dataset. Both have much documentation/papers..
kaggle.com - it's also a good way to benchmak yourself against others.