views:

321

answers:

7

Twitter, Google, Amazon, del.icio.us etc. all give you a lot of data to play with, all for free. There's also a lot of textual data available through initiatives like Project Gutenberg. And that, it seems, is just the tip of the iceberg.

I have been wondering how you could use this data for fun. I'm a first year IT student, so I have no knowledge of statistics, machine learning, collaborative filtering etc. My interest in this area was piqued by the book Programming Collective Intelligence by Toby Segaran, and now I want to take a deeper look at what you can do with data. I don't know where to start. Any ideas?

I have also been pondering whether I should go and buy something like Paradigms of Artificial Intelligence Programming. Is it worth the trip across the city?

+6  A: 

Try firing books in different styles from Guttenberg through a Markov Chain generator - there's one in Perl here to get you started.

anon
Beat me to it. Markov machines are great fun :)
Robert Gould
Post the results to a Twitter account through its API and see how many followers you can get based on your machine generated Tweets.
Joe Holloway
I like it! But most twits seem to be poorly programmed AIs anyway. I have grave doubts about their ability to pass the Turing Test.
anon
Well, you won't pass the Turing Test, but you could still have some fun by seeing how many "social media entrepreneurs" you can get to follow your nonsensical bot. What if instead of using Project Gutenberg you just used other Tweets to train your bot?
Joe Holloway
+3  A: 

Visualizations, do them, share them.

jfar
A: 

You can make puzzles like hangman games. Or a mashup or try Yahoo pipes to join information.

Robert Gould
+1  A: 

You can use some of that data to make money (if you're really good!) http://www.netflixprize.com/ Netflix has made available an anonymized dataset, and are asking for better algorithms to predict customer choices.

sep332
Yeah I was going to suggest this. I am working on this right now.
James Van Boxtel
A: 

Predict future stockmarket trends from the data. Profit!

timday
Easier said then done.
Chris S
+1  A: 

If you're familiar with Python try playing around with the nltk. It has tons of libraries for text mining and even machine learning in general. Try working your way through nltk book.

theycallmemorty
+1  A: 

If you want to start off with a easy AI problem, you might try clustering.

http://en.wikipedia.org/wiki/Data_clustering

You could use it to group flickr images together by tag or something cool like that.

James Van Boxtel