views:

99

answers:

2

Hello all,

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).

I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:

  1. How do I get the information?

    In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.

    I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)

  2. How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.

  3. How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:

    reuters <- tm_map(reuters, removeWords, stopwords("english"))

    Would this do the trick? I should I do something else/more ?

    Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).

  4. And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?

I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!

Best,

Tal

+1  A: 

As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.

Dan
+4  A: 
harshsinghal
Wow, wonderful answer Harshsinghal. Thank you!
Tal Galili