tags:

views:

46

answers:

1

Hey guys I've got a couple of issues with my code.

  • I was wondering that I am plotting the results very ineffectively, since the grouping by hour takes ages
  • the DB is very simple it contains the tweets, created date and username. It is fed by the twitter gardenhose.

Thanks for your help !

require 'rubygems'
require 'sequel'
require 'gnuplot'

DB = Sequel.sqlite("volcano.sqlite")
tweets = DB[:tweets]

def get_values(keyword,tweets)
        my_tweets = tweets.filter(:text.like("%#{keyword}%"))
    r = Hash.new
    start = my_tweets.first[:created_at]
    my_tweets.each do |t|
     hour = ((t[:created_at]-start)/3600).round
     r[hour] == nil ? r[hour] = 1 : r[hour] += 1
    end

    x = []
    y = []
    r.sort.each do |e|
     x <<  e[0]
     y <<  e[1]
    end
    [x,y]
end

keywords = ["iceland", "island", "vulkan", "volcano"]
values  = {}

keywords.each do |k|
  values[k] = get_values(k,tweets)
end


Gnuplot.open do |gp|
 Gnuplot::Plot.new(gp) do |plot|
  plot.terminal "png"
  plot.output "volcano.png"
  plot.data = []
  values.each do |k,v|
     plot.data <<  Gnuplot::DataSet.new([v[0],v[1]]){ |ds|
       ds.with = "linespoints"
       ds.title = k
    }
  end
 end
end
A: 

This is one of those cases where it makes more sense to use SQL. I'd recommend doing something like what is described in this other grouping question and just modify it to use SQLite date functions instead of MySQL ones.

Mike Buckbee