views:

49

answers:

5

I want to create sparklines that illustrate the number of posts created on my blog in the last 2 weeks. To do this, I need to first generate an array that contains the number of posts created on each day during the period in question.

For example, this array:

[40, 18, 0, 2, 39, 37, 22, 25, 30, 60, 36, 5, 2, 2]

generates this sparkline: (I'm using the Googlecharts wrapper around the Google Charts API)

My question is how to create these arrays. Here's what I'm doing now: (I'm using Searchlogic to do the queries, but it should be understandable even if you've never used it)

  history = []
  14.downto(1) do |days_ago|
    history.push(Post.created_at_after((days_ago + 1).day.ago.beginning_of_day).created_at_before((days_ago - 1).days.ago.beginning_of_day).size)
  end

This approach is ugly and slow -- there must be a better way!

+1  A: 

You need to have your data indexed properly or this will never work efficiently. If you're using a granularity of "day" then it pays to have a Date column. You can then use a standard SQL GROUP BY operation to get the values you need directly.

For example, a migration could be done like:

self.up
  add_column :posts, :created_on_date
  add_index :posts, :created_on_date

  execute "UPDATE posts SET created_on_date=created_at"
end

Then retrieval is really fast since it can exercise the index:

def sparkline_data
  self.class.connection.select_values("
    SELECT created_on_date, COUNT(id) FROM posts
      WHERE created_on_date>DATE_SUB(UTC_TIMESTAMP(), INTERVAL 14 DAY)
      GROUP BY created_on_date
  ").collect(&:to_i)
end

Keep in mind if you're potentially missing a day you'll have to account for that by inserting a zero value into your results. The date is returned here, so you should be able to compute the missing values and fill them in. Typically this is done by iterating over a group of days using collect.

When you need to retrieve a thin slice of data quickly, loading instances of the models will always be a huge bottleneck. Often you need to go directly to SQL if there's no simple way to fetch what you need.

tadman
A: 

In addition to tadman's answer, if you have the required administrator access, you may want to investigate partitioning based on date, especially if you receive an extremely high volume of posts per day.

ElectricDialect
+1  A: 

Try this:

n_days_ago, today = (Date.today-days_ago), Date.today

# get the count by date from the database  
post_count_hash = Post.count(:group => "DATE(created_at)", 
             :conditions => ["created_at BETWEEN ? AND ? ", n_days_ago, today])

# now fill the missing date with 0   
(n_days_ago..today).each{ |date| post_count_hash[date.to_s] ||=0 }

post_count_hash.sort.collect{|kv| kv[0]}

Note 1: If you add an index on created_at this method should scale well. If you run in to millions of records each day then you are better off storing the post count by day in another table.

Note 2: You can cache and age the results to improve performance. In my system I typically set the TTL to be 10-15min.

KandadaBoggu
+5  A: 

This will give you a hash mapping dates to post counts:

counts = Post.count(
  :conditions => ["created_at >= ?", 14.days.ago],
  :group => "DATE(created_at)"
)

You can then turn this into an array:

counts_array = []
14.downto(1) do |d|
  counts_array << (counts[d.days.ago.to_date.to_s] || 0)
end
Alex Reisner
+1 for brevity.
KandadaBoggu
The array-creation part is a bit ugly though -- if it weren't for the possibility of 0s, you could do the whole thing in a single line: `Post.count(:conditions => ['created_at >= ?', 14.days.ago], :group => "DATE(created_at)").sort_by{|i| i[0]}.map{|i| i[1]}`. Is there any way to get the `Post.count` call to map days to zero when no posts were created on that day?
Horace Loeb
There's no way to do this without telling the database about the sequence of dates you're interested in, and for sake of simplicity and speed I don't think it's wise to go that route. I was trying to preserve some of your code, but you can clean up the array-ification thusly: `counts_array = (1..14).to_a.reverse.map{ |d| counts[d.days.ago.to_date.to_s] || 0 }`.
Alex Reisner
A: 

The majority of the time spent is doing the 14 database queries that each need to scan every row in the table to check the date (assuming you are not indexing by created_at).

To minimize this, we can do a single database query to grab the relevant rows, and then sort through them.

history = []
14.times { history << 0 }
recent_posts = Post.created_at_after(14.days.ago.beginning_of_day)
recent_posts.each do |post|
  history[(Date.today - post.created_at.to_date).to_i] += 1
end

I also recommend you add an index, like tadman recommended, but in this case to the created_at field in the posts table.

nilbus