views:

98

answers:

2

In my project I would like to select records from my database and then group them together by those that occurred at a similar time to the latest record within a certain time range.

For example, with a 1 hour time range. If a user created 3 posts between 4:30pm and 5:15pm, 2 posts between 1:15pm and 1:30pm, and 1 post at 10:00am I would like the a structure like the following to be created:

user.posts.find(:all).group_by do |post|
  # (posts have a created_at column containing both a date and time)
  # Algorithm here
end

Result:

[
 [Tue March 31 5:15pm, [post6,post5,post4]]
 [Tue March 31 1:30pm, [post3,post2]]
 [Tue March 31 10:00am, [post1]]
]

Any thoughts on the best algorithm to do this? Pseudocode is fine too if you don't know Ruby.

Edit: Thanks Joel. Here's the code I ended up using (feeds instead of posts):

  def aggregate(feeds, timeLimit)
    return [] if feeds.blank?
    result = []
    bin = []
    feeds = feeds.sort_by { |f| -f.created_at.to_i }
    bin_time = feeds.first.created_at
    feeds.each do |feed|
      if (bin_time - feed.created_at) < timeLimit
        bin << feed
      else
        result << [bin_time, bin]
        bin_time = feed.created_at
        bin = [feed]
      end
    end
    result << [bin_time, bin]
    result
  end
A: 
if post.created_at - group_start > limit
    output current group if non-empty
    set group to current post
    set group_start to post.created_at
else
    add post to current group

then, outside the loop, output the current group if non-empty. Adjust the if condition depending on the order you visit the posts.

dwc
+1  A: 

The basic concept is pretty simple, accumulate posts into bins, then when a time is outside the range, start a new bin. Here is a Python version:

posts = [('post6', 1715), ('post5', 1645), ('post4', 1630)
    , ('post3', 1330), ('post2', 1315), ('post1', 1000)
    ]

rslt = []
bin = []
binTime = 1 << 31
for postData, postTime in posts:
    if (postTime >= binTime - 100):
        bin.append(postData)
    else:
        if bin:
            rslt.append([binTime, bin])
        binTime = postTime
        bin = [postData]

if bin:
    rslt.append([binTime, bin])

print rslt
Joel