views:

58

answers:

2

Hello,

Our aim is to build timelines saying about periods of time when user was online. (It really doesn't matter what user we are talking about and where he was online) To get information about onliners we can call API method, someservice.com/api/?call=whoIsOnline

whoIsOnline method will give us a list of users currently online. But there is no API method to get information about who IS NOT online.

So, we should build our timelines using information we got from whoIsOnline. Of course there will be a measurement error (we can't track information in realtime). Let's suppose that we will call whoIsOnline method every 2 minutes (yes, we will run our script by cron every 2 minutes).

For example, calling whoIsOnline at 08:00 will return

Peter_id
Michal_id
Andy_id

calling whoIsOnline at 08:02 will return

Michael_id
Andy_id
George_id

As you can see, Peter has gone offline, but we have new onliner - George.

Available instruments are Db(MySQL) / text files / key-value storage (Redis/memcache); feel free to choose any of them (or even all of them).

So, we have to get information like this

George_id was online...
12 May: 08:02-08:30, 12:40-12:46, 20:14-22:36 
11 May: 09:10-12:30, 21:45-23:00
10 May: was not online

And now question...

  1. How would you store information to implement such timelines?
  2. How would you query/calculate information about periods of time when user was online?

Additional information..

  1. You cannot update information about offline users, only users who are "currently" online.
  2. Solution should be flexible: timeline information could be represented relating to any timezone.
  3. We should keep information only for last 7 days.
  4. Every user seen online is automatically getting his own identifier in our database.

Uff.. it was really hard for me to write it because my English is pretty bad, but I hope my question will be clear for you.

Thank you.

+1  A: 

There are two distinct ways of measuring "online" status:

  1. Make the assumption that when someone clicks a page that they are online for some notional interval after that eg 5 minutes. So if they click on a page a 4:03, 4:05 and 4:09 they will have an online interval of 4:03 to 4:09-4:14 (depending on what algorithms/assumptions you make regarding the final click); or

  2. Use a "heartbeat" Javascript and/or Flash process to keep track on how often pages are open.

(1) is more common. (2) often leaves people leery at the privacy issues. Another way of viewing this is that (1) is passive monitoring whereas (2) is active monitoring.

There are many variations of (1). The interval may not be fixed. It may vary depending on what page the user is on. This can be built from assumptions or statistical sampling or even just probabilistic models. In the simplest case "who is online?" is simply the list of users who have clicked something in the last 5 (for example) minutes, which is easy to figure out (since you're recording every page view).

cletus
A: 

I would do something like this (pseudocode)

UPDATE session_log SET last_online = NOW() WHERE user = ... AND last_online = '10 minutes ago'
IF NOT UPDATED:
    INSERT INTO session_log (last_online, user) VALUES(NOW(), user)

There should be a created at (or similar) column too ofcourse, but that way you can easily keep track of sessions.

WoLpH