tags:

views:

255

answers:

5

I'm building a toplist for my site and instead of recording in and out hits and emptying the database every 24 hours I would like to record in and out hits "within the last 24 hours", this way the toplist will always look busy no matter what time you look at it, while still acurately ranking sites.

So basically I'll have to add a record to the database logging an 'in' hit (for example) and the remove it around 24 hours later, or simply not consider it when ranking the sites.

I want to know the most efficient way of doing this, I have some ideas but they all seem really resource heavy. I hope you understand what I'm trying to achieve :) Any help would be appreciated.

A: 

Write a Stored Procedure that deletes records older than 24 hours. Then write a trigger that runs on every INSERT statement and calls the SP.

jao
A: 

you could store the timestamp with each "hit" and then call a query like

 $time = time()-86400;
 mysql_query("DELETE FROM xxx WHERE timestamp < $time");

or you could same thing within the SELECT statement, depends on if you still need the hits afterwards, etc

Flo
+3  A: 

You don't necessarily have to delete "old" data all the time. You can also limit the data set to records within the last 24 hours when querying data.

WHERE
  site='xyz'
  AND hit_datetime > Now()-Interval 24 hour

see http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html

VolkerK
That's obviously the way to go here, imho.
n3rd
It is a good solution, but if you have like a lot of traffic i would consider the one from vilx- as it might build a huge table with time
Flo
define "a lot of traffic". If you always delete old records you hit the index and data file a lot which isn't exactly a performance boot. If you do it in intervals you need something like this WHERE-clause anyway to get accurate results anyway and the gain of INSERT/SELECT speed because of the decreased amount of data is most likely negligible probably even "eaten" by the time the DELETE operation took. If storage space is an issue (if!) you can delete old records from time to time, every n days, when the oil price is falling, when your neighbour is abducted by aliens (again)...
VolkerK
Jea performance will make no difference, and after thinking more about it you might be right, if you have so many hits that storage could get a problem you will most likely need a real server (=lots of space) anyways cause a small webspace or some could not take the load so it doesnt matter. Keeping it might probably be nice for stats history or some.
Flo
A: 

If the time-constraint is not really hard (e.g. you'll loose money or are really annoying your users if the data is kept in the the db longer than 24 hours), I'd use use PHP's register_shutdown_function like this:


function cleanup() {
  // open db-connection etc.
  $query = 'DELETE FROM <yourtable> ' .
               'WHERE UNIX_TIMESTAMP(<timstampfield>) < ' . (time() - 86400);
  mysql_query($query);
  // close connection
}

register_shutdown_function('cleanup');

The above code assumes, <timestampfield> is of one of the the MYSQL-date-datatypes (TIMESTAMP, DATE, DATETIME).

Argelbargel
+86400 must be -86400 otherwise everything will be gone
Flo
flo, of course! fixed it :-/
Argelbargel
+1  A: 

I would delete the data that is older than 24 hours with a simple

DELETE...WHERE hit_time < now() - interval 24 hour

The other question is - when to call it? The tradeoff is between performance and stale data. The more often you call it, the less "stale" data there will be, but the server load will grow.

I see several approaches, pick one that suits your needs most:

  • Call it at the start of every script. This can be optmized by calling it only if the script will do something with the hit data. That way every script will always run with "correct" data. However this will have the maximum load.
  • Schedule a cron job and call it once every 1h/2h/24h/etc. This way there will be a little bit of "stale" data, but the overhead will be reduced to a minimum.
  • Do it like PHP does it with sessions - on every script startup give it a x% (x is configurable) chance of being run. That is, take a value from 0 to 100, and if it is smaller than x, execute the DELETE.

You can also invent some other schemes - for example, run it once per user session; or run it only if the execution time is evenly divisable by, say, 7; or something else. Either way you trade off performance for correctness.

Vilx-