views:

143

answers:

4

To keep this simple, let's say I'm making a basic pageview counter in php that stores the counts for each page in a mysql table. The table had 2 colums: PAGE_ID and COUNT.

I added the following code to every page:

$query = "INSERT INTO table VALUES ('$page_id', '1')
         ON duplicate KEY UPDATE COUNT=COUNT+1";

$result = mysqli_query($cxn, $query);

To make sure each person viewing the page only triggers the counter once I added PHP sessions. Basically if you view a page, the page_id gets stored in a session and the counter php code checks that session before triggering the counter. Worked fine in my own testing.

Some pages were getting too many views, I was suspecting duplicates, so I started logging IPs and User agents. It turns out in about 10% of the cases, the IP triggers the counter for the same page 2-3 times in a few minutes.

First question What could be causing the duplicates? The problem seems to happen mostly with IE8 and Safari but I also have at least one instance of it happening with IE7 and IE6. Any known problem with php sessions? Should I use cookies instead?

Part 2: I modified my table so that it now stores the last unix time stamp and the last IP that triggered the counter.

I want to modify my query so that before it runs the "COUNT=COUNT+1" it checks for the following:

If the current IP is the same as the last stored IP for this page {

     check that it's been at least 5 minutes before doing COUNT=COUNT+1

} else { COUNT=COUNT+1; }

Second question How do I write that in a mysql query, while keeping my "ON duplicate KEY" statement?

I understand this statement wouldn't be 100% accurate but until I can figure out why the session thing doesn't seem to be working, this would work. My website is low traffic and I rarely get more than 1 visitors on the same page within a 5 minutes time frame.

Thanks

A: 

What you're looking for is a CASE statement: http://dev.mysql.com/doc/refman/5.0/en/case-statement.html

An example CASE statement:

SELECT name,
       (CASE WHEN is_happy THEN "Happy!"
        ELSE "sad." END) as happiness 
FROM user_state;

They can be used in UPDATE and INSERT as well.

I suspect that you want to look at the IE stuff, instead. I had similar problems in the past and I ended up having to use javascript to de-bounce clicks. I'm not sure if it was IE users double clicking on things or if it was IE being weird.

The Doctor What
Thanks for your reply. I ran across that CASE statement documentation, but I don't really understand how to apply it. Do you have any tutorials or easy well explained examples of how to use cases in php sql queries?As for the IE click things, in this case the counter code is part of the page. Users don't have to click anything to trigger it, just load the page. In my testings when you load a page for the first time it counts as 1 but if you refresh, it doesn't since the session sotred that page_id the first time around. I doubt people are closing and opening their browser 3 times in a row!
Enkay
Does the example I added above help?
The Doctor What
+1  A: 

1. What could be causing the duplicates?

Some pages were getting too many views, I was suspecting duplicates, so I started logging IPs and User agents. It turns out in about 10% of the cases, the IP triggers the counter for the same page 2-3 times in a few minutes.

There's no way to know if the host requesting the page sits behind a NAT router - to you the request would have the same IP but in reality is a different host. A cookie or session would help you isolate on a per workstation basis, though I wonder when you would write the info to the database.

2. How do I write that in a mysql query, while keeping my "ON duplicate KEY" statement?

I don't see the need for the COUNT = COUNT + 1, because SQL has a COUNT function:

  SELECT page_id,
         COUNT(*) 'num_hits'
    FROM ZZZ_NETWORK
   WHERE page_id = ?
GROUP BY page_id

If you used the following structure for ZZZ_NETWORK:

  • page_id, primary key
  • ip_address, primary key
  • timestamp, primary key

...you wouldn't have to be concerned with duplicate key handling, and it would allow you to know how many hits you got in a specific day, week, month/etc. Including the timestamp would ensure there could never be duplicates.

After a month I'd have thousands of rows in my table if I use one row per page view.

Drive space is dirt cheap, and having a timestamp so you can datamine would be invaluable for reporting. But you don't have to keep all the data either - you could archive it by dumping the data to file so you could retrieve it if necessary.

I don't understand how it would ensure I would never get a duplicate.

A timestamp includes both date (Dec 25/09) and time (07:00:00 AM). Some datetime data types get down to fractions of a second. This makes it close to impossible to have the same date & time for a given page_id with a given IP address - I can't click the refresh button fast enough, even if I wanted to. So records could never be duplicates, because the last of the three columns would have a different value every time (no pun intended).

OMG Ponies
Thanks for your reply. You have to keep in mind this is a low traffic website. 30 pageviews per page a day tops. Same IP, same exact User agent 3 times within 4 minutes makes me believe those are the same person on the same machine triggering the counter more than once. This happening in about now 12% of the cases.My idea was to have 1 row per page_id instead of 1 row per pageview. After a month I'd have thousands of rows in my table if I use one row per page view. What do you think?
Enkay
My duplicates are not happening at the exact same time. They happen within a span of a few seconds to a few minutes. If I did set up the table the way you suggest, I would still have to check that the same page_id and ip combination are not getting registered at a time interval I judge to be too close. I also don't want to make it so that it only counts a pageview once per IP. If you come back tomorrow or in a few hours, I want it to count as another page view. I just don't want to get multiple counts for the same 5 minutes "session".
Enkay
Thanks for the quick update. You're right about drive space being cheap. I was mostly worried about slowing down mysql with thousands of rows. I'm not sure if that's a valid concern or not as I'm not too familar with mysql.Can you elaborate on the timestamp? I don't understand how it would ensure I would never get a duplicate. By duplicate I mean the same IP viewing the same page more than once in a 5 minutes time frame. The way I understand, it would only ensure i don't get duplicates happening at the exact same second, but my duplicates happen over a few minutes. Am I missing something?
Enkay
+2  A: 

I'd suggest you try to see if you can fix the PHP, but ignoring that. You could store the unixtimestamp divided by 300 (IE, in 5 minute intervals)

$query = "INSERT INTO OtherTable VALUES ('$page_id', '$IP_ADDRESS', (UNIX_TIMESTAMP(NOW())/300))";

If records updated is 0 you don't need to update the page count.

MindStalker
Dividing the timestamp into 5 minutes intervals is a goodidea. It's not completely full proof since it depends on when in that 5 minutes time frame the first count it. The way I see it, this would however also require me to use one row per pageview. How could I make it work using one row per page id and a single mysql query?
Enkay
If you made the key Unique per page_id, ip_address that would be significantly smaller. The only real solution if you didn't do this would be to fix your session handling which is another story.
MindStalker
A: 

I think I may found a way to fix my session problem. Actually, I switched from sessions to cookies.

A lot of my pages are getting views through an iframe. The iframe was causing a problem with IE browsers. IE was not able to read cookie data from within an iframe unless I add a p3p header.

I added the p3p header and now it seems to be working correctly. I'm so tired of microsoft and their idea of security. I also have no idea what a p3p is but here's what it looks like in php.

header ( "p3p:CP=\"IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT\"");
Enkay