views:

247

answers:

5

Assuming I have an open source web server or proxy I can enhance, let's say apache or squid.

Is there a way to determine the time each client spends on a web page?

HTTP is of course stateless, so it's not trivial, but maybe someone has an idea on how to approach this problem?

Thanks.

+4  A: 

Not without having some javascript constantly hit your server on the client side and then checking when it stops (but of course that assumes the user has javascript enabled). There are also various (ugly) ways to detect windows being closed with javascript, but of course these won't always trigger. eg. browser crash.

I sort of wonder why you want this anyway. What if a person looks at the web page for 3 seconds, gets distracted by another tab/window but leaves your page open for 2 hours? The answer you get is 2 hours, the answer you (probably) want is 3 seconds.

Chris Young
+4  A: 

With Apache or Squid you hardly can detect the time a user spends on your page.

But with some additional sugar on your webpage you can:

It's free and has a lot of functions.

But you'll also invite Google to watch the stats of your site ... (but maybe that helps them to decide if you wanna buy you :-))

Andre Bossard
We use Urchin, which was bought by Google. It's Google Analytics that you can install on your server. It's quite pricy, though, but good if you can't send data to Google for some reason.
Thomas Owens
+2  A: 

You could count the time between when the page was requested to when the next page is requested, however this would only be correct if the user stayed on that page the whole time til he requested the next page. Even then he may still be on the original page (e.g. he opened the new one in a tab), and will only work if they do browse to another page.

The only way to know for sure would be to use Javascript to ping the server from the open page every ten seconds or so, just to say "I'm still being read!"

Vincent McNabb
+2  A: 

I've actually seen javascript analytics packages where they not only tracked how long you were on the page, by pinging the server every so often, but also kept track of exactly what was on the screen. by measuring the size of your browser window, along with the scroll positions of the document, they were able to determine exactly how long each element was on the screen. By tracking the location of the mouse, can probably get a good guess at what they are looking at too. I can't find the link right now, but here's the short story. If you are really interested in what people are looking at, and for how long, you can do it. There's not much of a limit to how much you can track.

Also, just a thought, If you don't want to ping the server too much, you could keep stuff buffered in memory, and only send to the server when you got a sufficient amount of data, or right before the page closed.

Kibbee
A: 

This kind of metric was actually pretty popular several years ago, before PCs got more powerful and tabbed browsers became popular, and it became harder to measure as accurately. The standard way to do it in the past was to assume people are usually just loading one page at a time, and just use server log data to determine the time between page views. Your standard analytics vendors like Omniture and Urchin (now Google Analytics) calculate this.

Normally, you set a tracking cookie to be able to identify a specific person/browser over time, but in the short term you can just use an IP address/user-agent combo.

So, basically you just crunch the log data and count the delta between to page views as how long the person was on the page. You set some rules (or your analytics vendor does this behind the curtain) like discarding/truncating times beyond some cutoff (say 10 minutes) where you assume the person wasn't actually reading but left the page open in a window/tab.

Is this data perfect? Obviously not. But you just need enough "good enough" data to do statistical analysis and draw some conclusions.

It's still useful for longitudinal analysis (readers' habits over time) and qualitative comparison between different pages on your site. (i.e. between two 700-word articles, if one has a mean reading time twice as long as the other, then more people are actually reading the first article.) Of course, your site has to be busy enough to have enough data points for statistically sound analysis after you throw out all the "bad" outlier data points.

Yes, you could use Javascript to send keep-alives to improve the data. You could just poll at given intervals after document.onload or set mouseover events on sections of your pages.

Another technique is to use Javascript to add an onclick event to every <a href> that hits your server. Not only do you then know when someone clicks a link to take them off your site, really sophisticated "hotspot" analysis looks at the fact that if someone clicked a link 6 paragraphs down a page, then they must have read that far.

joelhardi