tags:

views:

426

answers:

3

We're trying to figure out how common web tracking software works, like Google Analytics.

We've noticed that much tracking Javascript code from around the web creates multiple cookies on the visitor's system, usually three cookies-- one that expires at the end of the day, one that expires at the end of the week, and one that expires at the end of the month.

Our team has been debating why this is common and we have been tearing our hair out to figure out why one might do this.

The only thing we can think of is performance: this way you can calculate whether a visitor is repeat per day, week, or month without having to do heavy queries on the OLTP database all the time. But we can conceive of ways to make it work anyway.

What are the advantages to creating the tracking cookies in this way, and how do you think they're being used by others?

+1  A: 

They likely are using cookies in this manner to determine the frequency of visits to the domain. If you visit the site and it notes that you still have the day-expiring cookie, then that is significant in terms of your frequency of visitation. If all you have are the weekly and monthly, then it is clear that you haven't visited the site for at least a day, and last within the week.

There is no rule that says that this is the only way to do this. One could track with a single cookie and store statistics on the server.

Demi
Yeah that's what we thought. It must be just for performance.
RibaldEddie
A: 

Very interesting question. I think that this is the solution to the hotel problem. Let's take a look from a DB query perspective. If a single cookie is sent to the user (with expiry date e.g. equal to one year), the number of daily visits for the site would be something like this:

SELECT COUNT(DISTINCT CookieId) FROM Visits 
WHERE VisitDate = '2009-01-01' AND SiteId = 548

With multiple cookie system, you have to store only number of cookies issued per day per site and increment it every time you send a new cookie:

SELECT NoOfVisits FROM Visits 
WHERE VisitDate = '2009-01-01' AND SiteId = 548

This is a clear performance advantage when having hundred of millions of cookies issued each year.

bbmud
A: 

Using 3 cookies each for day, week and month means that the client side is sending you back 3 cookies on each request, which is clearly bad from a network bandwidth and latency perspective.

So, you should clearly have a balancing act... use only one cookie if you care more about user latency than about server cpu usage in the database, use the 3 cookie method if you care more about server cpu usage than user latency.

The real solution is to make an hybrid: use only one cookie and then at the end of every day execute the first query and store the result in a separate table or column, so that your stats interface only has to query that separate result.

winden