ansaurus

Question

Optimizing SQL to determine unique page views per user

Answer 1

A:

Just some random thoughts:

Can I verify that the thinking behind the unique visit types is:

pageid + userid = user has logged in
pageid + sessionid = user not identified but has cookies enabled
pageid + ip / useragent = user not identified and no cookies enabled

For raw performance, you might consider #2 to be redundant since #3 will probably cover #2 i most conditions (or is #2 important e.g. if the user then registers and then #2 can be mapped to a #1)? (meaning that session id might still be logged, but not used in any visit determination)

IMHO IP will always be present (even if spoofed) and will be a good candidate for an Index. User agent can be hidden and will only have a limited range (not very selectable).

I would use a surrogate primary key in this instance due to the nullable fields and since none of the fields is unique by themselves.

IMHO your idea about storing ALL the visits and then trimming the duplicates via batch out is a good one to weigh up (rather than checking if exists to update vs insert new)

So PK = Surrogate
Clustering = Not sure - another query / requirement might drive this better.
NonClustered Index = IP Address, Page Id (assuming more distinct IP addresses than page id's)

nonnb 2010-08-28 13:05:28

Answer 2

+1 A:

Personally I would not put this in the request-response path. I would log the the raw data in a table (or push it on a queue) and let a background task/thread/cron job deal with that.

The queue (or the message passing table) should then just contain pageid, userip, sessionid, useragen,ip.

Absolute timings are less important now as long as the background task can keep up. since a single thread will now do the heavy lifting it will not create conflicting locks when updating the unique pageviews tables.

Peter Tillemans 2010-08-28 13:06:00

ansaurus

tags:

views:

answers:

Optimizing SQL to determine unique page views per user

related questions