views:

231

answers:

4

Hi guys,

I have been charged with the task of analysing the log table of my company's website. This table contains a user's click path throughout the website for a given session. My company is looking to understand/spot trends based on the 'click paths' of our users. In doing so, identify groups of users that take on a certain 'click path' based on age/geography and so on.

As you can tell from the title, I am completely new to BI and its capabilities so I was wondering:

  1. Are our objectives attainable?
  2. How should I go about doing this?

I am currently reading books online as well as other e-books I have found. All signs seem to suggest this is possible via sequence clustering. Although the exact implementation and tweaks involved are currently lost on me. Therefore, if anyone has first hand experience in such an undertaking, I would be awesome if you could share it here.

Cheers!

A: 

Seems that you can use neural networks for that task. Possibly perceptrons.

I have some experience with neural networks but I'm not an expert.
I strongly recommend the book Programming Collective Intelligence: Building Smart Web 2.0 Applications. Check it out even if you don't know Python.

Nick D
A: 

First off start with a open source or commercial web analytics software package (google up for that), as reading web server log files is non trivial

Some allow mapping data to other tables (your user table with age etc), or blend your own solution to map web session logs with other data

Other than that normal SQL queries will solve your analystics problem e.g.

select user.id 
 from user, log l1, log l2, log l3
 where user.id = l1.userid and l1.type = first step
  and user.id = l2.userid and l2.type = next step
  and user.id = l3.userid and l3.type = last step
  and l1.sessionid = l2.sessionid and l2.sessionid = l3.sessionid

Loading the raw data into a BI framework may not make it much easier. Loading the results of queries like this into a BI framework would make scense

Depending on you web application, you may have trouble identifying actaul sessions if they have long running session id's etc, or changing session id's. If that is an issues you need to roll you web analytics into the actual web server code so you can simulate long running state and record that instead

TFD
+2  A: 

What you're looking for is called Association Rule Mining. I'm not particularly familiar with BI, but I suggest you take a look at Weka which contains several implementations of the Apriori algorithm and its variations.

Amro
+2  A: 

This wont help you with your existing log files... (but it is an alternative, if your search for an answer fails)

Google Analytics is free, and you can set up several custom variables{age,etc} and see where the traffic goes.. ( you wont be able to see what an individual user does.. ) not exactly when u are trying to do but free and can be made to be close to what ur looking for

If you want really good Analytics look into Omniture ( expensive ) but its top notch for building complex website reporting. It is used in many e-commerce scenarios tracking how a user comes in and interacts with site + much much more~

There are plenty of Website analytics out there, before "rolling" your own, look into some of them, they might help you focus in on your own goals.

BigBlondeViking
I'd second this. I looked into some of the BI offereings last year (we didn't wind up using them) and there is a fairly steep learning curve if you want to do it 'properly'.
Paddy