views:

236

answers:

3

This is probably a pretty high-level question that requires a lot of explaining, but I'm in need of a lot of explaining.

Basically I'm developing a PHP application that requires a lot of logging and tracking. Tracking clicks, interactions, performance, etc. etc. Anything under the sun. Facebook's Scribe and Yahoo's Chukwa are both great implementations of this. I know little about log4php.

What I want is a high-level overview of how this kind of logging works, specifically in conjunction with a PHP application. You can stop at the point where the log gets processed; I already know that I want to use Hadoop/Hive for processing and storage.

I'd also like some fairly low-level looks at what happens within the application itself. For example, how does one take the behavior of a click and send that to the logger? I'd appreciate any reading that can help get me started, as well.

+4  A: 

You can buy/get the tools to do this for you or build in-house.

but/get:

1 - Tag your pages with Google/Yahoo analytics - This will track pageviews, page flow performance, SEO ranking for keywords, etc.

2 - For tracking and logging user behavior, which include clicks, interactions and performance. I found nothing better than ClickTale - http://www.clicktale.com/default_e.aspx - It video records user sessions and puts these "log files" in a server.

in-house: 1 - Creating hidden fields in your forms that submits to a logging database also works. You specify unique IDs to forms and keep track of it's actions during submits.

I'm sure there's lots more, but these are the basics. These are not PHP specify though.

HTH

EDIT #1 :

This may be beyond the scope of your question, but tracking doesn't necessarily mean data that goes in-house. An example would be adding a "like it" or "digg it" button to articles or pages. This will "log" popularity for you. You can go to facebook or digg.com to see progress of your site. it'll also help with SEO. basically, it's a tracking system. And it's easy to use. there are PHP snippets out there that you can copy and paste to your code. If you have WordPress, there is a plugin - just look for "digg", "like it" in the plugin search section.

Going back to Google Analytics, if you want to go beyond tracking clicks, go ahead and make goals/funnels. It'll track user behavior, and answer questions such as "What were my most valuable keywords?" "where are all my users dropping off?" "what is the bounce rate for each page?" "what are the top 3 entry points to my site and from what traffic medium?" these are question SEO/SEM managers are most concerned about. and it's definitely good to track and understand.

ClickTale starts where Google Analytics ends. GA will describe user behavior in the page level, but not in the field level. ClickTale, which has heat maps, will answer these questions "I know this page has a high bounce rate, but why? which field is a problem field for my customers?" "At what area of the page do users spend most of their time in?" "how do i prove to the graphics guys that a particular section needs to be redesigned?".

EDIT #2

For high traffic sites, you will need to scale your logging DB. It really helps when it comes to reporting. What I suggest is a 3-tier database reporting structure. tier 1 = last 7 days, tier 2 = last 6 months, tier = everything. You can modify these according to the business. The point being, data moves from one tier to another. keeping fresh data readily available. You want to generate reports asap. A a single huge DB just doesn't scale.

rxn
I'd like to see you add some more to this answer (particularly since you've done a great job so far). I awarded the bounty since time was running out, but I'm holding out on a vote-up/accept because I fully believe you can improve upon this answer.
Josh Smith
First, thank you for the bounty. Second, I'll add some now and some more later (it's getting late where I am). Plz see my edits.
rxn
A: 

You can monitor user clicks by logging the path the user is taking, referrer --> new uri, assuming both are verbose and descriptive enough. For example, if a user clicks on one of his friends you should log the uris:

Referrer: /users/41251
Target: /users/66257

storing them properly for easy querying and reporting. Here a direct click like that would assume the target is in the referrer's page, so is a friend. If you have more complicated scenarios be sure to describe them with distinct uris, eg: /users/suggestion/14152 for a suggested connection.

Add to that timestamps and you have a very rough estimate of how long they stayed on each page, although users tend to lose focus, switch tabs/applications and come back, etc. Google Analytics, for one, does this well.

For a summary of where users click most on your site using heatmaps I like the free (GPL) Clickheat.

Fanis
A: 

Check out Splunk

philfreo