views:

345

answers:

3

I am in process of gathering information about web analytics tools (like google web analytics) for my next assignment, but I am not able to find any good information. I am looking for:

1. Key terms used

2. What all mediums are available for data collection and How they works.

3. Any reference books, white papers etc (technical and non technical both)

4. Any open source implementation (especially in .NET).

+3  A: 

Here are the key terms used:

  • Hit (internet)
  • Page view
  • Visit / Session
  • First Visit / First Session
  • Visitor / Unique Visitor / Unique User
  • Repeat Visitor
  • New Visitor
  • Impression
  • Singletons
  • Bounce Rate
  • % Exit
  • Visibility time
  • Session Duration
  • Page View Duration / Time on Page
  • Page Depth / Page Views per Session
  • Frequency / Session per Unique
  • Click path

Methods used:

  • Web server logfile analysis
  • Page tagging

Web server logfile analysis

In this method you write script to scrape details out of your log files and then write it to your database. This method will not give you real time statistics. You can read more about web log analysis software here.

Page tagging

Add a code of javascript or just an image and then use the code to get all the dtails about the page, referrr, visitor etc.

...these were images included in a web page that showed the number of times the image had been requested, which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated...

If you are using analytics in your own website, you can use the code provided by Eytan Levit

Credit wikipedia. More information can be found there.

Niyaz
+2  A: 

Well,

I'm no expert, but here is some common data you can retrieve to build you own analytics:

string str;
str += "Refferer:" + Request.UrlReferrer.AbsolutePath.ToString() + "<BR>";
str += "Form data:" + Request.Form.ToString() + "<br>";
str += "User Agent:" + Request.ServerVariables["HTTP_USER_AGENT"] + "<br>";
str += "IP Address:" + Request.UserHostAddress.ToString() + "<BR>";
str += "Browser:" + Request.Browser.Browser + " Version: " + Request.Browser.Version + " Platform: " + Request.Browser.Platform + "<BR>";
str += "Is Crawler: " + Request.Browser.Crawler.ToString() + "<BR>";
str += "QueryString" + Request.QueryString.ToString() + "<BR>";

You can also parse the keyword the user has reached your website from like this:

protected string GetKeywordFromReferrer(string url)
{
    if (url.Trim() == "")
    {
        return "no url";
    }
    string urlEscaped = Uri.UnescapeDataString(url).Replace('+', ' ');
    string terms = "";
    string site = "";

    Match searchQuery = Regex.Match(urlEscaped, @"[\&\?][qp]\=([^\&]*)");
    if (searchQuery.Success)
    {
        terms = searchQuery.Groups[1].Value;
    }
    else
    {
        Match siteDomain = Regex.Match(urlEscaped, @"http\:\/\/(.+?)\/");
        if (siteDomain.Success)
        {
            site = siteDomain.Groups[1].Value;
        }
    }
    if (terms != "")
    {
        return terms;
    }
    if (site != "")
    {
        return site;
    }

    return "Direct Access";

}

Hope this has helped a bit.

Eytan Levit
Good answer Eytan!!!
Niyaz
+1  A: 

1. Key terms used
As with answer 1

2. What all mediums are available for data collection and How they works.
Log files from Apache, IIS. HTTP Handlers for ASP.NET, or your actual page. Javascript includes (the objects available to Javascript give you most information you need about the client)

3. Any reference books, white papers etc (technical and non technical both)
The RFC on HTTP is useful, that gives you most of the request headers that are capturable.

4.Any open source implementation (especially in .NET).

I wrote one that has the parsing part of the analysis done (in my view the hardest part). It needs a bit of tweaking in certain areas as it's 4 years old:

It's missing a DAL, which is harder than it sounds - the main hurdle is making sure you don't replicate the exact data that each row of the log has, as you then may as well just use the log files. The other part is displaying this aggregated data in a nice format. My goal was to have it stored in SQL Server, and also db4o format to cater for smaller websites.

The 'sad' part of the Statmagic project is Google came along and completely wiped out the competition and any point in me finishing it.

Chris S