1. Key terms used
As with answer 1
2. What all mediums are available for data collection and How they works.
Log files from Apache, IIS. HTTP Handlers for ASP.NET, or your actual page. Javascript includes (the objects available to Javascript give you most information you need about the client)
3. Any reference books, white papers etc (technical and non technical both)
The RFC on HTTP is useful, that gives you most of the request headers that are capturable.
4.Any open source implementation (especially in .NET).
I wrote one that has the parsing part of the analysis done (in my view the hardest part). It needs a bit of tweaking in certain areas as it's 4 years old:
It's missing a DAL, which is harder than it sounds - the main hurdle is making sure you don't replicate the exact data that each row of the log has, as you then may as well just use the log files. The other part is displaying this aggregated data in a nice format. My goal was to have it stored in SQL Server, and also db4o format to cater for smaller websites.
The 'sad' part of the Statmagic project is Google came along and completely wiped out the competition and any point in me finishing it.