I have a file with a sequence of event timestamps corresponding to the times at which someone visits a website:
02.02.2010 09:00:00
02.02.2010 09:00:00
02.02.2010 09:00:00
02.02.2010 09:00:01
02.02.2010 09:00:03
02.02.2010 09:00:05
02.02.2010 09:00:06
02.02.2010 09:00:06
02.02.2010 09:00:09
02.02.2010 09:00:11
02.02.2010 09:00:11
02.02.2010 09:00:11
etc, for several thousand rows.
I'd like to get an idea how the web hits are distributed over time, over the week etc. I need to know how I should scale the (future) web servers in order to guarantee service availability with a given number of nines. In particuler I need to give upper bounds on the number of almost-concurrent visits.
Are there any resources out ther that explain how to do that? I'm fluent in mathematics and statistics, and I've looked at queuing theory but it seems that that theory assumes the rate of arrival to be independent of the time of the day, which is clearly wrong in my case. And NO, histograms are not the right answer since the result depends heavily on bin width and placement.