We keep track of user agent strings in our website. I want to do some statistics on them, to see how many IE6 users we have ( so we know what we have to develop against), and also how many mobile users we have.
So we have log entires like this:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; FunWebProducts)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; FunWebProducts; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0; .NET CLR 2.0.50727)
And ideally, it would be pretty neat to see all the 'meaningful' strings, which would just mean probably strings longer than a certain length. For instance, IU might like to see how many entries have FunWebProducts
in it, or .NET CLR
, or .NET CLR 1.0.3705
-- but I don't want to see how many have a semi-colon. So I'm not necessarily looking for unique strings, but all strings, even sub-sets. So, I would want to see the count of all Mozilla
, knowing that this includes the counts for Mozilla/5.0
and Mozilla/4.0
. It would be nice if there were a nested display for this, starting with the shortest strings, and working its way down. Something perhaps like
4,2093 Mozilla
1,093 Mozilla/5.0
468 Mozilla/5.0 (Windows;
47 Mozilla/5.0 (Windows; U
2,398 Mozilla/4.0
This sounds like a computer science homework. What would this be called? Does something like this exist out there, or do I write my own?