I need to implement a custom-developed web analytics service for large number of websites. The key entities here are:
- Website
- Visitor
Each unique visitor will have have a single row in the database with information like landing page, time of day, OS, Browser, referrer, IP, etc.
I will need to do aggregated queries on this database such as 'COUNT all visitors who have Windows as OS and came from Bing.com'
I have hundreds of websites to track and the number of visitors for those websites range from a few hundred a day to few million a day. In total, I expect this database to grow by about a million rows per day.
My questions are:
1) Is MySQL a good database for this purpose?
2) What could be a good architecture? I am thinking of creating a new table for each website. Or perhaps start with a single table and then spawn a new table (daily) if number of rows in an existing table exceed 1 million (is my assumption correct). My only worry is that if a table grows too big, the SQL queries can get dramatically slow. So, what is the maximum number of rows I should store per table? Moreover, is there a limit on number of tables that MySQL can handle.
3) Is it advisable to do aggregate queries over millions of rows? I'm ready to wait for a couple of seconds to get results for such queries. Is it a good practice or is there any other way to do aggregate queries?
In a nutshell, I am trying a design a large scale data-warehouse kind of setup which will be write heavy. If you know about any published case studies or reports, that'll be great!