tags:

views:

38

answers:

2

Hi, I have a separate table for every day's data which is basically webstats type : keywords, visits, duration, IP, sale, etc (maybe 100 bytes total per record) Each table will have around a couple of million records.

What I need to do is have a web admin so that the user/admin can view reports for different date periods AND sorted by certain calculated values. For example, the user may want the results for the 15th of last month to the 12th of this month , sorted by SALE/VISIT , descending order.

The admin/user only needs to view (say) the top 200 records at a time and will probably not view more than a few hundred total in any one session

Because of the arbitrary date period involved, I need to sum up the relevant columns for each record and only then can the selection be done.

My question is whether it will be possible to have the reports in real time or would they be too slow (the tables are not rarely - if ever - updated after the day's data has been inserted)

Is such a scenario better fitted to indexes or tablescans?

And also, whether a massive table for all dates would be better than having separate tables for each date (there are almost no joins)

thanks in advance!

A: 

You may want to try a different approach. I think Splunk will work for you. It was designed for this, they even do ads on this site. They have a free version you can try.

Ryan Oberoi
Ryan: thanks for suggestion, but it has to be mySQL in this case
Splunk can connect to your database. check out: http://webaj.com/how-connect-splunk-mysql-odbc-database-source.htmSplunk provides the framework for web reports of the style that you are interested in. No use rewriting what is already available. But splunk is a bit heavy on maintenance and configuration. So make the call appropriately.
Ryan Oberoi
+1  A: 

With a separate table for each day's data, summarizing across a month is going to involve doing the same analysis on each of 30-odd tables. Over a year, you will have to do the analysis on 365 or so tables. That's going to be a nightmare.

It would almost certainly be better to have a soundly indexed single table than the huge number of tables. Some DBMS support fragmented tables - if MySQL does, fragment the single big table by the date. I would be inclined to fragment by month, especially if the normal queries are for one month or less and do not cross month boundaries. (Even if it involves two months, with decent fragment elimination, the query engine won't have to read most of the data; just the two fragments for the two months. It might be able to do those scans in parallel, even - again, depending on the DBMS.)

Sometimes, it is quicker to do sequential scans of a table than to do indexed lookups - don't simply assume that because the query plan involves a table scan that it will automatically be bad performing.

Jonathan Leffler
Jonathan: thanks much for your reply. How about if I create summary tables for each month's data and for each year's data etc. If done well, that would mean having to deal with no more than 20 or so of these tables at a time.Am also considering the fragmentation part. thanks
The difficulty with monthly summary tables is your query from 15th May to 12th June - which is not aligned with month ends. If you can define appropriate summary tables, then use them. Otherwise, you are looking at data warehousing, and need to read up on snowflake and star schemas, and fact tables and dimension tables, etc.
Jonathan Leffler
thanks again, jonathan. The scenario you describe (a period between 15th may to 12th june) would mean calculations across 30 or so tables. That is pretty much the worst case scenario and seems much better than having to deal with more tables every month. Is a 30-table max , under the circumstances described earlier by me - going to be very difficult/slow? Will look up snowflake, star schemas etc. thanks