views:

1058

answers:

5

I'm working on a project which is similar in nature to website visitor analysis. It will be used by 100s of websites with average of 10,000s to 100,000s page views a day each so the data amount will be very large.

Should I use a single table with websiteid or a separate table for each website?

Making changes to a live service with 100s of websites with separate tables for each seems like a big problem. On the other hand performance and scalability are probably going to be a problem with such large data. Any suggestions, comments or advice is most welcome.

+7  A: 

How about one table partitioned by website FK?

vartec
Just saying I agree with this, horizontal partitioning on PK/FK.
thr
Thanks, I'm checking this option
Nir
+1  A: 

I would say use the design that most makes sense given your data - in this case one large table.

The records will all be the same type, with same columns, so from a database normalization standpoint they make sense to have them in the same table. An index makes selecting particular rows easy, especially when whole queries can be satisfied by data in a single index (which can often be the case).

Note that visitor analysis will necessarily involve a lot of operations where there is no easy way to optimise other than to operate on a large number of rows at once - for instance: counts, sums, and averages. It is typical for resource intensive statistics like this to be pre-calculated and stored, rather than fetched live. It's something you would want to think about.

thomasrutter
Thanks! anyone know of a good place to read about such systems and their architecture?
Nir
Well, StackOverflow's can be pretty good if you want to have a bit of a search. Also mysqlperformanceblog.com I think is good, though again you may have to search around a bit. Hard to recommend something, you could try asking another question I guess.
thomasrutter
+1  A: 

If the data is uniform, go with one table. If you ever need to SELECT across all websites having multiple tables is a pain. However if you write enough scripting you can do it with multiple tables.

You could use MySQL's MERGE storage engine to do SELECTs across the tables (but don't expect good performance, and watch out for the Windows hard limit on the number of open files - in Linux you may haveto use ulimit to raise the limit. There's no way to do it in Windows).

I have broken a huge table into many (hundreds) of tables and used MERGE to SELECT. I did this so the I could perform off-line creation and optimization of each of the small tables. (Eg OPTIMIZE or ALTER TABLE...ORDER BY). However the performance of SELECT with MERGE caused me to write my own custom storage engine. (Described http://blog.coldlogic.com/categories/coldstore/'>here)

Dave Pullin
A: 

Use one table unless you have performance problems with MySQL.

Nobody here cannot answer performance questions, you should just do performance tests yourself to understand, whether having one big table is sufficient.

stepancheg
+1  A: 

Use the single data structure. Once you start encountering performance problems there are many solutions like you can partition your tables by website id also known as horizontal partitioning or you can also use replication. This all depends upon the the ratio of reads vs writes.

But for start keep things simple and use one table with proper indexing. You can also determine if you need transactions or not. You can also take advantage of various different mysql storage engines like MyIsam or NDB (in memory clustering) to boost up the performance. Also caching plays a very good role in offloading the load from the database. The data that is mostly read only and can be computed easily is usually put in the cache and the cache serves the request instead of going to the database and only the necessary queries go to the database.

Faisal Feroz