views:

495

answers:

5

I'm considering a design for a private messaging system and I need some input here, basically I have several questions regarding this. I've read most of the related questions and they've given me some thought already.

All of the basic messaging systems I've thus far looked into use a single table for all of the users' messages. With indexes etc this approach would seem fine.

What I wanted to know is if there would be any benefit to splitting the user messages into separate tables. So when a new user is created a new table is created (either in the same or a dedicated message database) which stores all of the messages - sent and received -for that user.

What are the pitfalls/benefits to approaching things that way? I'm writing in PHP would the code required to write be particularly more cumbersome than the first large table option? Would the eventual result, with a large amount of smaller tables be a more robust, trouble free design than one large table? In the event of large amounts of concurrent users, how would the performance of the server compare where dealing with one large versus many small tables?

Any help with those questions or other input would be appreciated. I'm currently working through a smaller scale design for my test site before rewriting the PM module and would like to optimise it. My poor human brain handles separate table far more easily, but the same isn't necessarily so for a computer.

Many thanks!

+1  A: 

Creating one table per user certainly won't scale well when there are a large number of users with a small number of messages. The way MySQL handles table opening/closing, very large numbers of tables (> 10k, say) become quite inefficient, especially at server startup and shutdown, as well as trying to backup non-transactional tables.

However, the way you've worded your question sounds like a case of premature optimisation. Make it work first, then fix performance problems. This is always the right way to do things.

Partitioning / sharding will become necessary once your scale gets high enough. But there are a lot of other things to worry about in the mean time. Sort them out first :)

One table is the right way to go from an RDBMS PoV. I recommend you use it until you know better.

MarkR
Hmm again an interesting point, as it's my thoughts about large amounts or users that I'm concerned about. So it would perhaps seem a little silly of me to consider thousands of open tables a better approach than 1 (or several) large ones.
TooManyCooks
A: 

Splitting large amounts of data into smaller sets makes sense if you're trying to avoid locking issues: for example - locking the messages table - doing big selects or updating huge amounts of data at once. In this case long running queries could block whole table and everyone needs to wait... You should ask yourself if this going to happen in your case? At least for me it looks like messaging system is not going to have such things because all information is being pushed into table or retrieved from it in rather small sets. If this is a user centric application - so, for example, getting all messages for single user is quite easy and fast to do, the same goes also for creating new messages for one or another particular user... Unless you would have really huge amounts of users/messages in your system.

Splitting data into multiple tables has also some drawbacks - you will need kind of management system or logic how do you split everything - giving separate table for each user could grow up soon into hundreds or thousands of tables - which is, in my opinion, not that nice. Therefore probably you would need some other criteria how to split the data. If you want splitting logic to be dynamic and easy adjustable - you would probably need also to save it in DB somehow. As you see complexity grows...

As advantage of such data sharding could be the scalability - you could easy put different sets of data on different machines once single machine is not able to handle whole load.

Laimoncijus
A: 

It depends how your message system works. Are there cuncurrency issue? Does it need to be scalable as the application accomodate more customers?

Designing one table will perfectly work on small, one message at a time single user system. However, if you are considering multiple user, concurrent messaging system, the tables should be splited

Data model for Real time application is recommended to be "normalized"(Spliting table) due to "locking & latching" and data redundency issue.

  1. Locking policy varies by Database Vendor. If you have tables that have updates & select by applicaiton concurrently, "Locking"(page level, row level, table level depending on vendor) issue araise. Some bad DB & app design completely lock the table so message never go through.

  2. Redendency issue is more clear. If you use only one table, some information(like user. I guess one user could send multiple messages) is redundent.

Try to google with "normalization", 'Locking"..

exiter2000
A: 
Richard Harrison
+2  A: 

You'll just get headaches from moving to small numerous tables. Databases are made for handling lots of data, let it do it's thing.

  • You'll likely end up using dynamic table names in queries (SELECT * FROM $username WHERE ...), making smart features like stored procedures and possibly parameterized queries a lot trickier if not outright impossible. Usually a really bad idea.

  • Try rewriting SELECT * FROM messages WHERE authorID = 1 ORDER BY date_posted DESC, but where "messages" is anywhere between 1 and 30,000 different tables. Keeping your table relations monogamous will keep them bidirectional, way more useful.

If you think table size will really be a problem, set up an "archived messages" clone table and periodically move old & not-unread messages there where they won't get in the way. Also note how most forum software with private messaging allows for limiting user inbox sizes. There are a few ways to solve the problem while keeping things sane.

tadamson
Thanks that's a very good point regarding archived messages. What I don't want to do is signigicantly restrict users' abilities regarding the amount of messages they have, but I can see how moving older messages might help me.
TooManyCooks