ansaurus

Question

Need some suggestion for a database schema design

Answer 1

+1 A:

So the question is, how to design this such that it scales? Any tips/hints?

Yes, you don't want to use a relational database for message queuing. What you are trying to do is not what a relational database is best designed for, and while you can do it, its kinda like driving in a nail with a screwdriver.

Instead, look at one of the many open source message queues out there, the guys at SecondLife have a neat wiki where they reviewed a lot of them.

http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes

FlySwat 2009-04-08 19:00:01

Answer 2

+3 A:

the database design for this is simple as you suggest. As far as it taking longer once the user has more messages, what you can do is just paginate the results. Show the first 10/50/100 or whatever makes sense and only pull those records. Generally speaking, your times shouldn't increase very much unless the volume of messages increases by an order of magnatude or more. You should be able to pull back 1000 short messages in way less than a second. Now it may take more time for the page to display at that point, but thats where the pagination should help.

I would suggest though going through and thinking of future features and building your database out a little more based on that. Adding more features to the software is easy, changing the database is comparatively harder.

Ryan Guill 2009-04-08 19:01:29

I wish I knew why everyone on this question is getting mass down-voted. There may be some controversial advice here, but I don't believe any of it is inaccurate. If you are down-voting, please explain why, especially if you are the one asking the question. Let us clarify if necessary.

Ryan Guill 2009-04-08 19:13:55

I agree, I'm upvoting this as well. This is a legitimate question for Stack Overflow and it should get the chance to have a good answer.

St. John Johnson 2009-04-08 19:52:25

I agree with both of your sentiments, I don't have the points to down-vote but if I ever do I am taking a mental note to be sure and leave a comment explaining why.

thinkhard 2009-04-08 21:01:40

Answer 3

A:

You could always have only one row per user and just concatenate messages together into one long record. If you're keeping messages for a long period of time, that isn't the best way to go, but it reduces your problem to a single find and concatenate at storage time and a single find at retrieve time. It's hard to say without more detail - part of what makes DB design hard is meeting all the goals of the system in a well-compromised way. Without all the details, its hard to give advice on the best compromise.

EDIT: I thought I was fairly clear on this, but evidently not: You would not do this unless you were blanking a reader's queue when he reads it. This is why I prompted for clarification.

2009-04-08 19:01:34

The only criticism I have to offer is you should make a test case of this and see exactly how performance differs. I think you have a fundamental misunderstanding of how good databases are at what they're designed to do.

Frank Crook 2009-04-08 19:36:54

You should not add repeating groups to columns. This is a poor design decision.

thinkhard 2009-04-08 20:51:20

Answer 4

A:

Limit the number of rows that your pull.php will display at any one time.

The more data you transfer, longer it will take to display the page, regardless of how great your DB is.

You must limit your data in the SQL, return the most recent N rows.

EDIT Put an index on Recipient and it will speed it up. You'll need another column to distinguish rows if you want to take the top 50 or something, possibly SendDate or an auto incrementing field. A Clustered index will slow down inserts, so use a regular index there.

KM 2009-04-08 19:01:40

"A Clustered index will slow down inserts, so use a regular index there." Don't do this. http://www.sql-server-performance.com/tips/clustered_indexes_p1.aspx

DBAndrew 2009-04-08 19:48:43

Hello? he is using mySql and not SQL Server.

KM 2009-04-08 19:58:12

Answer 5

A:

This is an unavoidable problem - more messages, more time to find the requested ones. The only thing you can do is what you already did - add an index and turn O(n) look up time for a complete table scan into O(log(u) + m) for a clustered index look up where n is the number of total messages, u the number of users, and m the number of messages per user.

Daniel Brückner 2009-04-08 19:02:01

Answer 6

+3 A:

Follow the rules of normalization. Try to reach 3rd normal form. To go further for this type of application probably isn’t worth it. Keep your tables thin.
Don’t actually delete rows just mark them as deleted with a bit flag. If you really need to remove them for some type of maintenance / cleanup to reduce size. Mark them as deleted and then create a cleanup process to archive or remove the records during low usage hours.
Integer values are easier for SQL server to deal with then character values. So instead of where recipient = 'John' use WHERE Recipient_ID = 23 You will gain this type of behavior when you normalize your database.

DBAndrew 2009-04-08 19:36:58

Adding joins in this situation would only exacerbate his problem. I'm a huge believe in 3rd normal form, but if he is already seeing performance problems on a single table query with proper indexes, normalizing won't help.

FlySwat 2009-04-08 19:41:41

Answer 7

+3 A:

Don't use VARCHAR for your recipient. It's best to make a Recipient table with a primary key that is an integer (or bigint if you are expecting extremely large quantities of people).

Then when you do your select statements:

SELECT message FROM DB WHERE recipient = 52;

The speed retrieving rows will be much faster.

Plus, I believe MySQL indexes are B-Trees, which is O(log n) for most cases.

St. John Johnson 2009-04-08 19:50:30

Answer 8

+2 A:

A database table without an index is called a heap, querying a heap results in each row of the table being evaluated even with a 'where' clause, the big-o notation for a heap is O(n) with n being the number of rows in the table. Adding an index (and this really depends on the underlying aspects of your database engine) results in a complexity of O(log(n)) to find the matching row in the table. This is because the index most certainly is implemented in a b-tree sort of way. Adding rows to the table, even with an index present is an O(1) operation.

 > But for pulling down messages, my pull.php will take longer as the number of rows 
 increase! I find the sql select (and delete) will take longer as the rows grow and
 this is true even after I have added an index for the recipient field.

UNLESS you are inserting into the middle of an index, at which point the database engine will need to shift rows down to accommodate. The same occurs when you delete from the index. Remember there is more than one kind of index. Be sure that the index you are using is not a clustered index as more data must be sifted through and moved with inserts and deletes.

FlySwat has given the best option available to you... do not use an RDBMS because your messages are not relational in a formal sense. You will get much better performance from a file system.

dbarker has also given correct answers. I do not know why he has been voted down 3 times, but I will vote him up at the risk that I may lose points. dbarker is referring to "Vertical Partitioning" and his suggestion is both acceptable and good. This isn't rocket surgery people.

My suggestion is to not implement this kind of functionality in your RDBMS, if you do remember that select, update, insert, delete all place locks on pages in your table. If you do go forward with putting this functionality into a database then run your selects with a nolock locking hint if it is available on your platform to increase concurrency. Additionally if you have so many concurrent users, partition your tables vertically as dbarker suggested and place these database files on separate drives (not just volumes but separate hardware) to increase I/O concurrency.

thinkhard 2009-04-08 20:49:25

ansaurus

tags:

views:

answers:

Need some suggestion for a database schema design

related questions