Hello, I am currently planning a new system in PHP/MySQL and want to make sure my database can handle the amount of data that I am planning to store. One of the features of my new project is a "messages" feature like Facebook. I want to make sure I create the best possible experience for the end user. The website will eventually handle 1000's of users with potentially millions of messages collectively. What would be the best approach for the database design? Is MySQL even the right database to use?
I've not worked on MySQL but I've worked professionally on SQL Server and Oracle. I would recommend Oracle for any high performance system.
If you are planning to handle large amounts of data (of course millions doesn't even come close to qualifying as large), then hire a datbase professional. Efficient and effective database design for large data sets is a complex issue and requires a specialist.
In answer to your question yes mysql can handle millions of records easily if the design is good and will be a nightmare if the design is bad, pretty much like any other modern datbase.
As long as you setup your tables to be relational and set the relationships between tables, MySQL should be fine.
Might I also suggest Postgres?
Facebook started with MySQL and they only moved to Cassandra when they had 7TB of inbox data for over 100 million users.
Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.
If you are on a budget, start with MySQL and use a system like Zend::DB or on a higher level Doctrine.
It's more important to make it easy to switch DMBS then to choose your DBMS at the beginning.
MySQL has no problem with millions or hundreds of millions of records as long as you design you database correctly.
That being said, a "messages feature like Facebook" is a pretty broad definition. Generally, you would define a messages
table that links each message to the user that created it (ie, have a userId
column in the messages table). If you want messages to go to multiple users, you have a message_recipients
table defining the 1-to-many relationship by storing multiple records consisting of the messageId
and a recipientId
. Add the proper indexes to these tables and you're 80% of the way there.
That being said, that remaining 20% can be a killer. Unfortunately, how you use your database is going to determine what else you need to do, and you'd have to provide a lot more detail about your application before those judgments can be made. For example, you might wish to consider having auto-archiving solution which keeps the main table relatively small, and moves old data to backup tables that can be accessed if necessary. You probably won't need this right away, but it could help in the future.
You are not very precise on what you want to learn. Okay. I'll try to give you some advice.
- Normalization
- Indexes
- MyISAM for tables under high load
- Denormalization (sic!), but you should understand what are you doing
- Sharding
- Minimalistic DB layer for flexibility
I wanted to thank everyone for your valuable input. I left the question a bit broad intentionally. I will continue to use MySQL for now with the proper tables and relationships. Thanks again!
Sharding is certainly not necessary for your "broadly" based requirements...I have dealt with a fair amount of data and didn't even consider partitioned tables and shard implementation until there were numerous tables housing over a billion records (then joining those could get a little slow). Index your tables with smart keys, and you may even consider using an eav type structure to keep the tables narrow and relieve yourself of null returns on queries.
Above was written while half asleep so ignore typos ;)
If you mean "what should my mysql table look like for a message system", I use the following columns in my message system:
message_id
fromuser
fromview
fromstatus
touser
toview
tostatus
title
text
poston
thread
Message_id is auto_increment, obviously. Fromuser and touser are obvious. Fromstatus and tostatus is active, deleted, purge, draft, and likewise. Fromview and toview are set to 'yes' and 'no'. Title, text, and 'poston' date are obvious. Thread might take a little effort on your part depending on your html forms and message display scripts.
For your form, create a foreach loop based on the "to:" field and save a copy for each recipient.
I expect this message system to hold millions, but that millions is probably a couple of years away. I'm keeping it small and simple.