views:

636

answers:

10

Hello, I am currently planning a new system in PHP/MySQL and want to make sure my database can handle the amount of data that I am planning to store. One of the features of my new project is a "messages" feature like Facebook. I want to make sure I create the best possible experience for the end user. The website will eventually handle 1000's of users with potentially millions of messages collectively. What would be the best approach for the database design? Is MySQL even the right database to use?

A: 

I've not worked on MySQL but I've worked professionally on SQL Server and Oracle. I would recommend Oracle for any high performance system.

Vivek
I'd never recommend Oracle unless you are rich enough to waste your money for an extrasupermega Oracle DBA. Oracle is overweight and full of legacy rudiments. MS SQL Server is as bad as everything Microsoft does. And you won't be able to run it on an ordinary machine, you'll need to buy bunch of other Microsoft's borken software. Use MySQL or even better PostgreSQL. The key to success is to design your database well, having performance and scalability in mind.
codeholic
codeholic: My preferred RDBMS as well. Though, would you care to back-up your bashing of Oracle and MS SQL Server? I was a MS SQL developer for several years before transitioning over to PGSQL. Some days I long for MS SQL: much better replication, larger community of resources, wider availability of commercial support, better GUI tools, and sprocs are more intuitive to write in T-SQL than PL/pgSQL.
jamieb
+3  A: 

If you are planning to handle large amounts of data (of course millions doesn't even come close to qualifying as large), then hire a datbase professional. Efficient and effective database design for large data sets is a complex issue and requires a specialist.

In answer to your question yes mysql can handle millions of records easily if the design is good and will be a nightmare if the design is bad, pretty much like any other modern datbase.

HLGEM
+1  A: 

As long as you setup your tables to be relational and set the relationships between tables, MySQL should be fine.

Might I also suggest Postgres?

I've had equal experience with MySQL, PostGres and MS SQL... My preference is MS SQL but since start up costs are very important in new projects, PostGres would be my preference for this or any project.
TravisO
+5  A: 

Facebook started with MySQL and they only moved to Cassandra when they had 7TB of inbox data for over 100 million users.

Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.

Daniel Vassallo
Exactly, start small, keep your costs low. Just because you want to be the next Facebook doesn't mean you need to spend anywhere near the amount of money or time to design a system that broad. Every successful site started simple, quick and cheap. Over designing of your system reeks of "premature optimization".
TravisO
+1  A: 

If you are on a budget, start with MySQL and use a system like Zend::DB or on a higher level Doctrine.

It's more important to make it easy to switch DMBS then to choose your DBMS at the beginning.

douwe
+4  A: 

MySQL has no problem with millions or hundreds of millions of records as long as you design you database correctly.

That being said, a "messages feature like Facebook" is a pretty broad definition. Generally, you would define a messages table that links each message to the user that created it (ie, have a userId column in the messages table). If you want messages to go to multiple users, you have a message_recipients table defining the 1-to-many relationship by storing multiple records consisting of the messageId and a recipientId. Add the proper indexes to these tables and you're 80% of the way there.

That being said, that remaining 20% can be a killer. Unfortunately, how you use your database is going to determine what else you need to do, and you'd have to provide a lot more detail about your application before those judgments can be made. For example, you might wish to consider having auto-archiving solution which keeps the main table relatively small, and moves old data to backup tables that can be accessed if necessary. You probably won't need this right away, but it could help in the future.

zombat
From my experience, pretty much every person or company exaggerates their requirements 10x to 100x the reality when they are planning a system. When in doubt, start simple, buy 1 server and run the web server and database from it. Don't worry about multiple servers until you need them. The only reason to have multiple servers from day 1 is because you want fail over, and even then you might find the initial costs exceed your want of it.
TravisO
@TravisO - 100% agree.
zombat
A: 

You are not very precise on what you want to learn. Okay. I'll try to give you some advice.

  1. Normalization
  2. Indexes
  3. MyISAM for tables under high load
  4. Denormalization (sic!), but you should understand what are you doing
  5. Sharding
  6. Minimalistic DB layer for flexibility
codeholic
A: 

I wanted to thank everyone for your valuable input. I left the question a bit broad intentionally. I will continue to use MySQL for now with the proper tables and relationships. Thanks again!

Randy Gonzalez
A: 

Sharding is certainly not necessary for your "broadly" based requirements...I have dealt with a fair amount of data and didn't even consider partitioned tables and shard implementation until there were numerous tables housing over a billion records (then joining those could get a little slow). Index your tables with smart keys, and you may even consider using an eav type structure to keep the tables narrow and relieve yourself of null returns on queries.

Above was written while half asleep so ignore typos ;)

Matt
A: 

If you mean "what should my mysql table look like for a message system", I use the following columns in my message system:

message_id
fromuser
fromview
fromstatus
touser
toview
tostatus
title
text
poston
thread

Message_id is auto_increment, obviously. Fromuser and touser are obvious. Fromstatus and tostatus is active, deleted, purge, draft, and likewise. Fromview and toview are set to 'yes' and 'no'. Title, text, and 'poston' date are obvious. Thread might take a little effort on your part depending on your html forms and message display scripts.

For your form, create a foreach loop based on the "to:" field and save a copy for each recipient.

I expect this message system to hold millions, but that millions is probably a couple of years away. I'm keeping it small and simple.

Bryan