views:

109

answers:

1

Suppose there is a messaging system. This system has millions of entry to be sent and get reported and the count is growing by 100K every hour. 2 service accesses db, one is sender, one is reporter. So what would you suggest in order to get maximum performance? How could the db be designed?

Also what open source RDBMS would you suggest among mysql, postgresql, mongodb etc. to fullfil this high volume db?

Thanks

+1  A: 

You've not really provided much information on your requirement other than a few comments about expected data volumes. Simple storage of large volumes of data has no real intrinsic value, it's the ability to access that data which gives the real value; so knowing how you expected to retrieve information from the database is more important than how much data you want to store.

Do these messages really require a document db like MongDB, or are are they structured enough to use a straight RDBMS like Postgresql or MySQL. Do you need full text search capability? How often and what type of queries are executed against this message data? Are you trying to write your own Twitter?

If those are your current data volumes, look to using db replication for resilience. Consider partitioning your message table, perhaps by date posted. Use master/slave (or even multi-master/multi-slave) as Konerak has suggested. Look at the possibilities of an archive table for older messages that are less likely to be queried, but which are then still available. Look at what a commercial database like Oracle can offer you. Get in a professional to help tune the db for performance, rather than simply asking for free advice on sites like SO.

Consider your hardware as well... multiple load balanced servers to help with the volumes (we have 14 dedicated servers purely for accepting new messages, and three high performance servers tuned for querying the data).

Mark Baker
I think your answer is tending towards Col. Shrapnel's advice in the comments.
Brian Hooper
@Brian - It would be my first piece of advice. I've tried to add more than Col. Shrapnel said, to justify it as an answer; but with the vagaries of the question... I work with data volumes at that sort of level, and highlighted a few of the techniques that I use for dealing with large databases; but if the OP is still debating whether to go for a document database or an rdbms, and is more concerned with data volumes than how to access that data, then I suspect he really needs professional help
Mark Baker
Quite. I wasn't intending to knock your answer, and I'm sorry if I gave that impression; I even upvoted it, as there are some useful suggestions.
Brian Hooper