views:

42

answers:

1

Hey guys,

I suppose this is partially subjective in that it's probably dependent on everyone's interpretation of "high volume", but for the sake of discussion, I'd like to approach this in a hypothetical way. Also, if this is something that should be exclusive to ServerFault, let me know and I'll happily repost there.

Obviously there are numerous well-known database servers - the most lauded of which likely being MySQL. Many people swear by SQLite, PostgreSQL, or even MSSQL (I've admittedly only used MySQL and SQLite). I've had plenty of success dealing with MySQL for low-medium (<= 1,000,000 hits/month) traffic where database interaction was either minimal or moderate (eg, no complex subqueries, wide joins, etc), and MySQL clusters for medium-high traffic. That said, I'm wondering about the validity of filesystem-based systems for extremely high traffic (say 100,000 concurrent connections, hypothetically).

There's always the approach of "build something solid, optimize it, and then scale it by throwing more CPUs at it" which isn't unreasonable given the cloud, and I'm not necessarily afraid of spawning slaves to keep things well distributed. But from a minimalist (and efficiency) standpoint, for something with that many concurrent requests, it seems like adding more gears to the machine is just adding unnecessary complexity.

I know that using something like MySQL Cluster has support for redistributing queries across working slaves should one fail, but if you had a single application such that logically breaking usage into separate servers was not possible, is there a solution that is more efficient than just increasing CPUs? Possibly using filesystem storage across N mount points? I'd love to get some thoughts about pros and cons.

+1  A: 

See Wikipedia on the subject of the C10K problem - or references from that page since the Wikipedia page is rather light on material. Suffice to say, C10K refers to the problem of having 10,000 concurrent clients. You are asking about a problem an order of magnitude larger - which is correspondingly harder and less achievable in practice. You are rapidly encroaching on Google's search territory, and require Google-sized infrastructure to cope.

Jonathan Leffler