More a general brainstorming question: what is the state-of-the-art in designing a (relational) database which scales to very large amounts of data? And given today's technology trends, how do we expect to design them in 5-10 years?
By scalabiliy, I mean in particualar the ability to increase capacity with linear cost by adding hardware.
These are the two approaches I'm currently aware of:
Commercial RDBMS (Oracle, MS-SQL) + SAN
- Positives:
- Mature technology, developed/optimized over several decades
- Negatives:
- Expensive, non-commodity hardware
- Scalability limit, depending on max. SAN capacity
- db server is single point of failure (mitigation: fail-over instance)
- CPU/RAM bottlenecks on db server can occur
- Positives:
Distributed databases (HBase, Google's BigTable)
- Positives:
- Based on commodity hardware => inexpensive
- Predictable, linear scalability with virtually no capacity limitation
- Negatives:
- Currently no (full) transaction support
- Other limitations in functionality (indexes, joins, triggers, sprocs ...)
- Optimized for special kinds of queries, bad performance for others
- Currently no support for standardized DDL/DMLs, in particular SQL
- Emerging technologies, currently not as mature as classical RDMS
- Positives:
So, what's the future? Will distributed databases mature over the next couple of years, so they can be used in a similar way as current RDBMSes? Any other approaches?