How build a scalable (relational) database for Petabytes+ of data?

More a general brainstorming question: what is the state-of-the-art in designing a (relational) database which scales to very large amounts of data? And given today's technology trends, how do we expect to design them in 5-10 years?

By scalabiliy, I mean in particualar the ability to increase capacity with linear cost by adding hardware.

These are the two approaches I'm currently aware of:

Commercial RDBMS (Oracle, MS-SQL) + SAN
- Positives:
  - Mature technology, developed/optimized over several decades
- Negatives:
  - Expensive, non-commodity hardware
  - Scalability limit, depending on max. SAN capacity
  - db server is single point of failure (mitigation: fail-over instance)
  - CPU/RAM bottlenecks on db server can occur
Distributed databases (HBase, Google's BigTable)
- Positives:
  - Based on commodity hardware => inexpensive
  - Predictable, linear scalability with virtually no capacity limitation
- Negatives:
  - Currently no (full) transaction support
  - Other limitations in functionality (indexes, joins, triggers, sprocs ...)
  - Optimized for special kinds of queries, bad performance for others
  - Currently no support for standardized DDL/DMLs, in particular SQL
  - Emerging technologies, currently not as mature as classical RDMS

So, what's the future? Will distributed databases mature over the next couple of years, so they can be used in a similar way as current RDBMSes? Any other approaches?

ansaurus

tags:

views:

answers:

How build a scalable (relational) database for Petabytes+ of data?

related questions