views:

149

answers:

0

More a general brainstorming question: what is the state-of-the-art in designing a (relational) database which scales to very large amounts of data? And given today's technology trends, how do we expect to design them in 5-10 years?

By scalabiliy, I mean in particualar the ability to increase capacity with linear cost by adding hardware.

These are the two approaches I'm currently aware of:

  1. Commercial RDBMS (Oracle, MS-SQL) + SAN

    • Positives:
      • Mature technology, developed/optimized over several decades
    • Negatives:
      • Expensive, non-commodity hardware
      • Scalability limit, depending on max. SAN capacity
      • db server is single point of failure (mitigation: fail-over instance)
      • CPU/RAM bottlenecks on db server can occur
  2. Distributed databases (HBase, Google's BigTable)

    • Positives:
      • Based on commodity hardware => inexpensive
      • Predictable, linear scalability with virtually no capacity limitation
    • Negatives:
      • Currently no (full) transaction support
      • Other limitations in functionality (indexes, joins, triggers, sprocs ...)
      • Optimized for special kinds of queries, bad performance for others
      • Currently no support for standardized DDL/DMLs, in particular SQL
      • Emerging technologies, currently not as mature as classical RDMS

So, what's the future? Will distributed databases mature over the next couple of years, so they can be used in a similar way as current RDBMSes? Any other approaches?