views:

221

answers:

3

Apparently the reason for the BigTable architecture has to do with the difficulty scaling relational databases when you're dealing with the massive number of servers that Google has to deal with.

But technically speaking what exactly makes it difficult for relational databases to scale?

In the enterprise data centers of large corporations they seem to be able to do this successfully so I'm wondering why it's not possible to simply do this at a greater order of magnitude in order for it to scale on Google's servers.

+3  A: 

When you perform a query that involves relationships which are physically distributed, you have to pull that data for each relationship into a central place. That obviously won't scale well for large volumes of data.

A well set-up RDBMS server will perform the majority of it's queries on hot-pages in RAM, with little physical disk or network I/O.

If you are constrained by network I/O, then the benefits of relational data become lessened.

Mitch Wheat
@Mitch THANKS! Much clearer. Original comment deleted.
David Lively
A: 

The main reason as stated is physical location and network IO. Additionally, even large corporations deal with a fraction of the data that search engines deal with.

Think about the index on a standard database, maybe a few feilds... search engines need fast text search, on large text fields.

Nate Bross
+2  A: 

In addition to Mitch's answer, there's another facet: Webapps are generally poorly suited to relational databases. Relational databases put emphasis on normalization - essentially, making writes easier, but reads harder (in terms of work done, not necessarially for you). This works very well for OLAP, ad-hoc query type situations, but not so well for webapps, which are generally massively weighted in favor of reads over writes.

The strategy taken by non-relational databases such as Bigtable is the reverse: denormalize, to make reads much easier, at the cost of making writes more expensive.

Nick Johnson
I agree that most web apps involve more reading than user-inputting or app-updating of data. But I don't understand what you mean when you say that writes are "easier (in terms of work done)" in a normalized RDBMS? I would think the App Engine datastore is easier in terms of work done since a unique key identifies every entity and an update is equivalent to an insert because of the dictionary-like character of the datastore. Putting and fetching from a dictionary is about as easy as it gets as far as work done, I would think.
pacman
@pacman: You're forgetting all the work that's actually done. The Index is the big king of the Datastore. When you add an entity to the datastore, it does a huge amount of work replicating data so that if you want to get a property you can do so quickly. It basically writes indexes for each property, on each entity, twice (asc and desc), for all data that you store (perhaps not the new big Blobs, not sure). This is what takes so long for writes, but also allows for fast reads on a mind boggling scales.I'd suggest getting a good AppEngine book, as it's important when designing for GAE.
Lee Olayvar