tags:

views:

169

answers:

4

As a programmer I make revolutionary findings every few years. I'm either ahead of the curve, or behind it by about π in the phase. One hard lesson I learned was that scaling OUT is not always better, quite often the biggest performance gains are when we regrouped and scaled up.

What reasons to you have for scaling out vs. up? Price, performance, vision, projected usage? If so, how did this work for you?

We once scaled out to several hundred nodes that would serialize and cache necessary data out to each node and run maths processes on the records. Many, many billions of records needed to be (cross-)analyzed. It was the perfect business and technical case to employ scale-out. We kept optimizing until we processed about 24 hours of data in 26 hours wallclock. Really long story short, we leased a gigantic (for the time) IBM pSeries, put Oracle Enterprise on it, indexed our data and ended up processing the same 24 hours of data in about 6 hours. Revolution for me.

So many enterprise systems are OLTP and the data are not shard'd, but the desire by many is to cluster or scale-out. Is this a reaction to new techniques or perceived performance?

Do applications in general today or our programming matras lend themselves better for scale-out? Do we/should we take this trend always into account in the future?

+5  A: 

Because scaling up

  • Is limited ultimately by the size of box you can actually buy
  • Can become extremely cost-ineffective, e.g. a machine with 128 cores and 128G ram is vastly more expensive than 16 with 8 cores and 8G ram each.
  • Some things don't scale up well - such as IO read operations.
  • By scaling out, if your architecture is right, you can also achieve high availability. A 128-core, 128G ram machine is very expensive, but to have a 2nd redundant one is extortionate.

And also to some extent, because that's what Google do.

MarkR
I agree, but the sad thing is that all to often people apply brute force (read more hardware) where a better design would do miracles. Building an app to be stateless so that you do not have to do sticky or distributed sessions can dramatically decrease hardware requirements
mfeingold
Scaling up is the easy solution - for a while; developer time is expensive and your developers probably have better things to do - so to a point, it's compelling to just buy bigger boxes; eventually it becomes uneconomic.
MarkR
Cost effective? 6x Dell 4c24g = $36,168 ; 1x Dell 24c128g = $20,571
Xepoch
Scaling up on a single machine still constitutes a SPOF (Single Point Of Failure) albeit with a bigger impact when a failure occurs.
jldupont
Yes - having several machines can mean that you need less redundant tin (hopefully - depending on your architecture) - which is also good.
MarkR
I separate scaling and high-availability. I can put a HA landscape in place that doesn't scale and vice-versa.
Xepoch
+3  A: 

Not surprisingly, it all depends on your problem. If you can easily partition it with into subproblems that don't communicate much, scaling out gives trivial speedups. For instance, searching for a word in 1B web pages can be done by one machine searching 1B pages, or by 1M machines doing 1000 pages each without a significant loss in efficiency (so with a 1,000,000x speedup). This is called "embarrassingly parallel".

Other algorithms, however, do require much more intensive communication between the subparts. Your example requiring cross-analysis is the perfect example of where communication can often drown out the performance gains of adding more boxes. In these cases, you'll want to keep communication inside a (bigger) box, going over high-speed interconnects, rather than something as 'common' as (10-)Gig-E.

Of course, this is a fairly theoretical point of view. Other factors, such as I/O, reliability, easy of programming (one big shared-memory machine usually gives a lot less headaches than a cluster) can also have a big influence.

Finally, due to the (often extreme) cost benefits of scaling out using cheap commodity hardware, the cluster/grid approach has recently attracted much more (algorithmic) research. This makes that new ways of parallelization have been developed that minimize communication, and thus do much better on a cluster -- whereas common knowledge used to dictate that these types of algorithms could only run effectively on big iron machines...

Wim
Yes, in my example communication and latency end up being the problem. Interestingly it was *not* because of cross-talk, but rather the simple flat data representation that came down with processing as to avoid DB hits.
Xepoch
+4  A: 

Scaling out is best for embarrassingly parallel problems. It takes some work, but a number of web services fit that category (thus the current popularity). Otherwise you run into Amdahl's law, which then means to gain speed you have to scale up not out. I suspect you ran into that problem. Also IO bound operations also tend to do well with scaling out largely because waiting for IO increases the % that is parallelizable.

Kathy Van Stone
+1 on Amdahl's law.
bajafresh4life
Amdahl's law (i.e. what fraction of your application is actually parallelizable, versus what needs to be done sequentially) is indeed an important component. But it's often too theoretical a view, in a lot of cases it's the communication cost that kills you long before you run out of things to do in parallel...
Wim
+4  A: 

The blog post Scaling Up vs. Scaling Out: Hidden Costs by Jeff Atwood has some interesting points to consider, such as software licensing and power costs.

Daniel Ballinger