views:

322

answers:

8

The guys who wrote Bespin (cloud-based canvas-based code editor [and more]) recently spoke about how they re-factored and optimize a portion of the Bespin code because of a misconception that JavaScript was slow. It turned out that when all was said and done, their optimization produced no significant improvements.

I'm sure many of us go out of our way to write "optimized" code based on misconceptions similar to that of the Bespin team.

What are some common performance bottleneck misconceptions developers commonly subscribe to?

+12  A: 

And this is what happens when one optimizes without a valid profile in hand. All you are doing without a profile is guessing and probably wasting time and money. I could list a bunch of other misconceptions, but many come down to the fact that if the code in question isn't a top resource consumer, it is probably fine as is. Kinda like unrolling a for loop that is doing disk I/O...

Michael Dorgan
Definitely the biggest misconception - you think you know why it's slow. Profile, profile, profile!
jpeacock
The keyword here is _valid_. Make sure that the performance test you use to identify bottlenecks resembles the production load.
binil
+3  A: 
  • Relational databases are slow.
  • I'm smarter than the optimizer.
  • This should be optimized.
  • Java is slow

And, unrelated:

Some people, when confronted with a problem, think ``I know, I'll use regular expressions.'' Now they have two problems. jwz.

alex
Hell, this answer wasn't here when I wrote mine! LOL. :DDD
slacker
And the most funny thing is, that both of our answers got equally upvoted :D.
slacker
+4  A: 

If I convert the whole code base over to [Insert xxx latest technology here], it'll be much faster.

DancesWithBamboo
That misconception is much more commonly held by the PHBs than by the programmers themselves. Guess who makes the decision...
slacker
+2  A: 
  • Optimizing the WRONG part of the code (people, use your profiler!).
  • The optimizer in my compiler is smart, so I don't have to help it.
  • Java is fast (LOL)
  • Relational databases are fast (ROTFL LOL LMAO)
slacker
can you expand ROTFLLOLLMAO for us?
David Murdoch
@David Murdoch:http://en.wikipedia.org/wiki/LOL
slacker
sigh. i know ROTFLMAO. what is the "LLOL" part in the middle? Or is it "LOLL"? :-)
David Murdoch
I split it to make the intent clearer.
slacker
R.olling O.n T.he F.loor L.aughing L.aughing O.ut L.oud L.aughing M.y A.ss O.ff. I think ROTFLMAO would have made more sense. :-)
David Murdoch
+1  A: 

My rules of optimization.

  • Don't optimize
  • Don't optimize now.
  • Profile to identify the problem.
  • Optimize the component that is taking at least 80% of the time.
  • Find an optimization that is 10 times faster.

My best optimization has been reducing a report from 3 days to 9 minutes. The optimized code was sped up from three days to three minutes.

In my carreer I have met three people who had been tasked with producing a faster sort on VAX than the native sort. They invariably had been able to produce sorts that took only three times longer.

BillThor
+9  A: 

In no particular order:

"Ready, Fire, Aim" - thinking you know what needs to be optimized without proving it (i.e. guessing) and then acting on that, and since it doesn't help much, therefore assuming the code must have been optimal to begin with.

"Penny Wise, Pound Foolish" - thinking that optimization is all about compiler optimization, fussing about ++i vs. i++ while mountains of time are being spent needlessly in overblown designs, especially of data structures and databases.

"Swat Flies With a Bazooka" - being so enamored of the fanciest ideas heard in classrooms that they are just used for everything, regardless of scale.

"Fuzzy Thinking about Performance" - throwing around terms like "hotspot" and "bottleneck" and "profiler" and "measure" as if these things were well understood and / or relevant. (I bet I get clobbered for that!) OK, one at a time:

  • hotspot - What's the definition? I have one: it is a region of physical addresses where the PC register is found a significant fraction of time. It is the kind of thing PC samplers are good at finding. Many performance problems exhibit hotspots, but only in the simplest programs is the problem in the same place as the hotspot is.

  • bottleneck - A catch-all term used for performance problems, it implies a limited channel constraining how fast work can be accomplished. The unstated assumption is that the work is necessary. In my decades of performance tuning, I have in fact found a few problems like that - very few. Nearly all are of a very different nature. Rather than taking the shortest route from point A to B, little detours are taken, in the form of function calls which take little code, but not little time. Then those detours take further nested detours, sometimes 30 levels deep. The more the detours are nested, the more likely it is that some of them are less than necessary - wasteful, in fact - and it nearly always arises from galloping generality - unquestioning over-indulgence in "abstraction".

  • profiler - a universal good thing, right? All you gotta do is get a profiler and do some profiling, right? Ever think about how easy it is to fool a profiler into telling you a lot of nothing, when your goal is to find out what you need to fix to get better performance? Somewhere in your call tree, tuck a little file I/O, or a little call to some system routine, or have your evil twin do it without your knowledge. At some point, that will be your problem, and most profilers will miss it completely because the only inefficiency they contemplate is algorithmic inefficiency. Or, not all your routines will be little, and they may not call another routine in a small number of places, so your call graph says there's a link between the two routines, but which call? Or suppose you can figure out that some big percent of time is spent in a path A calls B calls C. You can look at that and think there's not much you can do about it, when if you could also look at the data being passed in those calls, you could see if it's necesssary. Here's a fun project - pick your favorite profiler, and then see how many ways you could fool it. That is, find ways to make the program take longer without the profiler being able to tell what you did, because if you can do it intentionally, you can also do it without intending to.

  • measure - (i.e. measure time) that's what profilers have done for decades, and they take pride in the accuracy and precision with which they measure. But measure what time? and why with accuracy? Remember, the goal is to precisely locate performance problems, such that you could fruitfully optimize them to gain a speedup. When you get that speedup, it is what it is, regardless of how precisely you estimated it beforehand. If that precision of measurement is bought at the expense of precision of location, then you've just bought apples when what you needed was oranges.

Here's a list of myths about performance.

Mike Dunlavey
Hmmm, profilers have been very useful to me, and I've made improvements thanks to them both measurable and perceivable... I don't really get your point.
alex
@alex: Yes, they find *some* problems. Do you know you didn't have more opportunities for speedup? Absence of evidence is not evidence of absence. Here's an example of what I mean, where over several iterations, a speedup of over 40 times was demonstrated: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773
Mike Dunlavey
+1, though I disagree with (or misunderstand) the "bottleneck" part. My view: bottleneck describes the hardware component that's limiting - e.g. CPU, cache, disk. When running a computationally heavy program, rarely all components are at their limits - usually one will be the bottleneck. You can reduce the overall load, you can remove "pressure" from one component so that it doesn't bottleneck so early, and you can move "pressure" between components.
peterchen
@peterchen: You're right. My beef is that the term is generally applied to anything that makes software take longer than it should, and as a metaphor, it is misleading. It implies that the racehorse is limited by how fast it's legs move (say). Well sure, but the overwhelmingly typical problem in non-toy software is that rather than running straight, the horse is following a fractal coastline of nested detours, many of them not truly necessary. I wish there were a better metaphor to capture this idea, because people aren't really aware of it.
Mike Dunlavey
@peterchen: Maybe another way to say it is that our call trees can get way overgrown and need pruning. What's a word for this? "Kudzu Code" maybe?
Mike Dunlavey
Thanks for your feedback- I agree. Compilers and esp. hardware have made *amazing* progress in dealing with Kudzu (had to look that up!) However, growth always seems by a slightly higher power than gains. Still: more and more to prune
peterchen
A: 

The rules are simple:

  1. Try to use standard library functions first.
  2. Try to use brute-force and ignorance second.
  3. Prove you've got a problem before trying to do any optimization.
Donal Fellows
+3  A: 

"This has to be as fast as possible."

If you don't have a performance problem, you don't need to worry about optimizing performance (beyond just paying attention to using good algorithms).

This misconception also manifests in attempts to optimize performance for every single aspect of your program. I see this most often with people trying to shave every last millisecond off of a low-volume web application's execution time while failing to take into account that network latency will take orders of magnitude longer than their code's execution time, making any reduction in execution time irrelevant anyhow.

Dave Sherohman