views:

1391

answers:

18

Anyone who reads a lot of SO questions knows that the need to preach against premature optimization is not going away anytime soon - but what about the other extreme, the projects that fail or struggle because they did not consider performance early enough?

Should you just try and get a working system as quickly as possible so you can do performance tests, or are there some optimizations so essential that they should always be considered right from the beginning?

A: 

Loop optimizations should always be made early. That is when the algorithm is most clear in your mind.

Steve
What exactly do you mean with "loop optimization"?
Michael Borgwardt
When someone first writes a for loop that does something complicated, the kitchen sink usually goes in. Things that don't need to get recalculated get recalculated. Sometimes, not everything needs to be recalculated, but only a portion of the loop needs to be recalculated.
Steve
sometimes something can be calculated once outside of the loop, etc. In a tight loop with a lot of math, this can get out of hand quickly
Steve
Sorry, but to me that definitely sounds like premature micro-optimization.
Michael Borgwardt
Sorry, didn't explain myself well. In my work, I do a lot of math. These tend to occur in a couple of defined places. In general, we ignore the GUI and most other non/IO code, and make sure the math is right, and that its fast. We know in advance the bottleneck will be trig/exp/ln functions.
Steve
Ah, that makes a lot more sense - interesting though, that it runs counter to what makes sense in most "normal" applications.
Michael Borgwardt
I don't know what math you do, but scientific computing is much more commonly limited by memory bandwidth. Loop optimizations can always be done later since they are purely local transformations, but designing data structures for cache locality is worth doing early since it can have a significant impact on the architecture.
Jed
I deal with extremely small amounts of data (most of which can fit in the cache), but then its all complex number mathematics
Steve
+3  A: 

It may be best to view this in the light of things you should avoid, because they are universal performance-killers, and which are hard to change once implemented, because they become part of the application design.

Off the top of the head I can think of:

  • Avoid network round trips
  • Avoid nonsequential HD access
  • Use efficient algorithms and data structures whenever dealing with large volumes of data
  • Design your database carefully by normalizing enough to allow sensible queries but not so much that anything requires a dozen joins.
Michael Borgwardt
These can be ordered in importance. Algorithms/data structures is first. Database design *is* the same thing, also first. Network round trips and disk optimizations are a distant second. Often these cannot be changed because of OS or Protocol.
S.Lott
A: 

We build the code, and if performance is an issue, we fix it using the apropriate tools.

The big problem with optimized code is that it is often more complex. And complex code is hard to maintain, so you should only optimize what needs to be optimized.

Gamecat
+3  A: 

Any optimization that you know with 100% certainty ahead of time will make a huge difference and won't complicate your code hugely. This actually includes more than you might think, but there's still not a lot of optimizations that fall into this category.

Update: One area that may need some planning is interaction with the database. In particular, you need to plan to query as much as you can as soon as you can if you're going to be doing a lot of queries. Making several small queries when you could instead be using one big query can be a huge performance drag, and it can cause some major architectural redesign if you don't design your architecture with this in mind. As always, your true milage may vary.

Jason Baker
+43  A: 

Algorithm and data structure must align with the problem. This isn't "optimization". This is "design".

Getting the right algorithm/data structure is central, essential and first.

Generally, you want to avoid or minimize search. This isn't optimization; this is simple design of a data structure that avoids search. Hash tables, trees, indexes, and what-not are design issues, not optimizations.

You want to avoid or minimize sort. This isn't optimization, either. This is simple design of the right algorithm to prepare the data in a way that doesn't require sorting. Trees, linked lists, priority queues, heaps, and what-not are design issues.

After you have prevented searching and sorting, there isn't much left that really eats up mountains of time. If you still have performance problems after preventing all searching and sorting, feel free to optimize what's left.

S.Lott
I generally agree but, IMHO, it depends which data structures and algorithms. Those that are local within some *short* function can be easily changed later, those that effect how parts of the system interact must be considered beforehand.
Asaf R
So, basically you're saying: optimization you can do up front is not optimization, it's a good design. I think I can live with that. A good design should be optimized for the problem at hand ;-)
Erik van Brakel
@Asaf R: "Changed" doesn't really enter into it. Preventing search is simply fundamental, essential, core -- it's what good design is about.
S.Lott
@Erik van Brakel: Actually, I'm making a stronger statement. Design (the stuff you do up front) is NOT "optimization". Optimization is something you do later to tweak up an appropriate design for the last bits of speed.
S.Lott
Nice answer. But there are some other things that eat up mountains of time. For instance unecessary network traffic or nonsequential disk access as mentioned in other answers.
MarkJ
File seek times are often minor compared to a really rotten O(n**2) algorithm. Seeks ("non-sequential access") are usually a symptom of a rotten algorithm, not the cause.
S.Lott
I would add "design data structures for memory locality" to the list above. A lot of applications are memory bandwidth limited and keeping values that are used together close in memory can have a huge impact on cache utilization. This frequently makes vector instructions more attractive and reduces conflict misses. If changing the layout in memory necessarily affects other components, it is *design* and the decision needs to be made early. If it is (or can be made to be) easy to change later, then it is an *optimization* and can wait.
Jed
I would add that using best practice doesn't mean that you do a premature optimization as I wrote here: http://satukubik.com/2009/08/10/premature-optimization-vs-best-practice/
nanda
+1  A: 

I would say if your are storing/searching any type of data, do the following:

  1. Determine the largest data set that your target customer/audience will ever use.
  2. Build a test data set that is twice as large as the theoretical maximum
  3. Make sure your application performs well on the test data set
mjmarsh
+11  A: 

Optimize immediately for readability and maintainability. Measure performance. Don't optimize for performance unless you have measurement and can tell in hard numbers what the effect of your optimizations are.

Edit: To add clarification:

  • There's nothing wrong with early performance optimization if it comes with early measurements. If you immediately see the need for performance optimization, it's part of the spec and testable (e.g. "must present result in 100ms for some given input")
  • If you have the choice of two equally readable/maintainable implementations: Choose the one that you like best - even on the basis of expected unjustified performance gains.
Olaf
+2  A: 

Anything that affects high-level design. In practice, a big one tends to be reducing memory allocations, because without the ability to freely heap allocate, a lot of programming techniques are out the window at a design level. Also, if a data structure that's passed around a lot is hard to encapsulate in a way that hides the implementation you might want to make sure it's efficient early on.

dsimcha
+8  A: 

Read C++ Optimizations You Can Do "As You Go" chapter in this useful reminder website.

And don't forget that you can have a good early optimization by choosing a good enough algorithm from the start. Make sure your algorithms are well encapsulated, to make a potential refactoring or algorithm change later easy.

Klaim
Great article!Writing fast code is often more about good habits rather than active optimizations, at least it makes the optimization phase much shorter.
Laserallan
+6  A: 

the "getting-it-to-work" optimization

+2  A: 

Don't write a database access layer that causes thousands of database calls for every click in the UI.

Greg
don't couple your UI to your db queries in the first place... cache the data.
Anonymous Type
In many cases I could read your answer as "don't use ORMs", with what I would mostly agree...
kriss
One real-world example I ran into. A business model object that eager-loaded a state object, that eager-loaded a counties collection, that eager-loaded cities collection...basically loaded the entire geography of the unite states from the database with every database call....then serialized the whole thing across a web service. At least use lazy loading.
Greg
+1  A: 

There are several trends that occur:

  • the more complex software is, the more complex it is to maintain and to restructure w/o introducing bugs
  • the more complete software is, the more constraints there are on its design (since you've made design choices that have consequences)
  • the more software is used by real customers, the more constraints there are on its design
  • the more constraints there are on software design, the harder it is to optimize
  • the more complete software is & the more use it gets by real customers, the more information you get on whether its performance is sufficient
  • the more software is used by real customers, the more money you/your company has (hopefully) to optimize performance later
  • the more time you take writing commercial software, the less likely it is to be profitable (because someone else has got there first, and/or because your labor costs have gone up)

If you look at these trends individually they tell you different things about whether/when you should optimize. So I don't think there's a general answer.

But in many cases it makes sense to rapid-prototype selected functions of software, forget about optimization unless it's an obvious necessity for the thing to function, then that gets you the ability to try it out and measure performance, study it, and then throw it away so you can do a better job architecting your software for real.

Jason S
A: 

First rule: all data/tables in the database model will be (at least) in boyce-codd normal form. The latter you'll be in optimizing your database, the closer you'll be to hell.

Philippe Grondier
A: 

Immutability in objects.

I call it the best optimisation there is.

Fortyrunner
+3  A: 

There probably is no single optimization trick that should always be done early. As always pointed out, any specific tricks should wait until you have analyzed your performance.

You should however, as S.Lott pointed out, consider performance in your design, from the first minute.

That does not mean any specific optimization, but planning your performance. Specifically, identify:

  • What are the performance requirements? (time per transaction, throughput etc.)
  • What is the expected problem / data size?
  • What hardware will we run on?

Then you should calculate a rough performance estimate. If that looks good, you're fine. If it looks like you will miss your performance goals, you should not continue to the next stage until you have a design that has at least a fighting chance of meeting the goals. Otherwise, you're setting yourself up for failure.

The idea is not to get a design that is guaranteed to perform well, but to avoid a design that cannot perform properly (as in "we'll just do everything in RAM", but the database has 2 TB). Any performance problems that you were unable to foresee that way can then be dealt with when they occur.

sleske
+9  A: 

I personally agree with sleske's answer; a lot of software suffers from bad performance because nobody on the team sets performance goals (e.g., when the project starts, or as new features are conceived). For example, how fast do you want a particular action to take (from, say, the user's perspective)? 5 seconds? 1 second? Less? If you can't answer that question, you probably aren't going to end up with good performance in the end. For one thing, it's hard to set up systematic performance tests without any goals for them to measure against. At best, you'll have to engage in costly refactoring to finally achieve adequate performance (after, say, your customers complain enough, or your competitors come out with faster software).

More generally, there is a great article about optimization in the ACM's Ubiquity publication, The Fallacy of Premature Optimization. It is the best article I've read on this subject; the author makes great points and provides historical context. Here is an excerpt from the final summary:

Sir Tony Hoare's statement "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" has been responsible for a fundamental change in the way software engineers develop applications. A misinterpretation of this statement, particularly the clause "premature optimization is the root of all evil" has led many software engineers (and schools teaching new engineers) to believe that optimization and, indeed, concern about software efficiency, is unimportant. To reverse the decline in application performance, software engineers must first reject the prevailing attitude that performance is not important. Once they decide that writing efficient software is worthwhile, the next step is to learn how high-level language compilers and interpreters process those high-level constructs; once the engineer understands this process, choosing appropriate high-level constructs will become second nature and won't incur additional development cost. Hoare's statement, in its original context, certainly still applies. Premature optimization, at the microscopic level, is usually a bad idea. This does not suggest, however, that engineers should not be concerned about application performance. The fallacy that "premature optimization" is the same thing as "concern about performance" should not guide software development.

Jacob Gabrielson
I'd upvote a million times if I could
HLGEM
A: 

Based on my experience from various, mostly negative, experiences around optimizations as a software developer and architect.

Do (and make sure the team does):

  • know the data
  • know the programming language(s)
  • appreciate the customer's hardware and network setup
  • keep things immutable
  • have a profiler and know how to use it
  • address issues when they occur

Do not:

  • cache
  • make everything configurable
  • make a framework
  • roll your own concurrency scheme
  • plan too far ahead
stili
A: 

From a database perspective, there are a lot of things that are known to be faster most of the time in querying. For instance, avoiding writing cursors is not a premature optimization it is good design. There are three things which are critical to database development: security, data integrity and performance. If you are not considering the performance of every query you write, then you are doing a bad job. You don't have to optimize to the nth degree, but you should think every time, is this the best performing way to do this query. Once you learn the techniques which perform better they are not harder to maintain or understand. If you don't consider performance when writing queries then your application will work badly and will ultimately fail. Users consider performance as one of their highest priorities and developers should do the same.

And I'm with mjmarsh, if you aren't testing against a dataset as large or larger than you expect the application to have, you are not doing your job. It isn't premature to consider how normal performance will be. What works OK in a small dataset is immediately shown to be a problem in a large dataset.

HLGEM