views:

822

answers:

21

I come from a DBA world and performance has always been an obsession. I am turning to development and I always think about performance, constantly, all the time.

Reading on SO sometimes seems that performance does not matter. For instance for evangelists on hibernate (or any other ORM).

As a developer, when do I have to think about performance and when not?

+13  A: 

The Knuth quote ("We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil") probably applies.

When you drive your car, do you constantly and consciously check how close your car is to the curb? If you have enough experience driving a car you learn to know where its edges are and roughly how to drive and park it without hitting something close by.

The analogous kind of intuition/experience for programming performance is important to gain through trial/error and questions, but you shouldn't have to spend your time constantly double-checking yourself.

Jason S
The car thing is a weak analogy. Consider this: Premature optimization is like maintaining the perfect tire pressure to optimize gas mileage while driving twice as far because of a wrong route in the first place.
S.Lott
Remind me to get off the road when you are around; I agree with S. Lott.
Software Monkey
Programmer intuition at the root cause of a software problem is often wrong, so use a profiler to isolate slow spots. Then, write a program that checks the run times of unit tests, and have your continuous integration platform notify you if (important) tests take too long.
Robert Paulson
don't take me literally! I'm just tired of everyone quoting "Premature optimization is bad bad bad!" when a lot of people's questions on the subject are attempts to develop some intuition for what are good ways to improve performance. YES it should be measured if it matters.
Jason S
Thank-you for clarifying Jason. It wasn't clear that this is what you were getting at.
Robert Paulson
@S.Lott - nice one.
ldigas
+17  A: 

Generally speaking, obsessing about performance or optimization is the route to much evil in software development. Usually, only about 5% (or less!) of your code has any impact on the performance of the overall system. Your primary goals, first, as a software developer on most projects is getting to correct and reliable functionality and also of course maintainability of the system. Then, once implemented and working correctly, you evaluate performance, find out where the bottlenecks are, and optimize them accordingly to meet your overall goals.

One Caveat: Doing O(n) type evaluations of approaches you take to things are reasonable things to consider in advance as part of the original system design and algorithm selection, etc. just to feel confident the performance will be "in the ball park". But beyond that, most attempts to optimize things in advance of actually measuring where the bottlenecks are will result in optimizing things that don't matter and usually make things less maintainable, harder to understand, etc.

Tall Jeff
That's one big caveat though - for 1000 items, the difference between O(n) processing and O(n!) processing can be the difference between 10 ms and the lifetime of the universe.
Eclipse
@Josh .. spot on. Any code where n < 10 will be probably be fast enough. Know what n is expected to be. Need to think when we write code.
Robert Paulson
Think? Did we pick the wrong profession?
Justice
+5  A: 

When it does?

No, seriously. There are some applications that will never have enough users to warrant more than the basic indexes and key relationships in the database. They won't require tuning inner loops of the code. Workgroup size applications, for example.

As things scale, so does the demand for optimized paths, both in code, data access and communication. If you are working on limited hardware (embedded systems) you care a lot about performance. But there are many, many applications that will never see enough users to make the systems resources even notice you are there.

In those cases, all that extra work is wasted money and effort. In some cases your specification makes it clear you need that extra effort. In some cases it makes it clear you never will.

Godeke
+1  A: 

You should optimize for performance after you've built your application and only if you have proof where your bottlenecks are. (E.g. by profiling) Maybe you find out that it's not necessary to optimise, especially because the compiler is usually much better at it than you are.

Georg
Only if performance is not a requirement, or it's a really small and trivial program, should you wait until complete. In most other cases it's too late and harder to change.
Robert Paulson
+4  A: 

I had a phase where I was absolutly paranoid about performance. I spent so much time trying to improve performance that my code never actually progressed. Don't get into that habit :-)

Code, THEN optimise, not both at the same time.

Adam Gibbins
A: 

When do I have to think about performance and when not?

You can think about performance, whenever you spare any cycles from thinking about correctness. :-)

Most of the time, though, your thought will be "the performance of this bit doesn't matter: either, because the performance of the whole application is still alright even with this bit, or because the cost of this bit is negligible or infinitessimal compared to the cost of that other bit."

ChrisW
This is why so many programs perform at the just barely tolerable level. How nice it would be if I clicked my Open Office shortcut on my state of the art quad core system with 15K disks and it was just instantly there.
Software Monkey
http://en.wikipedia.org/wiki/OpenOffice.org says, "Critics have pointed to excessive code bloat and OpenOffice.org's loading of the Java Runtime Environment as possible reasons for the slow speeds and excessive memory usage."
ChrisW
A: 

For me, when i am working with image manipulation, multiple large sql queries or loops that pass the 100k mark, i think about optimization. Otherwise i don't, unless i see its slow when its working.

Ólafur Waage
A: 

I am working on building a search engine. Optimization is what makes the the difference between a user continuing to search or leaving the website. I think It is true for many applications, and unfortunately many of them do not care enough about it. Sometimes it is much cheaper that throwing more hardware at the problem. Unfortunately the latter is easier. To summarize, I'd say you have to optimize whenever you either have to process a LOT of data and/or process it very quickly.

A: 

I find my typical pattern for working through a unit of development is:

  1. Put together the least version that works through the primary use case from beginning to end. In this step I mainly focus on bare-bones simplicity and a good implementation of whatever patterns apply.

  2. Step 2 is to refactor step 1 mainly with an eye toward simplifying. Since I've been doing OOP, the one thing I seem to be able to count on is that this step always presents lots of obvious simplifications and actual reduction in code. It also is the point where the obvious abstractions fall out (which is another simplification.) IMPORTANT NOTE: This has a strong secondary effect of addressing performance, especially when you have done it a few times and you know where the performant antipatterns are.

  3. Often, when #2 is done, things are satisfactorily performant, but the tests will confirm this or not; and also point out the (usually very few) locations where optimizations need to be addressed.

Increasingly I see that any time I spend thinking about efficient design in phases 1 and 2 messes up the simplicity, which at that point is primary.

le dorfier
+2  A: 

Don't think about performance until after you've got it working correctly. If it works correctly, and it doesn't have any user noticable performance problems, don't optimize.

If it works correctly and it has significant and noticable delays, don't optimize. Profile instead. Most of an application's time is going to be spent in a single "hot" loop, and which loop it is is seldom intuitive. You need real measurements and science to tell you what's happening. Once you have your profile data, your optimization task should progress from big to small:

  1. Architecture optimizations. Is the overall structure of the application the source of the inneficiency?

  2. Algorithm optimizations: Are you using the right data structures? Are you accessing them in the right way? Is your application spend most of its time writing, or most of its time reading? Optimize for the answer to that question.

  3. Last resort. Microoptimization. Streamlining the hot loops, or unrolling some loops. Duff's Device. Don't optimize at this level until you've determined that you can make no further improvements to the other two levels, and you still haven't met your performance goals. This level of optimization has a high likelyhood of breaking shit, and making your application more difficult to grow, more brittle, so don't do it unless you really really have to.

Again I will emphasize, don't waste your time on optimizing just any random bit of code that looks inefficient. Optimization time is a significant investment. You should have evidence to back you up before you gamble your time on a loser.

Breton
There is definitely a difference between thinking about performance and thinking about optimization before correct operation. The former is not only fine but is a good idea; the latter is debatable.Building inherently poorly-performing code is foolish.
DocMax
@DocMax: Totally agree.
Software Monkey
By the time you've done the full profiling, it's too late for architecture optimizations, and a pain for some algorithmic optimizations.
David Thornley
@David - only some very small percentage of the programs you write will need optimizing. The time saved not optimizing the others more than makes up for the occasional overhaul, IME. On the other hand, I generally assume that folks on SO are not making the obvious mistakes...I could be wrong. :)
Sarah Mei
A: 

Like other people have said, optimizing before you know the problems is a waste of time. I recently worked on some code that had a lot of optimization, and most of my time was spent removing the optimization! The problem was that it made adding critical functionality impossible.

Another example...in my previous job I worked in Data Processing, writing scripts around executable programs. If I started a program at 5 PM at night, and it finished by 8 AM the next morning, it was fast enough. Of course, in the case of emergency it was much better for it to take one hour instead of ten, and faster code made my job easier, but as long as it ran correctly 30 minutes was equivalent to 16 hours.

It depends totally on your project...and should be considered in your project requirements.

Remember also that making a program more efficient takes longer...you're trading off speed of development for speed of execution.

Mark Krenitsky
+5  A: 

I think there are two contradictory proverbs that are relevant here.

1: Premature optimization is the root of all evil.

2: Look before you leap.

In my personal experience, it has been the case that when code is first written, the probability of finding the magic 3% of the code that is using 90% of the resources is very easy to find. This is where the first proverb is relevant, and seems to produce great results. As the code base matures however, it seems to be that instead of 3% using 90% of the resources, you suddenly have 50% using 90% of the resources. If you imagine the analogy of a water pipe, instead of a few big leaks, you now have the problem of multiple small leaks, all over the place. This gives the overall application slow performance, even if its hard to pin down to any one individual function.

This is where proverb 2 seems relevant. Don't rely on the first proverb to not do any performance planning, have an overall plan, even if it an evolving one. Try work out some acceptable performance metrics and time your program. Consider the later performance implications of design choices. As an example, one might plan to use a tuple store rather than a database in advance if all that is needed is a tuple store. Starting with an SQL database and then changing to a tuple store later is quite difficult.

Above all, do try to optimize where its easy and make notes about cases where optimization is possible. If you don't, as time goes on, programs tend to suffer the death of a thousand cuts, as the effect of functions that are 5-20% slower than they need to be add up and indeed multiply.

ErgoSum
A: 

It usually comes down to the requirements. In some cases you have very strict non functional requirements for response times, etc. In those cases, you should be extra effort to tweak your code, procedures, etc.

But as a rule of thumb (IMHO), you should build your code based on best practices around reliability and maintenability and then have a specific round of tests around performance. This way, you will know that you will be tweaking only the bits of code that are actually impacting the performance.

Wagner Silveira
A: 

Answer 1:

Many people rightly say "Don't think about performance, optimize later", but remember that they have unit tests. They can rewrite large portions of their codebase without fear of introducing bugs. If you rewrite without any unit tests, you have to manually test everything again, and this time around it's harder because the algorithms are more complex.

You can put off optimization until later but you have to make sure you're prepared for it. If you don't have a workable plan for optimizing later (unit tests, tools to profile your code), then you'd best think about performance now because it will hurt you much more later on.

Answer 2:

Sometimes the simple working solution you first come up with runs in O(n^n) time. If you know you'll have large data sets, then go ahead and optimize it now.

Answer 3:

Some time ago, I got sick of the blemishes in PHP and tried to fix some of them. I wrote a framework which involved a base class that everything had to inherit from. It used method and property overloading, reflection, and just about every other advanced feature to make it work. Then I went ahead and used it in a massive project, using my own framework features instead of the basic language features like isset() and static. The project's code was a little more tidy, but the magic slowed down every method call, and property access by about 50x.

If you're going to try and extend the language itself, you need to think about performance now because you have to re-write everything if you can't optimize it. C has a zero-cost macro system, go for your life. Javascript has no such system, be very careful about writing a new object inheritance system that you want to use everywhere.

too much php
A: 

You want to think about performance regarding a particular issue as often as you might have to deal with performance issues on each issue one day.

Meaning -- if it isn't going to get a ton of use, worry accordingly. Don't be incredibly lazy or inefficient, and don't over obsess in getting algorithmic nirvana each time. It's often best to code simplest and make it faster/optimized as needs arise. In the meantime simple code can be worked on by any developer and it's something worth considering.

If you see it's importance increasing, now you know to think about it some more as it will come to bite you in the rear.

As developers we have an issue of wanting a perfect v1.0. A working v1.0 is soemthing that works, not perfect for every situation the future may ever bring.

A good example for me was when I started playing with databases many years ago. I didn't know what additional indexes were or the great performance boosts they give when queries unimaginably slow down.

We can't predict every problem. I try to do good design, and let problems fight for my attention.

Hope something was of use.

Jas Panesar
+4  A: 

Citing Knuth's 'premature optimization .. evil' is a poor argument for writing sloppy and slow code (correct or otherwise).

  1. You need metrics to optimize.
  2. You need to think about code when you're coding it.
  3. You only optimize the subset of code that matters.

If you're writing a simple form to update some details, it's probably not worth optimizing.

If you're writing a google search engine replacement, and you expect to have a lot of traffic, then you figure out a way to make that search as fast as possible.

You only ever need to optimize code that counts, and there's a lot of code in a program for doing one-off things, or events that happen rarely.


Given that we satisfy 1, 2 and 3 above:

There's little point waiting until the app is 90% complete before doing any performance testing and optimization. Performance is often an unwritten non-functional requirement. You need to identify and write down what some of these unwritten requirements are and commit to them.

It's also probably too late by 90% complete if you need to make an architectural or other major changes. The more code that has been written, the harder it gets to change things, if only for the fact that there's more code to think about. You need to continually make sure that your app will perform when and where it needs to.

Then again, if you have well written unit tests you should be able to have performance tests as a part of those tests.

My 2 shillings at least.

Robert Paulson
For this answer, I would give you my entire daily allotment of votes, if I could... I suspect the general community has been too far damaged by years of misinterpreting Knuth to give this answer the votes it deserves.
Software Monkey
Thanks @Software Monkey .. I agree with your assessment.
Robert Paulson
very good thoughts! +1
Jason S
A: 

It is best to write code and then identify critical areas that would benefit most from optimization.

Sometimes code gets replaced, removed or refactored. Optimizing too much code can be a waste of valuable time.

Tom Conder
+3  A: 

Performance is not something that can be pasted on at the end of a project.

cherouvim
So what, you're not allowed to edit your code after writing it? Sure the design shouldn't be idiotic but you can optimise code six months after writing it.
Quibblesome
@Quarrelsome: Of course you are allowed. It's an iterative process. But what the quote means is that performance is not a "feature" that you can start designing and implementing at some point in the project's lifecycle.
cherouvim
A: 

Nope, if you're going to be thinking of something then think about delivering value to your employer/client and customers. Think about what counts for them.

That said, performance is important but it can lead to ruin.

Dawn of War 2 was released in February with a game breaking bug that destroys multi-player. The issue? Population cap. When a squad is re-enforced the last unit takes up double the cap due to a coding error. This means you can be in a situation where you have a very small army and when you try to create a new unit the game tells you that you have too many units on the field. Most frustrating.

Why should this be an issue? How can this be an issue that only occurs with re-enforcement? If it was an issue with just buying the unit then surely it would have been discovered in testing!

Well my guess is that it is due to premature optimisation. The developer didn't want to foreach all the units when you click the "buy unit" button and instead made pop-cap like a bank so when a unit is created it takes cap out of the bank and when it dies pop cap is put back into the bank. Sure that's more performant, but one little mistake throws the whole bank out of step.

So what's worse, a small perf hit when you press that "buy unit" button or a lot of flaming in the Dow2 forum, angry customers and MS dragging their heels with certifying the fix meaning that it isn't fixed yet?

In a lot of cases its better off marking a

// todo: performance could be better
// try doing xyz if we need to improve it

because the performant version takes more time and adds a maintenance cost to the code.

The performance you should be worrying about is delivering a solution to your client that is satisfactory and fulfills their need. The speed of getting to the release date is usually more important.

There are scenarios where general performance is important such as embedded systems but this should be known as a restriction up front and is a special context you should be aware of before you hit the code.

Quibblesome
A: 

I may be disagreeing with commonly accepted wisdom, but I think that you do have to think about performance all the time. The important thing though is how you think about performance.

Often if you start talking about performance, other people will start talking about optimisation, rather prematurely you might say, as performance and optimisation are not the same thing.

Paying too little attention to performance is prematurely optimising for the kudos that accues from not optimising prematurely.

Optimise is a verb that takes an object. Until you have that object, you cannot optimise it. You can optimise code once it has been written (though it may only be a single method). You can optimise a class model or a functional spec once they have been written.

Working on performance is a matter aiming for the optimal, whether a priori or posteriori. Some sort of performance work is only appropriate posteriori and this is what should be considered optimisation, and is premature to do a priori. Some sort of performance work is as, or more, appropriate a priori.

The main thing to try to get right a priori is whether your basic approach is reasonable. One example often given here is that of time complexity of algorithms, but I have to disagree. It is not always better to do something in O(1) or O(n log n) than in O(n). O(n) time is the same as O(1) time when n is 1, and faster when n is 0, and datasets with 0 or 1 item can be common in a lot of cases. More so, time complexity notation deliberately ignores lower-order terms and constants. Really O(n) time means kn + c (and possibly other lower-order terms) while O(1) time means k + c, but for different values of k and c. If this algorithm is itself inside a loop, it could be that O(n) will massively beat out O(1).

So time complexity isn't the thing that needs to be considered here. The thing that needs to be considered here, is whether time complexity is the thing to be considered here. If it is, then it's time to look at whether the case where O(n) beats O(1) because of lack of overhead applies, or whether one should go with the more common case where O(n) beats O(1), or whether one should ignore time complexity here and just do what reads more naturally. E.g. a common case of time-complexity competition is whether to use a list and search it or a hashed-based set and query it based on the hash. However with most libraries the code for each is going to look different so there will be one that better describes the intent, and that's the one to go for when it isn't performant critical.

The important a priori though about performance here, was whether it was worth thinking about performance at this stage.

Another case of basic approach, is in how remote resources are handled. If you are going to access the same rarely-changed remote resource multiple times a second, you need to make sure that your access code has either some degree of caching, or that it will at least be easy for you to put that caching in. Locking down to a particular approach to caching may or may not be premature, but tightly mixing in your access with other matters so that its hard to add that caching later is almost certainly premature pessimisation!

So we need some thought from the beginning, though we don't need to solve everything at this stage.

Another reasonable thought right at the beginning, is I don't need to think about the performance of this part right now. This is going with the people who say not to pay attention to peformance a priori, but on a finer-grained level. It's putting a small amount of thought into being reasonably confident that it is indeed okay to not think in more detail about performance.

Another reasonable thought is I am pretty sure the performance of this section will be critical, but I'm not yet able to measure the impact of different approaches. Here you've decided that optimisation will probably be necessary, but that aiming for the optimal now would indeed be premature. However, you can lay the ground work by making your functional boundaries such that you can more easily change the implementation of the suspected-critical part, or perhaps put time-logging code into that function esp. on debug builds, and esp. if it's quite far from the calling public method (so it isn't equivalent from doing the time logging in an external test). Here you haven't done anything a priori to make the code faster, but you have worked to aid later optimisation.

Another thing it is reasonable to think about is whether something should be done in a multi-threaded way, but note here that there are three reasonable thoughts; as well as this will need to be multi-threded and this will not need to be multi-threaded there is also this may need to be multi-threaded. Again, you can define functional boundaries in such a way as to make it easier to later re-write the code so that that particular method is called in parallel to other work.

Finally, you need to think about how able you are going to be to measure performance after the fact, and how often you are going to have to.

One important case here is where underlying data will change over the course of the lifetime of the project. With your DBA background you will be well aware that the optimal solution for a particular set of data with a particular balance of frequency of operations is not the same for different data with different operation frequencies even if the data fits the same schema (e.g. small table heavy reads and writes will benefit less from heavy indexing than the same table schema having many more rows, few writes and many reads). The same applies to applications, so you need to consider whether you are going to have to just do optimisation in a later optimisation phase, or if you are going to have to return to optimisation frequently as conditions change. In the latter case it's worth making sure now that it's easy to change parts likely to change in the future.

Another important case is where you will not be able to easily obtain information about how the code is being used. If you are writing an application that is used in a single site, you will be able to measure this very well. If you are writing an application that will be distributed, this becomes harder. If you are writing a library that will be used in several different applications, then it is harder still. In the latter case the argument that YAGNI becomes much weaker; maybe someone out there really does need that slow method to be much improved and you don't know about it. Again though, the approach you need to take isn't always the same; while one approach is to put in work up-front to make it more performant in this case (not quite a priori as it's posteriori to your library being written but a priori to it being used) another approach is simply to document that the particular method is by necessity expensive and that the calling code should memoise or otherwise optimise the way their own code uses it if appropritate.

Constantly, if mostly subliminally, thinking about performance is important, it's just that the appropriate response to that thinking isn't always make it go faster now.

Jon Hanna
A: 

I'm going to disagree with the pack here to some degree.

You always consider performance--but that consideration is usually to dismiss it. Look at how often the code will run. If the answer is "once" or "rarely" performance is basically a non-issue.

Only when a piece of code is going to execute frequently do you need to pay attention to performance. Even in this case you generally should only look at the O() class of the routine until profiling shows a problem.

The only time I will consider anything like detailed optimization when I'm originally writing is code inside an O(scary) routine. (Example: A test to ensure a set of data contained no dead ends. O(n^8), although with substantial pruning. I was careful of performance from the first line.)

Loren Pechtel