views:

117

answers:

8

hi,

to be honest, I don't really know what the "small green men" in my cpu and compiler do, so I sometimes would like to know :).

Currently I would like to know what's faster, so that I can design my code in a more efficient way. So for example I want to calclate something at different points in my sourcecode, when will it be faster to calculate it once and store it in a variable that's read and used for the next points it's needed and when is it faster to calculate it everytime?

I think it's depending on how "complex" and "long" the calculation is and how fast then cache is, where variables are stored, but I don't have any clue what's faster :).

Thanks for any reply to my tiny but important question!

Andreas

PS: perhaps it's important to know that I code in JAVA, but it's more a genral question.

+7  A: 

It will generally always be faster to store something calculated once, rather than calculate it each time.

But depending on the complexity of the calculation, it may not matter. For example, if you want to calculate the sum of 3, 2 and 1, the speed increase from calculating it once and calculating it each time will be minimal and not worth the trouble unless you're doing it millions of times.

However, as a counterpoint, I have a list of the first 50 million primes in a file and it's definitely faster to search that file than to check if one of the big numbers is prime using mathematical methods.

You should write your code for functionality and readability first and only worry about performance when you strike a problem.

paxdiablo
Actually I'm not sure... it wouldn't surprise me if probabilistic primality testing were faster than file IO ;)
+1  A: 

There's no real answer to this question. It's quicker to store things in the cache if cache accesses are quicker than calculation. It's quicker to calculate things if cache accesses are slower than your calculation.

As with all "which is faster" questions, the answer is to profile, then profile again, and then profile again. Is your calculation really the bottleneck, or are you waiting on I/O somewhere? Try things and see!

Dominic Rodger
+1  A: 

In general, you should code whatever is most readable and maintainable, and only start looking at performance once you've got it working AND found that it's too slow. There's no point optimising something that runs just fine.

Since calculating something once and saving the result is more maintainable (if the calculation changes, you only need to change it one place) you should do that. Repeating yourself in code is bad.

You'll find that it's faster to do this as well. I can't think of any sensible situation in which it wouldn't be at least as fast to retrieve the pre-calculated result as compared with re-calculating.

Airsource Ltd
You can find code to be too slow before it's working. And you can see some (a lot) of performance bottle necks before you get too far into a project. For example... if you have the option between batch processing requests over a remote process (or connection) this will be faster than making each request separately. This is typically due to protocol overhead, latency, bandwidth and serialization. You can even have these issues with reading from files, ports, and devices.
Matthew Whited
@Matthew - beware statements like "this will be faster". Who knows? Profile, profile, profile.
Dominic Rodger
Feel free to yell at me for "pre-optimizing", but I have been working with electronics, networks, and computers long enough to identify some bottle necks before they are an issue. Just looking at some code screams "I am a problem". After you get the results back from the profiler you still need to find the solution to the problem. And the earlier you find that... the less it costs.
Matthew Whited
@Matthew - absolutely - but I'd argue that failure to batch requests over a remote connection is a failure in the design rather than the code?
Airsource Ltd
And a design feature that lots of developers miss.
Matthew Whited
+2  A: 

The only time I can think of in which it would automatically be slower is if you stored the value in a file, or in a table (e.g. a database). Then it could be slower to retrieve it than to recalculate it. That does not seem to apply in your case, but you said it was a general question.

But you seem to be dancing around another question -- is it better to use lots of globally stored values. If that is part of your question, then my answer does not address it at all.

MJB
+1  A: 

It should be faster to retrieve over re-calculate but you risk slipping into premature micro-optimization, by deciding different techniques without any instrumentation / measurement that it would make any difference.

Andrew
+1  A: 

Generally speaking, in situations where this matters, the compiler is smart enough to see it by himself. I am talking about the JIT compiler which lurks in the JVM. That compiler is known to be able to inline method calls and the like.

Also, the general advice is to write clear code and to worry about such low-level optimizations only when:

  • the code is correct (it yields the expected result);
  • there is an actual, measurable problem of performance;
  • algorithms are fine (e.g. you are not using a quadratic sort or something like that).

At that point, you have all that you need to measure the impact of using an intermediate variable, and that's good, because the answer to your question is, in all generality: "well, it depends."

Thomas Pornin
A: 

I would say calculating it once and storing in it.

In the example above, you are not likey to be adding static values together all of the time. The values you will be calculating are most likely going to be other variables stored in memory/on disk.

So if you

  1. gather all your variables
  2. calculate the result
  3. store it in memory

From here on in you will only need to retrieve the value, rather than repeat steps 1 and 2.

A: 

Caching is generally faster for any computation that is even somewhat complex, but there's a trade-off involved. By caching the results of your computation, you have to take some responsibility for invalidating the cache when the result of the computation is no longer accurate. Take the following example:

public class Rectangle
    private int _height;
    public int getHeight() {
        return _height;
    }
    public void setHeight(int value) {
        _height = value;
    }

    private int _width;
    public int getWidth() {
        return _width;
    }
    public void setWidth(int value) {
        _width = value;
    }

    private int _area;
    public int getArea() {
        if (_area == 0) {
            _area = _width * height;
        }
        return _area;
    }
}

Here, the area calculation is stored so it doesn't need to be recomputed (this is a bit of a trivial example, and multiplications aren't really costly enough to justify caching, but you get the drift.). The following code works fine:

Rectangle r = new Rectangle();
r.setHeight(5);
r.setWidth(5);

r.getArea(); // returns 25.

But if we modify any of the information used to calculate the cached item, the code has a bug:

r.setHeight(3);
r.getArea(); // returns 25 - should be 15!

In this case the solution is easy - reset the area whenever height or width is set. But if your calculations become more complex, the calculations depend on something outside a single class, or your caching mechanism is more complex than a simple member variable, it can become somewhat troublesome to manage your caches.

In general, whenever you decide you want to use caching, you should proceed with caution and unit test heavily to ensure you're not introducing bugs.

Ryan Brunner