views:

182

answers:

9

hi, i'm writing an interesting program, and every performance hit is very painful for me. so i'm wondering what is better - to make an extra "if" statement to reduce a number of function calls, or to avoid those "if" adn get more funciton calls. the function is virtual method, that overrides IEqualityComparer's method Equals, all it does is comparing the size and the hash of 2 files. the if statement compares the size of these 2 files. i think you got the point of this logic. as you can see i'm writing this program in C#. So maybe anyone can answer me, because this is not the first time i'm wondering what to choose. thanks

A: 

My guess would be that an if statement is better, but with today's advanced compilers you can never really tell. Your best bet is to try both and compare the performance.

Greg Bray
+3  A: 

Have you tried profiling to find out? Are you sure that either of these is the bottleneck in your application?

Oli Charlesworth
no of course not. these moments are bottleneck more in my view of program. everything is fine. i just can't decide what to choose. profiling works very craftily. when i compare all the files in directory and it's subdirectories, than the next time it does the same work about 40 times faster even if i change a lot of stuff. so i cant compare the result of same folders=(
Mark
@Mark: there's no point making this kind of decision if you have no idea whether it will make any difference or not!
Oli Charlesworth
A: 

In the old days, back in the 486 and older days, when CPUs were "dumb", branching logic (e.g. an if()) would cause a pipeline and/or cache flush, which would slow things down. These days, with modern compilers and out-of-order branch-predicting wash-your-dishses-for-you CPUs, such overhead is minimal.

The only proper way to answer your question is: benchmark both methods and see which is faster.

Marc B
+1  A: 

Is the pain caused by the actual performance you observe while testing or just by the fact that you think about possibility of wasting a few cycles? If it's the second case the only sane way to fix the problem is by working on attitude.

The cost of a branch is very hard to predict, because modern processors use some very clever techniques to speed the execution. They store some special data structures that are used to predict the branch target. The branch is very cheap if the prediction is correct and pretty costly otherwise. The rate of incorrect predictions is low, but of course not zero. I don't think You can get a definitive answer for your question

Maciej Hehl
+5  A: 
  1. If you really need that much performance so badly, why don't you program in assembly language?

  2. If you are still sure you absolutely need to worry about this, first check for other optimization opportunities that have more potential (a better algorithm can make orders of magnitude more differnece than any microoptimization).

  3. If you optimized the living shit out of everything else, the only way to be sure is to profile. Really. No matter how hard anyone of us tries to guess, they will likely underestimate the JIT.

  4. Still I have an opinion on this: Generally speaking, branch misprediction can hurt much more than a function call, since it screws the cache. But who says it compiled down to code that is likely to blow the cache? Edit: But since it seems like you're comparing file contents for strict equality, short-circuiting in case the length differs can save much time (Consider: how long does it take the filesystem to tell you the length? It likely already knows, so nearly none. How long does it take you to hash a 10 MB file? VERY long, n comparision). So if I guessed that correctly, then go for the short-circuiting, for crying out loud.

delnan
ok, i see your point. well, the problem is not so much in optimizing program, but for knowing the answer, because "this is not the first time i'm wondering what to choose"
Mark
If you see the point, then the answer is: Do what results in the clearest code and expresses the program's logic best ;-)
delnan
A: 

It's really hard to know without profiling. But either way, I can tell you that your algorithms are generally going to be much more important than if vs function, and going with functions usually makes it easier to change out and update implementations much more easily, rapidly, and safely, allowing you ultimately to do more to improve the more important parts of your algorithms. And, again, the way to know how you're doing there is to profile.

Joel Coehoorn
ok, thank you guys. i got the answer i was looking for - "now one can really tell me". that's really what i was expecting to hear. so thank you.
Mark
A: 

The answer depends on one thing: "am I using a completely braindead compiler"

And since you're not, the answer is "it doesn't matter". The compiler and JIT'er heavily transforms your code, so what is actually executed looks nothing like the code you wrote.

For example, function calls can be inlined, eliminating all the overhead of the function call.

Therefore: write code that is easy to understand for yourself, and as a side bonus, it also becomes easier to understand for the compiler when it optimizes your code.

jalf
+2  A: 

Keep if - it will run much faster.

It is clear that creating hash of an file will take considerably more time than if.

dmajkic
A: 

if can have a cost due to branching. The cost depends on the code run in the if case, the code run in the else case, the size of the CPU cache, and compiler decisions.

Function call can have a cost due to, well, to the cost of calling a function. This can be much bigger than for if or it can be zero (because the call was inlined - and inlining can even happen with virtual calls when the compiler could "see" which form was going to be called at compile time), or it can be in between.

Because of this, there really is no general answer to this question. Even if you profile, there is nothing to say that it won't be different on a different architecture even with a binary copy of the assembly (because the jitter will be different) or with a different version of the .NET environment (and here "different version" includes service packs, hot-fixes and patches that touch on it).

Jon Hanna