If there is no cost to the optimization, do it. When writing the code, ++i
is just as easy to write as i++
, so prefer the former. There is no cost to it.
On the other hand, going back and making this change afterwards takes time, and it most likely won't make a noticeable difference, so you probably shouldn't bother with it.
But yes, it can make a difference. On built-in types, probably not, but for complex classes, the compiler is unlikely to be able to optimize it away. The reason for this is that the increment operation no is no longer an intrinsic operation, built into the compiler, but a function defined in the class. The compiler may be able to optimize it like any other function, but it can not, in general, assume that pre-increment can be used instead of post-increment. The two functions may do entirely different things.
So when determining which optimizations can be done by the compiler, consider whether it has enough information to perform it. In this case, the compiler doesn't know that post-increment and pre-increment perform the same modifications to the object, so it can not assume that one can be replaced with the other. But you have this knowledge, so you can safely perform the optimization.
Many of the others you mention can usually be done very efficiently by the compiler:
Inlining can be done by the compiler, and it's usually better at it than you. All it needs to know is how large a proportion of the function consists of function call over head, and how often is it called? A big function that is called often probably shouldn't be inlined, because you end up copying a lot of code, resulting in a larger executable, and more instruction cache misses. Inlining is always a tradeoff, and often, the compiler is better at weighing all the factors than you.
Loop unrolling is a purely mechanic operation, and the compiler can do that easily. Same goes for strength reduction. Swapping inner and outer loops is trickier, because the compiler has to prove that the changed order of traversal won't affect the result, which is difficult to do automatically. So here is an optimization you should do yourself.
But even in the simple ones that the compiler is able to do, you sometimes have information your compiler doesn't. If you know that a function is going to be called extremely often, even if it's only called from one place, it may be worth checking whether the compiler automatically inlines it, and do it manually if not.
Sometimes you may know more about a loop than the compiler as well (for example, that the number of iterations will always be a multiple of 4, so you can safely unroll it 4 times). The compiler may not have this information, so if it were to inline the loop, it would have to insert an epilog to ensure that the last few iterations get performed correctly.
So such "small-scale" optimizations can still be necessary, if 1) you actually need the performance, and 2) you have information that the compiler doesn't.
You can't outperform the compiler on purely mechanical optimizations. But you may be able to make assumptions that the compiler can't, and that is when you're able to optimize better than the compiler.