Improving the performance of C code...

views:

439

answers:

+4 Q:

Improving the performance of C code...

What is the most unorthodox way to improve the performance of C code? This is no-holds-barred! Everything goes including changing loop structures to gotos, hardcoding anything and everything, using case statements in bizarre ways etc. Don't worry at all about maintainability, readability etc.

p.s. This is practical... and I'm well aware of how to improve the performance of code in reasonable ways (improve algorithms, profile before you optimize etc)

+16 A:

In my experience the most unorthodox way of optimizing C code is to profile the application, identify slowly performing structures and or db hits and then design reasonable solutions around them using Big O analysis.

JeffreyABecker 2009-11-24 02:53:15

+1 wow. I have never *ever* heard of such a technique.

rascher 2009-11-24 02:57:55

This doesn't really deserve +6, since it's not at all unorthodox and goes against the question....... but it is reasonable so I can't downvote you either :\

Mark 2009-11-24 03:03:32

@Mark: it's a joke, saying that the perfectly logical way to optimize is "unorthodox" because few people actually do it that way.

Andrew Medico 2009-11-24 03:07:17

more to the point people tend to look at you like you have and irregular number of heads if you suggest it.

JeffreyABecker 2009-11-24 03:10:28

straightforward solution - simply statistically strange

Andy Dent 2009-11-24 04:18:55

+6 A:

Duff's Device is the canonical example. It's so weird that Tom Duff admitted, "This code forms some sort of argument in [the debate about fall-through in case statements], but I'm not sure whether it's for or against".

Grandpa 2009-11-24 02:54:04

+4 A:

Profile your code, find the slow spots, and use inline assembly to optimize them.

Robert Harvey 2009-11-24 02:54:38

When I worked at a game company, we did that But eventually, you hit diminishing returns and you have to look at the big picture. We often found that rearranging the layout of data structures affected overall performance to a large degree.

Nosredna 2009-11-24 03:05:47

You forgot step 4: profile again to be sure your inline assembly didn't actually slow the code. I've seen that happen.

Dour High Arch 2009-11-27 23:46:20

+1 A:

Duff's Device & Carmack's Fast InvSqrt.

Jason 2009-11-24 02:55:20

And Carmack's isn't Carmack's.

Nosredna 2009-11-24 02:57:13

Right, but it's known as such.

Jason 2009-11-24 03:01:22

It's also typically a performance *penalty* on most modern hardware (since most architectures have a hardware reciprocal square root instruction that stays in the FP domain).

Stephen Canon 2009-11-24 03:05:20

+3 A:

You're looking for an unorthodox, no holds-barred, yet general-purpose solution to optimizing C?

Rewrite it in assembly language.

RickNZ 2009-11-24 02:56:45

+2 A:

1) Loop unrolling. You save a jump, comparison, and increment every iteration if you don't actually loop.
2) Avoid double-indirection. It's usually faster to perform arithmetic that retrieval, so a[y*height + x] is usually faster than a[y][x]. Plus a one-dimensional array of size MxN saves M (or N) words worth of pointers compared to a rectangular matrix of dimensions MxN.
3) Use ridiculous assembly optimizations whenever possible. For instance, on the x86 architecture, you can use the BSWAP instruction to swap bytes in one operation instead of the normal temp=a; a=b; b=temp; pattern.

And of course, don't forget:
4) Don't do bounds checking or error handling.

That having been said, I'd avoid all of these except (2) in practice.

Dathan 2009-11-24 02:57:19

Except that most of this is useless because the compiler will do it.

Zan Lynx 2009-11-24 03:03:04

In most cases "unorthodox optimizations" are pointless - pointing out that answers to a pointless question are themselves pointless is kinda... pointless. (c:

Dathan 2009-11-24 03:05:44

Can't compilers automatically do 1 and 2? And shouldn't there be a library full of assembly hacks for these sorts of things?

Mark 2009-11-24 03:05:56

Darn. Thought I was being clever - you actually said "useless." Doh!

Dathan 2009-11-24 03:06:15

I haven't looked at the machine code output of a C compiler lately, but since 1 and 3 were covered in my undergrad compilers course, I assume they're pretty widely implemented.

Dathan 2009-11-24 03:07:27

+5 A:

Abusing the constant 0x5f3759df to compute inverse square roots quickly has to rank pretty high...

Grandpa 2009-11-24 02:58:28

+1 A:

Your compiler is almost certainly better at optimizing than your ugly attempts would give you. Most of the historical little tricks are now pointless. People ignoring readability and maintainability tend to write code that ends up less efficient because the real optimizations are made more difficult.

When code has been optimized in all ways possible and still needs performance gain, rewriting critical portions in ASM is the best hope to have any effect.

Mike Graham 2009-11-24 03:00:12

+4 A:

Use inline assembly?

Seriously though, if by merely changing the C code you can improve performance, chances are you can do it cleanly.

A few exceptions:

1) If you are relying on alignment semantics for pointers of different types often you can perform block operations on pointers that technically expose your application to a bounds overrun condition, but in practice does not because of your system's alignment characteristics. So a memory copy can be performed, by aligning initial char's then the inner block can be done using a long* pointer.

2) It may be possible to copy stack frames in clever ways if you know the memory order in which your compiler assigns local variables. This may allow you to implement co-routines which the language doesn't otherwise support. Coroutines are often a simpler and faster way of implementing some kinds of loop control.

3) Unions are always a bit "hacky" however you use them. Its a way of implementing polymorphism with fairly loose type checking.

4) Use of the C preprocessor as a way of auto-generating code is usually very difficult to debug and read. As such people tend to avoid this.

Paul Hsieh 2009-11-24 03:00:50

+1 A:

In DSP applications, it's still worth going to assembly language to get the best performance out of the SIMD instructions that C compilers don't do very well with. But that's not really a "C" solution.

Something I do pretty often is use curve-fitting software to replace functions with approximations that are faster to calculate. Sometimes LUTs are still faster than doing a bunch of calculations, but not as often as they used to be.

Nosredna 2009-11-24 03:01:35

+1 A:

See this chapter, It’s a plain Wonderful Life by Abrash (it's about 5 pages: click 'Next' at the bottom of each screen).

Summary (some quotes from the article):

Table-driven magic (huge lookup table and incredible state machine)
An approach to performance programming that operates at a more efficient, tightly integrated level than you may ever see again
Astonishing economy of effort

ChrisW 2009-11-24 03:02:26

+1 A:

There is nothing unorthodox left to do for C code performance. All of the effective techniques have been "orthodoxed".

The best I've found is to use a profiler with access to CPU performance counters and pay special attention to cache and branch misses. Add cache prefetches wherever you can and remove unpredictable branches wherever you can.

Don't bother with loop unrolling. If the branch is predictable it is almost free. Let the compiler worry about it.

On some very parallel architectures like IA64 it can be faster to unroll a loop all the way to the end. One example of this is avoiding the C string functions. Use memset to zero a string array, memcpy to set the string and memcmp to compare the entire array against another similar array. This can use 64-bit loads, never has to check for the zero terminator and can be optimized to not loop or branch at all if using a "small" array size of 64 or 128. The memxxx() functions are usually compiler built-ins and very optimized.

Zan Lynx 2009-11-24 03:14:03

+1 A:

I hear a lot of answers of the form "Try doing X, Y, or Z", but that's like saying "Hear, have a fish, and eat well for a day".

I would rather teach you how to fish - for performance problems. People who say "Profile First" are on the right track but (IMHO) are far too timid.

Here's an example of aggressive performance tuning.

Here's a short explanation of why it works.

Here's a long explanation of why it works.

That will teach you to fish by showing you how to find out where the fish are and how big they are. Once you find them, you can cook them up (fix them) in many wonderful ways. The great thing is, once you find and dispose of one fish (performance problem), the others get bigger and easier to catch.

Mike Dunlavey 2009-11-27 23:29:39

+1 A:

For point 3 above within Dathan's answer, another way of swapping, you can swap variables in an unconventional manner using xor.

int = 3, y = 4;
x = x ^ y; 
y = y ^ x; 
x = x ^ y;

Now x and y are swapped! :)

Another thing, when you are dividing something with 2, it is better to use shift right operator. Same could be said for multiplying by 2, shift left.

In the old Borland C compiler, there was a _stklen property which you can assign in order to reduce stack size and code. I haven't seen anything like that nowadays as compiler technology has advanced since.

When using malloc, it would be better to calloc instead as it initializes memory to zero instead.

Usage of the ternary operator instead of the if/else statement is apparently faster, I guess that compiler writers have got more smarter with regards to machine code generation. I simply cannot provide proof of that in that regard, but it was held true back then when Borland C 3.01 ruled the roost.

Inlining code with assembly routines.

I like this question topic as it reminds me of the old days when memory was precious and having to squeeze a pint into a quart pot and used the hocus pocus tricks of the x86 code. Thanks for posting this question Mr.Database.

Take care, Tom.

tommieb75 2009-11-28 00:02:08

I should also mention that when dealing with arrays, it is faster to do so using *(some_array + n) when you have declared char some_array[50]...but that could be irrelevant now with regards to compiler technology today... ;)

tommieb75 2009-11-28 01:55:11

+1 A:

Inline Assembly.

s1n 2009-11-28 00:04:24

ansaurus

tags:

views:

answers:

Improving the performance of C code...

related questions