views:

664

answers:

7

I have a legacy firmware application that requires new functionality. The size of the application was already near the limited flash capacity of the device and the few new functions and variables pushed it over the edge. Turning on compiler optimization does the trick, but the customer is wary of doing so because they have caused failures in the past. So, what are some common things to look for when refactoring C code to produce smaller output?

+2  A: 

Refactoring out duplicate code should have the biggest impact on your program's memory footprint.

Bill the Lizard
+10  A: 
  • Use generation functions instead of data tables where possible
  • Disable inline functions
  • Turn frequently used macros into functions
  • Reduce resolution for variables larger than the native machine size (ie, 8 bit micro, try to get rid of 16 and 32 bit variables - doubles and quadruples some code sequences)
  • If the micro has a smaller instruction set (Arm thumb) enable it in the compiler
  • If the memory is segmented (ie, paged or nonlinear) then
    • Rearrange code so that fewer global calls (larger call instructions) need to be used
    • Rearrange code and variable usage to eliminate global memory calls
    • Re-evaluate global memory usage - if it can be placed on the stack then so much the better
  • Make sure you're compiling with debug turned off - on some processors it makes a big difference
  • Compress data that can't be generated on the fly - then decompress into ram at startup for fast access
  • Delve into the compiler options - it may be that every call is automatically global, but you might be able to safely disable that on a file by file basis to reduce size (sometimes significantly)

If you still need more space than with compile with optimizations turned on, then look at the generated assembly versus unoptimized code. Then re-write the code where the biggest changes took place so that the compiler generates the same optimizations based on tricky C re-writes with optimization turned off.

For instance, you may have several 'if' statements that make similar comparisons:

if(A && B && (C || D)){}
if(A && !B && (C || D)){}
if(!A && B && (C || D)){}

Then creating anew variable and making some comparisons in advance will save the compiler from duplicating code:

E = (C || D);

if(A && B && E){}
if(A && !B && E){}
if(!A && B && E){}

This is one of the optimizations the compiler does for you automatically if you turn it on. There are many, many others, and you might consider reading a bit of compiler theory if you want to learn how to do this by hand in the C code.

Adam Davis
The only caveat would be that any of these processes in time-critical sections will warrant additional testing.
Harper Shelby
I suppose I should add a disclaimer, but the reality is that any change along these lines is trading size for performance.
Adam Davis
If you disable inline functions, and turn macros into functions, aren't you increasing the runtime memory use (more function calls = new stackframes). I'm not sure about this stuff though.
Using register variables (if available) for your inline and macro replacement functions can reduce your stackframe size down to just the return address.
S.Lott
RAM is not currently as tight as flash code space in my specific case, but I appreciate generic answers as well.
Judge Maygarden
Still worth keeping in mind, as devinb points out some of these will increase RAM usage, and in the case of generation functions you're trading flash for ram (generate once at beginning, leave in memory for fast access)
Adam Davis
In regards to variable size, a lot of the floating point math being changed to fixed-point would probably help a lot, right?
Judge Maygarden
monjardin - Only if you got rid of all the floating point so the linker wouldn't link any floating point code. Once you add one floating point variable with a multiply, though, the FP multiply routine is included, and using it later doesn't incur as much a hit as the first one.
Adam Davis
That's good to know as it would have been a lot of work for naught!
Judge Maygarden
The code fits now. Thanks for all the ideas!
Judge Maygarden
Glad to have helped!
Adam Davis
@Adam: Nice work. I would only add that performance concerns are often misplaced, so one shouldn't worry about it unless it becomes a problem.
Mike Dunlavey
A: 

Pay attention to macros. They can produce a lot of code from just one macro expansion. If you find such macros - try to rewrite them so that their size is minimized and functionality is moved to functions.

Pay attention to duplicate code - both copy-pasted and logically duplicate. Try to separate duplicate code into functions.

Check whether the compiler supports inlining and it could be turned off.

sharptooth
A: 

Compiler optimisation that triggers bug ? That's strange. Get a map of your program, and see if you should target data or code. Look for duplicated code. Look for code with similar goal. One example of it is the busybox code, which aims for small memory footprint.

It is favoring size over readability, so it sometimes get quite ugly, with gotos and so on.

shodanex
It's not at all uncommon for embedded compilers to have bugs. The less widely used the chip, the more likely the compiler is to have issues. This is less common today than in times past (especially if gcc is your compiler), but it's still a worry for uncommon (less well tested) platforms.
Michael Kohne
A: 

See http://stackoverflow.com/questions/404615/485660#485660

Mike Dunlavey
This question is in regards to static code size, not runtime memory usage.
Judge Maygarden
Well, one way to reduce static code size is to replace it with byte code and a small interpreter.
Mike Dunlavey
Another way is code generation, if you have configuration info that is constant over the life of an installation.
Mike Dunlavey
I'm removing the -1 because my question wasn't specific enough. Writing an interpreter for a MCU with 2kB of RAM is not a task I'm not going to pursue. Especially since the modification is essentially done at this point!
Judge Maygarden
Mike Dunlavey
I once had to program a 68K-based display processor for a lottery terminal. I did it in C with a byte-code interpreter.
Mike Dunlavey
I'm curious, how was that an advantageous approach (e.g. development time/cost, reduced application complexity, etc)?
Judge Maygarden
1) It saves space, because the interpreter can be simple, and the byte code optimal for the app.2) Factoring the app into those 2 pieces can simplify dev't of the whole.3) If the app gens the code, it can simplify the app.4) If byte code is in RAM, it can be dynamically downloaded.
Mike Dunlavey
... for point 3 (which is not really relevant to your problem), I took over a project, and reduced it by an order of magnitude, by having C gen C. Of course execution was fast, but so was development.
Mike Dunlavey
Mike Dunlavey
+5  A: 

Generally: make use of your linker map or tools to figure out what your largest/most numerous symbols are, and then possibly take a look at them using a disassembler. You'd be surprised at what you find this way.

With a bit of perl or the like, you can make short work of a .xMAP file or the results of "objdump" or "nm", and re-sort it various ways for pertinent info.


Specific to small instruction sets: Watch for literal pool usage. While changing from e.g. the ARM (32 bits per instruction) instruction set to the THUMB (16 bits per instruction) instruction set can be useful on some ARM processors, it reduces the size of the "immediate" field.

Suddenly something that would be a direct load from a global or static becomes very indirect; it must first load the address of the global/static into a register, then load from that, rather than just encoding the address directly in the instruction. So you get a few extra instructions and an extra entry in the literal pool for something that normally would have been one instruction.

A strategy to fight this is to group globals and statics together into structures; this way you only store one literal (the address of your global structure) and compute offsets from that, rather than storing many different literals when you're accessing multiple statics/globals.

We converted our "singleton" classes from managing their own instance pointers to just being members in a large "struct GlobalTable", and it make a noticeable difference in code size (a few percent) as well as performance in some cases.


Otherwise: keep an eye out for static structures and arrays of non-trivially-constructed data. Each one of these typically generates huge amounts of .sinit code ("invisible functions", if you will) that are run before main() to populate these arrays properly. If you can use only trivial data types in your statics, you'll be far better off.

This is again something that can be easily identified by using a tool over the results of "nm" or "objdump" or the like. If you have a ton of .sinit stuff, you'll want to investigate!


Oh, and -- if your compiler/linker supports it, don't be afraid to selectively enable optimization or smaller instruction sets for just certain files or functions!

leander
+1 A good rubber-meets-the-road approach.
Mike Dunlavey
+1 The linker map file is the place to start. It will show you where the space is being used.
Conor OG
A: 

The above answers claim "Turning on compiler optimization [reduced the code size]". Given all the documentation and experience I have had in embedded systems TI DSP programming, I know for a fact that turning on optimization will INCREASE your code size ( for TI DSP chip ) !


Let me explain:

The TI TMSCx6416 DSP has 9 compiler flags that will affect your code size.

  1. 3 different flags for Optimization
  2. 3 different flags for Debugging
  3. 3 different flags for Code size

For my compiler, when you turn on optimization level three the documentation states:

  1. Auto-inlining for certain functions will occur --> will increase code size
  2. Software pipelining is enabled --> will increase code size

What is software pipelining?

That is where the compiler will do things in assembly that make the for loops execute significantly faster ( up to a couple times faster ) but at the cost of greater code size. I suggest reading about software pipelining at wikipedia ( look for loop unrolling, prolog, and epilog ).

So check your documentation to make sure the optimization isn't making your code larger.


Another suggestion is to look for compiler flags that relate to code size. If you have code size compiler flags, make sure to crank up them up to the highest setting. Usually compiling for code size means your code will execute slower... but you may have to do that.

Trevor Boyd Smith
Using the high level optimization for size of IAR EW430 for a TI MSP430 target did reduce compiled code size, but that is not the solution I chose.
Judge Maygarden
Looking at the compiler flags that option generates, disabling inlining and loop unrolling are among them. Those are actions already noted in current answers.
Judge Maygarden
Sounds like your problem is already solved. For my own curiosity, in your compiler turning on optimization resulted in disabling inline/loop unrolling/etc ?
Trevor Boyd Smith
Yes, it disabled inline and loop unrolling when set to optimize for size. When set to optimize balanced or for speed they were enabled.
Judge Maygarden