I have a legacy firmware application that requires new functionality. The size of the application was already near the limited flash capacity of the device and the few new functions and variables pushed it over the edge. Turning on compiler optimization does the trick, but the customer is wary of doing so because they have caused failures in the past. So, what are some common things to look for when refactoring C code to produce smaller output?
Refactoring out duplicate code should have the biggest impact on your program's memory footprint.
- Use generation functions instead of data tables where possible
- Disable inline functions
- Turn frequently used macros into functions
- Reduce resolution for variables larger than the native machine size (ie, 8 bit micro, try to get rid of 16 and 32 bit variables - doubles and quadruples some code sequences)
- If the micro has a smaller instruction set (Arm thumb) enable it in the compiler
- If the memory is segmented (ie, paged or nonlinear) then
- Rearrange code so that fewer global calls (larger call instructions) need to be used
- Rearrange code and variable usage to eliminate global memory calls
- Re-evaluate global memory usage - if it can be placed on the stack then so much the better
- Make sure you're compiling with debug turned off - on some processors it makes a big difference
- Compress data that can't be generated on the fly - then decompress into ram at startup for fast access
- Delve into the compiler options - it may be that every call is automatically global, but you might be able to safely disable that on a file by file basis to reduce size (sometimes significantly)
If you still need more space than with compile with optimizations
turned on, then look at the generated assembly versus unoptimized code. Then re-write the code where the biggest changes took place so that the compiler generates the same optimizations based on tricky C re-writes with optimization turned off.
For instance, you may have several 'if' statements that make similar comparisons:
if(A && B && (C || D)){}
if(A && !B && (C || D)){}
if(!A && B && (C || D)){}
Then creating anew variable and making some comparisons in advance will save the compiler from duplicating code:
E = (C || D);
if(A && B && E){}
if(A && !B && E){}
if(!A && B && E){}
This is one of the optimizations the compiler does for you automatically if you turn it on. There are many, many others, and you might consider reading a bit of compiler theory if you want to learn how to do this by hand in the C code.
Pay attention to macros. They can produce a lot of code from just one macro expansion. If you find such macros - try to rewrite them so that their size is minimized and functionality is moved to functions.
Pay attention to duplicate code - both copy-pasted and logically duplicate. Try to separate duplicate code into functions.
Check whether the compiler supports inlining and it could be turned off.
Compiler optimisation that triggers bug ? That's strange. Get a map of your program, and see if you should target data or code. Look for duplicated code. Look for code with similar goal. One example of it is the busybox code, which aims for small memory footprint.
It is favoring size over readability, so it sometimes get quite ugly, with gotos and so on.
Generally: make use of your linker map or tools to figure out what your largest/most numerous symbols are, and then possibly take a look at them using a disassembler. You'd be surprised at what you find this way.
With a bit of perl or the like, you can make short work of a .xMAP file or the results of "objdump" or "nm", and re-sort it various ways for pertinent info.
Specific to small instruction sets: Watch for literal pool usage. While changing from e.g. the ARM (32 bits per instruction) instruction set to the THUMB (16 bits per instruction) instruction set can be useful on some ARM processors, it reduces the size of the "immediate" field.
Suddenly something that would be a direct load from a global or static becomes very indirect; it must first load the address of the global/static into a register, then load from that, rather than just encoding the address directly in the instruction. So you get a few extra instructions and an extra entry in the literal pool for something that normally would have been one instruction.
A strategy to fight this is to group globals and statics together into structures; this way you only store one literal (the address of your global structure) and compute offsets from that, rather than storing many different literals when you're accessing multiple statics/globals.
We converted our "singleton" classes from managing their own instance pointers to just being members in a large "struct GlobalTable", and it make a noticeable difference in code size (a few percent) as well as performance in some cases.
Otherwise: keep an eye out for static structures and arrays of non-trivially-constructed data. Each one of these typically generates huge amounts of .sinit code ("invisible functions", if you will) that are run before main() to populate these arrays properly. If you can use only trivial data types in your statics, you'll be far better off.
This is again something that can be easily identified by using a tool over the results of "nm" or "objdump" or the like. If you have a ton of .sinit stuff, you'll want to investigate!
Oh, and -- if your compiler/linker supports it, don't be afraid to selectively enable optimization or smaller instruction sets for just certain files or functions!
The above answers claim "Turning on compiler optimization [reduced the code size]". Given all the documentation and experience I have had in embedded systems TI DSP programming, I know for a fact that turning on optimization will INCREASE your code size ( for TI DSP chip ) !
Let me explain:
The TI TMSCx6416 DSP has 9 compiler flags that will affect your code size.
- 3 different flags for Optimization
- 3 different flags for Debugging
- 3 different flags for Code size
For my compiler, when you turn on optimization level three the documentation states:
- Auto-inlining for certain functions will occur --> will increase code size
- Software pipelining is enabled --> will increase code size
What is software pipelining?
That is where the compiler will do things in assembly that make the for loops execute significantly faster ( up to a couple times faster ) but at the cost of greater code size. I suggest reading about software pipelining at wikipedia ( look for loop unrolling, prolog, and epilog ).
So check your documentation to make sure the optimization isn't making your code larger.
Another suggestion is to look for compiler flags that relate to code size. If you have code size compiler flags, make sure to crank up them up to the highest setting. Usually compiling for code size means your code will execute slower... but you may have to do that.