views:

1002

answers:

16

When you say "optimization", people tend to think "speed". But what about embedded systems where speed isn't all that critical, but memory is a major constraint? What are some guidelines, techniques, and tricks that can be used for shaving off those extra kilobytes in ROM and RAM? How does one "profile" code to see where the memory bloat is?

P.S. One could argue that "prematurely" optimizing for space in embedded systems isn't all that evil, because you leave yourself more room for data storage and feature creep. It also allows you to cut hardware production costs because your code can run on smaller ROM/RAM.

P.P.S. References to articles and books are welcome too!

P.P.P.S. These questions are closely related: 404615, 1561629

+1  A: 

Firstly, tell your compiler to optimize for code size. GCC has the -Os flag for this.

Everything else is at the algorithmic level - use similar tools that you would for finding memory leaks, but instead look for allocs and frees that you could avoid.

Also take a look at commonly used data structure packing - if you can shave a byte or two off them, you can cut down memory use substantially.

Anon.
+11  A: 

There are many things you can do to reduce your memory footprints, I'm sure people have written books on the subject, but a few of the major ones are:

  • Compiler options to reduce code size (including -Os and packing/alignment options)

  • Linker options to strip dead code

  • If you're loading from flash (or ROM) to ram to execute (rather than executing from flash), then use a compressed flash image, and decompress it with your bootloader.

  • Use static allocation: a heap is an inefficient way to allocate limited memory, and if it might fail due to fragmentation if it is constrained.

  • Tools to find the stack high-watermark (typically they fill the stack with a pattern, execute the program, then see where the pattern remains), so you can set the stack size(s) optimally

  • And of course, optimising the algorithms you use for memory footprint (often at expense of speed)

Autopulated
On the other hand, a heap offers possibilities for reuse of memory that static allocation does not.
anon
Well, a heap makes it _easier_ to re-use memory, without doing so explicitly.
Autopulated
Static allocation (typically an array of structs) sometimes centralizes things and breaks encapsulation for me. It's hard to explain. At the moment, I can't think of a good example I could type in a few sentences.
Emile Cormier
Right on about the fragmentation angle: a major reason many embedded systems that have to run for years refuse to use dynamic allocation.
Edan Maor
That, and by not having to handle failure everywhere, you save about 30% of your code size ;-)
Steve Jessop
@Emile: In very limited environments you often have to break "good" programming practices because of the tight limitations.
rstevens
+4  A: 

Compile in VS with /Os. Often times this is even faster than optimizing for speed anyway, because smaller code size == less paging.

Comdat folding should be enabled in the linker (it is by default in release builds)

Be careful about data structure packing; often time this results in the compiler generated more code (== more memory) to generate the assembly to access unaligned memory. Using 1 bit for a boolean flag is a classic example.

Also, be careful when choosing a memory efficient algorithm over an algorithm with a better runtime. This is where premature optimizations come in.

Terry Mahaffey
+24  A: 

My experience from an extremely constrained embedded memory environment:

  • Use fixed size buffers. Don't use pointers or dynamic allocation because they have too much overhead.
  • Use the smallest int data type that works.
  • Don't ever use recursion. Always use looping.
  • Don't pass lots of function parameters. Use globals instead. :)
Tim Lovell-Smith
You're gonna get flamed on that last one, lol. :)
Emile Cormier
+1 for personal experience
stacker
I presumed everyone was talking from experience... what other qualification would they have?! :D
Autopulated
Actually, if you think about how people used to program on memory constrained systems (and the subsequent two-digit year issues, but that's a different story) this makes perfect sense. This type of program architecture will be much smaller. You would really be quite surprised at what people managed to fit onto really small computer systems (back in the days of real programmers ;-).
ConcernedOfTunbridgeWells
One alternative to globals or lots of function parameters is to use parameter blocks. Basically, you create a `struct` that can be used by several functions, each using whatever parameters they need from the PB. Then the calling code can set up the PB and pass it to one or more functions. The low-level filesystem calls in the old Mac OS did this from the start to help pack everything into the 128K of the original Macintosh. It's like ghetto classes, except that (unlike class methods), you could pass two PBs to some functions.
Mike DeSimone
Yes to all of those, and: don't (ever) use floating point math, make sure your stucts are packed tight, use bit-fields with abandon, think hard before you create another variable; if you can get the info you need from an existing one, do that.
ArielP
If you have 256 bytes of RAM that already hold the C stack, globals aren't flame material at all.@Ariel: doesn't FP math depend on the actual platform?
peterchen
@Autopulated: There's thins thing called "thinking".
peterchen
@peterchen Yes, I would expect so, but for every platform there has to be some code included (past what you write) to be able to perform the fp operations. i.e. from a previous answer on SP, here's the code included on my platform:Mine starts at line number 223 and memory address of 001BC with _ floatsisf, continues through several additional labels (_fpack, _divsf3, etc) and ends in _funpack, last line at 535 and memory address 0042C. If you can handle (42C-1BC = 0x270 =) 624 bytes of lost program space, great, but some chips have just 2k of space and that's not an option.
ArielP
globals are really good for embedded, as are fixed structs, never malloc, keep local variables to a minimum, etc. basically everything you said. To add to that a good speed optimizer can often result in a good size optimization as well as far as code generation, the above rules manage the things the compiler cannot for either type of optimization
dwelch
+2  A: 

Profiling code or data bloat can be done via map files: for gcc see here, for VS see here.
I have yet to see a useful tool for size profiling though (and don't have time to fix my VS AddIn hack).

Georg Fritzsche
Map files can also help with data bloat - it's easy to see where you've allocated large chunks of memory to determine where you might be able to most effectively target your reduction efforts.
Michael Burr
Thanks, that should have been in there - added.
Georg Fritzsche
+6  A: 

Generate a map file from your linker. It will show how the memory is allocated. This is a good start when optimizing for memory usage. It also will show all the functions and how the code-space is laid out.

Thomas Matthews
+4  A: 

As with all optimization, first optimize algorithms, second optimize the code and data, finally optimize the compiler.

I don't know what your program does, so I can't advice on algorithms. Many others have written about the compiler. So, here's some advice on code and data:

  • Eliminate redundancy in your code. Any repeated code that's three or more lines long, repeated three times in your code, should be changed to a function call.
  • Eliminate redundancy in your data. Find the most compact representation: merge read-only data, and consider using compression codes.
  • Run the code through a regular profiler; eliminate all code that isn't used.
Chip Uni
PLEASE follow this advice - I'm working on a system where the original developers (20 yrs ago) were so concerned about stack that they duplicated code everywhere! It's a nightmare of epic proportions.
Michael Kohne
+2  A: 

If you're looking for a good way to profile your application's heap usage, check out valgrind's massif tool. It will let you take snapshots of your app's memory usage profile over time, and you can then use that information to better see where the "low hanging fruit" is, and aim your optimizations accordingly.

Jeremy Friesner
+9  A: 

A few obvious ones

  • If speed isn't critical, execute the code directly from flash.
  • Declare constant data tables using const. This will avoid the data being copied from flash to RAM
  • Pack large data tables tightly using the smallest data types, and in the correct order to avoid padding.
  • Use compression for large sets of data (as long as the compression code doesn't outweigh the data)
  • Turn off exception handling and RTTI.
  • Did anybody mention using -Os? ;-)

Folding knowledge into data

One of the rules of Unix philosophy can help make code more compact:

Rule of Representation: Fold knowledge into data so program logic can be stupid and robust.

I can't count how many times I've seen elaborate branching logic, spanning many pages, that could've been folded into a nice compact table of rules, constants, and function pointers. State machines can often be represented this way (State Pattern). The Command Pattern also applies. It's all about the declarative vs imperative styles of programming.

Log codes + binary data instead of text

Instead of logging plain text, log event codes and binary data. Then use a "phrasebook" to reconstitute the event messages. The messages in the phrasebook can even contain printf-style format specifiers, so that the event data values are displayed neatly within the text.

Minimize the number of threads

Each thread needs it own memory block for a stack and TSS. Where you don't need preemption, consider making your tasks execute co-operatively within the same thread (cooperative multi-tasking).

Use memory pools instead of hoarding

To avoid heap fragmentation, I've often seen separate modules hoard large static memory buffers for their own use, even when the memory is only occasionally required. A memory pool could be used instead so the the memory is only used "on demand". However, this approach may require careful analysis and instrumentation to make sure pools are not depleted at runtime.

Dynamic allocation only at initialization

In embedded systems where only one application runs indefinitely, you can use dynamic allocation in a sensible way that doesn't lead to fragmentation: Just dynamically allocate once in your various initialization routines, and never free the memory. reserve() your containers to the correct capacity and don't let them auto-grow. If you need to frequently allocate/free buffers of data (say, for communication packets), then use memory pools. I once even extended the C/C++ runtimes so that it would abort my program if anything tried to dynamically allocate memory after the initialization sequence.

Emile Cormier
"Log codes + binary data instead of text" - we used to run `strings` on the binaries, sort the result by length, shoot the longest string in the image, repeat until so bored you have to go and do something more interesting instead. That wasn't C++, although we did have mangled function names to ignore.
Steve Jessop
+5  A: 

Here's a book on the subject Small Memory Software: Patterns for systems with limited memory.

Nikolai N Fetissov
+2  A: 

Check this answer.

Mike Dunlavey
A: 

Bear in mind the implementation cost of some C++ features, such as virtual function tables and overloaded operators that create temporary objects.

PhilMY
+1  A: 

on top what others suggest:

Limit use of c++ features, write like in ANSI C with minor extensions. NO templates.

Align your structures by hand, or use #pragma pack

{char a; long b; char c; long d; char e; char f; } //is 18 bytes, 
{char a; char c; char d; char f; long b; long d; } //is 12 bytes.

For the same reason, use a centralized global data storage structure instead of scattered local static variables.

Intelligently balance usage of malloc()/new and static structures.

If you need a subset of functionality of given library, consider writing your own.

Unroll short loops.

for(i=0;i<3;i++){ transform_vector[i]; }

is longer than

transform_vector[0];
transform_vector[1];
transform_vector[2];

Don't do that for longer ones.

Pack multiple files together to let the compiler inline short functions and perform various optimizations Linker can't.

SF.
Linkers *for these platforms* can't. Also, banning templates completely is ignorant, I'd say NO templates unless you know what you do.
peterchen
You can definitely use templates where you'd otherwise use function-like macros. It shouldn't generate more bloat, and you get the extra type safety.
Emile Cormier
If you specify -Os, shouldn't the compiler know when to unroll loops for smaller space?
Emile Cormier
+2  A: 

Ok most were mentioned already, but here is my list anyway:

  • Learn what your compiler can do. Read compiler documentation, experiment with code examples. Check settings.
  • Check generated code at target optimization level. Sometimes results are surprising and often it turns out optimization actually slows things down (or just take too much space).
  • choose suitable memory model. If you target really small tight system, large or huge memory model might not be the best choice (but usually easisest to program for...)
  • Prefer static allocation. Use dynamic allocation only on startup or over statically allocated buffer (pool or maximum instance sized static buffer).
  • Use C99 style data types. Use smallest sufficient data type, for storage types. Local variables like loop variables are sometimes more efficient with "fast" data types.
  • Select inline candidates. Some parameter heavy function with relatively simple bodies are better off when inlined. Or consider passing structure of parameters. Globals are also option, but be careful - tests and maintenance can become difficult if anyone in them isn't disciplned enough.
  • Use const keyword well , be aware of array initialization implications.
  • Map file, ideally also with module sizes. Check also what is included from crt (is it really neccessary?).
  • Recursion just say no (limited stack space)
  • Floating point numbers - prefer fixed point math. Tends to include and call a lot of code (even for simple addition or multiplication).
  • C++ you should know C++ VERY WELL. If you don't, program constrainted embedded systems in C, please. Those who dare must be careful with all advanced C++ constructs (inheritance, templates, exceptions, overloading, etc.). Consider close to HW code to be rather Super-C and C++ is used where it counts: in high level logic, GUI, etc.
  • Disable whatever you don't need in compiler settings (be it parts of libraries, language constructs, etc.)

Last but not least - while hunting for smallest possible code size - don't overdo it. Watch out also for performance and maintainability. Over-optimized code tends to decay very quickly.

MaR
A: 

Along with that everyone else said, I'd just like to add don't use virtual functions because with virtual functions a VTable must be created which can take up who knows how much space.

Also watch out for exceptions. With gcc, I don't believe there is a growing size for each try-catch block(except for 2 function calls for each try-catch), but there is a fixed size function which must be linked in which could be wasting precious bytes

Earlz
There is only one vtable for the ancestry of classes, not per object (not sure for multiple-inheritance, though). The space for a vtable is one function pointer per virtual method, per class. A polymorphic object only holds one extra pointer to that common vtable. IMHO, the vtable + vtable-pointers is no bigger than the hand-written alternative using "type codes", switch statements, and dispatch tables (except maybe for trivial cases).
Emile Cormier
Re virtual functions, I humbly think that a better guideline would be to not use virtual functions needlessly. Only use them where you need polymorphism.
Emile Cormier
+1  A: 

Don't be afraid to write 'little languages' inside your program. Sometimes a table of strings and an interpreter can get a LOT done. For instance, in a system I've worked on, we have a lot of internal tables, which have to be accessed in various ways (loop through, whatever). We've got an internal system of commands for referencing the tables that forms a sort of half-way language that's quite compact for what it gets donw.

But, BE CAREFUL! Know that you are writing such things (I wrote one accidentally, myself), and DOCUMENT what you are doing. The original developers do NOT seem to have been conscious of what they were doing, so it's much harder to manage than it should be.

Michael Kohne
I agree with Michael: Documentation takes NO space in the final, compiled program. Use lots.
Chip Uni
I don't even need lots. ANY would be nice some days.
Michael Kohne