What are some best practice for "Memory Efficient C programming". Mostly for embedded/mobile device what should be the guidelines for having low memory consumptions ?
I guess there should be separate guideline for a) code memory b) data memory
What are some best practice for "Memory Efficient C programming". Mostly for embedded/mobile device what should be the guidelines for having low memory consumptions ?
I guess there should be separate guideline for a) code memory b) data memory
Pre-allocating all memory upfront (i.e. no malloc calls except for startup initialization) is definitely helpful for deterministic memory usage. Otherwise, different architectures provide techniques to help out. For example, certain ARM processors provide an alternative instruction set (Thumb) that almost halves code size by using 16-bit instructions instead of the normal 32 bits. Of course, speed is sacrificed in doing so...
Most likely you will need to chose your algorithms carefully. Aim for algorithms which have O(1) or O(log n) memory usage (i.e. low). For example, continuous resizable arrays (e.g. std::vector
) in most cases require less memory than linked lists.
Sometimes, using look-up tables may be more benefitial to both code size and speed. If you only need 64 entries in a LUT, that's 16*4 bytes for sin/cos/tan (use symmetry!) compared to a large sin/cos/tan function.
Compression sometimes helps. Simple algorithms like RLE are easy to compress/decompress when read sequentially.
If you're dealing with graphics or audio, consider different formats. Paletted or bitpacked* graphics may be a good tradeoff for quality, and palettes can be shared across many images, further reducing data size. Audio may be reduced from 16-bit to 8-bit or even 4-bit, and stereo can be converted to mono. Sampling rates can be reduced from 44.1KHz to 22kHz or 11kHz. These audio transformations greatly reduce their data size (and, sadly, quality) and are trivial (except resampling, but that's what audio software is for =]).
* I guess you could put this under compression. Bitpacking for graphics usually refers to reducing the number of bits per channel so each pixel can fit in two bytes (RGB565 or ARGB155 for example) or one (ARGB232 or RGB332) from the original three or four (RGB888 or ARGB8888, respectively).
In C, at a much simpler level, consider the following;
Some parsing operations can be performed on streams as the bytes arrive rather than copy into buffer and parse.
Some examples of this:
1) Before you start the project, build in a way of measuring how much memory you're using, preferably on a per-component basis. That way, each time you make a change you can see its effects on memory use. You can't optimise what you can't measure.
2) If the project is already mature and hits the memory limits, (or is ported to a device with less memory), find out what you're using the memory for already.
My experience has been that almost all the significant optimisation when fixing an over-size application comes from a small number of changes: reduce cache sizes, strip out some textures (of course this is a functional change requiring stakeholder agreement, i.e. meetings, so may not be efficient in terms of your time), resample audio, reduce the upfront size of custom-allocated heaps, find ways to free resources that are only used temporarily and reload them when required again. Occasionally you'll find some structure which is 64 bytes that could be reduced to 16, or whatever, but that's rarely the lowest-hanging fruit. If you know what the biggest lists and arrays in the app are, though, then you know which structs to look at first.
Oh yes: find and fix memory leaks. Any memory you can regain without sacrificing performance is a great start.
I've spent a lot of time in the past worrying about code size. Main considerations (aside from: make sure you measure it at build time so that you can see it change), are:
1) Find out what code is referenced, and by what. If you discover that an entire XML library is being linked into your app just to parse a two-element config file, consider changing the config file format and/or writing your own trivial parser. If you can, use either source or binary analysis to draw a big dependency graph, and look for large components with only a small number of users: it may be possible to snip these out with only minor code rewrites. Be prepared to play diplomat: if two different components in your app use XML, and you want to cut it, then that's two people you have to convince of the benefits of hand-rolling something that's currently a trusted, off-the-shelf library.
2) Mess around with the compiler options. Consult your platform-specific documentation. For instance, you may want to reduce the default acceptable code size increase due to inlining, and on GCC at least you can tell the compiler only to apply optimisations which don't typically increase code size.
3) Take advantage of libraries already on the target platform(s) where possible, even if it means writing an adaptor layer. In the XML example above, you may find that on your target platform there's an XML library in memory at all times anyway, because the OS uses it, in which case link to it dynamically.
4) As someone else mentioned, thumb mode can help on ARM. If you only use it for the code which is not performance critical, and leave the critical routines in ARM, then you won't notice the difference.
Finally, there may be clever tricks you can play if you have sufficient control over the device. The UI only allows one application to run at a time? Unload all the drivers and services your app doesn't need. Screen is double buffered, but your app is synched to the refresh cycle? You may be able to reclaim an entire screen buffer.
One trick that is useful in applications is to create a rainy day fund of memory. Allocate single block on startup that is big enough that it will suffice for cleanup tasks. If malloc/new fail, release the rainy day fund and post a message letting the user know that resources are tight and they should save and quite soon. This was a technique used in many Mac applications circa 1990.
A few suggestions that I've found useful in working with embedded systems:
Ensure any lookup tables or other constant data are actually declared using const
. If const
is used then the data can be stored in read-only (e.g, flash or EEPROM) memory, otherwise the data has to be copied to RAM at start-up, and this takes up both flash and RAM space. Set the linker options so that it generates a map file, and study this file to see exactly where your data is being allocated in the memory map.
Make sure you're using all the memory regions available to you. For example, microcontrollers often have onboard memory that you can make use of (which may also be faster to access than external RAM). You should be able to control the memory regions to which code and data are allocated using compiler and linker option settings.
To reduce code size, check your compiler's optimisation settings. Most compilers have switches to optimise for speed or code size. It can be worth experimenting with these options to see whether the size of the compiled code can be reduced. And obviously, eliminate duplicate code wherever possible.
Check how much stack memory your system needs and adjust the linker memory allocation accordingly (see the answers to this question). To reduce stack usage, avoid placing large data structures on the stack (for whatever value of "large" is relevant to you).
Make sure you're using fixed point / integer math wherever possible. Lots of developers use floating-point math (along with the sluggish performance and large libraries & memory usage) when simple scaled integer math will suffice.
A great way of constraining memory requirements is to rely as much as possible on libc or other standard libraries that can be linked dynamically. Every additional DLL or shared object you have to include in your project is a significant chunk of memory you may be able to avoid burning.
Also, make use of unions and bit fields, where applicable, only load the part of your data your program is working on in memory, and make sure you're compiling with the -Os (in gcc; or your compiler's equivalent) switch to optimize for program size.
I have a presentation from the Embedded Systems Conference available on this topic. It is from 2001, but still it is very applicable. See paper .
Also, if you can choose the architecture of the target device, going with something like a modern ARM with Thumb V2, or a PowerPC with VLE, or MIPS with MIPS16, or selecting known-compact targets like Infineon TriCore or the SH family is a very good option. Not to mention the NEC V850E family that is nicely compact. Or move to an AVR which has excellent code compactness (but is an 8-bit machine). Anything but a fixed-length 32-bit RISC is a good choice!
All good recommendations. Here are some design approaches that I've found useful.
Write an interpreter for a special-purpose byte-code instruction set, and write as much of the program as possible in that instruction set. If certain operations require high performance, make them native-code and call them from the interpreter.
If part of the input data changes very infrequently, you could have an external code generator that creates an ad-hoc program. That will be smaller than a more general program, as well as running faster and not having to allocate storage for the seldom-changing input.
Be willing to waste lots of cycles if it will let you store absolutely minimum data structure. Usually you will find that performance suffers very little.
Recommend this book Small Memory Software: Patterns for systems with limited memory
Reduce the length of and eliminate as many string constants as possible to reduce code space
Carefully consider the tradeoffs of algorithms versus lookup tables where needed
Be aware of how different kinds of variables are allocated.
In addition to the suggestions other have given, remember that local variables declared in your functions will normally be allocated on the stack.
If stack memory is limited or you want to reduce the size of the stack to make room for more heap or global RAM, then consider the following:
Convert large local variables to globals (decreases the amount of stack used, but increases the amount of global RAM used). Variables can be declared:
reentrant
code if you have a preemptive
environment.Many embedded systems do not have a stack monitor diagnostic to insure that a stack overflow is caught, so some analysis is required.
PS: Bonus for appropriate use of Stack Overflow?