Memory Efficient Programming

views:

1570

answers:

+19 Q:

Memory Efficient Programming

What are some best practice for "Memory Efficient C programming". Mostly for embedded/mobile device what should be the guidelines for having low memory consumptions ?

I guess there should be separate guideline for a) code memory b) data memory

+4 A:

Pre-allocating all memory upfront (i.e. no malloc calls except for startup initialization) is definitely helpful for deterministic memory usage. Otherwise, different architectures provide techniques to help out. For example, certain ARM processors provide an alternative instruction set (Thumb) that almost halves code size by using 16-bit instructions instead of the normal 32 bits. Of course, speed is sacrificed in doing so...

Judge Maygarden 2009-01-01 06:10:43

For complicated algoriths, ARM mode may reduce code size. For simple operations, THUMB mode may be better. It all depends on the context, really. But for general code, I agree that THUMB is smaller than ARM.

strager 2009-01-01 06:37:45

Pre-allocating works well for speed and size of certain applications. For others, it can worsen size requirements, as the app cannot typically re-use memory across modules.

Shane MacLaughlin 2009-01-01 11:15:36

+4 A:

Most likely you will need to chose your algorithms carefully. Aim for algorithms which have O(1) or O(log n) memory usage (i.e. low). For example, continuous resizable arrays (e.g. std::vector) in most cases require less memory than linked lists.

Sometimes, using look-up tables may be more benefitial to both code size and speed. If you only need 64 entries in a LUT, that's 16*4 bytes for sin/cos/tan (use symmetry!) compared to a large sin/cos/tan function.

Compression sometimes helps. Simple algorithms like RLE are easy to compress/decompress when read sequentially.

If you're dealing with graphics or audio, consider different formats. Paletted or bitpacked* graphics may be a good tradeoff for quality, and palettes can be shared across many images, further reducing data size. Audio may be reduced from 16-bit to 8-bit or even 4-bit, and stereo can be converted to mono. Sampling rates can be reduced from 44.1KHz to 22kHz or 11kHz. These audio transformations greatly reduce their data size (and, sadly, quality) and are trivial (except resampling, but that's what audio software is for =]).

* I guess you could put this under compression. Bitpacking for graphics usually refers to reducing the number of bits per channel so each pixel can fit in two bytes (RGB565 or ARGB155 for example) or one (ARGB232 or RGB332) from the original three or four (RGB888 or ARGB8888, respectively).

strager 2009-01-01 06:41:52

+3 A:

Avoid memory fragmentation by using your own memory allocator (or carefully using the system's allocator).

One method is to use a 'slab allocator' (see this article for example) and multiple pools of memory for objects of different sizes.

nimrodm 2009-01-01 07:52:34

+12 A:

In C, at a much simpler level, consider the following;

Use #pragma pack(1) to byte align your structures
Use unions where a structure can contain different types of data
Use bit fields rather than ints to store flags and small integers
Avoid using fixed length character arrays to store strings, implement a string pool and use pointers.
Where storing references to an enumerated string list, e.g. font name, store an index into the list rather than the string
When using dynamic memory allocation, compute the number of elements required in advance to avoid reallocs.

Shane MacLaughlin 2009-01-01 09:56:57

Ah, completely forgot about the first three points myself. I thought it was too obvious and overlooked it. xD

strager 2009-01-01 19:04:56

Shane MacLaughlin 2009-01-02 09:01:59

1. That may come with a speed penalty. Better: structure your structs intelligently: struct s { int i; char c; } instead of struct s {char c; int i;}2. Unions are very good.3. bitfields can be painfully slow. Beware

Mikeage 2009-02-23 04:51:15

@Mikeage - Yup, I find that typically you are either optimizing for either memory or speed, where the two are in opposition.

Shane MacLaughlin 2009-02-23 07:35:04

+4 A:

Some parsing operations can be performed on streams as the bytes arrive rather than copy into buffer and parse.

Some examples of this:

Parse a NMEA steam with a state machine, collecting only the required fields into a much more efficient struct.
Parse XML using SAX instead of DOM.

JeffV 2009-01-01 14:08:51

+1 for avoid DOM. This is even true on desktop apps with large serialized data.

Shane MacLaughlin 2009-01-02 08:51:43

Avoid text, use binary. XML is even worse than text. There's a reason why our predecessors came up with ASN.1 and the likes, and it wasn't ease of use.

MSalters 2009-01-09 08:59:43

+1 A:

1) Before you start the project, build in a way of measuring how much memory you're using, preferably on a per-component basis. That way, each time you make a change you can see its effects on memory use. You can't optimise what you can't measure.

2) If the project is already mature and hits the memory limits, (or is ported to a device with less memory), find out what you're using the memory for already.

My experience has been that almost all the significant optimisation when fixing an over-size application comes from a small number of changes: reduce cache sizes, strip out some textures (of course this is a functional change requiring stakeholder agreement, i.e. meetings, so may not be efficient in terms of your time), resample audio, reduce the upfront size of custom-allocated heaps, find ways to free resources that are only used temporarily and reload them when required again. Occasionally you'll find some structure which is 64 bytes that could be reduced to 16, or whatever, but that's rarely the lowest-hanging fruit. If you know what the biggest lists and arrays in the app are, though, then you know which structs to look at first.

Oh yes: find and fix memory leaks. Any memory you can regain without sacrificing performance is a great start.

I've spent a lot of time in the past worrying about code size. Main considerations (aside from: make sure you measure it at build time so that you can see it change), are:

1) Find out what code is referenced, and by what. If you discover that an entire XML library is being linked into your app just to parse a two-element config file, consider changing the config file format and/or writing your own trivial parser. If you can, use either source or binary analysis to draw a big dependency graph, and look for large components with only a small number of users: it may be possible to snip these out with only minor code rewrites. Be prepared to play diplomat: if two different components in your app use XML, and you want to cut it, then that's two people you have to convince of the benefits of hand-rolling something that's currently a trusted, off-the-shelf library.

2) Mess around with the compiler options. Consult your platform-specific documentation. For instance, you may want to reduce the default acceptable code size increase due to inlining, and on GCC at least you can tell the compiler only to apply optimisations which don't typically increase code size.

3) Take advantage of libraries already on the target platform(s) where possible, even if it means writing an adaptor layer. In the XML example above, you may find that on your target platform there's an XML library in memory at all times anyway, because the OS uses it, in which case link to it dynamically.

4) As someone else mentioned, thumb mode can help on ARM. If you only use it for the code which is not performance critical, and leave the critical routines in ARM, then you won't notice the difference.

Finally, there may be clever tricks you can play if you have sufficient control over the device. The UI only allows one application to run at a time? Unload all the drivers and services your app doesn't need. Screen is double buffered, but your app is synched to the refresh cycle? You may be able to reclaim an entire screen buffer.

Steve Jessop 2009-01-01 14:18:12

One trick that is useful in applications is to create a rainy day fund of memory. Allocate single block on startup that is big enough that it will suffice for cleanup tasks. If malloc/new fail, release the rainy day fund and post a message letting the user know that resources are tight and they should save and quite soon. This was a technique used in many Mac applications circa 1990.

plinth 2009-01-01 14:30:48

+10 A:

A few suggestions that I've found useful in working with embedded systems:

Ensure any lookup tables or other constant data are actually declared using const. If const is used then the data can be stored in read-only (e.g, flash or EEPROM) memory, otherwise the data has to be copied to RAM at start-up, and this takes up both flash and RAM space. Set the linker options so that it generates a map file, and study this file to see exactly where your data is being allocated in the memory map.
Make sure you're using all the memory regions available to you. For example, microcontrollers often have onboard memory that you can make use of (which may also be faster to access than external RAM). You should be able to control the memory regions to which code and data are allocated using compiler and linker option settings.
To reduce code size, check your compiler's optimisation settings. Most compilers have switches to optimise for speed or code size. It can be worth experimenting with these options to see whether the size of the compiled code can be reduced. And obviously, eliminate duplicate code wherever possible.
Check how much stack memory your system needs and adjust the linker memory allocation accordingly (see the answers to this question). To reduce stack usage, avoid placing large data structures on the stack (for whatever value of "large" is relevant to you).

ChrisN 2009-01-01 14:40:01

+9 A:

Make sure you're using fixed point / integer math wherever possible. Lots of developers use floating-point math (along with the sluggish performance and large libraries & memory usage) when simple scaled integer math will suffice.

Dan 2009-01-01 19:05:35

This assumes the system lacks an FPU, and floating-point operations need to be emulated. Still, floats and doubles take more memory than fixed-point integers in most cases, so it does help to use the latter.

strager 2009-01-01 19:07:54

Most systems lack FPU. Integers rule!

jakobengblom2 2009-01-09 08:14:26

A great way of constraining memory requirements is to rely as much as possible on libc or other standard libraries that can be linked dynamically. Every additional DLL or shared object you have to include in your project is a significant chunk of memory you may be able to avoid burning.

Also, make use of unions and bit fields, where applicable, only load the part of your data your program is working on in memory, and make sure you're compiling with the -Os (in gcc; or your compiler's equivalent) switch to optimize for program size.

Max 2009-01-02 07:27:17

I have a presentation from the Embedded Systems Conference available on this topic. It is from 2001, but still it is very applicable. See paper .

Also, if you can choose the architecture of the target device, going with something like a modern ARM with Thumb V2, or a PowerPC with VLE, or MIPS with MIPS16, or selecting known-compact targets like Infineon TriCore or the SH family is a very good option. Not to mention the NEC V850E family that is nicely compact. Or move to an AVR which has excellent code compactness (but is an 8-bit machine). Anything but a fixed-length 32-bit RISC is a good choice!

jakobengblom2 2009-01-09 08:18:17

+6 A:

All good recommendations. Here are some design approaches that I've found useful.

Byte Coding

Write an interpreter for a special-purpose byte-code instruction set, and write as much of the program as possible in that instruction set. If certain operations require high performance, make them native-code and call them from the interpreter.

Code Generation

If part of the input data changes very infrequently, you could have an external code generator that creates an ad-hoc program. That will be smaller than a more general program, as well as running faster and not having to allocate storage for the seldom-changing input.

Be a data-hater

Be willing to waste lots of cycles if it will let you store absolutely minimum data structure. Usually you will find that performance suffers very little.

Mike Dunlavey 2009-01-27 22:36:17

+1 A:

Recommend this book Small Memory Software: Patterns for systems with limited memory

kcwu 2009-05-18 06:34:28

Reduce the length of and eliminate as many string constants as possible to reduce code space
Carefully consider the tradeoffs of algorithms versus lookup tables where needed
Be aware of how different kinds of variables are allocated.
- Constants are probably in code space.
- Static variables are probably at fixed memory locations. Avoid these if possible.
- Parameters are probably stored on the stack or in registers.
- Local variables may also be allocated from the stack. Don't declare large local arrays or strings if the code might run out of stack space under worst case conditions.
- You may not have a heap - there may not be an OS to manage the heap for you. Is that acceptable? Do you need a malloc() function?

Ben Gartner 2009-09-07 16:31:16

In addition to the suggestions other have given, remember that local variables declared in your functions will normally be allocated on the stack.

If stack memory is limited or you want to reduce the size of the stack to make room for more heap or global RAM, then consider the following:

Flatten your call tree to reduce the number of variables on the stack at any given time.
Convert large local variables to globals (decreases the amount of stack used, but increases the amount of global RAM used). Variables can be declared:
- Global scope: Visible to entire program
- Static at file scope: Visible in the same file
- Static at function scope: Visible within the function
- NOTE: Regardless, if theses changes are made, you must be wary of issues with reentrant code if you have a preemptive environment.

Many embedded systems do not have a stack monitor diagnostic to insure that a stack overflow is caught, so some analysis is required.

PS: Bonus for appropriate use of Stack Overflow?

Tim Henigan 2009-09-07 17:12:43

ansaurus

tags:

views:

answers:

Memory Efficient Programming

related questions