tags:

views:

399

answers:

7

Hello,

I am planning to use an Arduino programmable board. Those have quite limited flash memories ranging between 16 and 128 kB to store compiled C or C++ code.

Are there ways to estimate how much (standard) code it will represent ?

I suppose this is very vague, but I'm only looking for an order of magnitude.

+1  A: 

You can't really say there. The length of the uncompiled code has little to do with the length of the compiled code. For example:

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>

int main()
{
  std::vector<std::string> strings;
  strings.push_back("Hello");
  strings.push_back("World");
  std::sort(strings.begin(), strings.end());
  std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, ""));
}

vs

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>

int main()
{
  std::vector<std::string> strings;
  strings.push_back("Hello");
  strings.push_back("World");
  for ( int idx = 0; idx < strings.size(); idx++ )
    std::cout << strings[idx];
}

Both are the exact same number of lines, and produce the same output, but the first example involves an instantiation of std::sort, which is probably an order or magnitude more code than the rest of the code here.

If you absolutely need to count number of bytes used in the program, use assembler.

Billy ONeal
all good except that last statement! people happily code in C (and C++) in limited space environments.
Keith Nicholas
Even using assembler you would then need to know how many bytes each instruction was, since (at least on some platforms) there no fixed relationship between assembly mnemonics and bytes. So either way you'd be compiling/assembling the code and looking how big the result is.
Pete Kirkham
@Kieth: I never said that you had to use assembler in limited space environments. I said that if you needed to be *absolutely sure* as to how code size would affect binary size, that's what you'd have to do. The amount of code generated is not directly proportional to the input code size, even in C and C++.@Pete: This is true -- every platform I've ever written assembler for used fixed-width instructions (PIC and MIPS). Of course if you're writing something like x86 than even assembler cannot help you, but at least it'd be more directly related to program size than C or C++ source code.
Billy ONeal
A: 

At a linux system you can do some experiments with static compiled example programs. E.g.

$ size `which busybox `
text            data    bss     dec     hex filename
1830468    4448   25650 1860566  1c63d6 /bin/busybox

The sizes are given in bytes. This output is independent from the executable file format, since the sizes of the different sections inside the file format. The text section contains the machine code and const stufff. The data section contains data for static initialization of variables. The bss size is the size of uninitialized data - of course uninitialized data does not need to be stored in the executable file.)

Well, busybox contains a lot of functionality (like all common shell commands, a shell etc.).

If you link own examples with gcc -static, keep in mind, that your used libc may dramatically increase the program size and that using an embedded libc may be much more space efficient.

To test that you can check out the diet-libc or uclibc and link against that. Actually, busybox is usually linked against uclibc.

Note that the sizes you get this way give you only an order of magnitude. For example, your workstation probably uses another CPU architecture than the arduino board and the machine code of different architecture may differ, more or less, in its size (because of operand sizes, available instructions, opcode encoding and so one).

To go on with rough order of magnitude reasoning, busybox contains roughly 309 tools (including ftp daemon and such stuff), i.e. the average code size of a busybox tool is roughly 5k.

maxschlepzig
I don't see in any way how this is linux specific. Windows has similar utilities, i.e. FileAlyzer. Other unixen are perfectly able to use size as well, not just linux.
Billy ONeal
Here the file size is 183 kB ? I'm not familiar with the `size` command (shame on me).
Klaus
This won't be representative, you'll pull in (realtive) huge libraries and the overhead of the elf format itself, which'll be huge compared to a binary on arduino.
nos
@Billy ONeal: I did not even write that this is linux specific. This is kind of linux specific, because busybox is quite linux specific. Sure, the size utily is available on other unices, too. But again, I did not write the opposite.
maxschlepzig
@nos: I've had already mentioned the impact of different libraries. Looking at the text value is elf independent (it is the size of the section), or is it? Besides, please not that the poster asked 'for an order of magnitude'.
maxschlepzig
@Klaus: The code size is text + data section (~ 1.8 MB, the data section contains stuff like data for static initializations). You get the actual file size via ls, which also depends on the file format. E.g. ls -l `which busybox` prints at my system a file size of 1841392 bytes.
maxschlepzig
@maxschelpzig: I believe the exact words you used were "At a linux system"
Billy ONeal
Well, it's the example that is linux-specific, not the concept, so it would have been better to move "at a linux system" to immediately following "e.g.". The issue here is not linux vs other desktops, but that library size on desktops is not at all representative of Arduino code size
Ben Voigt
@Billy ONeal: Which does not exclude other systems.
maxschlepzig
@Ben Voigt: I explicitly mentioned the diet-libc, which is code size optimized for embedded systems. And busybox links to an embedded libc as well. If you use some libc function in your arduino code, then the libc-code statically linked is probably comparable with diet-libc or uclibc one.
maxschlepzig
@max: No, it's not at all comparable. They're not just different compilers with different libraries, they're completely different architectures. Size of x86 machine code tells you almost nothing about size of AVR machine instructions used by the Arduino.
Ben Voigt
Thank you for the additions, anyway interesting :)
Klaus
+1  A: 

its quite a bit for a reasonably complex piece of software, but you will start bumping into the limit if you want it to have a lot of different functionality. Also if you want to store quite a lot of static strings / data, it can eat into that quite quickly. But 32k is a decent amount for embedded applications. Tends to be RAM that you have problems with first!

also, quite often the C++ compilers for embedded systems are a lot worse than the C compilers.
ie, they are no where as good as C++ compilers for the common desktop OS's (in terms of producing efficient machine code for the target platform)

Keith Nicholas
Thank you for the feedback :) We'll see if we lose much space/RAM with C++/OOP.
Klaus
+2  A: 

Download the arduino IDE and 'verify' some of your existing code, or look at the sample sketches. It will tell you how many bytes that code is, which will give you an idea of how much more you can fit into a given device. Picking a couple of the examples at random, the web server example is 5816 bytes, and the LCD hello world is 2616. Both use external libraries.

Pete Kirkham
I did that, but default example are all quite small. I found a bigger code at http://www.fisherinnovation.com/?p=fi-apartmentbot that gave me a better idead. Compiled, it is about 6 kB.
Klaus
+4  A: 

The output of the size command is a good starting place, but does not give you all of the information you need.

$ avr-size program.elf
text            data    bss     dec     hex filename

The size of your image is usually a little bit more than the sum of the text and the data sections. The bss section is essentially compressed because it is all 0s. There may be other sections which are relevant which aren't listed by size.

If your build system is set up like ones that I've used before for AVR microcontrollers then you will end up with an *.elf file as well as a *.bin file, and possibly a *.hex file. The *.bin file is the actual image that would be stored in the program flash of the processor, so you can examine its size to determine how your program is growing as you make edits to it. The *.bin file is extracted from the *.elf file with the objdump command and some flags which I can't remember right now.

If you are wanting to know how to guess-timate how your much your C or C++ code will produce when compiled, this is a lot more difficult. I have observed a 10x blowup in a function when I tried to use a uint64_t rather than a uint32_t when all I was doing was incrementing it (this was about 5 times more code than I thought it would be). This was mostly to do with gcc's avr optimizations not being the best, but smaller changes in code size can creep in from seemingly innocent code.

This will likely be amplified with the use of C++, which tends to hide more things that turn into code than C does. Chief among the things C++ hides are destructor calls and lots of pointer dereferencing which has to do with the this pointer in objects as well as a secret pointer many objects have to their virtual function table and class static variables.

On AVR all of this pointer stuff is likely to really add up because pointers are twice as big as registers and take multiple instructions to load. Also AVR has only a few register pairs that can be used as pointers, which results in lots of moving things into and out of those registers.

Some tips for small programs on AVR:

  • Use uint8_t and int8_t instead of int whenever you can. You could also use uint_fast8_t and int_fast8_t if you want your code to be portable. This can lead to many operations taking up only half as much code, because int is two bytes.

  • Be very aware of things like string and struct constants and literals and how/where they are stored.

  • If you're not scared of it, read the AVR assembly manual. You can get an idea of the types of instructions, and from that the type of C code that easily maps to those instructions. Use that kind of C code.

nategoose
Thank you for your answer, particularly for all the practical advises.
Klaus
@Klaus: If you think the answer is useful, you should mark it as the accepted answer. :)
Sedate Alien
All of the answers have been useful, unfortunately, I can't mark them all ;)
Klaus
A: 

In my experience, it isn't your code as much as it is what libraries that said code drags in. For your consideration: The line

sprintf( string, "Uptime: %d:%02d:%02d", uptime.hours, uptime.minutes, uptime.seconds );

probably consumes about fifty or sixty bytes. However including sprintf() and friends will increase the size of your memory image by at least 2kb.

Bob Kaufman
A: 

Try creating a simplified version of your app, focusing on the most valuable feature first, then start adding up the 'nice (and cool) stuff to have'. Keep an eye on the byte usage shown in the Arduino IDE when you verify your code.

As a rough indication, my first app (LED flasher controlled by a push buttun) requires 1092 bytes. That`s roughly 1K out of 32k. Pretty small footprint for C++ code!

What worries me most is the limited amount of RAM (1 Kb). If the CPU stack takes some of it, then there isn`t much left for creating any data structures.

I only had my Arduino for 48 hrs, so there is still a lot to use it effectively ;-) But it's a lot of fun to use :).

Yves
Thank you for the feedback and advise :) Have fun with your Arduino.
Klaus