views:

310

answers:

6

I want to concatenate a piece of text, for example "The answer is " with a signed integer, to give the output "The number is 42".

I know how long the piece of text is (14 characters) but I don't know how many characters the string representation of the number will be.

I assume the worst case scenario, the largest signed 16-bit integer has 5 digits, plus one extra in case it is negative, so is the following code the correct way to do it?

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *message;

    message = malloc(14*sizeof(char)+(sizeof(int)*5)+1);

    sprintf(message, "The answer is %d", 42);

    puts(message);

    free(message);
}
+6  A: 

Use:

malloc(14*sizeof(char) /*for the 14 char text*/
       +(sizeof(char)*5) /*for the magnitude of the max number*/
       +1 /* for the sign of the number*/
       +1 /* for NULL char*/
      );

Since the digits will be represented as char you have to use sizeof(char) instead of sizeof(int).

codaddict
I'd add that since `sizeof(char)` is 1 by the C standard, just `14+5+1+1` would be ok
qrdl
@qrdl: Ya that would work, but using sizeof(char) makes it clear what is it being allocated to hold.
codaddict
You must apply the sizeof(char) on the sum of the number of char you need.On a 2 bytes / char system your solution won't work.
Niklaos
@Niklaos, a byte is _not_ an octet. The ISO standard defines a byte as the same size as a char, however many bits that is. You _never_ need sizeof(char) for malloc.
paxdiablo
@Niklaos, can you give an example of a 2 bytes/char system? As far as I know the C standard defines char as one byte.
mcl
That's true, @mcl, just keep in mind that a byte is _not_ necessarily 8 bits.
paxdiablo
A: 

malloc((14 + 6 + 1) * sizeof(char));

  • 14 char for the string
  • 6 for de digits + sign
  • 1 for the '\0'

Note : Sizeof(int) gives you the size of the type in byes. Sizeof(int) == 4 if the int is 32bits, 8 if it's a 64bits.

Niklaos
There is a typo in : 4 char for the string, it should be 14.
gameover
I fixed the typo.
Oddthinking
Please stop perpetuating the myth that a byte is 8 bits. A byte and a char in the ISO C/C++ standard is defined by CHAR_BITS in limits.h - sizeof(int) will be 1 for a 32-bit int if CHAR_BITS is 32 (i.e., a 32-bit byte/char).
paxdiablo
@paxdiablo Amazing :p I just read the wikipedia byte page. So yes, it's looks like it's a myth.
Niklaos
It's a little known fact to those who aren't "language lawyers" like myself. I actually thought I'd found a bug in the standard once because of this misunderstanding. But alas I, like so many others before me (and after), was wrong :-)
paxdiablo
Ultra-language-lawyer: sizeof(int)*CHAR_BITS only gives you the storage requirements. The range of possible values is no bigger than `1<<bits` of course, but may be smaller. E.g. in systems that store integers as floats with exponent set to 0 (Crays), where the number of digits needed to represent INT_MIN is non-trivial.
MSalters
Nitpitck to above comments - it's `CHAR_BIT`, not `CHAR_BITS`.
caf
A: 

I think that the correct formula to get the maximum lenght of the decimal representation of an integer would be (floor(log10(INT_MAX))+1); you could also abuse of the preprocessor in this way:

#include <limits.h>
#define TOSTRING_(x) #x
#define TOSTRING(x) TOSTRING_(x)
/* ... */
#define YOUR_MESSAGE "The answer is "
char message[]=YOUR_MESSAGE "+" TOSTRING(INT_MAX);
sprintf(message+sizeof(YOUR_MESSAGE),"%d", 42);

, which also avoids the heap allocation. You may want to use snprintf for better security, although with this method it shouldn't be necessary.

Another trick like that would be to create a function like this:

size_t GetIntMaxLenght()
{
    const char dummy[]=TOSTRING(INT_MAX);
    return sizeof(dummy)+1;
}

if the compiler is smart enough it could completely sweep away the dummy var from the compiled code, otherwise it may be wise to declare that var as static to avoid reinitializing it every time the function is called.

Matteo Italia
don't forget a byte for an optional sign if it's negative.
plinth
And the fact that it's INT_MAX rather than MAX_INT :-)
paxdiablo
Corrected, thank you. I also added a const in the second function that may help the compiler to optimize the var away (IIRC the compiler is not required to allocate memory for const objects if they aren't used).
Matteo Italia
This doesn't work for two reasons. Firstly, unless you add another level of macros, `TOSTRING(INT_MAX)` will just give you the string `"INT_MAX"`. Secondly, `INT_MAX` is not necessarily defined to be a number - it just has to be a compile-time constant. An implementation would be well within its rights to have something like `#define INT_MAX __imax`.
caf
Huh, you're right, I always forget about the double macro evaluation trick (actually I usually I use C++ so I rarely need macro hacks). For your second point, well, actually I've never seen any compiler doing that; still, theoretically it can be.
Matteo Italia
+3  A: 

Not quite, you only need a number of characters so sizeof(int) is not required.

However, for easily maintainable and portable code, you should have something like:

#define TEXT "The answer is "
#undef CHARS_PER_INT
#if INT_MAX == 32767
    #define CHARS_PER_INT 6
#endif
#if INT_MAX == 2147483647
    #define CHARS_PER_INT 11
#endif
#ifndef CHARS_PER_INT
    #error Suspect system, I have no idea how many chars to allocate for an int.
#endif

int main (void) {
    char *message;

    message = malloc(sizeof(TEXT)+CHARS_PER_INT+1);
    sprintf(message, TEXT "%d", 42);
    puts(message);
    free(message);
    return 0;
}

This has a number of advantages:

  • If you change the string, you change one thing and one thing only. The argument to malloc adjusts automatically.
  • The expression sizeof(TEXT)+CHARS_PER_INT+1 is calculated at compile time. A solution involving strlen would have a runtime cost.
  • If you try to compile your code on a system where integers may cause overflow, you'll be told about it (go fix the code).
  • You should actually allocate an extra character for the number since the biggest 16-bit number (in terms of character count) is -32768 (six characters long). You'll notice I still have a +1 on the end - that's because you need space for the string null terminator.
paxdiablo
Thanks for the answer - you say the expression is calculated at compile time, but I don't see any difference between this and the answer above - they are surely calculated at the same point, i.e. inside the malloc() expression. What am I missing?
SlappyTheFish
@Slappy, they are both compile-time expressions. I was comparing it to a "strlen()" solution which would have a runtime cost, updating to make that clear.
paxdiablo
+1  A: 

One way of doing it (not necessarily recommended) that gives you the exact size of the number in characters is using the stdio functions themselves.

For example, if you print the number (somewhere, for whatever reason) before you allocate your memory, you can use the %n format identifier with printf. %n doesn't print anything; rather, you supply it with a pointer to int, and printf fills that with how many characters have been written so far.

Another example is snprintf, if you have it available. You pass it the maximum number of characters you want it to write to your string, and it returns the number of characters it should have written, not counting the final nul. (Or -1 on error.) So, using a 1-byte dummy string, snprintf can tell you exactly how many characters your number is.

A big advantage to using these functions is that if you decide to change the format of your number (leading 0's, padding spaces, octal output, long longs, whatever) you will not overrun your memory.

If you have GNU extensions to stdio, you may want to consider using asprintf. This is exactly like sprintf, except it does the memory allocation for you! No assembly required. (Although you do need to free it yourself.) But you shouldn't rely on it to be portable.

JXG
A: 

A safe approximation for signed int is (digits including the potential - sign):

(CHAR_BIT * sizeof(int) - 1) / 3 + 1

The equivalent for unsigned is:

(CHAR_BIT * sizeof(unsigned)) / 3

This will slightly overestimate the space required for very long types (and will also overestimate in the unusual case where int has padding bits), but is a good approximation and has the advantage that it is a compile-time constant. CHAR_BIT is provided by <limits.h>.

caf