views:

266

answers:

5

I am converting double values to string like this:

std::string conv(double x) {
    char buf[30];
    sprintf(buf, "%.20g", x);
    return buf;
}

I have hardcoded the buffer size to 30, but am not sure if this is large enough for all cases.

  • How can I find out the maximum buffer size I need?
  • Does the precision get higher (and therefore buffer needs to increase) when switching from 32bit to 64?

PS: I cannot use ostringstream or boost::lexical_cast for performance reason (see this)

+2  A: 

printf("%.20g", 1.79769e+308); is 1.7976900000000000632e+308, 27 bytes including the trailing \0. I would choose 64 or 128 just to be sure.

(Since it's on the stack and released right after you could also go with big buffers, even 2048 bytes, without running into problems for non embedded applications)

Also, are you sure the bottleneck of your program is lexical_cast..? Doing what you're doing seems very silly to me

Andreas Bonini
We write in a multithreadded application large CSV files with double values, and when using `lexical_cast` for some reason it gets terribly slow on some machines. When using sprintf instead it is about 20 times faster.
martinus
Well, if it's a normal desktop application it will have a huge stack, so go with buf[2048] and don't worry about it!
Andreas Bonini
32 bytes is fine - you showed it doesn't need to be quite that big, but even with a 5-digit exponent, 32 has some spare space.
Jonathan Leffler
I don't like seeing 2k buffers like that. In this case it's big enough, but it just makes me think, "if the programmer wasn't confident that 1k is enough, or 1.5k, then (a) why is he confident that 2k is?, and (b) what else is he not confident about?"
Steve Jessop
Better safe than sorry? There is nothing wrong about not being confident that you can jump 2 yards, but being absolutely sure that you can jump half a yard. The second is obvious, the first is not. That 2k (or even 1k -- I said 2k only because stack space in this case is "free" so the number doesn't really matter) is enough is obvious, that 32 is is not. And what about the programmer who reads the code? If I see 32 I'll worry about it overflowing, if I say 1k/2k/10k I won't
Andreas Bonini
Sure, but I actually have fixed the buffer overrun bugs in code where someone said "X is obviously large enough", didn't bother proving that it really is large enough, didn't bother measuring at runtime, and turned out to be wrong in some cases. If you can prove that the "%.20g" format fits in 2k, then you can prove it fits in 64 bytes. If you can't prove it, then you're not saying "better safe than sorry", you're saying "better quite safe than actually safe". Since you can prove it, I say stick with a small number and a comment which shows future readers that you have thought about it.
Steve Jessop
That said, I've also fixed bugs where someone (probably me) calculated the max length and forgot something, like Alok did with the minus sign. Result is either an overrun or truncation which causes problems later, depending whether it was strcpy or strncpy. I think "calculate a limit and add a few" is a fairly good idea for this reason - you don't necessarily trust yourself to be correct to the nearest byte. But IMO if I can't trust myself to be correct within the nearest 1024 bytes, I'm in trouble ;-)
Steve Jessop
A: 

Here's a program to print the number of digits required for maximum and minimum values double can take for any system:

#include <float.h>
#include <stdio.h>

int main(void)
{
    double m = DBL_MAX;
    double n = DBL_MIN;
    int i;
    i = printf("%.20g\n", m);
    printf("%d\n", i);
    i = printf("%.20g\n", n);
    printf("%d\n", i);
    return 0;
}

For me, it prints:

1.7976931348623157081e+308
27
2.2250738585072013831e-308
27

Since 27 includes a newline but doesn't include the terminating 0 for the strings, I would say that on this system, 27 should suffice. For long double, the answer seems to be 27 and 28 for LDBL_MAX and LDBL_MIN respectively on my system.

The man page (on my for sprintf says this about %g:

The double argument is converted in style f or e (or F or E for G conversions). The precision specifies the number of significant digits. If the precision is missing, 6 digits are given; if the precision is zero, it is treated as 1. Style e is used if the exponent from its conversion is less than -4 or greater than or equal to the precision. Trailing zeros are removed from the fractional part of the result; a decimal point appears only if it is followed by at least one digit.

Similar wording is in the C standard.

So I think you will be safe if you used the output from the above program as your array size.

Alok
I used DBL_MAX in my answer too, but because of the exponential notation it may not be the longest (or it may be, I don't know).
Andreas Bonini
keep in mind that you need one additional character for negative numbers :-)
martinus
Nice. I should've thought about that.
Alok
@Andreas: Since the precision is specified as 20, anything >= 1e20 will be printed with the exponential notation.
Alok
+3  A: 

I seem to remember that if you call sprintf with a NULL destination, it doesn't do anything. It does, however, return the number of chars that it "wrote". If I'm right (and I can't seem to find the source for that) then you can do:

// find the length of the string
int len = sprintf(NULL, fmt, var1, var2,...);
// allocate the necessary memory.
char *output = malloc(sizeof(char) * (len + 1)); // yes I know that sizeof(char) is defined as 1 but this seems nicer.
// now sprintf it after checking for errors
sprintf(output, fmt, var1, var2,...);


Another option is to use snprintf which allows you to limit the length of the output:

#define MAX 20 /* or whatever length you want */
char output[MAX];
snprintf(output, MAX, fmt, var1, var2,...);

snprintf takes the size of the buffer as an argument, and doesn't allow the output string to exceed that size.

Nathan Fellman
+1, `snprintf()` is the way to go.
Bastien Léonard
-1: the OP is looking for performance, solution 1 runs printf() twice and calls malloc() which has a big overhead: that's very bad. Solution 2 uses snprintf(), which is slower than printf(). Using printf() with a big enough buffer is the best solution for performance, which (silly or not) is what the OP asked!
Andreas Bonini
+1, for `snprintf`, even though it isn't standard (yet). The performance difference between `snprintf` and `sprintf` is negligible, especially when formatting doubles.
avakar
@Andreas, you're right. I missed the performance requirement. Note, however, that he's not using `printf`, but rather `sprintf`.
Nathan Fellman
A: 

If you are on a platform that supports POSIX or C99, you should be able to use snprintf to compute the size of the buffer you will need. snprintf takes a parameter indicating the size of the buffer you are passing in; if the size of the string would exceed the size of that buffer, it truncates the output to fit into the buffer, and returns the amount of space it would have needed to fit the entire output. You can use the output of this to allocate a buffer that's the exact right size. If you just want to compute the size of the buffer you need, you can pass in NULL as the buffer and a size of 0 to compute how much space you need.

int size = snprintf(NULL, 0, "%.20g", x);
char *buf = malloc(size + 1); // Need the + 1 for a terminating null character
snprintf(buf, size + 1, "%.20g", x);

Remember to free(buf) after you've used it to avoid memory leaks.

The problem with this is that it won't work in Visual Studio, which still does not support C99. While they have something like snprintf, if the buffer passed in is too small, it does not return the size needed, but returns -1 instead, which is completely useless (and it does not accept NULL as a buffer, even with a 0 length).

If you don't mind truncating, you can simply use snprintf with a fixed size buffer, and be assured that you won't overflow it:

char buf[30];
snprintf(buf, sizeof(buf), "%.20g", x);

Make sure you check your platform docs on snprintf; in particular, some platforms may not add a terminating null at the end of the string if the string is truncated, so you may need to do that yourself.

Brian Campbell
+1  A: 

I have hardcoded the buffer size to 30, but am not sure if this is large enough for all cases.

It is. %.20g specifies 20 digits in the mantissa. add 1 for decimal point. 1 for (possible) sign, 5 for "e+308" or "e-308", the worse case exponent. and 1 for terminating null.

20 + 1 + 1 + 5 + 1 = 28.

Does the precision get higher (and therefore buffer needs to increase) when switching from 32bit to 64?

No.

A double is the same size in both architectures. If you declare your variables as long double, then you possibly have 1 more digit in the exponent "e+4092", which still fits in a 30 character buffer. But only on X86, and only on older processors.

The long double is an obsolete 80 bit form of floating point value that was the native format of the 486 FPU. That FPU architecture didn't scale well and as since been discarded in favor of SSE style instructions where the largest possible floating point value is a 64 bit double.

Which is a long way of saying a buffer of 30 characters will always be sufficient as long as you keep limiting the mantissa in your printout to 20 digits.

John Knoeller