views:

422

answers:

9

The Daily WTF for 2008-11-28 pillories the following code:

static char *nice_num(long n)
{
    int neg = 0, d = 3;
    char *buffer = prtbuf;
    int bufsize = 20;

    if (n < 0)
    {
        neg = 1;
        n = -n;
    }
    buffer += bufsize;
    *--buffer = '\0';

    do
    {
        *--buffer = '0' + (n % 10);
        n /= 10;
        if (--d == 0)
        {
            d = 3;
            *--buffer = ',';
        }
    }
    while (n);

    if (*buffer == ',') ++buffer;
    if (neg) *--buffer = '-';
    return buffer;
}

How would you write it?

+12  A: 

If you're a seasoned C programmer, you'll realize this code isn't actually that bad. It's relatively straightforward (for C), and it's blazingly fast. It has three problems:

  1. It fails on the edge case of LONG_MIN (-2,147,483,648), since negating this number produces itself in twos-complement
  2. It assumes 32-bit integers - for 64-bit longs, a 20-byte buffer is not big enough
  3. It's not thread-safe - it uses a global static buffer, so multiple threads calling it at the same time will result in a race condition

Problem #1 is easily solved with a special case. To address #2, I'd separate the code into two functions, one for 32-bit integers and one for 64-bit integers. #3 is a little harder - we have to change the interface to make completely thread-safe.

Here is my solution, based on this code but modified to address these problems:

static int nice_num(char *buffer, size_t len, int32_t n)
{
  int neg = 0, d = 3;
  char buf[16];
  size_t bufsize = sizeof(buf);
  char *pbuf = buf + bufsize;

  if(n < 0)
  {
    if(n == INT32_MIN)
    {
      strncpy(buffer, "-2,147,483,648", len);
      return len <= 14;
    }

    neg = 1;
    n = -n;
  }

  *--pbuf = '\0';

  do
  {
    *--pbuf = '0' + (n % 10);
    n /= 10;
    if(--d == 0)
    {
      d = 3;
      *--pbuf = ',';
    }
  }
  while(n > 0);

  if(*pbuf == ',') ++pbuf;
  if(neg) *--pbuf = '-';

  strncpy(buffer, pbuf, len);
  return len <= strlen(pbuf);
}

Explanation: it creates a local buffer on the stack and then fills that in in the same method as the initial code. Then, it copies it into a parameter passed into the function, making sure not to overflow the buffer. It also has a special case for INT32_MIN. The return value is 0 if the original buffer was large enough, or 1 if the buffer was too small and the resulting string was truncated.

Adam Rosenfield
When I read your first paragraph, I decided to actually try and read the code. I was surpised that I very easily understood how it worked. I hope this is a good thing. =]
strager
The only gripe I might have about this code is that strncpy() does not guarantee null termination if the string to be copied is longer than the buffer to receive it. The return value does signal that, but it is usually a good idea to ensure that the output is null terminated.
Jonathan Leffler
A: 

In pure C:

#include <stdio.h>
#include <limits.h>

static char *prettyNumber(long num, int base, char separator)
{
#define bufferSize      (sizeof(long) * CHAR_BIT)
        static char buffer[bufferSize + 1];
        unsigned int pos = 0;

        /* We're walking backwards because numbers are right to left. */
        char *p = buffer + bufferSize;
        *p = '\0';

        int negative = num < 0;

        do
        {
                char digit = num % base;
                digit += '0';

                *(--p) = digit;
                ++pos;

                num /= base;

                /* This the last of a digit group? */
                if(pos % 3 == 0)
                {
/* TODO Make this a user setting. */
#ifndef IM_AMERICAN
#       define IM_AMERICAN_BOOL 0
#else
#       define IM_AMERICAN_BOOL 1
#endif
                        /* Handle special thousands case. */
                        if(!IM_AMERICAN_BOOL && pos == 3 && num < base)
                        {
                                /* DO NOTHING */
                        }
                        else
                        {
                                *(--p) = separator;
                        }
                }
        } while(num);

        if(negative)
                *(--p) = '-';

        return p;
#undef bufferSize
}

int main(int argc, char **argv)
{
        while(argc > 1)
        {
                long num = 0;

                if(sscanf(argv[1], "%ld", &num) != 1)
                        continue;

                printf("%ld = %s\n", num, prettyNumber(num, 10, ' '));

                --argc;
                ++argv;
        };

        return 0;
}

Normally I'd return an alloc'd buffer, which would need to be free'd by the user. This addition is trivial.

strager
Instead of using "IM_AMERICAN", why not use localeconv()->thousands_sep? It'll be the appropriate value for your locale.
Evan Teran
+1  A: 

That's probably pretty close to the way I would write it actually. The only thing I can immediately see that is wrong with the solution is that is doesn't work for LONG_MIN on machines where LONG_MIN is -(LONG_MAX + 1), which is most machines nowadays. I might use localeconv to get the thousands separator instead of assuming comma, and I might more carefully calculate the buffer size, but the algorithm and implementation seem pretty straight-forward to me, not really much of a WTF for C (there are much better solutions for C++).

Robert Gamble
+2  A: 

Hmm... I guess I shouldn't admit this, but my int to string routine for an embedded system work in pretty much exactly the same way (but without putting in the commas).

It's not particularly straightforward, but I wouldn't call it a WTF if you're working on a system that you can't use snprintf() on.

The guy who wrote the above probably noted that the printf() family of routines can't do comma grouping, so he came up with his own.

Footnote: there are some libraries where the printf() style formatting does support grouping, but they are not standard. And I know that the posted code doesn't support other locales that group using '.'. But that's hardly a WTF, just a bug possibly.

Michael Burr
+1  A: 

Lisp:

(defun pretty-number (x) (format t "~:D" x))

I'm suprised how easily I could do this. I'm not even past the first chapter in my Lisp book. xD (Or should I say, ~:D)

strager
A: 

I got bored and made this naive implementation in Perl. Works.


sub pretify {
    my $num = $_[0];
    my $numstring = sprintf( "%f", $num );

    # Split into whole/decimal
    my ( $whole, $decimal ) = ( $numstring =~ /(^\d*)(.\d+)?/ );
    my @chunks;
    my $output = '';

    # Pad whole into multiples of 3
    $whole = q{ } x ( 3 - ( length $whole ) % 3 ) . $whole;

    # Create an array of all 3 parts.
    @chunks = $whole =~ /(.{3})/g;

    # Reassemble with commas
    $output = join ',', @chunks;
    if ($decimal) {
        $output .= $decimal;
    }

    # Strip Padding ( and spurious commas )
    $output =~ s/^[ ,]+//;

    # Strip excess tailing zeros
    $output =~ s/0+$//;

    # Ending with . is ugly
    $output =~ s/\.$//;
    return $output;
}

print "\n", pretify 100000000000000000000000000.0000;
print "\n", pretify 10_202_030.45;
print "\n", pretify 10_101;
print "\n", pretify 0;
print "\n", pretify 0.1;
print "\n", pretify 0.0001;
print "\n";
Kent Fredric
pretify, i like it!
alex
A: 

Thank God I don't work with C.

Ben Aston
A: 

Yeah, we thank God for you not working with C too!

@Ferneu: your answer does not provide any useful information. I realize that as a newcomer, you cannot comment on other people's postings. I assume your comment applies to @sasserstyl's answer. But many people would down-vote your answer as not helpful. Sometimes, jocularity is OK, but be cautious.
Jonathan Leffler
A: 
size_t
signed_as_text_grouped_on_powers_of_1000(char *s, ssize_t max, int n)
{
    if (max <= 0)
        return 0;

    size_t r=0;
    bool more_groups = n/1000 != 0;
    if (more_groups)
    {
       r = signed_as_text_grouped_on_powers_of_1000(s, max, n/1000);
       r += snprintf(s+r, max-r, ",");
       n = abs(n%1000);
       r += snprintf(s+r, max-r, "%03d",n);
    } else
       r += snprintf(s+r, max-r, "% 3d", n);

    return r;
}

Unfortunately, this is about 10x slower than the original.

Adrian Panasiuk