views:

977

answers:

5

I have an array of unsigned chars in c I am trying to print in base 10, and I am stuck. I think this will be better explained in code, so, given:

unsigned char n[3];
char[0] = 1;
char[1] = 2;
char[2] = 3;

I would like to print 197121.

This is trivial with small base 256 arrays. One can simply 1 * 256 ^ 0 + 2 * 256 ^ 1 + 3 * 256 ^ 2.

However, if my array was 100 bytes large, then this quickly becomes a problem. There is no integral type in C that is 100 bytes large, which is why I'm storing numbers in unsigned char arrays to begin with.

How am I supposed to efficiently print this number out in base 10?

I am a bit lost.

+7  A: 

There's no easy way to do it using only the standard C library. You'll either have to write the function yourself (not recommended), or use an external library such as GMP.

For example, using GMP, you could do:

unsigned char n[100];  // number to print

mpz_t num;
mpz_import(num, 100, -1, 1, 0, 0, n);  // convert byte array into GMP format
mpz_out_str(stdout, 10, num);  // print num to stdout in base 10
mpz_clear(num);  // free memory for num
Adam Rosenfield
overflowed :) :)
Tom
There is a reason why scientific notion was invented. ;-)
Nick Presta
Jesus, I broke StackOverflow! Take two: It may not be perfect, but you can get away with using doubles (or long doubles) for a while. You may lose some precision, but it really won't matter for 80% of cases, since on my machine DBL_MAX is 179769313486231570814527423731704356798070567525844996598917476803 157260780028538760589558632766878171540458953514382464234321326889 464182768467546703537516986049910576551282076245490090389328944075 868508455133942304583236903222948165808559332123348274797826204144 723168738177180919299881250404026184124858368 (spaces added to not break SO).
Chris Lutz
@Nick - I felt the full number would make more of an impact.
Chris Lutz
in line 3, why would the size (4th argument) be 0?
Mike Caron
@Mike Caron, I forgot the fourth argument, that's been fixed now
Adam Rosenfield
+1  A: 

Here is a function that does what you want:

#include <math.h>
#include <stddef.h> // for size_t

double getval(unsigned char *arr, size_t len)
{
    double ret = 0;
    size_t cur;
    for(cur = 0; cur < len; cur++)
        ret += arr[cur] * pow(256, cur);
    return ret;
}

That looks perfectly readable to me. Just pass the unsigned char * array you want to convert and the size. Note that it won't be perfect - for arbitrary precision, I suggest looking into the GNU MP BigNum library, as has been suggested already.

As a bonus, I don't like your storing your numbers in little-endian order, so here's a version if you want to store base-256 numbers in big-endian order:

#include <stddef.h> // for size_t

double getval_big_endian(unsigned char *arr, size_t len)
{
    double ret = 0;
    size_t cur;
    for(cur = 0; cur < len; cur++)
      {
        ret *= 256;
        ret += arr[cur];
      }
    return ret;
}

Just things to consider.

Chris Lutz
A: 

something like

#define MAX_ARRAY_SIZE 100
unsigned char myArray[MAX_ARRAY_SIZE];
unsigned int base10 = 0;
for (unsigned int i = 0; i < MAX_ARRAY_SIZE; ++i)
{
    base10 += myArray[i] * pow(256, i);
}

obviously initialize the array with whatever values you want.

ThePosey
This is limited to 4 bytes of input; the OP specifically requested how to do it with 100 bytes of input.
Adam Rosenfield
+3  A: 

When I saw this question, I purpose to solve it, but at that moment I was very busy. This last weekend I've could gain some prize hours of free time so I considered my pending challenge.

First of all, I suggest you to considered above response. I never use GMP library but I'm sure that it's better solution than a handmade code. Also, you could be interest to analyze code of bc calculator; it can works with big numbers and I used to test my own code.

Ok, if you are still interested in a code do it by yourself (only with support C language and Standard C library) may be I can give you something.

Before all, a little bit theory. In basic numeric theory (modular arithmetic level) theres is an algorithm that inspire me to arrive at one solution; Multiply and Power algorithm to solve a^N module m:

Result := 1;
for i := k until i = 0
    if n_i = 1 then Result := (Result * a) mod m;
    if i != 0 then Result := (Result * Result) mod m;
end for;

Where k is number of digits less one of N in binary representation, and n_i is i binary digit. For instance (N is exponent):

N = 44 -> 1 0 1 1 0 0

k = 5
n_5 = 1
n_4 = 0
n_3 = 1
n_2 = 1
n_1 = 0
n_0 = 0

When we make a module operation, as an integer division, we can lose part of the number, so we only have to modify algorithm to don't miss relevant data.

Here is my code (take care that it is an adhoc code, strong dependency of may computer arch. Basically I play with data length of C language so, be carefully because my data length could not be the same):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>


enum { SHF = 31, BMASK = 0x1 << SHF, MODULE = 1000000000UL, LIMIT = 1024 };


unsigned int scaleBigNum(const unsigned short scale, const unsigned int lim, unsigned int *num);   
unsigned int pow2BigNum(const unsigned int lim, unsigned int *nsrc, unsigned int *ndst);
unsigned int addBigNum(const unsigned int lim1, unsigned int *num1, const unsigned int lim2, unsigned int *num2);

unsigned int bigNum(const unsigned short int base, const unsigned int exp, unsigned int **num);


int main(void)
{
  unsigned int *num, lim;
  unsigned int *np, nplim;
  int i, j;


  for(i = 1; i < LIMIT; ++i)
  {
    lim = bigNum(i, i, &num);

    printf("%i^%i == ", i, i);
    for(j = lim - 1; j > -1; --j)
      printf("%09u", num[j]);
    printf("\n");

    free(num);
  } 

  return 0;
}


/*
  bigNum: Compute number base^exp and store it in num array
  @base: Base number
  @exp: Exponent number
  @num: Pointer to array where it stores big number

  Return: Array length of result number
*/
unsigned int bigNum(const unsigned short int base, const unsigned int exp, unsigned int **num)
{
  unsigned int m, lim, mem; 
  unsigned int *v, *w, *k;


  //Note: mem has the exactly amount memory to allocate (dinamic memory version) 
  mem = ( (unsigned int) (exp * log10( (float) base ) / 9 ) ) + 3;
  v = (unsigned int *) malloc( mem * sizeof(unsigned int) );
  w = (unsigned int *) malloc( mem * sizeof(unsigned int) );

  for(m = BMASK; ( (m & exp) == 0 ) && m;  m >>= 1 ) ;

  v[0] = (m) ? 1 : 0;
  for(lim = 1; m > 1; m >>= 1)
  { 
    if( exp & m )
      lim = scaleBigNum(base, lim, v);

    lim = pow2BigNum(lim, v, w);

    k = v;
    v = w;
    w = k;
  }

  if(exp & 0x1)
    lim = scaleBigNum(base, lim, v);

  free(w);

  *num = v;  
  return lim;
}

/*
  scaleBigNum: Make an (num[] <- scale*num[]) big number operation
  @scale: Scalar that multiply big number
  @lim: Length of source big number
  @num: Source big number (array of unsigned int). Update it with new big number value

  Return: Array length of operation result
  Warning: This method can write in an incorrect position if we don't previous reallocate num (if it's necessary). bigNum method do it for us
*/
unsigned int scaleBigNum(const unsigned short scale, const unsigned int lim, unsigned int *num)
{
  unsigned int i;
  unsigned long long int n, t;


  for(n = 0, t = 0, i = 0; i < lim; ++i)
  {
    t = (n / MODULE);
    n = ( (unsigned long long int) scale * num[i]  );

    num[i] =  (n % MODULE) + t;  // (n % MODULE) + t always will be smaller than MODULE  
  }

  num[i] = (n / MODULE);

  return ( (num[i]) ? lim + 1 : lim );
}


/*
  pow2BigNum: Make a (dst[] <- src[] * src[]) big number operation  
  @lim: Length of source big number
  @src: Source big number (array of unsigned int)
  @dst: Destination big number (array of unsigned int)

  Return: Array length of operation result
  Warning: This method can write in an incorrect position if we don't previous reallocate num (if it's necessary). bigNum method do it for us
*/
unsigned int pow2BigNum(const unsigned int lim, unsigned int *src, unsigned int *dst)
{
  unsigned int i, j;
  unsigned long long int n, t;
  unsigned int k, c;


  for(c = 0, dst[0] = 0, i = 0; i < lim; ++i)
  {
    for(j = i, n = 0; j < lim; ++j)
    {
      n = ( (unsigned long long int) src[i] * src[j] );
      k = i + j;

      if(i != j)
      {
        t = 2 * (n % MODULE);
        n = 2 * (n / MODULE);

        // (i + j)
        dst[k] = ( (k > c) ? ((c = k), 0) : dst[k] ) + (t % MODULE); 
        ++k; // (i + j + 1)
        dst[k] = ( (k > c) ? ((c = k), 0) : dst[k] ) + ( (t / MODULE) + (n % MODULE) ); 
        ++k; // (i + j + 2)
        dst[k] = ( (k > c) ? ((c = k), 0) : dst[k] ) + (n / MODULE);
      }
      else
      {
        dst[k] = ( (k > c) ? ((c = k), 0) : dst[k] ) + (n % MODULE);
        ++k; // (i + j)
        dst[k] = ( (k > c) ? ((c = k), 0) : dst[k] ) + (n / MODULE);
      }

      for(k = i + j; k < (lim + j); ++k)
      {
        dst[k + 1] += (dst[k] / MODULE);
        dst[k] %= MODULE;
      }

    }
  }

  i = lim << 1;
  return ((dst[i - 1]) ? i : i - 1);
}


/*
  addBigNum: Make a (num2[] <- num1[] + num2[]) big number operation
  @lim1: Length of source num1 big number
  @num1: First source operand big number (array of unsigned int). Should be smaller than second
  @lim2: Length of source num2 big number
  @num2: Second source operand big number (array of unsigned int). Should be equal or greater than first

  Return: Array length of operation result or 0 if num1[] > num2[] (dosen't do any op)
  Warning: This method can write in an incorrect position if we don't previous reallocate num2  
*/
unsigned int  addBigNum(const unsigned int lim1, unsigned int *num1, const unsigned int lim2, unsigned int *num2)
{
  unsigned long long int n;
  unsigned int i;

  if(lim1 > lim2)
    return 0;

  for(num2[lim2] = 0, n = 0, i = 0; i < lim1; ++i)
  {
    n = num2[i] + num1[i] + (n / MODULE); 
    num2[i] = n % MODULE;
  }

  for(n /= MODULE; n; ++i)
  {
    num2[i] += n;
    n = (num2[i] / MODULE);
  }

  return (lim2 > i) ? lim2 : i;
}

To compile:

gcc -o bgn <name>.c -Wall -O3 -lm     //Math library if you wants to use log func

To check result, use direct output as and input to bc. Easy shell script:

#!/bin/bash


select S in ` awk -F '==' '{print $1 " == " $2 }' | bc`;
do
    0;
done;

echo "Test Finished!";


We have and array of unsigned int (4 bytes) where we store at each int of array a number of 9 digits ( % 1000000000UL ); hence num[0] we will have the first 9 digits, num[1] we will have digit 10 to 18, num[2]... I use convencional memory to work but an improvement can do it with dinamic memory. Ok, but how length It could be the array? (or how many memory we need to allocate?). Using bc calculator (bc -l with mathlib) we can determine how many digits has a number:

l(a^N) / l(10)     // Natural logarith to Logarithm base 10

If we know digits, we know amount integers we needed:

( l(a^N) / (9 * l(10)) ) + 1     // Truncate result

If you work with value such as (2^k)^N you can resolve it logarithm with this expression:

( k*N*l(2)/(9*l(10)) ) + 1    // Truncate result

to determine the exactly length of integer array. Example:

256^800 = 2^(8*800) ---> l(2^(8*800))/(9*l(10)) + 1 = 8*800*l(2)/(9*l(10)) + 1

The value 1000000000UL (10^9) constant is very important. A constant like 10000000000UL (10^10) dosen't work because can produce and indetected overflow (try what's happens with number 16^16 and 10^10 constant) and a constant more little such as 1000000000UL (10^8) are correct but we need to reserve more memory and do more steps. 10^9 is key constant for unsigned int of 32 bits and unsigned long long int of 64 bits.

The code has two parts, Multiply (easy) and Power by 2 (more hard). Multiply is just multiplication and scale and propagate the integer overflow. It take the principle of associative property in math to do exactly the inverse principle, so if k(A + B + C) we want kA + kB + kC where number will be k*A*10^18 + k*B*10^9 + k*C. Obiously, k*C operation can generate a number bigger than 999 999 999, but never more bigger than 0xFF FF FF FF FF FF FF FF. A number bigger than 64 bits can never occur in a multiplication because C is an unsigned integer of 32 bits and k is a unsigned short of 16 bits. In worts case, we will have this number:

k = 0x FF FF;
C = 0x 3B 9A C9 FF;    // 999999999
n = k*C = 0x 3B 9A | 8E 64 36 01;

n % 1000000000 = 0x 3B 99 CA 01;
n / 1000000000 = 0x FF FE;

After Mul k*B we need to add 0x FF FE from last multiplication of C ( B = k*B + (C / module) ), and so on (we have 18 bits arithmetic offset, enough to guarantee correct values).

Power is more complex but is in essencial, the same problem (multiplication and add), so I give some tricks about code power:

  • Data types are important, very important
  • If you try to multiplication an unsigned integer with unsigned integer, you get another unsigned integer. Use explicit cast to get unsigned long long int and don't lose data.
  • Always use unsigned modifier, dont forget it!
  • Power by 2 can directly modify 2 index ahead of current index
  • gdb is your friend

I've developed another method that add big numbers. These last I don't prove so much but I think it works well. Don't be cruels with me if it has a bug.

...and that's all!

PD1: Developed in a

Intel(R) Pentium(R) 4 CPU 1.70GHz

Data length: 
    unsigned short: 2 
    unsigned int: 4 
    unsigned long int: 4 
    unsigned long long int: 8

Numbers such as 256^1024 it spend:

real    0m0.059s
user    0m0.033s
sys    0m0.000s

A bucle that's compute i^i where i goes to i = 1 ... 1024:

real    0m40.716s
user    0m14.952s
sys    0m0.067s

For numbers such as 65355^65355, spent time is insane.

PD2: My response is so late but I hope my code it will be usefull.

PD3: Sorry, explain me in english is one of my worst handicaps!

Last update: I just have had an idea that with same algorithm but other implementation, improve response and reduce amount memory to use (we can use the completely bits of unsigned int). The secret: n^2 = n * n = n * (n - 1 + 1) = n * (n - 1) + n. (I will not do this new code, but if someone are interested, may be after exams... )

saxi
tldr, but what a nice answer!
toto
@toto: How do you know it's a nice answer if it's too long for you to read?
kitchen
A: 

It may be too late or too irrelevant to make this suggestion, but could you store each byte as two base 10 digits (or one base 100) instead of one base 256? If you haven't implemented division yet, then that implies all you have is addition, subtraction, and maybe multiplication; those shouldn't be too hard to convert. Once you've done that, printing it would be trivial.

Mark Ransom