tags:

views:

1115

answers:

5

I have a binary string that I am encoding in Base 64. Now, I need to know before hand the size of the final Base 64 encoded string will be.

Is there any way to calculate that?

Something like:

BinaryStringSize is 64Kb EncodedBinaryStringSize will be 127Kb after encoding.

Oh, the code is in C.

Thanks.

+1  A: 

The actual length of MIME-compliant base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be a lot higher because of the overhead of the headers. Very roughly, the final size of base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers).

In other words, you can approximate the size of the decoded data with this formula:

BytesNeededForEncoding = (string_length(base_string) * 1.37) + 814;
BytesNeededForDecoding = (string_length(encoded_string) - 814) / 1.37;

Source: http://en.wikipedia.org/wiki/Base64

Gary Willoughby
Thanks. How do you put that in code?
Uri
great, let me go check it.
Uri
thanks! That was it.
Uri
Just edited the code, a small error. ;)
Gary Willoughby
This is only for MIME-encoded messages, and not necessarily unadorned Base64.
geocar
+2  A: 

Base 64 transforms 3 bytes into 4.

If you're set of bits does not happen to be a multiple of 24 bits, you must pad it out so that it has a multiple of 24 bits (3 bytes).

S.Lott
So, length(Base-64) = 4 * (length(Binary) + 2) / 3. Then newlines and trailing null might need to be considered.
Jonathan Leffler
@Jonathan Leffler: Correct. And if it's a MIME attachment, all the MIME overheads need to be factored in.
S.Lott
+6  A: 

If you do Base64 exactly right, and that includes padding the end with = characters, and you break it up with a CR LF every 72 characters, the answer can be found with:

code_size    = ((input_size * 4) / 3);
padding_size = (input_size % 3) ? (3 - (input_size % 3)) : 0;
crlfs_size   = 2 + (2 * (code_size + padding_size) / 72);
total_size   = code_size + padding_size + crlfs_size;

In C, you may also terminate with a \0-byte, so there'll be an extra byte there, and you may want to length-check at the end of every code as you write them, so if you're just looking for what you pass to malloc(), you might actually prefer a version that wastes a few bytes, in order to make the coding simpler:

output_size = ((input_size * 4) / 3) + (input_size / 96) + 6;
geocar
The CRLF-per-72 characters is not raw Base 64 encoding; it is just a common variant.
Jonathan Leffler
If I put emphasis on "and" in "and you break it up with ...", would that make it more clear?
geocar
This CRLF calculation should be: crlfs_size = 2 + (2 * ((code_size + padding_size) / 72));The total size must be divided by the line width before doubling to properly account for the last line. Otherwise the number of CRLF characters will be overestimated. Also the 2 + may not be necessary if there is no final CRLF pair.
adzm
Additionally, I noticed the code_size calculation itself has some flaws. See my answer for a well-tested alternative.
adzm
+2  A: 

Check out the b64 library. The function b64_encode2() can give a maximum estimate of the required size if you pass NULL, so you can allocate memory with certainty, and then call again passing the buffer and have it do the conversion.

dcw
+3  A: 

geocar's answer was close, but could sometimes be off slightly.

There are 4 bytes output for every 3 bytes of input. If the input size is not a multiple of three, we must add to make it one. Otherwise leave it alone.

input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) 

Divide this by 3, then multiply by 4. That is our total output size, including padding.

code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4

As I said in my comment, the total size must be divided by the line width before doubling to properly account for the last line. Otherwise the number of CRLF characters will be overestimated. I am also assuming there will only be a CRLF pair if the line is 72 characters. This includes the last line, but not if it is under 72 characters.

newline_size = ((code_padded_size) / 72) * 2

So put it all together:

unsigned int code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;

unsigned int total_size = code_padded_size + newline_size;

Or to make it a bit more readable:

unsigned int adjustment = ( (input_size % 3) ? (3 - (input_size % 3)) : 0);
unsigned int code_padded_size = ( (input_size + adjustment) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;

unsigned int total_size = code_padded_size + newline_size;
adzm