ansaurus

Question

Answer 1

+2 A:

Fundamentally, it's a mistake to talk about "base64 encoding a string" where "string" is meant in terms of text.

Base64 encoding is applied to binary data (a sequence of bytes, or octets if you want to be even more picky), and the result is text. Every character in the output is printable ASCII text. The whole point of base64 is to provide a safe way of converting arbitrary binary data into a text format which can be reliably embedded in other text, transported etc. ASCII is compatible with almost all character sets, so you're very unlikely to be unable to encode ASCII text as part of something else.

When someone talks about "base64 encoding a string" they're really talking about encoding text as binary using some existing encoding (e.g. UTF-8), then applying a base64 encoding to the result. When decoding, you'd need to decode the base64 back to binary, and then decode that binary data with the original encoding, to get the original text.

Jon Skeet 2010-08-17 19:17:10

Answer 2

+2 A:

For me the (first) linked article has a fundamental problem:

Before even attempting to base64 encode a string, you should check to see if the string contains only ASCII characters

You don't base64 encode strings. You base64 encode byte sequences. And when you're dealing with any kind of encoding work, it's extremely important to keep in mind this difference.

Also, his check for 'ASCII' actually lets through everything from 80 to ff, which aren't ASCII - ASCII is only 00 to 7f.

Now, if you have a string which you have checked is pure ASCII, you can then safely treat it as a byte sequence of the ASCII values of the characters in it - but this is a separate earlier step, nothing strictly to do with the act of base64 encoding.

(I should say that I do like his repeated urging for the reader to note that base64 encoding is not in any shape or form encryption)

AakashM 2010-08-17 19:19:00

Thanks for the reply. Are there any prerequisites to these byte sequences or can any arbitrary sequence of bytes be base64 encoded?

Rob 2010-08-17 22:52:01

@Rob any byte sequence at all. The article is actually quite good on explaining how the 24 bits in any 3 bytes are split into 4 groups of 6 bits, which are then mapped to characters in the base64 alphabet.

AakashM 2010-08-18 09:21:09

@AakashM Ok thanks. So then the test from his code could/should be omitted?

Rob 2010-08-18 13:42:05

@Rob that starts to get into a different issue. His code uses `charCodeAt` to pull characters from the string - now, my Javascript is really not good enough to be able to say how that will handle character encoding issues. I *think* Javascript strings are always internally UTF-16, but don't trust that. He needs this test because his subsequent code treats `cur` as a byte; but that's the only reason for the test, I think. If you code for `cur` to be anything from 0 to 65535, then the test can go.

AakashM 2010-08-18 19:32:19

@AakashM Yeah, looking at the code again he's definitely counting on it being a byte and then using bit wise operations to group 3 8 bit grouping in to 4 6 bit groupings. So it looks like this code only works for the limited character set. I don't even think you can manipulate javascript bits directly (besides the bitwise operations he's using). Perhaps this is the limitation. I need to look for some other examples.

Rob 2010-08-18 20:30:01

ansaurus

tags:

views:

answers:

base64 encoding: input character

related questions