tags:

views:

3566

answers:

13

I've heard people talking about "base 64 encoding" here and there. What is it used for?

Cheers!

+12  A: 

Base-64 encoding is a way of taking binary data and turning it into text so that it's more easily transmitted in things like e-mail and HTML form data.

http://en.wikipedia.org/wiki/Base64

Brad Wilson
+1  A: 

Mostly, I've seen it used to encode binary data in contexts that can only handle ascii - or a simple - character sets.

Eric Tuttleman
+2  A: 

It's used for converting arbitrary binary data to ASCII text.

For example, e-mail attachments are sent this way.

Can Berk Güder
+11  A: 

It's basically a way of encoding arbitrary binary data in ASCII text. It takes 4 characters per 3 bytes of data, plus potentially a bit of padding at the end.

Essentially each 6 bits of the input is encoded in a 64-character alphabet. The "standard" alphabet uses A-Z, a-z, 0-9 and + and /, with = as a padding character. The are URL-safe variants.

Wikipedia is a reasonably good source of more information.

Jon Skeet
+2  A: 

Some transportation protocols only allow alphanumerical characters to be transmitted. Just imagine a situation where control characters are used to trigger special actions and/or that only supports a limited bit width per character. Base64 transforms any input into an encoding that only uses alphanumeric characters, +, / and the = as a padding character.

Konrad Rudolph
+1  A: 

To expand a bit on what Brad is saying: many transport mechanisms for email and Usenet and other ways of moving data are not "8 bit clean", which means that characters outside the standard ascii character set might be mangled in transit - for instance, 0x0D might be seen as a carriage return, and turned into a carriage return and line feed. Base 64 maps all the binary characters into several standard ascii letters and numbers and punctuation so they won't be mangled this way.

Paul Tomblin
+6  A: 

From http://en.wikipedia.org/wiki/Base64

"The term Base64 refers to a specific MIME content transfer encoding. It is also used as a generic term for any similar encoding scheme that encodes binary data by treating it numerically and translating it into a base 64 representation. The particular choice of base is due to the history of character set encoding: one can choose a set of 64 characters that is both part of the subset common to most encodings, and also printable. This combination leaves the data unlikely to be modified in transit through systems, such as email, which were traditionally not 8-bit clean."

"Base64 can be used in a variety of contexts:

  • Evolution and Thunderbird use Base64 to obfuscate e-mail passwords[1]
  • Base64 can be used to transmit and store text that might otherwise cause delimiter collision
  • Base64 is often used as a quick but insecure shortcut to obscure secrets without incurring the overhead of cryptographic key management
  • Spammers use Base64 to evade basic anti-spamming tools, which often do not decode Base64 and therefore cannot detect keywords in encoded messages.
  • Base64 is used to encode character strings in LDIF files
  • Base64 is sometimes used to embed binary data in an XML file, using a syntax similar to ...... e.g. Firefox's bookmarks.html.
  • Base64 is also used when communicating with government Fiscal Signature printing devices (usually, over serial or parallel ports) to minimize the delay when transferring receipt characters for signing.
  • Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
  • Can be used to embed raw image data into a CSS property such as background-image."
warren
+19  A: 

When you have some binary data that you want to ship across a network, you generally don't do it by just streaming the bits and bytes over the wire in a raw format. Why? because some media are made for streaming text. You never know -- some protocols may interpret your binary data as control characters (like a modem), or your binary data could be screwed up because the underlying protocol might think that you've entered a special character combination (like how FTP translates line endings).

So to get around this, people encode the binary data into characters. Base64 is one of these types of encodings. Why 64? Because you can generally rely on the same 64 characters being present in many character sets, and you can be reasonably confident that your data's going to end up on the other side of the wire uncorrupted.

Dave Markle
That's not really a good reason for using 64. There are a few more than 64 characters which could be safely used. The real reason for using 64 is that there *are* 64 safe characters, but there *aren't* 128 safe characters, and it's helpful to split the binary data up into 6 bit sections.
Jon Skeet
(In theory you could do base-80 encoding or something similar, but it would be significantly harder. Powers of two are natural bases for binary.)
Jon Skeet
+2  A: 

It's a textual encoding of binary data where the resultant text has nothing but letters, numbers and the symbols "+", "/" and "=". It's a convenient way to store/transmit binary data over media that is specifically used for textual data.

But why Base-64? The two alternatives for converting binary data into text that immediately spring to mind are:

  1. Decimal: store the decimal value of each byte as three numbers: 045 112 101 037 etc. where each byte is represented by 3 bytes. The data bloats three-fold.
  2. Hexadecimal: store the bytes as hex pairs: AC 47 0D 1A etc. where each byte is represented by 2 bytes. The data bloats two-fold.

Base-64 maps 3 bytes (8 x 3 = 24 bits) in 4 characters that span 6-bits (6 x 4 = 24 bits). The result looks something like "TWFuIGlzIGRpc3Rpb...". Therefore the bloating is only a mere 4/3 = 1.3333333 times the original.

Ates Goral
+2  A: 

In the early days of computers, when telephone line inter-system communication was not particularly reliable, a quick & dirty method of verifying data integrity was used: "bit parity". In this method, every byte transmitted would have 7-bits of data, and the 8th would be 1 or 0, to force the total number of 1 bits in the byte to be even.

Hence 0x01 would be transmited as 0x81; 0x02 would be 0x82; 0x03 would remain 0x03 etc.

To further this system, when the ASCII character set was defined, only 00-7F were assigned characters. (Still today, all characters set in the range 80-FF are non-standard)

Many routers of the day put the parity check and byte translation into hardware, forcing the computers attached to them to deal strictly with 7-bit data. This force email attachments (and all other data, which is why HTTP & SMTP protocols are text-based), to be convert into a text-only format.

Few of the routers survived into the 90's. I severely doubt any of them are in use today.

James Curran
+1  A: 

I use it in a practical sense when we transfer large binary objects (images) via web services. So when I am testing a C# web service using a python script, the binary object can be recreated with a little magic.

[In python]

import base64

imageAsBytes = base64.b64decode( dataFromWS )

Andrew Cox
A: 

And it's used by some hackers to obscure code running on your site :p

Chris