tags:

views:

739

answers:

9

Simple yes or no question, and I'm 90% sure that it is no... but I'm not sure.

Can a Base64 string contain tabs?

A: 

Convert.FromBase64String() in the .NET framework does not seem to mind them. I believe all whitespace in the string is ignored.

string xxx = "ABCD\tDEFG";   //simulated Base64 encoded string w/added tab
Console.WriteLine(xxx);
byte[] xx = Convert.FromBase64String(xxx); // convert string back to binary
Console.WriteLine(BitConverter.ToString(xx));

output:

ABCD    DEFG
00-10-83-0C-41-46

The relevant clause of RFC-2045 (6:8)

The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances.

James Curran
I always try an leave a comment when I down vote something unless it's pure and utter crap answer/question.
Jason Coco
+2  A: 

Sure. Tab is just ASCII character 9, and that has a base64 representation just like any other integer.

Joel Coehoorn
After converting a string to Base64, it can have tabs?
Jason
The encoded string is whatever the encoder created- you can't just insert a tab (or anything else) without breaking the encoding. A string that has a tab already there can be encoded, and the tab should still be there after decoding.
Joel Coehoorn
+7  A: 

The short answer is no - but Base64 cannot contain carriage returns either.

That is why, if you have multiple lines of Base64, you strip out any carriage returns, line feeds, and anything else that is not in the Base64 alphabet

That includes tabs.

Ian Boyd
+13  A: 

It depends on what you're asking. If you are asking whether or not tabs can be base-64 encoded, then the answer is "yes" since they can be treated the same as any other ASCII character.

However, if you are asking whether or not base-64 output can contain tabs, then the answer is no. The following link is for an article detailing base-64, including which characters are considered valid:

http://en.wikipedia.org/wiki/Base64

Brian
+2  A: 

From wikipedia.com:

The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code. The original specification, RFC 989, additionally used the "*" symbol to delimit encoded but unencrypted data within the output stream.

As you can see, tab characters are not included. However, you can of course encode a tab character into a base64 string.

Judge Maygarden
A: 

Base64 specification (RFC 4648) states in Section 3.3 that any encountered non-alphabet characters should be rejected unless explicitly allowed by another specification:

Implementations MUST reject the encoded data if it contains
characters outside the base alphabet when interpreting base-encoded
data, unless the specification referring to this document explicitly states otherwise. Such specifications may instead state, as MIME does, that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any adjacent carriage return/ line feed (CRLF) characters constitute "non-alphabet characters" and are ignored.

Specs such as PEM (RFC 1421) and MIME (RFC 2045) specify that Base64 strings can be broken up by whitespaces. Per referenced RFC 822, a tab (HTAB) is considered a whitespace character.

So, when Base64 is used in context of either MIME or PEM (and probably other similar specifications), whitespace, including tabs, should be handled (stripped out) while to decoding the encoded content.

ykaganovich
+1  A: 

Haha, as you see from the responses, this is actually not such a simple yes no answer.

A resulting Base64 string after conversion cannot contain a tab character, but It seems to me that you are not asking that, seems to me that you are asking can you represent a string (before conversion) containing a tab in Base64, and the answer to that is yes.

I would add though that really what you should do is make sure that you take care to preserve the encoding of your string, i.e. convert it to an array of bytes with your correct encoding (Unicode, UTF-8 whatever) then convert that array of bytes to base64.

EDIT: A simple test.

private void button2_Click(object sender, EventArgs e)
{
  StringBuilder sb = new StringBuilder();
  string test = "The rain in spain falls \t mainly on the plain";
  sb.AppendLine(test);
  UTF8Encoding enc = new UTF8Encoding();
  byte[] b = enc.GetBytes(test);
  string cvtd = Convert.ToBase64String(b);
  sb.AppendLine(cvtd);
  byte[] c = Convert.FromBase64String(cvtd);
  string backAgain = enc.GetString(c);
  sb.AppendLine(backAgain);
  MessageBox.Show(sb.ToString());
}
Tim Jarvis
A: 

YES!

Base64 is used to encode ANY 8bit value (Decimal 0 to 255) into a string using a set of safe characters. TAB is decimal 9.

Base 64 uses one of the following character sets:

Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

Binary Attachments (eg: email) in text are also encoded using this system.

John
Or, by the other interpretation of the question, "NO", since tab is not one of the 64 (26 upper case + 26 lower case + 10 digits + 2 symbols) "digits" used to express a number in base 64.
dreeves
A: 

It seems that there is lots of confusion here; and surprisingly most answers are of "No" variety. I don't think that is a good canonical answer. The reason for confusion is probably the fact that Base64 is not strictly specified; multiple practical implementations and interpretations exist. You can check out link text for more discussion on this.

In general, however, conforming base64 codecs SHOULD understand linefeeds, as they are mandated by some base64 definitions (76 character segments, then linefeed etc). Because of this, most decoders also allow for indentation whitespace, and quite commonly any whitespace between 4-character "triplets" (so named since they encode 3 bytes).

So there's a good chance that in practice you can use tabs and other white space.

But I would not add tabs myself if generating base64 content sent to a service -- be conservative at what you send, (more) liberal at what you receive.

StaxMan