tags:

views:

1009

answers:

4

Can unicode characters be en/decoded with base64?
I have attempted to encode the following string: الله but when I decoded it all I got was '????'

+2  A: 

Of course they can. Depends on how your language or base64 routine handles unicode input. For example, python's b64 routines expect an encoded string (as base64 encodes binary to text, not unicode codepoints to text)

   
    Python 2.5.1 (r251:54863, Jul 31 2008, 22:53:39)
    [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> a = 'ûñö'
    >>> import base64
    >>> base64.b64encode(a)
    'w7vDscO2'
    >>> base64.b64decode('w7vDscO2')
    '\xc3\xbb\xc3\xb1\xc3\xb6'
    >>> print '\xc3\xbb\xc3\xb1\xc3\xb6'
    ûñö
    >>>     
    >>> u'üñô'
    u'\xfc\xf1\xf4'
    >>> base64.b64encode(u'\xfc\xf1\xf4')
    Traceback (most recent call last):
      File "", line 1, in 
      File "/usr/lib/python2.5/base64.py", line 53, in b64encode
        encoded = binascii.b2a_base64(s)[:-1]
    UnicodeEncodeError: 'ascii' codec can't encode characters in position
    0-2: ordinal not in range(128)
    >>> base64.b64encode(u'\xfc\xf1\xf4'.encode('utf-8'))
    'w7zDscO0'
    >>> base64.b64decode('w7zDscO0')
    '\xc3\xbc\xc3\xb1\xc3\xb4'
    >>> print base64.b64decode('w7zDscO0')
    üñô
    >>> a = 'الله'
    >>> a
    '\xd8\xa7\xd9\x84\xd9\x84\xd9\x87'
    >>> base64.b64encode(a)
    '2KfZhNmE2Yc='
    >>> b = base64.b64encode(a)
    >>> print base64.b64decode(b)
    الله
Vinko Vrsalovic
+1 for examples
I'd just note that the returned string is not a unicode object.it should be decoded as follows:c = base64.b64decode(b).decode('utf-8')
DanJ
+4  A: 

Base64 converts binary to text. If you want to convert text to a base64 format, you'll need to convert the text to binary using some appropriate encoding (e.g. UTF-8, UTF-16) first.

Jon Skeet
+1  A: 

You didn't specify which language(s) you're using, but try converting the string to a byte array (however that's done in your language of choice) and then base64 encoding that byte array.

joel.neely
A: 

In .NET you can try this (encode):

byte[] encbuf;

encbuf = System.Text.Encoding.Unicode.GetBytes(input);
string encoded = Convert.ToBase64String(encbuf);

...and to decode:

byte[] decbuff;

decbuff = Convert.FromBase64String(this.ToString());
string decoded = System.Text.Encoding.Unicode.GetString(decbuff);
Scott Whitlock