The decode
method of unicode strings really doesn't have any applications at all (unless you have some non-text data in a unicode string for some reason -- see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.
unicode().decode()
will perform an implicit encoding of s
using the default (ascii) codec. Verify this like so:
>>> s = u'ö'
>>> s.decode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0:
ordinal not in range(128)
For str().encode()
it's the other way around -- it attempts an implicit decoding of s
with the default encoding:
>>> s = 'ö'
>>> s.decode('utf-8')
u'\xf6'
>>> s.encode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)
Used like this, str().encode()
is also superfluous.
But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:
>>> s.encode('zip')
'x\x9c;\xbc\r\x00\x02>\x01z'
You are right, though: the ambiguous usage of "encoding" for both these applications is... awkard. Again, with separate byte
and string
types in Python 3, this is no longer an issue.