In a nutshell, the way text and data is handled in Py3k may arguably be the most "breaking" change in the language. By knowing and avoiding,when possible, the situations where some Python 2.6 logic will work differently than in 3.x, we can facilitate the migration when it happens. Yet we should expect that some parts of the 2.6 logic may require special attention and modifications for example to deal with distinct encodings etc.
The idea behind BDFL's suggestion on slide 14 is probably to start "using" the same types which Py3k supports (and only these), namely unicode strings for strings (str
type) and 8-bits byte sequences for "data" (bytes
type).
The term "using" in the previous sentence is used rather loosely since the semantics and associated storage/encoding for these types differs between the 2.6 and 3.x versions. In Python 2.6, the bytes type and the associated literal syntax (b'xyz') simply map to the str type. Therefore
# in Py2.6
>>'mykey' == b'mykey'
True
b'mykey'.__class__
<class 'str'>
# in Py3k
>>>'mykey' == b'mykey'
False
b'mykey'.__class__
<class 'bytes'>
To answer your question [in the remarks below], in 2.6 whether you use b'xyz' or 'xyz', Python understands it as the same and one thing : an str. What is important is that you understand these as [potentially/in-the-future] two distinct types with a distinct purpose:
- str for text-like info, and
- bytes for sequences of octets storing whatever data at hand.
For example, again speaking close to your example/question, in Py3k you'll be able to have a dictionary with two elements which have a similar keys, one with b'mykey' and the other with 'mykey', however under 2.6 this is not possible, since these two keys are really the same; what matters is that you know this kind of things and avoid (or explicitly mark in a special fashion in the code) the situations where the 2.6 code will not work in 3.x.
In Py3k, str is an abstract unicode string, a sequence of unicode code points (characters) and Python deals with converting this to/from its encoded form whatever the encoding might be (as a programmer you do have a say about the encoding but at the time you deal with string operations and such you do not need to worry about these details). In contrast, bytes is a sequence of 8-bits "things" which semantics and encoding are totally left to the programmer.
So, even though Python 2.6 doesn't see a difference, by explicitly using bytes() / b'...' or str() / u'...', you...
- ... prepare yourself and your program to the upcoming types and semantics of Py3k
- ... make it easier for the automatic conversion (2to3 tool or other) of the source code, whereby the b in b'...' will remain and the u of u'...' will be removed (since the only string type will be unicode).
For more info:
Python 2.6 What's new (see PEP 3112 Bytes Literals)
Python 3.0 What's New (see Text Vs. Data Instead Of Unicode Vs. 8-bit
near the top)