The ASCII encoding only includes the bytes with values <= 127
. The range of characters represented by these bytes is identical in most encodings; in other words, "A" is chr(65)
in ASCII, in latin-1, in UTF-8, and so on.
The one half symbol, however, is not part of the ASCII character set, so when Python tries to encode this symbol into ASCII, it can do nothing but fail.
Update: Here's what happens (I assume we're talking CPython):
float(u'\xbd')
leads to PyFloat_FromString
in floatobject.c being called. This function, giving a unicode object, in turn calls PyUnicode_EncodeDecimal
in unicodeobject.c being called. From skimming over the code, I get it that this function turns the unicode object into a string by replacing every character with a unicode codepoint <256
with the byte of that value, i.e. the one half character, having the codepoint 189, is turned into chr(89)
.
Then, PyFloat_FromString
does its work as usual. At this moment, it's working with a regular string, which happens to be containing a non-ASCII range byte. It doesn't care about this; it just finds a byte that's not a digit, a period or the like, so it raises the value error.
The argument to this exception is a string
"invalid literal for float(): " + evil_string
That's fine; an exception message is, after all, a string. It's only when you try to decode this string, using the default encoding ASCII, that this turns into a problem.