Is there any reason to prefer unicode(somestring, 'utf8')
as opposed to somestring.decode('utf8')
?
My only thought is that .decode() is a bound method so python may be able to resolve it more efficiently, but correct me if I'm wrong.
Is there any reason to prefer unicode(somestring, 'utf8')
as opposed to somestring.decode('utf8')
?
My only thought is that .decode() is a bound method so python may be able to resolve it more efficiently, but correct me if I'm wrong.
It's easy to benchmark it:
>>> from timeit import Timer
>>> ts = Timer("s.decode('utf-8')", "s = 'ééé'")
>>> ts.timeit()
8.9185450077056885
>>> tu = Timer("unicode(s, 'utf-8')", "s = 'ééé'")
>>> tu.timeit()
2.7656929492950439
>>>
Obviously, unicode() is faster.
FWIW, I don't know where you get the impression that methods would be faster - it's quite the contrary.
I'd prefer 'something'.decode(...)
since the unicode
type is no longer there in Python 3.0, while text = b'binarydata'.decode(encoding)
is still valid.