Is there any reason to prefer unicode(somestring, 'utf8') as opposed to somestring.decode('utf8')?
My only thought is that .decode() is a bound method so python may be able to resolve it more efficiently, but correct me if I'm wrong.
Is there any reason to prefer unicode(somestring, 'utf8') as opposed to somestring.decode('utf8')?
My only thought is that .decode() is a bound method so python may be able to resolve it more efficiently, but correct me if I'm wrong.
It's easy to benchmark it:
>>> from timeit import Timer
>>> ts = Timer("s.decode('utf-8')", "s = 'ééé'")
>>> ts.timeit()
8.9185450077056885
>>> tu = Timer("unicode(s, 'utf-8')", "s = 'ééé'")
>>> tu.timeit()
2.7656929492950439
>>>
Obviously, unicode() is faster.
FWIW, I don't know where you get the impression that methods would be faster - it's quite the contrary.
I'd prefer 'something'.decode(...) since the unicode type is no longer there in Python 3.0, while text = b'binarydata'.decode(encoding) is still valid.