tags:

views:

2395

answers:

2

Is there any reason to prefer unicode(somestring, 'utf8') as opposed to somestring.decode('utf8')?

My only thought is that .decode() is a bound method so python may be able to resolve it more efficiently, but correct me if I'm wrong.

+4  A: 

It's easy to benchmark it:

>>> from timeit import Timer
>>> ts = Timer("s.decode('utf-8')", "s = 'ééé'")
>>> ts.timeit()
8.9185450077056885
>>> tu = Timer("unicode(s, 'utf-8')", "s = 'ééé'") 
>>> tu.timeit()
2.7656929492950439
>>>

Obviously, unicode() is faster.

FWIW, I don't know where you get the impression that methods would be faster - it's quite the contrary.

bruno desthuilliers
Fixed the example output.
J.F. Sebastian
Python25: 3.0 vs. 0.9; Python26: 2.6 vs. 0.6 that is `unicode()` is about 4 time faster than `s.decode()`
J.F. Sebastian
+5  A: 

I'd prefer 'something'.decode(...) since the unicode type is no longer there in Python 3.0, while text = b'binarydata'.decode(encoding) is still valid.

dF