ansaurus

Question

unicode() vs. str.decode() for a utf8 encoded byte string (python 2.x)

Answer 1

+4 A:

It's easy to benchmark it:

>>> from timeit import Timer
>>> ts = Timer("s.decode('utf-8')", "s = 'ééé'")
>>> ts.timeit()
8.9185450077056885
>>> tu = Timer("unicode(s, 'utf-8')", "s = 'ééé'") 
>>> tu.timeit()
2.7656929492950439
>>>

Obviously, unicode() is faster.

FWIW, I don't know where you get the impression that methods would be faster - it's quite the contrary.

bruno desthuilliers 2009-01-13 19:32:49

Fixed the example output.

J.F. Sebastian 2009-01-17 00:10:19

Python25: 3.0 vs. 0.9; Python26: 2.6 vs. 0.6 that is `unicode()` is about 4 time faster than `s.decode()`

J.F. Sebastian 2009-01-17 00:12:33

Answer 2

+5 A:

I'd prefer 'something'.decode(...) since the unicode type is no longer there in Python 3.0, while text = b'binarydata'.decode(encoding) is still valid.

dF 2009-01-13 19:36:52

ansaurus

tags:

views:

answers:

unicode() vs. str.decode() for a utf8 encoded byte string (python 2.x)

related questions