ansaurus

Question

In python, what is more efficient? Modifying lists or strings?

Answer 1

+4 A:

In general, modifying lists is more efficient than modifying strings, because strings are immutable.

Hank Gay 2010-02-12 16:25:19

but you'll have to convert the strings to lists!

SilentGhost 2010-02-12 16:29:18

As always, you need to profile to see which is fastest in a specific case, hence the "In general". At a guess, I'd say getting a list of the characters in a string is a fairly optimized process, but that's just a guess; if you need to *know*, you need to profile.

Hank Gay 2010-02-12 16:35:12

Answer 2

+7 A:

>>> a = "11011"
>>> b = "01001"
>>> import timeit
>>> def strslice():
    return a[:3] + b[3:]

>>> def lstslice():
    return list(a)[:3] + list(b)[3:]
>>> c = list(a)
>>> d = list(b)
>>> def lsts():
    return c[:3] + d[3:]

>>> timeit.timeit(strslice)
0.5103488475836432
>>> timeit.timeit(lstslice)
2.4350100538824613
>>> timeit.timeit(lsts)
1.0648406858527295

SilentGhost 2010-02-12 16:29:01

Answer 3

+5 A:

timeit is a good tool for micro-benchmarking, but it needs to be used with the utmost care when the operations you want to compare may involve in-place alterations -- in this case, you need to include extra operations designed to make needed copies. Then, first time just the "extra" overhead:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.01 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.06 usec per loop

So making the two brand-new lists we need (to avoid alteration) costs a tad more than 5 microseconds (when focused on small differences, run things at least 2-3 times to eyeball the uncertainty range). After which:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.5 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.47 usec per loop

string slicing and concatenation in this case can be seen to cost another 410-490 nanoseconds. And:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop

in-place list splicing can be seen to cost 930-980 nanoseconds. The difference is safely above the noise/uncertainty levels, so you can reliably state that for this use case working with strings is going to take roughly half as much time as working in-place with lists. Of course, it's also crucial to measure a range of use cases that are relevant and representative of your typical bottleneck tasks!

Alex Martelli 2010-02-12 16:40:23

Answer 4

A:

It really depends on actual use cases, and as others have said, profile it, but in general, appending to lists will be better, because it can be done in place, whereas "appending to strings" actually creates a new string that concatenates the old strings. This can rapidly eat up memory. (Which is a different issue from computational efficiency, really).

Edit: If you want computational efficiency with binary values, don't use strings or lists. Use integers and bitwise operations. With recent versions of python, you can use binary representations when you need them:

>>> bin(42)
'0b101010'
>>> 0b101010
42
>>> int('101010')
101010
>>> int('101010', 2)
42
>>> int('0b101010')
...
ValueError: invalid literal for int() with base 10: '0b101010'
>>> int('0b101010', 2)
42

Edit 2:

def strslice(a, b):
    return a[:3] + b[3:]

might be better written something like:

def binspice(a, b):
    mask = 0b11100
    return (a & mask) + (b & ~mask)

>>> a = 0b11011
>>> b = 0b1001
>>> bin(binsplice(a, b))
'0b11001
>>>

It might need to be modified if your binary numbers are different sizes.

jcdyer 2010-02-12 18:28:08

ansaurus

tags:

views:

answers:

In python, what is more efficient? Modifying lists or strings?

related questions