views:

142

answers:

4

Regardless of ease of use, which is more computationally efficient? Constantly slicing lists and appending to them? Or taking substrings and doing the same?

As an example, let's say I have two binary strings "11011" and "01001". If I represent these as lists, I'll be choosing a random "slice" point. Let's say I get 3. I'll Take the first 3 characters of the first string and the remaining characters of the second string (so I'd have to slice both) and create a new string out of it.

Would this be more efficiently done by cutting the substrings or by representing it as a list ( [1, 1, 0, 1, 1] ) rather than a string?

+4  A: 

In general, modifying lists is more efficient than modifying strings, because strings are immutable.

Hank Gay
but you'll have to convert the strings to lists!
SilentGhost
As always, you need to profile to see which is fastest in a specific case, hence the "In general". At a guess, I'd say getting a list of the characters in a string is a fairly optimized process, but that's just a guess; if you need to *know*, you need to profile.
Hank Gay
+7  A: 
>>> a = "11011"
>>> b = "01001"
>>> import timeit
>>> def strslice():
    return a[:3] + b[3:]

>>> def lstslice():
    return list(a)[:3] + list(b)[3:]
>>> c = list(a)
>>> d = list(b)
>>> def lsts():
    return c[:3] + d[3:]

>>> timeit.timeit(strslice)
0.5103488475836432
>>> timeit.timeit(lstslice)
2.4350100538824613
>>> timeit.timeit(lsts)
1.0648406858527295
SilentGhost
+5  A: 

timeit is a good tool for micro-benchmarking, but it needs to be used with the utmost care when the operations you want to compare may involve in-place alterations -- in this case, you need to include extra operations designed to make needed copies. Then, first time just the "extra" overhead:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.01 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.06 usec per loop

So making the two brand-new lists we need (to avoid alteration) costs a tad more than 5 microseconds (when focused on small differences, run things at least 2-3 times to eyeball the uncertainty range). After which:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.5 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.47 usec per loop

string slicing and concatenation in this case can be seen to cost another 410-490 nanoseconds. And:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop

in-place list splicing can be seen to cost 930-980 nanoseconds. The difference is safely above the noise/uncertainty levels, so you can reliably state that for this use case working with strings is going to take roughly half as much time as working in-place with lists. Of course, it's also crucial to measure a range of use cases that are relevant and representative of your typical bottleneck tasks!

Alex Martelli
A: 

It really depends on actual use cases, and as others have said, profile it, but in general, appending to lists will be better, because it can be done in place, whereas "appending to strings" actually creates a new string that concatenates the old strings. This can rapidly eat up memory. (Which is a different issue from computational efficiency, really).

Edit: If you want computational efficiency with binary values, don't use strings or lists. Use integers and bitwise operations. With recent versions of python, you can use binary representations when you need them:

>>> bin(42)
'0b101010'
>>> 0b101010
42
>>> int('101010')
101010
>>> int('101010', 2)
42
>>> int('0b101010')
...
ValueError: invalid literal for int() with base 10: '0b101010'
>>> int('0b101010', 2)
42

Edit 2:

def strslice(a, b):
    return a[:3] + b[3:]

might be better written something like:

def binspice(a, b):
    mask = 0b11100
    return (a & mask) + (b & ~mask)

>>> a = 0b11011
>>> b = 0b1001
>>> bin(binsplice(a, b))
'0b11001
>>> 

It might need to be modified if your binary numbers are different sizes.

jcdyer