views:

157

answers:

2

Is there a common idiom for avoiding pointless slice copying for cases like this:

>>> a = bytearray(b'hello')
>>> b = bytearray(b'goodbye, cruel world.')
>>> a.extend(b[14:20])
>>> a
bytearray(b'hello world')

It seems to me that there is an unnecessary copy happening when the b[14:20] slice is created. Rather than create a new slice in memory to give to extend I want to say "use only this range of the current object".

Some methods will help you out with slice parameters, for example count:

>>> a = bytearray(1000000)       # a million zero bytes
>>> a[0:900000].count(b'\x00')   # expensive temporary slice
900000
>>> a.count(b'\x00', 0, 900000)  # helpful start and end parameters
900000

but many, like extend in my first example, don't have this feature.

I realise that for many applications what I'm talking about would be a micro-optimisation, so before anyone asks - yes, I have profiled my application, and it is something worth worrying about for my case.

I have one 'solution' below, but any better ideas are most welcome.

+2  A: 

Creating a buffer object avoids copying the slice, but for short slices it's more efficient to just make the copy:

>>> a.extend(buffer(b, 14, 6))
>>> a
bytearray(b'hello world')

Here there's only one copy made of the memory, but the cost of creating the buffer object more than obliterates the saving. It should be better for larger slices though. I'm not sure how large the slice would have to be for this method to be more efficient overall.

Note that for Python 3 (and optionally in Python 2.7) you'd need a memoryview object instead:

>>> a.extend(memoryview(b)[14:20])
Scott Griffiths
buffer is a good choice for objects that support the buffer interface. Usually it's not worth special casing for small cases (unless most of your use cases are small) because 50% more than a tiny amount is still a tiny amount
gnibbler
+2  A: 

itertools has islice. islice doesn't have a count method so it is useful in other cases where you wish to avoid copying the slice. As you pointed out - count has a mechanism for that anyway

>>> from itertools import islice
>>> a = bytearray(1000000)
>>> sum(1 for x in islice(a,0,900000) if x==0)
900000
>>> len(filter(b'\x00'.__eq__,islice(a,0,900000)))
900000

>>> a=bytearray(b"hello")
>>> b = bytearray(b'goodbye, cruel world.')
>>> a.extend(islice(b,14,20))
>>> a
bytearray(b'hello world')
gnibbler
`islice` is a nice alternative. I just did some quick tests and it seems about as fast as `buffer` when used with `extend`, however they are both *much* slower than just using a slice, even for half a million elements...
Scott Griffiths