ansaurus

Question

Efficient reordering of coordinate pairs (2-tuples) in a list of pairs in Python

Answer 1

A:

Most recent version:

def zipij7(m=mem, ms=mems):
  cpy = sorted(ms + [m])
  loc = cpy.index(m)

  return zip(ms[:(loc)], repeat(m)) + zip(repeat(m), ms[(loc):])

Benches slightly faster for me than truppo's, slower by 30% than Michal's. (Looking into that now)

I may have found my answer (for now). It seems I forgot about making a list comp version for `zipij()``:

def zipij1(m=mem, ms=mems, f=ij):
  return [f(i, m) for i in ms]

It still relies on my silly ij() helper function, so it doesn't win the award for brevity, certainly, but timings have improved:

# 10000
1.27s
# 50000
6.74s

So it is now my current "winner", and also does not need to generate more than one list, or use a lot of function calls, other than the ij() helper, so I believe it would also be the most efficient.

However, I think this could still be improved... I think that making N ij() function calls (where N is the length of the resultant list) is not needed:

Find at what index mem would fit into mems when ordered
Split mems at that index into two parts
Do zip(part1, repeat(mem))
Add zip(repeat(mem), part2) to it

It'd basically be an improvement on zipij4(), and this avoids N extra function calls, but I am not sure of the speed/memory benefits over the cost of brevity. I will maybe add that version to this answer if I figure it out.

jonwd7 2010-01-28 11:12:40

Exactly how do you measure? You latest solution is slower than the inlined version I suggested. Even with an already sorted list and low "m" it is still slower. With a large "m" and a shuffled list it is 10 times slower.

truppo 2010-01-28 12:29:18

My latest solution? Definitely not.. Both Michal and I are getting twice as fast as the previous best solutions. Check his answer. I am timing differently than he is, but we are still getting the same results. Our latest implementations differ slightly, though. Mine is slightly faster still.

jonwd7 2010-01-28 12:36:10

I've updated my answer now with the exact benchmark I'm running

truppo 2010-01-28 12:45:22

Answer 2

+2 A:

Current version:

(Fastest at the time of posting with Python 2.6.4 on my machine.)

Update 3: Since we're going all out, let's do a binary search -- in a way which doesn't require injecting m into mems:

def binsearch(x, lst):
    low, high = -1, len(lst)
    while low < high:                                                           
        i = (high - low) // 2
        if i > 0:
            i += low
            if lst[i] < x:
                low = i
            else:
                high = i
        else:
            i = high
            high = low
    return i

def zipij(m=mem, ms=mems):
    i = binsearch(m, ms)
    return zip(ms[:i], repeat(m)) + zip(repeat(m), ms[i:])

This runs in 828 µs = 0.828 ms on my machine vs the OP's current solution's 1.14 ms. Input list assumed sorted (and the test case is the usual one, of course).

This binary search implementation returns the index of the first element in the given list which is not smaller than the object being searched for. Thus there's no need to inject m into mems and sort the whole thing (like in the OP's current solution with .index(m)) or walk through the beginning of the list step by step (like I did previously) to find the offset at which it should be divided.

Earlier attempts:

How about this? (Proposed solution next to In [25] below, 2.42 ms to zipij5's 3.13 ms.)

In [24]: timeit zipij5(m = mem, ms = mems)
100 loops, best of 3: 3.13 ms per loop

In [25]: timeit [(i, j) if i < j else (j, i) for (i, j) in zip(mems, repeat(mem))]
100 loops, best of 3: 2.42 ms per loop

In [27]: [(i, j) if i < j else (j, i) for (i, j) in zip(mems, repeat(mem))] == zipij5(m=mem, ms=mems)
Out[27]: True

Update: This appears to be just about exactly as fast as the OP's self-answer. Seems more straighforward, though.

Update 2: An implementation of the OP's proposed simplified solution:

def zipij(m=mem, ms=mems):
    split_at = 0
    for item in ms:
        if item < m:
            split_at += 1
        else:
            break
    return [(item, m) for item in mems[:split_at]] + [(m, item) for item in mems[split_at:]]

In [54]: timeit zipij()
1000 loops, best of 3: 1.15 ms per loop

Also, truppo's solution runs in 1.36 ms on my machine. I guess the above is the fastest so far. Note you need to sort mems before passing them into this function! If you're generating it with range, it is of course already sorted, though.

Michał Marczyk 2010-01-28 11:19:27

Had no idea if/else could be used that way in list comprehensions... *sigh* Thank you, though. I'll be looking into this, but I still want to figure out my proposed method at the bottom of my answer.

jonwd7 2010-01-28 11:33:13

@jonwd7: See my second update. Would that be alright?

Michał Marczyk 2010-01-28 11:44:31

This implementation runs slightly slower than my own, check the top of my answer, I've edited it in. Comment on it if you'd like. I'll see if I can tweak your implementation to best mine.

jonwd7 2010-01-28 12:09:30

Well, I went foraging around my collection of implementations of basic algorithms and came up with this nice binary search. Using it seems to shave off almost 30% of the running time of your zipij7 for a pre-sorted list. Even when having to sort the input list (with a call like `zipij(ms=sorted(mems))`), the timing seems to be lower (although only very slightly). The gains would certainly be more pronounced with a longer input list.

Michał Marczyk 2010-01-28 15:06:50

I myself was wanting to replace my `index()` with a "binary search", but that was hard without knowing the name. I will be comparing this to truppo's, which (for me) is actually benching slower than my own answer, but not by very much. So my current solution is actually truppo's based on brevity. I can make `binsearch()` a common utility function, though, so this new solution is still very brief. :)

jonwd7 2010-01-28 16:11:41

Python allows one to control the behaviour of `<` and similar operators with instances of custom classes through the `__cmp__(self, other)` method. That makes functions such as this `binsearch` quite generic and therefore all the more useful as a library utilities. :-) It's also nice to have a version going the other way (that is, one which finds a lower, rather than an upper bound) -- or roll the two into one function accepting an optional argument to determine the behaviour.

Michał Marczyk 2010-01-28 16:59:24

Answer 3

+2 A:

Why not just inline your ij()-function?

def zipij(m=mem, ms=mems):
  return [(i, m) if i < m else (m, i) for i in ms]

(This runs in 0.64 ms instead of 2.12 ms on my computer)

Some benchmarks:

zipit.py:

from itertools import repeat

mems = range(1, 50000, 2)
mem = 8

def zipij7(m=mem, ms=mems):
  cpy = sorted(ms + [m])
  loc = cpy.index(m)

  return zip(ms[:(loc)], repeat(m)) + zip(repeat(m), ms[(loc):])

def zipinline(m=mem, ms=mems):
  return [(i, m) if i < m else (m, i) for i in ms]

Sorted:

>python -m timeit -s "import zipit" "zipit.zipinline()"
100 loops, best of 3: 4.44 msec per loop

>python -m timeit -s "import zipit" "zipit.zipij7()"
100 loops, best of 3: 4.8 msec per loop

Unsorted:

>python -m timeit -s "import zipit, random; random.shuffle(zipit.mems)" "zipit.zipinline()"
100 loops, best of 3: 4.65 msec per loop

p>python -m timeit -s "import zipit, random; random.shuffle(zipit.mems)" "zipit.zipij7()"
100 loops, best of 3: 17.1 msec per loop

truppo 2010-01-28 11:30:30

OK, I didn't notice the slight difference in your answer, but it still is not running 10x faster than mine and Michal's as you claim... Mine runs in ~0.5s, his runs in ~0.6s, and yours runs in about ~0.7s on mine.

jonwd7 2010-01-28 12:42:31

Also, the list will have already been sorted anyway.. Mine is sorting inside the function and is still faster. I upped the list by 10x, mine runs in ~6s, Michal's in ~7s, and yours in ~8s.

jonwd7 2010-01-28 12:45:27

I will still end up choosing your method, most likely, as I appreciate the brevity and simplicity... But it's still running slower on MY specific machine. I will test on other hardware sometime. :)

jonwd7 2010-01-28 12:52:25

Are you using Python 3.0+ or something (where zip returns an iterator instead of a list)?Have you tried the exact benchmark in this post?

truppo 2010-01-28 12:54:41

Python 2.7a2 currently. Exact bench as yours: Yours = 3.9 msec per loop, mine = 2.94 msec per loop ... Shuffled: Yours = 4.01 msec per loop, mine = 11 msec per loop. But again I'm not arguing the problem with shuffling. But I've already said that sorting of `mems` won't be factoring into my decision as it will be sorted already.

jonwd7 2010-01-28 13:04:11

On Python 2.6.1 on my Mac mine is doing 2.92 msec per loop, and yours is doing 4.57. But indeed, even though the shuffling problem doesn't matter for my use, I can't pass up a one-liner.

jonwd7 2010-01-28 13:17:04

ansaurus

tags:

views:

answers:

Efficient reordering of coordinate pairs (2-tuples) in a list of pairs in Python

Current Best Solutions

Timings

Most recent version:

Current version:

Earlier attempts:

related questions