ansaurus

Question

Fastest nested loops over a single list (with elements remove or not)

Answer 1

+3 A:

A simple solution that works by sorting the list then using a generator to create groups:

def time_offsets(files, offset):

   files = sorted(files, key=lambda x:x.timestamp)

   group = []   
   timestamp = 0

   for f in files:
      if f.timestamp < timestamp + offset:
         group.append(f)
      else:
         yield group
         timestamp = f.timestamp
         group = [timestamp]
   else:
      yield group

# Now you can do this...
for group in time_offsets(files, 86400):
   print group

And here's a complete script you can run to test:

class File:
   def __init__(self, timestamp):
      self.timestamp = timestamp

   def __repr__(self):
      return "File: <%d>" % self.timestamp

def gen_files(num=100):
   import random
   files = []
   for i in range(num):
      timestamp = random.randint(0,1000000)
      files.append(File(timestamp))

   return files


def time_offsets(files, offset):

   files = sorted(files, key=lambda x:x.timestamp)

   group = []   
   timestamp = 0

   for f in files:
      if f.timestamp < timestamp + offset:
         group.append(f)
      else:
         yield group
         timestamp = f.timestamp
         group = [timestamp]
   else:
      yield group

# Now you can do this to group files by day (assuming timestamp in seconds)
files = gen_files()
for group in time_offsets(files, 86400):
   print group

Triptych 2009-10-16 18:59:33

Thanks! I've tested this, and it works very close with what I need. I have to have a look at generators (not sure what 'yield' does), and lambda functions syntax is still new to me. Thank you for the complete example :)

Alex 2009-10-17 07:22:11

Answer 2

+1 A:

The best solution I can think of is O(n log n).

listA = getListOfFiles()
listB = stableMergesort(listA, lambda el: el.timestamp)
listC = groupAdjacentElementsByTimestampRange(listB, offset)

Note that groupAdjacentElementsByTimestampRange is O(n).

Justice 2009-10-16 19:04:08

yup - pretty much what i did.

Triptych 2009-10-16 19:13:44

Yep, that is exactly what you did!

Justice 2009-10-16 20:20:42

Answer 3

+1 A:

I'm not exactly sure what you are trying to do - it seems to me that the order of the list will affect the groupings, but your existing code can be modified to work like this.

#This is O(n^2)
while lst:
    file_a=lst.pop()
    temp_group = group()
    temp_group.add(file_a)
    while lst
        file_b=lst[-1] 
        if (file_b.timestamp < (file_a.timestamp + offset)):
            temp_group.add(lst.pop())
    groups.add(temp_group)

Does the group have to start at file_a.timestamp?

# This is O(n)
from collections import defaultdict
groups=defaultdict(list)  # This is why you shouldn't use `list` as a variable name
for item in lst:
    groups[item.timestamp/offset].append(item)

Is much simpler way to chop up into groups with similar timestamps

gnibbler 2009-10-16 20:48:08

Thank you for both code snippets! The second one seems very interesting, but I have python 2.6.2 (linux), and apparently I don't have 'defaultdict' available anymore. I will try to adapt the code and see how it goes, because right now I am not sure how it works (I'm new to Python). Thank you!

Alex 2009-10-17 07:17:49

sorry brain f*rt on my part. defaultdict comes from collections. It most certainly is in 2.6

gnibbler 2009-10-17 09:17:11

ansaurus

tags:

views:

answers:

Fastest nested loops over a single list (with elements remove or not)

A simple solution that works by sorting the list then using a generator to create groups:

And here's a complete script you can run to test:

related questions