views:

76

answers:

2

I have a Python list with a number of entries, which I need to downsample using either:

  • A maximum number of rows. For example, limiting a list of 1234 entries to 1000.
  • A proportion of the original rows. For example, making the list 1/3 its original length.

(I need to be able to do both ways, but only one is used at a time).

I believe that for the maximum number of rows I can just calculate the proportion needed and pass that to the proportional downsizer:

def downsample_to_max(self, rows, max_rows):
        return downsample_to_proportion(rows, max_rows / float(len(rows)))

...so I really only need one downsampling function. Any hints, please?

EDIT: The list contains objects, not numeric values so I do not need to interpolate. Dropping objects is fine.

SOLUTION:

def downsample_to_proportion(self, rows, proportion):

    counter = 0.0
    last_counter = None
    results = []

    for row in rows:

        counter += proportion

        if int(counter) != last_counter:
            results.append(row)
            last_counter = int(counter)

    return results

Thanks.

A: 

Keep a counter, which you increment by the second value. Floor it each time, and yield the value at that index.

Ignacio Vazquez-Abrams
Please can you elaborate a little? Thanks.
Dave
Start with a counter at 0. While the counter is less than the length of the list: yield the element of the list whose index is the value of the counter, floored, then increment the counter.
Ignacio Vazquez-Abrams
+1  A: 

You can use islice from itertools:

from itertools import islice

def downsample_to_proportion(rows, proportion=1):
    return list(islice(rows, 0, len(rows), int(1/proportion)))

Usage:

x = range(1,10)
print downsample_to_proportion(x, 0.3)
# [1, 4, 7]
tzaman