ansaurus

Question

Answer 1

+1 A:

I guess filter() is as fast as you can possibly get without having to code the filtering function in C (and in that case, you better code the whole filtering process in C).

Why don't you paste the function you are filtering on? That might lead to easier optimizations.

Read this about optimization in Python. And this about the Python/C API.

Vinko Vrsalovic 2008-10-14 08:39:16

Answer 2

+8 A:

Maybe your lists are too large and do not fit in memory, and you experience thrashing. If the sources are in files, you do not need the whole list in memory all at once. Try using itertools, e.g.:

from itertools import ifilter

def is_important(s):
   return len(s)>10

filtered_list = ifilter(is_important, open('mylist.txt'))

Note that ifilter returns an iterator that is fast and memory efficient.

Generator Tricks is a tutorial by David M. Beazley that teaches some interesting uses for generators.

gimel 2008-10-14 10:18:45

Answer 3

+1 A:

Before doing it in C, you could try numpy. Perhaps you can turn your filtering into number crunching.

Cheery 2008-10-14 10:19:48

Answer 4

+1 A:

Filter will create a new list, so if your original is very big, you could end up using up to twice as much memory. If you only need to process the results iteratively, rather than use it as a real random-access list, you are probably better off using ifilter instead. ie.

for x in itertools.ifilter(condition_func, my_really_big_list):
    do_something_with(x)

Other speed tips are to use a python builtin, rather than a function you write yourself. There's a itertools.ifilterfalse specifically for the case where you would otherwise need to introduce a lambda to negate your check. (eg "ifilter(lambda x: not x.isalpha(), l)" should be written "ifilterfalse(str.isalpha, l)")

Brian 2008-10-14 10:24:46

Answer 5

+1 A:

It may be useful to know that generally a conditional list comprehension is much faster than the corresponding lambda:

>>> import timeit
>>> timeit.Timer('[x for x in xrange(10) if (x**2 % 4) == 1]').timeit()
2.0544309616088867
>>> timeit.f = lambda x: (x**2 % 4) == 1
timeit.Timer('[x for x in xrange(10) if f(x)]').timeit()
>>> 
3.4280929565429688

(Not sure why I needed to put f in the timeit namespace, there. Haven't really used the module much.)

fivebells 2008-10-14 12:49:35

Answer 6

+3 A:

If you can avoid creating the lists in the first place, you'll be happier.

Rather than

aBigList = someListMakingFunction()
filter( lambda x:x>10, aBigList )

You might want to look at your function that makes the list.

def someListMakingGenerator( ):
    for x in some source:
        yield x

Then your filter doesn't involve a giant tract of memory

def myFilter( aGenerator ):
    for x in aGenerator:
        if x > 10: 
            yield x

By using generators, you don't keep much stuff in memory.

S.Lott 2008-10-14 12:50:25

ansaurus

tags:

views:

answers:

fast filter method in python

related questions