ansaurus

Question

Python list vs. MySQL Select performance

Answer 1

A:

Read all the data into Python (from the numbers you mention it should handily fit in memory), say into a variable pylist as you say, then prep an auxiliary data structure as follows:

import collections
d = collections.defaultdict(list)
for text, number in pylist:
  d[number].append(text)

Now, to get all texts for numbers between low included and high excluded,

def slidingwindow(d, low, high):
    result = []
    for x in xrange(low, high):
        result.extend(d.get(x, ()))
    return result

Alex Martelli 2009-09-04 18:51:39

Answer 2

A:

It is difficult to answer without actual performance, but my gut feeling is that it would be better to go for the SQL with bind variables (I am not MySQL expert, but in this case query syntax should be something like %varname).

The reason is that you would return data only when needed (thus user interface would be responsive much in advance) and you would rely on a system highly optimized for that kind of operation. On the other hand, retrieving a larger chunk of data i usually faster than retrieving smaller ones, so the "full python" approach could have its edge.

However, unless you have serious performance issues, I would still stick in using SQL, because it would lead to much simpler code, to read and understand.

Roberto Liffredo 2009-09-04 19:34:22

Answer 3

+1 A:

Simply define an index over number in your database, then the database can generate the result sets instantly. Plus it can do some calculations on these sets too, if that is your next step.

Databases are actually great at such queries, I'd let it do its job before trying something else.

THC4k 2009-09-04 20:34:17

Thanks, I didn't know about index in MySQL so I learned that bit from your post.

greye 2009-09-04 21:35:42

Answer 4

+1 A:

It's certainly going to be much faster to pull the data into memory than run ~15,000 queries.

My advice is to make sure the SQL query sorts the data by number. If the data is sorted, you can use the very fast lookup methods in the bisect standard library module to find indexes.

Triptych 2009-09-04 20:39:33

ansaurus

tags:

views:

answers:

Python list vs. MySQL Select performance

related questions