views:

133

answers:

4

What would be faster ? Query mysql to see if the piece of information i need is there, OR load a python dictionary with all the information then just check if the id is there

If python is faster, then whats the best what to check if the id exists?

Im using python 2.4.3

Im searching for data which is tagged to a square on a board, im searching for x&y. Theres only one entry per square, the information wont change and it needs be recalled several times a second.

Thankyou !

Finished

I worked out it was python. I ran the code bellow, and mysql did it in 0.0003 of a second, but python did it in 0.000006 of a second and mysql had far far less to search and the test was run how the code would be running in real life. Which one had less overhead in terns of CPU and RAM i will never know but if speed is anything to go by python did much better.

And thankyou for your answers !

def speedtest():
 global search
 global data
 qb = time.time()
 search.execute("SELECT * FROM `nogo` where `1`='11' AND `2`='13&3'")
 qa = search.fetchall()
 print qa[0]
 qc =  time.time()
 print "mysql"
 print qb
 print qc
 print qc - qb

 data = {}
 for qa in range(15):
  data[qa] = {}
  for qb in range(300):
   data[qa][str(qb)] = 'nogo'
 qw = 5
 qe = '50'
 qb = time.time()
 print data[qw][qe]
 qc =  time.time()
 print "dictionary"
 print qb
 print qc
 print qc - qb
+1  A: 

Python ought to be much faster, but this largely depends on your specific scenario.

my_dict.has_key('foobar')

You may want to check out Bloom filters as well.

Deniz Dogan
`has_key` is ancient (discouraged since forever in Python 2, removed in Python 3). Use `'foobar' in my_dict`!
delnan
Hardly ancient just because it is removed in Python 3. It doesn't give me any DeprecationWarning in 2.6 and the syntax makes the intent much clearer in my opinion.
Deniz Dogan
http://docs.python.org/library/stdtypes.html#dict.has_key It is definitely deprecated.
DasIch
I know it says so in the docs (for 2.7), but it's hardly ancient. In fact, in the Python 2.4 documentation says "To check whether a single key is in the dictionary, use the has_key() method of the dictionary."
Deniz Dogan
+2  A: 

Generally speaking, if you want information from a database, ask the database for what you need. MySQL (and other database engines) are designed to retrieve data as efficiently as possible.

Trying to write your own procedures for retrieving data is trying to outsmart the talented people who have already imbued MySQL with so much data processing power.

This isn't to say it's never appropriate to load data into Python, but you should be sure that a database query isn't the right way to go first.

VoteyDisciple
Im not try to outsmart the clever people who make MySQL, i know its been perfected. But im wondering if the time it takes to pass information to mysql and pass it back and by the way mysql works because of what it was built for. Would a dictionary work with less overhead because im not trying to search,compare and index 3,000 records.
kizzie33
Presumably you're leaning toward a dictionary because you don't need ALL the data in a particular table, and you're thinking that transferring what you need to Python and building a lookup table will be fast. But so would creating an `INDEX` on just the data you need in MySQL.
VoteyDisciple
Definitely don't transfer data back and forth from MySQL that you don't need. Select only the fields you want matching the criteria you care about.
VoteyDisciple
A: 

In general I think python is faster but: it depends on 1) how big is the table you want to load (if its too big it will not be efficient with python), and 2) how many function calls you are about to execute (so sometimes it is better to load the table to a dict and execute all your queries within one function).

ScienceFriction
A: 

I can't speak for how fast MySQL would be (I lack the knowhow to benchmark it equitably), but Python dict have pretty much optimal performance too and don't require any IO (as opposed to database queries). Assuming (x_pos, y_pos) tuples as keys and a 55 x 55 field (you mentioned 3000 records, 55^2 is roughly 3000).

>>> the_dict = { (x, y) : None for x in range(55) for y in range (55) }
>>> len(the_dict)
3025
>>> import random
>>> xs = [random.randrange(0,110) for _ in range(55)]
>>> ys = [random.randrange(0,110) for _ in range(55)]
>>> import timeit
>>> total_secs = timeit.timeit("for x,y in zip(xs, ys): (x,y) in the_dict",
    setup="from __main__ import xs, ys, the_dict", number=100000)
>>> each_secs = total_secs / 100000
>>> each_secs
1.1723998441142385e-05
>>> each_usecs = 1000000 * each_secs
>>> each_usecs
11.723998441142385
>>> usecs_per_lookup = each_usecs / (55*55)
>>> usecs_per_lookup
0.0038757019640140115

0.004 microseconds(!) per lookup - good luck beating that, DBMS of choice ;) But since you use 2.4, YMMV slightly. Admittedly, tuples of ints hash make very efficient keys (integers (that fit into the hash datatype) hash to themselves, tuples just hash and xor their members). Also, this doesn't say anything about how fast loading the data would be (although you can use the pickle module for efficient serialization). But your question reads like you load the data once and then process it a million times.

delnan
Dang, (way) too late!
delnan