views:

290

answers:

8

I've got a huge tuple of strings, which are being returned from a program. An example tuple being returned might look like this:

('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')

I can convert these strings to real tuples (with integers inside), but i am hoping someone knows a nice trick to speed this up. Anything i've come up with feels like i am doing it a relatively "slow" way. And as i have mentioned, these lists can be big, so a fast way would be much appreciated!

Thanks

edit one Alright, so its seeming that eval is a slower method of doing this. But so far i've got 4 methods tested, thanks for any comments and submissions! :)

Also, someone asked on the size of my tuples. It will range anywhere from a few, to hopefully no more than a few million. Not "too" big, but big enough that speed is an important factor. I'm not here to micro-optimize, just learn any new nifty tricks i might not be aware of. Eg, eval() is something i often forget about, even though it doesn't seem to do so well in this case.

edit two I also wanted to note that the string format shouldn't change. So no need to check the format. Also, this is an embedded Python v2.6.2, so anything requiring 2.6 is fine. 3.0 on the other hand, not so much ;)

Looking great guys, again, thanks for all the input :)

edit 3 Yet another note. I noticed i had been returning code that didn't result in a "tuple", this is ok, and sorry if anyone thought the end result "had" to be a tuple. Something of like format is fine.

import timeit

test_tuple = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)', '(7,0)',)

def timeit_a():
    ''''''
    def convert_tup_strings(tup_string):
        first_int, last_int = tup_string[1:-1].split(',')
        return (int(first_int), int(last_int))

    return map(convert_tup_strings, test_tuple)

def timeit_a_1():
    ''''''
    def convert_tup_strings(tup_string):
        return map(int, tup_string[1:-1].split(','))

    return map(convert_tup_strings, test_tuple)

def timeit_b():
    converted = []

    for tup_string in test_tuple:
        first_int, last_int = tup_string[1:-1].split(',')
        converted.append((int(first_int), int(last_int)))

    return converted

def timeit_b_1():
    converted = []

    for tup_string in test_tuple:
        converted.append(map(int, tup_string[1:-1].split(',')))

    return converted

def timeit_c():
    ''''''
    return [eval(t) for t in test_tuple]

def timeit_d():
    ''''''
    return map(eval, test_tuple)

def timeit_e():
    ''''''
    return map(lambda a: tuple(map(int, a[1:-1].split(','))), test_tuple)

print 'Timeit timeit_a: %s' % timeit.timeit(timeit_a)
print 'Timeit timeit_a_1: %s' % timeit.timeit(timeit_a_1)
print 'Timeit timeit_b: %s' % timeit.timeit(timeit_b)
print 'Timeit timeit_b_1: %s' % timeit.timeit(timeit_b_1)
print 'Timeit timeit_c: %s' % timeit.timeit(timeit_c)
print 'Timeit timeit_d: %s' % timeit.timeit(timeit_d)
print 'Timeit timeit_e: %s' % timeit.timeit(timeit_e)

Results in:

Timeit timeit_a: 15.8954099772
Timeit timeit_a_1: 18.5484214589
Timeit timeit_b: 15.3137666465
Timeit timeit_b_1: 17.8405181116
Timeit timeit_c: 91.9587832802
Timeit timeit_d: 89.8858157489
Timeit timeit_e: 20.1564312947
+1  A: 

If you're sure the input is well formed:

tuples = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')
result = [eval(t) for t in tuples]
RexE
+1. In this case you must be sure, that the `tuples` string comes from a trusted source. Otherwise it may be a security threat.
Andrey Vlasovskikh
-1 for not using `map`. Gotta use those builtins. :)
Jed Smith
+1  A: 

You can get a parser up and running pretty quickly with YAPPS.

Azeem.Butt
I'll have to look into YAPPS, never even knew it existed. Thanks for the info :)
Lee Olayvar
+3  A: 
map(eval, tuples)

This won't account for the case where one of the tuples isn't syntactically correct. For that, I'd recommend something like:

def do(tup):
    try: return eval(tup)
    except: return None

map(do, tuples)

Both methods tested for speed:

>>> tuples = ["(1,0)"] * 1000000

>>> # map eval
>>> st = time.time(); parsed = map(eval, tuples); print "%.2f s" % (time.time() - st)
16.02 s

>>> # map do
>>> >>> st = time.time(); parsed = map(do, tuples); print "%.2f s" % (time.time() - st)
18.46 s

For 1,000,000 tuples that's not bad (but isn't great either). The overhead, presumably, is in parsing Python one million times by using eval. However, it is the easiest way to do what you're after.

The answer using list comprehension instead of map is about as slow as my try/except case (interesting in itself):

>>> st = time.time(); parsed = [eval(t) for t in tuples]; print "%.2f s" % (time.time() - st)
18.13 s

All that being said, I'm going to venture premature optimization is at work here -- parsing strings is always slow. How many tuples are you expecting?

Jed Smith
+1  A: 

you can just use yaml or json to parse it into tuples for you.

Autoplectic
He'd have to add `{}` to the string for `json` to work (and limit himself to installing simplejson as an egg or requiring Python 2.6 for his application).
Jed Smith
+1. To speed this up, you can switch to `cjson` that is terribly fast.
Andrey Vlasovskikh
+2  A: 

I'd do string parsing if you know the format. Faster than eval().

>>> tuples = ["(1,0)"] * 1000000
>>> import time
>>> st = time.time(); parsed = map(eval, tuples); print "%.2f s" % (time.time() - st)
32.71 s
>>> def parse(s) :
...   return s[1:-1].split(",")
...
>>> parse("(1,0)")
['1', '0']
>>> st = time.time(); parsed = map(parse, tuples); print "%.2f s" % (time.time() - st)
5.05 s

if you need ints

>>> def parse(s) :
...   return map(int, s[1:-1].split(","))
...
>>> parse("(1,0)")
[1, 0]
>>> st = time.time(); parsed = map(parse, tuples); print "%.2f s" % (time.time() - st)
9.62 s
Paul Tarjan
Hehe, you did exactly what i did, except you sped it up with a map to int. Love it, i'll be adding that to the list :)
Lee Olayvar
+10  A: 

I don't advice you to use eval at all. It is slow and insecure. You can do this:

result = map(lambda a: tuple(map(int, a[1:-1].split(','))), s)

The numbers speak for themselves:

timeit.Timer("map(lambda a: tuple(map(int, a[1:-1].split(','))), s)", "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)

1.8787779808044434

timeit.Timer("map(eval, s)", "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)

11.571426868438721
Nadia Alramli
While your method is definitely faster, your proportions seem off due to timeit's overhead. Your method takes 6.08 seconds against my million-tuple list, and eval takes 16.02 seconds, so the difference in methodology is important here.
Jed Smith
@Jed, I repeated the test several times before posting the numbers. But the difference in the proportions might be because of the difference in the target list length.
Nadia Alramli
I'd suspect the same, and I feel like operating on a big list of length 'n' is better for a question like this instead of repeating the test on a small list 'n' times
Jed Smith
Good point, perhaps i should redo the test makeup?Though i would prefer something middle ground between handling huge lists well, and tiny lists well. The average list size will probably be around a few thousand, though it's hard to say.If something handled million item tuples **far** superior than something else, it may be worth it to put a list size check in there to handle the sizez two different ways.
Lee Olayvar
The trouble with running the test 'n' times on a tiny list is that you're spending a truckload of time initializing the testing framework. The less the testing framework interferes with your numbers, the better.
Jed Smith
True, but shouldn't the overhead of the testing framework be the same for each test? As long as they all bear the same overhead, relatively speaking, the tests should be accurate, no?
Lee Olayvar
+1  A: 

My computer is slower than Nadia's, however this runs faster

>>> timeit.Timer(
    "list((int(a),int(c)) for a,b,c in (x[1:-1].partition(',') for x in s))", 
    "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
3.2250211238861084

than this

>>> timeit.Timer(
    "map(lambda a: tuple(map(int, a[1:-1].split(','))), s)", 
    "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
3.8979239463806152

using a list comprehension is faster still

>>> timeit.Timer(
    "[(int(a),int(c)) for a,b,c in (x[1:-1].partition(',') for x in s)]", 
    "s = ('(-1,0)', '(1,0)', '(2,0)', '(3,0)', '(4,0)', '(5,0)', '(6,0)')").timeit(100000)
2.452484130859375
gnibbler
Your functions (especially the first one) seem to be the fastest of all functions presented here. In case anyone wonders, I'm using python-2.6-9.fc11.x86_64 on an Intel Core 2 Duo E6400.
Cristian Ciupitu
A: 
import ast

list_of_tuples = map(ast.literal_eval, tuple_of_strings)
J.F. Sebastian