ansaurus

Question

What is the fastest (to access) struct-like object in Python?

Answer 1

+1 A:

I would be tempted to either (a) invent some kind of workload specific caching, and offload the storage and retrieval of my data to a memcachedb-like process, to improve scalability rather than performance alone or (b) rewrite as a C extension, with native data storage. An ordered-dictionary type perhaps.

You could start with this: http://www.xs4all.nl/~anthon/Python/ordereddict/

Warren P 2010-04-15 14:52:18

Answer 2

+1 A:

You can make your classes sequence like by adding __iter__, and __getitem__ methods, to make them sequence like (indexable and iterable.)

Would an OrderedDict work? There are several implementations available, and it is included in the Python31 collections module.

mikerobi 2010-04-15 15:21:23

Answer 3

+2 A:

A couple points and ideas:

1) You're timing accessing the same index many times in a row. Your actual program probably uses random or linear access, which will have different behavior. In particular, there will be more CPU cache misses. You might get slightly different results using your actual program.

2) OrderedDictionary is written as a wrapper around dict, ergo it will be slower than dict. That's a non-solution.

3) Have you tried both new-style and old-style classes? (new-style classes inherit from object; old-style classes do not)

4) Have you tried using psyco or Unladen Swallow?

5) Does your inner loop to modify the data or just access it? It might be possible to transform the data into the most efficient possible form before entering the loop, but use the most convenient form elsewhere in the program.

Daniel Stutzbach 2010-04-15 16:59:20

Answer 4

+1 A:

One thing to bear in mind is that namedtuples are optimised for access as tuples. If you change your accessor to be a[2] instead of a.c, you'll see similar performance to the tuples. The reason is that the name accessors are effectively translating into calls to self[idx], so pay both the indexing and the name lookup price.

If your usage pattern is such that access by name is common, but access as tuple isn't, you could write a quick equivalent to namedtuple that does things the opposite way: defers index lookups to access by-name. However, you'll pay the price on the index lookups then. Eg here's a quick implementation:

def makestruct(name, fields):
    fields = fields.split()
    template="""\
class {name}(object):
    __slots__={fields!r}
    def __init__(self, {args}):{self_fields} = {args}
    def __getitem__(self, idx): return getattr(self, fields[idx])
""".format(name=name, fields=fields, 
       args=','.join(fields), 
       self_fields=','.join('self.'+f for f in fields))
    d = {'fields':fields}
    exec template in d
    return d[name]

But the timings are very bad when __getitem__ must be called:

namedtuple.a  :  0.473686933517 
namedtuple[0] :  0.180409193039
struct.a      :  0.180846214294
struct[0]     :  1.32191514969

ie, the same performance as a __slots__ class for attribute access (unsurprisingly - that's what it is), but huge penalties due to the double lookup in index-based accesses. (Noteworthy is that __slots__ doesn't actually help much speed-wise. It saves memory, but the access time is about the same without them.)

One third option would be to duplicate the data, eg. subclass from list and store the values both in the attributes and listdata. However you don't actually get list-equivalent performance. There's a big speed hit just in having subclassed (bringing in checks for pure-python overloads). Thus struct[0] still takes around 0.5s (compared with 0.18 for raw list) in this case, and you do double the memory usage, so this may not be worth it.

Brian 2010-04-15 19:04:20

ansaurus

tags:

views:

answers:

What is the fastest (to access) struct-like object in Python?

related questions