Now how do I do lookup if 'column1'
has 'value11'
any(arow['column1'] == 'value11' for arow in table.iteritems())
Is this method of forming tables
wrong?
No, it's just very "exposed", perhaps too much -- it could usefully be encapsulated in a class which exposes the methods you need, then the issue of how best to implement them does not affect all the rest of your application.
Is there a better way to implement
such tables with easier lookups?
Once you have designed a class whose interface you'd like to use, you can experiment with very different implementation approaches and benchmark them on a workload that's representative of your usage pattern, so you can find out what's best for you (assuming table manipulation and lookup are a big part of your application's runtime, of course -- to find out, profile your app).
I had similar but not identical needs in a large internal app I maintain at work, except that the row indices are integer (only the column names are strings), the column order is important, and the workload is more about "editing" the table (adding, removing, reordering rows or columns, renaming columns, etc). I started with a table exposing the functionality I needed, with the simplest rough-and-ready implementation internally (a list of dicts, plus a list of column names for the column ordering); and by now I have evolved it (independently of the actual "application-level" parts, but based on profiling and benchmarking thereof) to completely different implementations (currently based on numpy
).
I think you should proceed along similar lines: "clothe" your current implementation into a nice "interface" with all the methods you need, profile your app -- unless this table object is a performance bottleneck, you're done; if it is a bottleneck, you can optimize the implementation (experiment, measure, repeat;-) without disturbing any of the rest of your application.
Inheriting from dict
is not a good idea because you probably don't want to expose all of dict
's rich functionality; plus, what you're doing is, roughly, an inefficient implementation of collections.defaultdict(dict)
. So, encapsulate the latter:
import collections
class Table(object):
def __init__(self):
self.d = collections.defaultdict(dict)
def add(self, row, col, val):
self.d[row][col] = val
def get(self, row, col, default=None):
return self.d[row].get(col, default)
def inrow(self, row, col):
return col in self.d[row]
def incol(self, col, val):
return any(x[col]==val for x in self.d.iteritems())
etc, etc -- write all the methods your app needs, with useful, short names, then maybe see if you can alias some of them as special methods if they're often used that way, e.g maybe (assuming Python 2.* -- requires slightly different syntax in 3.*):
def __setitem__(self, (row, col), val):
self.add(row, col, val)
and so forth. Once you have the code working, then comes the right time for profiling, benchmarking, and -- just perhaps -- internal optimization of the implementation.