views:

165

answers:

4

What is the right way to forming in-memory table in python with direct lookups for rows and columns.
I thought of using dict of dicts this way,

class Table(dict):
    def __getitem__(self, key):
        if key not in self:
             self[key]={}
        return dict.__getitem__(self, key)
table = Table()
table['row1']['column1'] = 'value11'
table['row1']['column2'] = 'value12'
table['row2']['column1'] = 'value21'
table['row2']['column2'] = 'value22'
>>>table
{'row1':{'column1':'value11','column2':'value12'},'row2':{'column1':'value21','column2':'value22'}}

I had difficulty in looking up for values in columns.

>>>'row1' in table
True
>>>'value11' in table['row1'].values()
True

Now how do I do lookup if 'column1' has 'value11'
Is this method of forming tables wrong?
Is there a better way to implement such tables with easier lookups?.
Thanks

+7  A: 

I'd use an in-memory database with SQLite for this. The sqlite module is even in the standard library since Python 2.5, which means this doesn't even add much to your requirements.

keturn
What If I had thousands of rows. Is sqlite better than using lists or dictionaries?
asdfg
I agree with Keturn, sqlite is the way to go, and again it does not look pythonistic to try build a table in python.
gath
How is this superior or better than using nested types. The table I use is constantly changing. Aren't too many db writes costly?. Thanx for the answer.
asdfg
+1 for the sqlite :memory: recommendation. I've used it to 'mock' a database while testing a web app and it performs very well.
Noufal Ibrahim
If by "the table I use is constantly changing", you mean you are constantly changing the number of *columns*, then SQLite is perhaps not the right choice, because it doesn't handle that very well. But a few thousand rows isn't terribly many for SQLite, it's built for just this sort of thing, and it's a very widely deployed and tested way of doing it.
keturn
A: 

A nested list should be able to do the job here. I would only use nested dictionaries if elements are spread thin across the grid.

grid = []
for row in height:
  grid.append([])
    for cell in width:
      grid[-1].append(value)

Checking rows is easy:

def valueInRow(value, row):
  return value in grid[row]

Checking collumns takes a little more work, because the grid is a list of rows, not a list of collumns:

def collumnIterator(collumn):
  height = len(grid)
  for row in xrange(height):
    yield grid[row][collumn]

def valueInCollumn(value, collumn):
  return value in collumnIterator(collumn)
Pieter Witvoet
The rows are not just integers and the height is not fixed. Also, I can iterate over the row dicts to check values in columns. I just wanted to know if this can be done avoiding the iteration
asdfg
A: 

Now how do I do lookup if 'column1' has 'value11'

Are you asking about this?

found= False
for r in table:
    if table[r]['column1'] == 'value11'
        found= True
        break

Is this what you're trying to do?

S.Lott
Yes but I wanted to do it iterating over the table.. just like 'row' in 'table'
asdfg
"iterating over table"? What can that possibly mean?
S.Lott
+7  A: 

Now how do I do lookup if 'column1' has 'value11'

any(arow['column1'] == 'value11' for arow in table.iteritems())

Is this method of forming tables wrong?

No, it's just very "exposed", perhaps too much -- it could usefully be encapsulated in a class which exposes the methods you need, then the issue of how best to implement them does not affect all the rest of your application.

Is there a better way to implement such tables with easier lookups?

Once you have designed a class whose interface you'd like to use, you can experiment with very different implementation approaches and benchmark them on a workload that's representative of your usage pattern, so you can find out what's best for you (assuming table manipulation and lookup are a big part of your application's runtime, of course -- to find out, profile your app).

I had similar but not identical needs in a large internal app I maintain at work, except that the row indices are integer (only the column names are strings), the column order is important, and the workload is more about "editing" the table (adding, removing, reordering rows or columns, renaming columns, etc). I started with a table exposing the functionality I needed, with the simplest rough-and-ready implementation internally (a list of dicts, plus a list of column names for the column ordering); and by now I have evolved it (independently of the actual "application-level" parts, but based on profiling and benchmarking thereof) to completely different implementations (currently based on numpy).

I think you should proceed along similar lines: "clothe" your current implementation into a nice "interface" with all the methods you need, profile your app -- unless this table object is a performance bottleneck, you're done; if it is a bottleneck, you can optimize the implementation (experiment, measure, repeat;-) without disturbing any of the rest of your application.

Inheriting from dict is not a good idea because you probably don't want to expose all of dict's rich functionality; plus, what you're doing is, roughly, an inefficient implementation of collections.defaultdict(dict). So, encapsulate the latter:

import collections

class Table(object):
    def __init__(self):
        self.d = collections.defaultdict(dict)
    def add(self, row, col, val):
        self.d[row][col] = val
    def get(self, row, col, default=None):
        return self.d[row].get(col, default)
    def inrow(self, row, col):
        return col in self.d[row]
    def incol(self, col, val):
        return any(x[col]==val for x in self.d.iteritems())

etc, etc -- write all the methods your app needs, with useful, short names, then maybe see if you can alias some of them as special methods if they're often used that way, e.g maybe (assuming Python 2.* -- requires slightly different syntax in 3.*):

    def __setitem__(self, (row, col), val):
        self.add(row, col, val)

and so forth. Once you have the code working, then comes the right time for profiling, benchmarking, and -- just perhaps -- internal optimization of the implementation.

Alex Martelli
Thanks for the guidelines
asdfg