views:

43

answers:

1

Hi All,

I have a such a data structure,

"ID  NAME  BIRTH     AGE    SEX"
=================================
1   Joe    01011980  30     M
2   Rose   12111986  24     F
3   Tom    31121965  35     M
4   Joe    15091990  20     M  

I want to use python + sqlite to store and query data in a easy way. I am in trying to design a dict like object to store and retrieve those information, also the database can be shared with other application in an easy way.(just a plain database table for other application, then the pickle and ySerial like object should not fit for it.)

For example:

d = mysqlitedict.open('student_table')  
 d['1'] = ["Joe","01011980","30","M"]    
 d['2'] = ["Rose","12111986","24","F"]

This can be reasonable because I can use __setitem__() to get ride of that if "ID" as the key and rest part as the value of that dict like object.

The problem is if I want to use other field either as key semantically, takes "NAME" for example:

 d['Joe'] = ["1","01011980","30","M"] 

That will be a problem, because a dict like object should have a key/value pair semantically, as now "ID" is the key, "NAME" can not as overrode key here.

Then my question is, can I design my class then I may do like this?

 d[key="NAME", "Joe"] = ["1","01011980","30","M"] 
 d[key="ID",'1'] = ["Joe","01011980","30","M"]  

 d.update(key = "ID", {'1':["Joe","01011980","30","M"]})

>>>d[key="NAME", 'Joe']
["1","Joe","01011980","30","M"]
["1","Joe","15091990","20","M"]

>>>d.has_key(key="NAME", 'Joe']
True

I will be appreciated for any reply!

KC

A: 

sqlite is a SQL database and works by far best when used as such (wrapped in SQLAlchemy or whatever if you really insist;-).

Syntax such as d[key="NAME", 'Joe'] is simply illegal Python, no matter how much wrapping and huffing and puffing you may do. A simple class wrapper around the DB connection is easy, but it will never give you that syntax -- something like d.fetch('Joe', key='Name') is reasonably easy to achieve, but indexing has very different syntax from function calls, and even in the latter named arguments must come after positional ones.

If you're willing to renounce your ambitious syntax dreams in favor of sensible Python syntax, and need help designing a class to implement the latter, feel free to ask, of course (I'm off to bed pretty soon, but I'm sure other, later-sleepers will be eager to help;-).

Edit: given the OP's clarifications (in a comment), it looks like a set_key method is acceptable to maintain Python-acceptable syntax (though the semantics of course will still be a tad off, since the OP wants a "dict-like" object which may have non unique keys -- no such thing in Python, really... but, we can approximate it a bit, at least).

So, here's a very first sketch (requires Python 2.6 or better -- just because I've used collections.MutableMapping to get other dict-like methods and .format to format strings; if you're stuck in 2.5, %-formatting of strings and UserDict.DictMixin will work instead):

import collections
import sqlite3

class SqliteDict(collections.MutableMapping):
  @classmethod
  def create(cls, path, columns):
    conn = sqlite3.connect(path)
    conn.execute('DROP TABLE IF EXISTS SqliteDict')
    conn.execute('CREATE TABLE SqliteDict ({0})'.format(','.join(columns.split())))
    conn.commit()
    return cls(conn)

  @classmethod
  def open(cls, path):
    conn = sqlite3.connect(path)
    return cls(conn)

  def __init__(self, conn):
    # looks like for sime weird reason you want str, not unicode, when feasible, so...:
    conn.text_factory = sqlite3.OptimizedUnicode
    c = conn.cursor()
    c.execute('SELECT * FROM SqliteDict LIMIT 0')
    self.cols = [x[0] for x in c.description]
    self.conn = conn
    # start with a keyname (==column name) of `ID`
    self.set_key('ID')

  def set_key(self, key):
    self.i = self.cols.index(key)
    self.kn = key

  def __len__(self):
    c = self.conn.cursor()
    c.execute('SELECT COUNT(*) FROM SqliteDict')
    return c.fetchone()[0]

  def __iter__(self):
    c = self.conn.cursor()
    c.execute('SELECT * FROM SqliteDict')
    while True:
      result = c.fetchone()
      if result is None: break
      k = result.pop(self.i)
      return k, result

  def __getitem__(self, k):
    c = self.conn.cursor()
    # print 'doing:', 'SELECT * FROM SqliteDict WHERE {0}=?'.format(self.kn)
    # print ' with:', repr(k)
    c.execute('SELECT * FROM SqliteDict WHERE {0}=?'.format(self.kn), (k,))
    result = [list(r) for r in c.fetchall()]
    # print ' resu:', repr(result)
    for r in result: del r[self.i]
    return result

  def __contains__(self, k):
    c = self.conn.cursor()
    c.execute('SELECT * FROM SqliteDict WHERE {0}=?'.format(self.kn), (k,))
    return c.fetchone() is not None

  def __delitem__(self, k):
    c = self.conn.cursor()
    c.execute('DELETE FROM SqliteDict WHERE {0}=?'.format(self.kn), (k,))
    self.conn.commit()

  def __setitem__(self, k, v):
    r = list(v)
    r.insert(self.i, k)
    if len(r) != len(self.cols):
      raise ValueError, 'len({0}) is {1}, must be {2} instead'.format(r, len(r), len(self.cols))
    c = self.conn.cursor()
    # print 'doing:', 'REPLACE INTO SqliteDict VALUES({0})'.format(','.join(['?']*len(r)))
    # print ' with:', r
    c.execute('REPLACE INTO SqliteDict VALUES({0})'.format(','.join(['?']*len(r))), r)
    self.conn.commit()

  def close(self):
    self.conn.close()


def main():
  d = SqliteDict.create('student_table', 'ID NAME BIRTH AGE SEX')
  d['1'] = ["Joe", "01011980", "30", "M"]    
  d['2'] = ["Rose", "12111986", "24", "F"]
  print len(d), 'items in table created.'
  print d['2']
  print d['1']
  d.close()

  d = SqliteDict.open('student_table')
  d.set_key('NAME')
  print len(d), 'items in table opened.'
  print d['Joe']


if __name__ == '__main__':
  main()

The class is not meant to be instantiated directly (though it's OK to do so by passing an open sqlite3 connection to a DB with an appropriate SqliteDict table) but through the two class methods create (to make a new DB or wipe out an existing one) and open, which seems to match the OP's desires better than the alternative (have __init__ take a DB file path an an option string describing how to open it, just like modules such as gdbm take -- 'r' to open read-only, 'c' to create or wipe out, 'w' to open read-write -- easy to adjust of course). Among the columns passed (as a whitespace-separated string) to create, there must be one named ID (I haven't given much care to raising "the right" errors for any of the many, many user errors that can occur on building and using instances of this class; errors will occur on all incorrect usage, but not necessarily ones obvious to the user).

Once an instance is opened (or created), it behaves as closely to a dict as possible, except that all values set must be lists of exactly the right length, while the values returned are lists of lists (due to the weird "non-unique key" issue). For example, the above code, when run, prints

2 items in table created.
[['Rose', '12111986', '24', 'F']]
[['Joe', '01011980', '30', 'M']]
2 items in table opened.
[['1', '01011980', '30', 'M']]

The "Pythonically absurd" behavior is that d[x] = d[x] will fail -- because the right hand side is a list e.g. with a single item (which is a list of the column values) while the item assignment absolutely requires a list with e.g. four items (the column values). This absurdity is in the OP's requested semantics, and could be altered only by drastically changing such absurd required semantics again (e.g., forcing item assignment to have a list of lists on the RHS, and using executemany in lieu of plain execute).

Non-uniqueness of keys also makes it impossible to guess if d[x] = v, for a key k which corresponds to some number n of table entries, is meant to replace one (and if so, which one?!) or all of those entries, or add another new entry instead. In the code above I've taken the "add another entry" interpretation, but with a SQL statement REPLACE that, should the CREATE TABLE be changed to specify some uniqueness constraints, will change some semantics from "add entry" to "replace entries" if and when uniqueness constraints would otherwise be violated.

I'll let you all to play with this code, and reflect how huge the semantic gap is between Python mappings and relational tables, that the OP is desperately keen to bridge (apparently as a side effect of his urge to "use nicer syntax" than SQL affords -- I wonder if he has looked at SqlAlchemy as I recommended).

I think, in the end, the important lesson is what I stated right at the start, in the first paragraph of the part of the answer I wrote yesterday, and I self-quote...:

sqlite is a SQL database and works by far best when used as such (wrapped in SQLAlchemy or whatever if you really insist;-).

Alex Martelli
Thanks Alex, I just try to fit the pythons' syntax to make story simple. And I think add one proprety key_field and set_key(key="NAME") in class may make sence, because on background, the SQL can do things like this:SELECT * from table where self.key_field = value.
K. C