views:

649

answers:

4

Let's say I don't have more than one or two dozen objects with different properties, such as the following:

UID, Name, Value, Color, Type, Location

I want to be able to call up all objects with Location = "Boston", or Type = "Primary". Classic database query type stuff.

Most table solutions (pytables, *sql) are really overkill for such a small set of data. Should I simply iterate over all the objects and create a separate dictionary for each data column (adding values to dictionaries as I add new objects)?

This would create dicts like this:

{'Boston' : [234, 654, 234], 'Chicago' : [324, 765, 342] } - where those 3 digit entries represent things like UID's.

As you can see, querying this would be a bit of a pain.

Any thoughts of an alternative?

+5  A: 

For small relational problems I love using Python's builtin sets.

For the example of location = 'Boston' OR type = 'Primary', if you had this data:

users = {
   1: dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
   2: dict(Name="Mr. Bar", Location="New York", Type="Primary"),
   3: dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
   #...
}

You can do the WHERE ... OR ... query like this:

set1 = set(u for u in users if users[u]['Location'] == 'Boston')
set2 = set(u for u in users if users[u]['Type'] == 'Primary')
result = set1.union(set2)

Or with just one expression:

result = set(u for u in users if users[u]['Location'] == 'Boston'
                              or users[u]['Type'] == 'Primary')

You can also use the functions in itertools to create fairly efficient queries of the data. For example if you want to do something similar to a GROUP BY city:

cities = ('Boston', 'New York', 'Chicago')
cities_users = dict(map(lambda city: (city, ifilter(lambda u: users[u]['Location'] == city, users)), cities))

You could also build indexes manually (build a dict mapping Location to User ID) to speed things up. If this becomes too slow or unwieldy then I would probably switch to sqlite, which is now included in the Python (2.5) standard library.

lost-theory
Thank you, I've never used the builtin sets before. This should at least make it more explicit what's happening in the code.
akoumjian
A: 

How about a list of objects and comprehensions?:

class Row():
    def __init__(self,uid,name,value,color,type,location):
        self.uid = uid
        self.name = name
        #etc...

#snip...

table = [
    Row(0,'Name0','Value0','Color0','Primary','Boston'),
    Row(1,'Name1','Value1','Color1','Primary','Walla Walla'),
    Row(2,'Name2','Value2','Color2','Secondary','Boston'),
    Row(3,'Name3','Value3','Color3','Secondary','Walla Walla'),
    #etc...
    ]

primary_entries = [row for row in table if row.type == 'Primary']
boston_entries = [row for row in table if row.location == 'Boston']

This won't be bad for a dozen objects. If they change a lot, load the data from a file.

Corbin March
Read up on namedtuple - it was specifically intended to simplify constructs such as your Row class.
Paul McGuire
Nice. Thanks Paul.
Corbin March
+1  A: 

If it's really a small amount of data, I'd not bother with an index and probably just write a helper function:

users = [
   dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
   dict(Name="Mr. Bar", Location="New York", Type="Primary"),
   dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
   ]

def search(dictlist, **kwargs):
   def match(d):
      for k,v in kwargs.iteritems():
         try: 
            if d[k] != v: 
               return False
         except KeyError:
            return False
      return True

   return [d for d in dictlist if match(d)]

Which will allow nice looking queries like this:

result = search(users, Type="Secondary")
Triptych
This is also very helpful. The aesthetic of this code keeps more with what people expect with Python. However, I like the little bit of flexibility that sets seem to offer with unions/intersections.
akoumjian
+2  A: 

I do not think sqlite would be "overkill" -- it comes with standard Python since 2.5, so no need to install stuff, and it can make and handle databases in either memory or local disk files. Really, how could it be simpler...? If you want everything in-memory including the initial values, and want to use dicts to express those initial values, for example...:

import sqlite3

db = sqlite3.connect(':memory:')
db.execute('Create table Users (Name, Location, Type)')
db.executemany('Insert into Users values(:Name, :Location, :Type)', [
   dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
   dict(Name="Mr. Bar", Location="New York", Type="Primary"),
   dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
   ])
db.commit()
db.row_factory = sqlite3.Row

and now your in-memory tiny "db" is ready to go. It's no harder to make a DB in a disk file and/or read the initial values from a text file, a CSV, and so forth, of course.

Querying is especially flexible, easy and sweet, e.g., you can mix string insertion and parameter substitution at will...:

def where(w, *a):
  c = db.cursor()
  c.execute('Select * From Users where %s' % w, *a)
  return c.fetchall()

print [r["Name"] for r in where('Type="Secondary"')]

emits [u'Mr. Foo', u'Mr. Quux'], just like the more elegant but equivalent

print [r["Name"] for r in where('Type=?', ["Secondary"])]

and your desired query's just:

print [r["Name"] for r in where('Location="Boston" or Type="Primary"')]

etc. Seriously -- what's not to like?

Alex Martelli
The advantage seems to be even more flexibility in querying, and the option of easily moving a db from memory to and from files, exporting, etc.The disadvantage, as I see it, is that you've had to import an additional module (not a huge deal), and someone else reading the code needs to learn what all those object methods are.It's a fine solution, but not really the simplest imho.
akoumjian
Wanna bet that more people already know about the Python DB API than about sets, genexps, `.union`, etc?-)
Alex Martelli
I just see a lot more elegance in using one dictionary, one set, and a loop than in using a database object, a cursor object, and six methods.
akoumjian
I have a follow up question:If I wanted to store more complex objects, such as Timeseries (http://pytseries.sourceforge.net/), how would that work? It seems like I would need to adapt it to one of the sqlite data types.
akoumjian
Alex Martelli