views:

328

answers:

5

Hello everybody!

I have a python dictionary whose keys are strings and the values are objects.

For instance, and object with one string and one int

class DictItem:
   def __init__(self, field1, field2):
      self.field1 = str(field1)
      self.field2 = int(field2)

and the dictionary:

myDict = dict()
myDict["sampleKey1"] = DictItem("test1", 1)
myDict["sampleKey2"] = DictItem("test2", 2)
myDict["sampleKey3"] = DictItem("test3", 3)

Which is the best/most efficient way to get the dictionary entries that have the "field2" field >= 2?

The idea is creating a "sub-dictionary" (a list would do too) only with the entries in which field2 >= 2 (in the example would be like):

{
    "sampleKey2": {
        "field1" : "test2",
        "field2": 2 
    },
    "sampleKey3": {
        "field1" : "test3",
        "field2": 3 
    }
}

Is there a better way than walking through all the dictionary elements and check for the condition? Maybe using itemgetters, and lambda functions?

Thank you!

P.S.: I am using Python2.4, just in case it's relevant

+2  A: 
mySubList = [dict(k=v) for k,v in myDict.iteritems() if v.field2 >= 2]

Documentation:

list-comprehensions, iteritems()

Adam Bernier
except he wanted a dict, not a list...
Ned Batchelder
"a list would do too"
Adam Bernier
+1  A: 

You should keep your various records - that is "DicItem" instances - inside a list. An generator/list expression can then filter your desired results with ease.

data = [
   DictItem("test1", 1), 
   DictItem("test2", 2),
   DictItem("test3", 3),
   DictItem("test4", 4),
]

and then:

results = [item for item in data if item.field2 >= 2]

This, of course, creates a linear filter. If you need more than linear speed for some of your queries, the container object for the registers - in this case a "list" should be a specialized class able to create indexes of the data there in, much like a DBMS does with its table indexes. This can be done easily deriving a class from "list" and overriding the "append", "insert", "__getitem__", "__delitem__" and "pop" methods.

If you need this for a high profile application, I'd suggest you to take a look at some of the Object Oriented DB systems for Python out there, like ZODB and others.

jsbueno
A: 

The idea is creating a "sub-dictionary" (a list would do too)

If you want a list you could use filter (or itertools.ifilter):

result_list = filter(lambda x: x.field2 > 2, mydict.values())
ChristopheD
A: 

'Most efficient' is going to depend on how often the dictionary contents change compared to how often you are doing the lookup.

If the dictionary changes often and you do the lookup less often then the most efficient method will be walking through iteritems and selecting the objects that match the criteria, using the code the Adam Bernier posted.

If the dictionary does not change much and you do lots of lookups then it may be faster to make one or more inverse dictionaries, e.g. one mapping the "field2" values to a list of objects that have that value.

Alternatively if you are going to be doing complex queries you could put all the data into an in-memory sqllite database and let SQL sort it out, perhaps via an ORM such as SqlAlchemy

Dave Kirby
+1  A: 

To make a dict from your dict,

subdict = dict((k, v) for k, v in myDict.iteritems() if v.field2 >= 2)
Alex Martelli
I finally choose this solution :) Thank you!!
BorrajaX
@BorrajaX, so why not accept the answer (with the checkmark-shaped icon under the big number)? That's SO's fundamental etiquette: thanks are nice, but acceptance is what matters!
Alex Martelli
@Alex Martelli... Ohhh... I didn't know how this worked! (It was my first post here)
BorrajaX