views:

168

answers:

5

I have a particular case where using compound dictionary keys would make a task easier. I have a working solution, but feel it is inelegant. How would you do it?

context = {
    'database': {
        'port': 9990,
        'users': ['number2', 'dr_evil']
    },
    'admins': ['[email protected]', '[email protected]'],
    'domain.name': 'virtucon.com'
}

def getitem(key, context):
    if hasattr(key, 'upper') and key in context:
        return context[key]

    keys = key if hasattr(key, 'pop') else key.split('.')

    k = keys.pop(0)
    if keys:
        try:
            return getitem(keys, context[k])
        except KeyError, e:
            raise KeyError(key)
    if hasattr(context, 'count'):
        k = int(k)
    return context[k]

if __name__ == "__main__":
    print getitem('database', context)
    print getitem('database.port', context)
    print getitem('database.users.0', context)
    print getitem('admins', context)
    print getitem('domain.name', context)
    try:
        getitem('database.nosuchkey', context)
    except KeyError, e:
        print "Error:", e

Thanks.

+1  A: 

I'm leaving my original solution for posterity:

CONTEXT = {
    "database": {
        "port": 9990,
        "users": ["number2", "dr_evil"]},
    "admins": ["[email protected]", "[email protected]"],
    "domain": {"name": "virtucon.com"}}


def getitem(context, *keys):
    node = context
    for key in keys:
        node = node[key]
    return node


if __name__ == "__main__":
    print getitem(CONTEXT, "database")
    print getitem(CONTEXT, "database", "port")
    print getitem(CONTEXT, "database", "users", 0)
    print getitem(CONTEXT, "admins")
    print getitem(CONTEXT, "domain", "name")
    try:
        getitem(CONTEXT, "database", "nosuchkey")
    except KeyError, e:
        print "Error:", e

But here's a version that implements an approach similar to the getitem interface suggested by doublep. I am specifically not handling dotted keys, but rather forcing the keys into separate nested structures because that seems cleaner to me:

CONTEXT = {
    "database": {
        "port": 9990,
        "users": ["number2", "dr_evil"]},
    "admins": ["[email protected]", "[email protected]"],
    "domain": {"name": "virtucon.com"}}


if __name__ == "__main__":
    print CONTEXT["database"]
    print CONTEXT["database"]["port"]
    print CONTEXT["database"]["users"][0]
    print CONTEXT["admins"]
    print CONTEXT["domain"]["name"]
    try:
        CONTEXT["database"]["nosuchkey"]
    except KeyError, e:
        print "Error:", e

You might notice that what I've really done here is eliminate all ceremony regarding accessing the data structure. The output of this script is the same as the original except that it does not contain a dotted key. This seems like a more natural approach to me but if you really wanted to be able to handle dotted keys, you could do something like this I suppose:

CONTEXT = {
    "database": {
        "port": 9990,
        "users": ["number2", "dr_evil"]},
    "admins": ["[email protected]", "[email protected]"],
    "domain": {"name": "virtucon.com"}}


def getitem(context, dotted_key):
    keys = dotted_key.split(".")
    value = context
    for key in keys:
        try:
            value = value[key]
        except TypeError:
            value = value[int(key)]
    return value


if __name__ == "__main__":
    print getitem(CONTEXT, "database")
    print getitem(CONTEXT, "database.port")
    print getitem(CONTEXT, "database.users.0")
    print getitem(CONTEXT, "admins")
    print getitem(CONTEXT, "domain.name")
    try:
        CONTEXT["database.nosuchkey"]
    except KeyError, e:
        print "Error:", e

I'm not sure what the advantage of this type of approach would be though.

John
That's even less pythonic than the code in the original question.
Nick Bastin
I want to use a single string key, otherwise I would just use `context["database"]["port"]`.
John Keyes
You could write a subclass of `dict` or a `dict`-like class that would special-handle tuples in `__getitem__`. So that e.g. code like `context['database']` and `context['database', 'port']` would work like the original `getitem()`. Though maybe people would still find that less Pythonic, dunno.
doublep
@John Keyes: Well, if you want to use strings, why `pop()` at all?
doublep
I'm using the `pop` to change the context for the recursion.I didn't want to subclass, just to be able to support any `dict`.
John Keyes
Nick, can you explain why you think it's less pythonic? While I wasn't really aiming for improving pythonicity (rather I was just trying to achieve the same effect with less code), I don't see what's necessarily so unpythonic about that approach.
John
I want to use dotted keys as they may be used in template files and will be getting token replaced e.g. `${database.port}`.Your new code doesn't handle the `domain.name` case.
John Keyes
@John just read your answer and see your comment about the `domain.name` case. You can ignore the last sentence of my last comment. Thanks.
John Keyes
A: 

The following code works. It checks for the special case of a single key having a period in it. Then, it splits the key apart. For each subkey, it tries to fetch the value from a list-like context, then it tries from a dictionary-type context, then it gives up.

This code also shows how to use unittest/nose, which is highly recommended. Test with "nosetests mysource.py".

Lastly, consder using Python's built-in ConfigParser class, which is really useful for this type of configuration task: http://docs.python.org/library/configparser.html

#!/usr/bin/env python

from nose.tools import eq_, raises

context = {
    'database': {
        'port': 9990,
        'users': ['number2', 'dr_evil']
    },
    'admins': ['[email protected]', '[email protected]'],
    'domain.name': 'virtucon.com'
}

def getitem(key, context):
    if isinstance(context, dict) and context.has_key(key):
        return context[key]
    for key in key.split('.'):
        try:
            context = context[int(key)]
            continue
        except ValueError:
            pass
        if isinstance(context, dict) and context.has_key(key):
            context = context[key]
            continue
        raise KeyError, key
    return context

def test_getitem():
    eq_( getitem('database', context), {'port': 9990, 'users': ['number2', 'dr_evil']} )
    eq_( getitem('database.port', context), 9990 )
    eq_( getitem('database.users.0', context), 'number2' )
    eq_( getitem('admins', context), ['[email protected]', '[email protected]'] )
    eq_( getitem('domain.name', context), 'virtucon.com' )

@raises(KeyError)
def test_getitem_error():
    getitem('database.nosuchkey', context)
shavenwarthog
Thanks for the answer, I don't think this is more elegant code. My code above works fine for the special case you outline too.
John Keyes
I've re-read my comment and I think it could be interpreted as rude. Apologies if you think so. Thanks for the effort you put into your answer, I just think there is a very nice way to solve this and it will smack me in the face when I see it.
John Keyes
A: 

As the key to getitem must be a string (or a list which is passed in the recursive call) I've come up with the following:

def getitem(key, context, first=True):
    if not isinstance(key, basestring) and not isinstance(key, list) and first:
        raise TypeError("Compound key must be a string.")

    if isinstance(key, basestring):
        if key in context:
            return context[key]
        else:
            keys = key.split('.')
    else:
        keys = key

    k = keys.pop(0)
    if key:
        try:
            return getitem(keys, context[k], False)
        except KeyError, e:
            raise KeyError(key)
    # is it a sequence type
    if hasattr(context, '__getitem__') and not hasattr(context, 'keys'):
        # then the index must be an integer
        k = int(k)
    return context[k]

I am on the fence as to whether this is an improvement.

John Keyes
After a nights sleep I am no longer on the fence, this sucks :)
John Keyes
+2  A: 
>>> def getitem(context, key):
    try:
        return context[key]
    except KeyError:
        pass
    cur, _, rest = key.partition('.')
    rest = int(rest) if rest.isdigit() else rest
    return getitem(context[cur], rest)


>>> getitem(context, 'admins.0')
'[email protected]'
>>> getitem(context, 'database.users.0')
'number2'
>>> getitem(context, 'database.users.1')
'dr_evil'

I've changed the order of the arguments, because that's how most Python's functions work, cf. getattr, operator.getitem, etc.

SilentGhost
Here it comes: *smack in face*. That is sweet.The only problem I see is when the key is not present e.g. `database.nosuchkey'. With the above code the error message is "nosuchkey". If I wrap the `return getitem()` in a try except and `raise KeyError(key)` the error message is `database.nosuchkey`.
John Keyes
@John: that's true, however traceback also shows the call that caused the exception. And this way you'll know, that `'database'` is present, while its subkey `'nosuchkey'` is not.
SilentGhost
+2  A: 

The accepted solution (as well as my first attempt) failed due to the ambiguity inherent in the specs: '.' may be "just a separator" or a part of the actual key string. Consider, for example, that key may be 'a.b.c.d.e.f' and the actual key to use at the current level is 'a.b.c.d' with 'e.f' left over for the next-most-indented level. Also, the spec is ambiguous in another sense: if more than one dot-joined prefix of 'key' is present, which one to use?

Assume the intention is to try every such feasible prefix: this would possibly produce multiple solutions but we can arbitrarily return the first solution found in this case.

def getitem(key, context):
    stk = [(key.split('.'), context)]
    while stk:
      kl, ctx = stk.pop()
      if not kl: return ctx
      if kl[0].isdigit():
        ik = int(kl[0])
        try: stk.append((kl[1:], ctx[ik]))
        except LookupError: pass
      for i in range(1, len(kl) + 1):
        k = '.'.join(kl[:i])
        if k in ctx: stk.append((kl[i:], ctx[k]))
    raise KeyError(key)

I was originally trying to avoid all try/excepts (as well as recursion and introspection via hasattr, isinstance, etc), but one snuck back in: it's hard to check if an integer is an acceptable index/key into what might be either a dict or a list, without either some introspection to distinguish the cases, or (and it looks simpler here) a try/except, so I went fir te latter, simplicity being always near the top of my concerns. Anyway...

I believe variants on this approach (where all the "possible continuation-context pairs" that might still be feasible at any point are kept around) are the only working way to deal with the ambiguities I've explained above (of course, one might choose to collect all possible solutions, arbitrarily pick one of them according to whatever heuristic criterion is desire, or maybe raise if the ambiguity is biting so there are multiple solutions, etc, etc, but these are minor variants of this general idea).

Alex Martelli
This will not work for two of the cases in the original code sample. The first is the case where the key contains a period `domain.name`. The second is the case where a part of the key is an index into a sequence e.g. `database.users.0`. There are no reasons for my complications, other than that was the way I thought of resolving the problem. I knew my solution smelled badly, that's why I asked how would other people solve it.
John Keyes
The `.0` as index into a sequence will work just fine (that's what the nested `try`/`except` is all about). You're right that it won't work if some or all of the dots are meant to be kept: neither will the answer you've selected in slightly more complex cases, e.g. when `key` is `'a.b.c.d'` and the leading `'a.b'` is to be taken as key at the current level, `'c.d'` as the one into the once-nested subdict thus chosen. The notation where `'.'` can stand for two completely different things is also ambiguous -- why not use a compound-key joiner char that can't appear in actual keys?
Alex Martelli
I tried the .0 as index and it didn't work `KeyError: 'database.users.0'`. Of course you are correct regarding the `'a.b.c.d'` case. My only reason for including it was a 'just in case' approach. My config files won't include keys like this.
John Keyes
If I change `except LookupError: ...` to `except TypeError: ...` the `.0` case works.
John Keyes
@John, you're right: my first solution works if a _dict_ has an int key, but not if a _list_ appears (you'd need catch both exceptions to work in both cases). So anyway, I've edited the A to show a solution that thinks **does** work in the "keys like this" cases -- and, "look ma, no try/except";-).
Alex Martelli
@Alex, Alas 'tis me again. I've tried that function and it doesn't work for me. I tried `getitem('database', context)` and others but get a KeyError for each one. I like the no try/except goal :)
John Keyes
@John, fixed this time;-). However one try/except snuck back in, as I mention in my latest edit: checking if an int is an acceptable key/index into either a dict or a list requires either a try/except, or an `isinstance` check, which is more complicated and WAY more restrictive (plus checks on whether the int is too large, which would further complicate the code), so I went that way for simplicity (and similarly slimmed down the A which was getting far too long and complicated after a succession of edits).
Alex Martelli
@Alex nice! That works well. Thanks for the all the time you've put into this. Funny how this was a simple piece of bad code (not what I put into the question), which I tried to make better (what I did put nto the question), and then it became a bigger problem through me adding complexity (for the sake of it). As the configuration files I am using are JSON based the keys in each `dict` will always be strings. This eliminates the requirement for `LookupError` of an int key. I also don't plan on using `.` in key names, just thought I should cover the possibility. I've learned a bunch though :)
John Keyes