views:

212

answers:

3

Hello,

He're an interesting problem that looks for the most Pythonic solution. Suppose I have a list of mappings {'id': id, 'url': url}. Some ids in the list are duplicate, and I want to create a new list, with all the duplicates removed. I came up with the following function:

def unique_mapping(map):
    d = {}
    for res in map:
        d[res['id']] = res['url']

    return [{'id': id, 'url': d[id]} for id in d]

I suppose it's quite efficient. But is there a "more Pythonic" way ? Or perhaps a more efficient way ?

+4  A: 

Your example can be rewritten slightly to construct the first dictionary using a generator expression and to remove necessity of construction of another mappings. Just reuse the old ones:

def unique_mapping(mappings):
    return dict((m['id'], m) for m in mappings).values()

Although this came out as a one-liner, I still think it's quite readable.

There are two things you have to keep in mind when using your original solution and mine:

  • the items will not always be returned in the same order they were originally
  • the later entry will overwrite previous entries with the same id

If you don't mind, then I suggest the solution above. In other case, this function preserves order and treats first-encountered ids with priority:

def unique_mapping(mappings):
    addedIds = set()
    for m in mappings:
        mId = m['id']
        if mId not in addedIds:
            addedIds.add(mId)
            yield m

You might need to call it with list(unique_mappings(mappings)) if you need a list and not a generator.

DzinX
+2  A: 

There are a couple of things you could improve.

  • You're performing two loops, one over the original dict, and then again over the result dict. You could build up your results in one step instead.

  • You could change to use a generator, to avoid constructing the whole list up-front. (Use list(unique_mapping(items)) to convert to a full list if you need it)

  • There's no need to store the value when just checking for duplicates, you can use a set instead.

  • You're recreating a dictionary for each element, rather than returning the original. This may actually be needed (eg. you're modifying them, and don't want to touch the original), but if not, its more efficient to use the dictionaries already created.

Here's an implementation:

def unique_mapping(items):
    s = set()
    for res in items:
        if res['id'] not in s:
            yield res
            s.add(res['id'])
Brian
+1  A: 

I think this can be made simpler still. Dictionaries don't tolerate duplicate keys. Make your list of mappings into a dictionary of mappings. This will remove duplicates.

>>> someListOfDicts= [
    {'url': 'http://a', 'id': 'a'}, 
    {'url': 'http://b', 'id': 'b'}, 
    {'url': 'http://c', 'id': 'a'}]

>>> dict( [(x['id'],x) for x in someListOfDicts ] ).values()

[{'url': 'http://c', 'id': 'a'}, {'url': 'http://b', 'id': 'b'}]
S.Lott