views:

53

answers:

2

Consider this short python list of dictionaries (first dictionary item is a string, second item is a Widget object):

raw_results =  
     [{'src': 'tag', 'widget': <Widget: to complete a form today>},   # dupe 1a
      {'src': 'tag', 'widget': <Widget: a newspaper>},                # dupe 2a
      {'src': 'zip', 'widget': <Widget: to complete a form today>},   # dupe 1b
      {'src': 'zip', 'widget': <Widget: the new Jack Johnson album>},
      {'src': 'zip', 'widget': <Widget: a newspaper>},                # dupe 2b
      {'src': 'zip', 'widget': <Widget: premium dog food >}]

I want to go through that list and remove the duplicates, which this SO question answered for me:

http://stackoverflow.com/questions/1549509/remove-duplicates-in-a-list-while-keeping-its-order-python

    known_widgets= set()
    processed_results = []

    for x in raw_results:
        widget = x['widget']
        if widget in known_widgets: 
            continue
        else:
            processed_results.append(x)
            known_widgets.add(widget)

However, after I remove the duplicate row (e.g. dupe 1b), I want to change the remaining duplicate's (e.g. dupe 1a) "src" data. I would like to append the removed duplicates "src" to the original. This is what I'd like to end up with:

processed_results =  
    [{'src': 'tag-zip', 'widget': <Widget: to complete a form today>},  # dupe 1a
     {'src': 'tag-zip', 'widget': <Widget: a newspaper>},               # dupe 2a
     {'src': 'zip', 'widget': <Widget: the new Jack Johnson album>},
     {'src': 'zip', 'widget': <Widget: premium dog food >}]

I'm sure this is easy to do, but my head is spinning after too much coffee and many hours circling this problem. I'd love and really appreciate the help of an expert. Thank you!

+2  A: 
def find_widget(widget, L):
    for i, v in enumerate(L):
      if v[widget] == widget:
          return i

known_widgets= set()
processed_results = []

for x in raw_results:
    widget = x['widget']
    if widget in known_widgets:
        processed_widgets[find_widget(widget, processed_results)]['src'] += '-%s' % x['tag']
        continue
    else:
        processed_results.append(x)
        known_widgets.add(widget)

Could probably be done better (as this is two passes for each duplicate widget).

ikanobori
Thanks for the help ikanobori, I appreciate it!
mitchf
If it works could you accept my post as the answer by clicking the V thingy to the left of it?
ikanobori
+1  A: 

Assuming you want to have a list of widgets keyed on the repeated src values, this is what you want:

class Widget(object):
    def __init__(self, desc):
        self.desc = desc
    def __str__(self):
        return "Widget(%s)" % self.desc

raw_results = [
    {'src':'tag-zip', 'widget':Widget('to complete a form today')},
    {'src':'tag-zip', 'widget':Widget('a newspaper')},
    {'src':'zip', 'widget':Widget('the new Jack Johnson album')},
    {'src':'zip', 'widget':Widget('premium dog food')}
]

from collections import defaultdict
known_widgets = defaultdict(list)
for x in raw_results:
    k, v = x['src'], x['widget']
    known_widgets[k].append(v)

for k, v in known_widgets.iteritems():
    print "%s: %s" % (k, ",".join(str(w) for w in v))

And if you want the duplicate widget5s eliminated, do this:

class Widget(object):
    def __init__(self, desc):
        self.desc = desc
    def __str__(self):
        return "Widget(%s)" % self.desc
    def __hash__(self):
        return hash(self.desc)
    def __cmp__(self, other):
        return cmp(self.desc, other.desc)

raw_results = [
    {'src':'tag-zip', 'widget':Widget('to complete a form today')},
    {'src':'tag-zip', 'widget':Widget('a newspaper')},
    {'src':'zip', 'widget':Widget('the new Jack Johnson album')},
    {'src':'zip', 'widget':Widget('premium dog food')},
    {'src':'tag-zip', 'widget':Widget('to complete a form today')},
    {'src':'tag-zip', 'widget':Widget('a newspaper')},
    {'src':'zip', 'widget':Widget('the new Jack Johnson album')},
    {'src':'zip', 'widget':Widget('premium dog food')},
]

from collections import defaultdict
known_widgets = defaultdict(set)
for x in raw_results:
    k, v = x['src'], x['widget']
    known_widgets[k].add(v)

for k, v in known_widgets.iteritems():
    print "%s: %s" % (k, ",".join(str(w) for w in v))
hughdbrown