ansaurus

Question

concatenate multi values in one record without duplication

Answer 1

A:

Hi mikehjun,

Here is a quickly made code in Python that may suit your needs, with minimal fidgeting.

import collections

d = collections.defaultdict(list)

with open("input_file.txt") as f:   
    for line in f:
        parsed = line.strip().split()
        print parsed
        k = parsed[0]
        v = parsed[2]
        d[k].append(v)

for k, v in sorted(d.iteritems()):
    s = " ----- "
    v = list(set(v)) # Must be a library function to do this
    v.sort()
    print k, s,
    for j in v:
        print j,
    print

Hope this helps

Morlock 2010-03-17 16:51:35

Use `collections.defaultdict` and save a bit of mess. `d=defaultdict(list)` and the last three lines of the loop simplify to `d[k].append( v )`

S.Lott 2010-03-17 17:40:15

Also, use `for k in d: v= items[k]` in the second loop to make it a hair simpler and more pythonic.

S.Lott 2010-03-17 17:41:08

@S.Lott Thank you. I'll implement this in my code! Cheers

Morlock 2010-03-17 17:42:21

Note, when working with files: 1) **Always** use a context manager, that is to say `with open("input_file.txt") as f:` to ensure that the file gets closed no matter what and 2) Never use `for line in f.readlines():`, which is wasteful. Use `for line in f:`. (Similarly, you can loop over a dict's items with `for k, v in d.iteritems()`, using `iteritems` to avoid making a needless list.) Also, `some_string.split(" ")` is typically spelled `some_string.split()`.

Mike Graham 2010-03-17 17:49:15

@Mike Graham Thank you. This is great input. I'll be sure to implement this starting right now in the Python code I am doing this afternoon. Cheers

Morlock 2010-03-17 18:01:30

@S.Lott I'm currently working. If you have time, be our guest :) I'll also get a chance to see how exactly you use 'collection.defaultdict'. Cheers

Morlock 2010-03-17 18:13:36

I edited in these things and additionally change the loop to be over `sorted(d.iteritems())` so that the order of the results would be the expected order. Note that this makes the solution O(n log n) and that there are plenty of O(n) solutions available.

Mike Graham 2010-03-17 18:22:02

Answer 2

A:

I think Morlock's answer does not satisfy the requirement of dropping duplicates. I would use a defaultdict(set), which will automatically omit dups, instead of defaultdict(list), and thus .add() instead of .append().

Vicki Laidler 2010-03-18 02:06:24

It did :) But it got lost in the edits. I added a line 'v = list(set(v)) to fix that. Cheers

Morlock 2010-03-18 12:27:38

ansaurus

tags:

views:

answers:

concatenate multi values in one record without duplication

related questions