ansaurus

Question

Python Working with lists based on indexes

Answer 1

+1 A:

I would split this into two tasks.

First, divide thedata into groups of LTYPE=N rows and the LTYPE=A rows that follow it.

def group_name_and_attributes(thedata):
    group = []
    for row in thedata:
        if row['LTYPE'] == 'N':
            if group:
                yield group
            group = [row]
        else:
            group.append(row)
    if group:
        yield group

Next, take each group in isolation and collect the total attributes for each; it's easy to then add the sum attributes to each row as desired.

def join_person_attributes(thedata):
    for group in group_name_and_attributes(thedata):
        attributes = ' '.join(row['NAME'] for row in group if row['LTYPE'] == 'A')
        for row in group:
            new_row = row.copy()
            new_row['PERSON_ATTRIBUTES'] = attributes
            yield new_row

new_data = list(join_person_attributes(thedata))

Of course you could make this modify the rows in-place, or only return one row per group, or ...

ephemient 2010-01-02 20:46:56

I appreciate your help a lot and I learned quite a bit from playing with the code you provided. I marked your answer up but I marked Alex's as the accepted because I had to add two lines to yours to get what I was looking for. I added pname= ' '.join(row['NAME'] for row in group if row['LTYPE'] == 'N') after the attributes= assignment in join_person_attributes function and new_row['PERSON_NAME'] = pname after the new_row assignment statement. I really do appreciate your answer and learned quite a bit. Thanks

PyNEwbie 2010-01-02 21:31:35

Answer 2

+2 A:

I suggest a different, index-free approach based on itertools.groupby:

import itertools, operator

data = [
{'LTYPE': 'N', 'RID': '1', 'NAME': 'Jason Smith'},
{'LTYPE': 'A', 'RID': '2', 'NAME': 'DA'},
{'LTYPE': 'A', 'RID': '3', 'NAME': 'B'},
{'LTYPE': 'N', 'RID': '4', 'NAME': 'John Smith'},
{'LTYPE': 'A', 'RID': '5', 'NAME': 'BC'},
{'LTYPE': 'A', 'RID': '6', 'NAME': 'CB'},
{'LTYPE': 'A', 'RID': '7', 'NAME': 'DB'},
{'LTYPE': 'A', 'RID': '8', 'NAME': 'DA'},
]

for k, g in itertools.groupby(data, operator.itemgetter('LTYPE')):
  if k=='N':
    person_name_record = next(g)
  else:
    attribute_records = list(g)
    person_attributes = ' '.join(r['NAME'] for r in attribute_records)
    addfields = dict(PERSON_ATTRIBUTES=person_attributes,
                     PERSON_NAME=person_name_record['NAME'])
    person_name_record.update(addfields)
    for r in attribute_records: r.update(addfields)

for r in data: print r

This prints your desired results for the first couple people (and each person is treated separately, so it should work just the same for a few hundred thousand people;-).

Alex Martelli 2010-01-02 20:53:47

Thanks I have been playing with your answer and learning a lot about how itertools works. I also learned from the other answer I marked yours as the answer because I had to make a slight modification to the other answer to get what I needed.

PyNEwbie 2010-01-02 21:28:26

ansaurus

tags:

views:

answers:

Python Working with lists based on indexes

related questions