views:

100

answers:

4

I have a list composed of [start position, stop position, [sample names with those positions]]

My goal is to remove the duplicates with exact start and stop positions and just add the extra sample to the sample names section. The problem I'm encountering is that when I delete from the list, I end up with an out of range error, because it's not recalculating the len(list) within the loops.

for g in range (len(list)) :

    for n in range(len(list)):
    #compares the start and stop position of one line to the start and stop of another line 
          if (list[g][0]==list[n+1][0] and list[g][1]==[n+1][1])
          #adds new sample numbers to first start and stop entry with duplication
          labels1=list[g][2]
          labels2=list[n+1][2]
          labels=labels1+labels2
          list[g][2]=labels
    #now delete the extra line
          del list[n+1]
+3  A: 

I not sure I understand what you want, but it might be this:

from collections import defaultdict
d = defaultdict(list)
for start, stop, samples in L1:
    d[start, stop].extend(samples)
L2 = [[start, stop, samples] for (start, stop), samples in d.items()]

Which will take L1:

L1 = [ [1, 5, ["a", "b", "c"]], [3, 4, ["d", "e"]], [1, 5, ["f"]] ]

and make L2:

L2 = [ [1, 5, ["a", "b", "c", "f"]], [3, 4, ["d", "e"]] ]

Please note that this does not guarantee the same order of the elements in L2 as in L1, but from the looks of your question, that doesn't matter.

truppo
If the order of elements were important, it would be easy to make a new list of (start, stop) tuples that recorded the order; then your list comprehension that builds L2 could loop over the order-preserving list and thus L2 would be built in the same order.
steveha
By the way, beautiful solution. I wanted to post an answer but the best I could hope to do would be to re-invent this.
steveha
+2  A: 

Your loops should not be for loops, they should be while loop with an increment step. I guess you can just manually check the condition within your for loop (continue if it's not met), but a while loop makes more sense, imo.

Brian
Just to clarify - "while g < len(list) :" will react to changes in both g and len(list).
Steve314
A: 

I've just put together a nice little list comprehension that does pretty much what you did, except without the nasty del s.

from functools import reduce
from operator import add
from itertools import groupby

data = [
    [1, 1, [2, 3, 4]],
    [1, 1, [5, 7, 8]],
    [1, 3, [2, 8, 5]],
    [2, 3, [1, 7, 9]],
    [2, 3, [3, 8, 5]],
]

data.sort()
print(
    [[key[0], key[1], reduce(add, (i[2] for i in iterator))]
     for key, iterator in groupby(data, lambda item: item[:2])
    ]
)
sykora
+1  A: 

Here is truppo's answer, re-written to preserve the order of entries from L1. It has a few other small changes, such as using a plain dict instead of a defaultdict, and using explicit tuples instead of packing and unpacking them on the fly.

L1 = [ [1, 5, ["a", "b", "c"]], [3, 4, ["d", "e"]], [1, 5, ["f"]] ]


d = {}
oplist = []  # order-preserving list

for start, stop, samples in L1:
    tup = (start, stop)  # make a tuple out of start/stop pair
    if tup in d:
        d[tup].extend(samples)
    else:
        d[tup] = samples
        oplist.append(tup)

L2 = [[tup[0], tup[1], d[tup]] for tup in oplist]

print L2
# prints: [[1, 5, ['a', 'b', 'c', 'f']], [3, 4, ['d', 'e']]]
steveha
This is the solution I used...you guys are amazing. I was stuck on this problem for three days and this was a great cut and paste solution. Thanks to everyone for the input!
Jill Jo
@Jill, remember to accept the answer that has helped you most (click on the checkmark-shaped icon below the big number counting the answer's upvotes) -- that's fundamental SO etiquette!
Alex Martelli