views:

87

answers:

2

Hi,

i have a huge list of two-element tuples which are coordinates of segments (start, end). In this way in a list below

list = [ (1,4), (2, 3), (10, 20), (18, 45) ] 

there are 4 segments with their start and end localization. I would like to remove segments that overlap. I expect to have a list like this as a result:

list = [ (1,4), (10,20) ]. 

I've already written a function which takes as input a pair of segments and returns 1 if their coordinates overlap:

def test_overlap(s1,e1,s2,e2):
    if (s1 <= e2 and e1 >= s2) or (e1 >= s2 and s1 <= e2):
        return 1
    if (s1 <= s2 and e1 >= e2) or (s1 >= s2 and e1 <= e2):
        return 1

But I have no idea how to compare efficiently each pair in a huge list of segments. Any help will be highly appreciated!

+3  A: 

First, sort the list (for comparison first use start point end then end point). Then go through list and remove all the tuples which are overlapping the previous element in the list. It is O(nlog(n)) for sorting and O(n) to go through the list.

Hope it helps.

Klark
+2  A: 

There's a data structure designed for this exact purpose (efficient identification of interval overlaps), it's called interval tree.

Bolo