tags:

views:

91

answers:

2

I have a website that allows multiple users to highlight multiple parts of a html document (mostly text).

Each highlight is marked by the start, character position, length and user id.

I need to find out the most highlighted sections (most overlapped sections), however different users might not have the same start and end position. What's the best way to implement such feature? preferably in c#

+5  A: 

Put all the users' begin and end selections in a list, sorted. Start at the top of the list and increment a counter for each begin point you find, decrement for each end point you find. The highest value of the counter is the start of your most highlighted / most overlapped section of text. The next item in the list is the end of that selection.

dthorpe
what if two selections don't have the same start position and length but are overlapped?
Comma
+1: This looks correct.
Moron
James Curran
+1  A: 

This should convert your data into the structure needed for dthorpe algorithm. If I had more time, I could probably Linq-ify the rest.

UPDATE: Ok, Now complete (if still untested....) UPDATE2: Now, actually tested & Working!

// Existing data structure
class StartLen
{
    public int Start {get; set;}
    public int Len   {get; set;}
    public string UserId {get; set;}
}

// Needed data struct
class StartEnd
{
    public int Pos {get; set;}
    public bool IsStart {get; set;}
}

class Segment
{
    public int Start { get; set; }
    public int End { get; set; }
    public int Count { get; set; }
}    

int count = 0, lastStart = -1;   // next rev, I figure a way to get rid of these. 

 // this can't be a lambda, it has to be a real function
IEnumerable<StartEnd> SplitSelection(StartLen sl)
 {  
    yield return new StartEnd() {Pos = sl.Start, IsStart = true} ; 
    yield return new StartEnd() {Pos = sl.Start+sl.Len -1 , IsStart = false} ; 
 }

List<StartLen> startLen = new List<StartLen>();
// we fill it with data for testing
// pretending to be the real data
startLen.Add(new StartLen() { Start=10, Len=10, UserId="abc123" });
startLen.Add(new StartLen() { Start=15, Len=10, UserId="xyz321" });

 var mostSelected  =  
    startLen.SelectMany<StartLen, StartEnd>(SplitSelection) 
        .OrderBy(se=>se.Pos) 
        .Select(se=>
        { 
            if (se.IsStart) 
            { 
                lastStart = se.Pos; 
                count++; 
            } 
            else 
            { 
                count--; 
                if (lastStart > 0) 
                { 
                    var seg = new Segment  
                        { Start = lastStart, End = se.Pos, Count = count }; 
                    lastStart = -1; 
                    return seg;
                } 
            } 
            // Garbage, cuz I need to return something 
            return new Segment { Start = 0, End = 0, Count = -1 };  
        }) 
       .OrderByDescending(seg => seg.Count) 
       .First(); 

// mostSelected  holds Start & End positions
}
James Curran
Comma
James Curran
thORpe! thORpe! :P
dthorpe
@dthorpe: I is a programmer, not a speller. (Feel free to reverse the middle letters of my last name... ;-)
James Curran
Ok if I understand this correctly the next step is to find the highest isStart = true and the lowest isStart = false?
Comma
James Curran
Comma
@James: Fair enough!;>
dthorpe