views:

369

answers:

7
NumberLine line = new NumberLine();
line.AddRange(1, 5);
line.AddRange(20, 30);

line.CheckRange(10, 25);

NumberLine is a class which represents a number line. I want to mark on it different ranges of numbers. The CheckRange method should return which parts from 10-25 I marked and which I did not. In this case it should return that 10-20 is not marked, and that 20-25 is marked.

How can I implement an effective implementation of this, which wouldn't do o(n)?

Thank you.

NOTE: This is NOT homework. I need this for my custom database implementation transactions. I'm learning programming alone.

+3  A: 

The solution is simple: Add all highlighted values to an AVL or Red-black tree. I mean when you do AddRange(1,3), insert the integer values 1,2 and 3 in the tree.

When checking ranges, simply lookup the endpoint values. That takes O(log n) which is considerably faster than O(n).

Note: Insertion and delete all take O(log n).

_NT
Sidequestion: What does that O(1) stand for? I never got that one...
Shaharyar
I didn't really understand what markers are you talking about...?
TTT
For adding new ranges you still have to do a union of your ranges. I don't see an immediate way of doing this in O(1) (every new range will *change* at most two other ranges and possible delete an arbitrary number of other ranges). And you still have to find all ranges that apply somehow. Essentially this is an implementation of a set but I can't think of a suitable implementation off the top of my head.
Joey
@Shaharyar,O(1) is a form of Big-O notation. It's used to mark the efficiency of an algorithm based on the size of data it is executing on. A notation of O(1) for example, says that the algorithm will perform at the same speed, regardless of the size of data it is executing on. More on it here http://en.wikipedia.org/wiki/Big-oh
jlech
I'm not sure this answer makes sense. In the worst case (no matter what you add), you'll still have to check every entry. How does adding the endpoints reduce this complexity?
Andrew Song
n is the ranges count (and not the numbers count in the range), and therefore the o(1) that you're talking about is my o(n).
TTT
But the question asks for an 'implementation' of the this system, which will require retrieval. Having an O(1) insert isn't especially enlightening here.
Andrew Song
If you're going to edit your post, at least mention it so that the comment thread still makes sense if people look at this later on.
Andrew Song
And AVL tree insertion is O(logn), as I recall.
Andrew Song
A: 

O(n) means varies as the number of elements O(1) means constant time

I can't think of an O(1) way of implementing this, either.

Henry Troup
+1  A: 

OK, I see where you're going with this.

Lucene does this with very large bit fields.

Say your possible range of numbers goes from 1 to 64, each of these numbers corresponds to the bit in that position on a 64 bit int. (No 1 is bit 0, No 2 is bit 1).

If you add a number to a range, you switch on that bit (in your example you'd switch on bits 0 through 4 and 19 through 29).

Now to check a range of numbers, you create another 64 bit int with that range of bits switched on, and perform a bitwise And (&) on the two bit fields. The 1 bits in the result is the overlapping range.

For more numbers than 64, just scale up the number of bits (possibly by working with arrays or lists of integers)

Hope this helps :)

Update: Scalability

Lets say you're working on 64 bit architecture and you can AND 64 bit integers in a single operation. Ideally you'd use 64 bit ints.

Now, lets say your possible range of numbers runs from 1 to 64,000, for this you need 1000 64 bit ints.

Now lets look at a couple use cases

  1. I want to check the range of 70 - 80. To do this we don't need another 1000 ints for the check, just one int, and we know we're checking it against the 2nd element in our array.

  2. I want to check the range of 2000 - 10,000 Again, we only need one int, calculate it's position in the array 31st (I think) and set the bits accordingly and compare. Then you iterate through the list until you hit 10,000 (position 156?), comparing along the way, and building you're list of integers to return.

Update 2: That's not O(1)

Depending of the size of the range to check, you can implement this as O(1)

However using this algorithm the general case is still O(n)

Binary Worrier
Won't this scale poorly though? If my range was 1-128 (which is really not that big of a range), I would suddenly require one (well, two) 128-bit numbers?
Andrew Song
Thank you, AND is a nice idea, but I think there's a much more better way to save only the ranges, without referring to each member in the range, because I going to use big ranges, 1000000 for example.
TTT
Has anyone any idea why this was downvoted?
Binary Worrier
Re case 2: Basically, instead of checking points one at a time, you're checking them 64 at a time and iterating. I'm not sure this is o(n)...
Andrew Song
I'm with Andrew, your algorithm run time in the worst case is n / [size of bit array], which is still o(n)
OedipusPrime
OK, a thousand apologies, I never thought case 2 was O(1), I thought was was less than O(n), but now see the error of my ways. Thanks guys :)
Binary Worrier
(Though I'm not the one who downvoted it. In fact, I upvoted it because it's a somewhat clever idea.)
Andrew Song
A: 

I'm not sure about the specifics of the app, but my intuition tells me this would be much better handled in the database, since it is a set based operation.

I.E.

Select
*
from numberlines
where 
number_group = @group_id
marked = 1
and number >= @min_range
and number <= @max_range
OG
I'm implementing database transactions, and your idea is to use a database?
TTT
not sure about the efficinency !
Toto
+1  A: 

Use a HashSet<T>:

public class NumberLine : HashSet<int>
{
    public void AddRange(int start, int end)
    {
     int count = (end-start)+1;

     UnionWith(Enumerable.Range(start, count));
    }

    public IEnumerable<int> CheckRange(int start, int end)
    {
     NumberLine other = new NumberLine();

     other.AddRange(start, end);

     other.IntersectWith(this); // marked
     // other.ExceptWith(this); // not marked

     return other;
    }
}

Not sure what you wanted to return from CheckRange, or if you just wanted it to print a string. For something simple like the ranges you specified, you could use:

public string CheckRange(int start, int end)
{
    NumberLine other = new NumberLine();

    other.AddRange(start, end);

    IEnumerable<int> marked = other.Intersect(this);
    IEnumerable<int> notMarked = other.Except(this);

    int markedMin = marked.Min();
    int markedMax = marked.Max();
    int notMarkedMin = notMarked.Min();
    int notMarkedMax = notMarked.Max();

    string markedString = (markedMin == markedMax)
      ? markedMin.ToString()
      : string.Format("{0} - {1}", markedMin, markedMax);

    string notMarkedString = (notMarkedMin == notMarkedMax)
      ? notMarkedMin.ToString()
      : string.Format("{0} - {1}", notMarkedMin, notMarkedMax);

    return string.Format("Marked: {0}\r\nNot Marked: {1}", markedString, notMarkedString);
}

It won't handle split ranges like:

Marked: 10-15, 20-25
Not Marked: 16-19

But it should get you on the right track.

Chris Doggett
A: 

If you tried to tackle this problem in iteratively that may help. For instance load your LineNumber class up with a list of Ranges, Those ranges have a start and end int in them. Then instead of a 'checkrange(a,b)' method, just implement a 'hasNumber(a)' method. Do this simply by looping through your list of Ranges and call a method 'isInRange(a) on the Range class So your Data model might be:

LineNumber {
 List<Range> ranges;
 aadRange(a,b);
 // Loops through all ranges and calls isInRange on each method
 isInRange(a);

 //just iterates over isInRange from a to b
 checkRange(a,b)

}

Range {
 Range(a,b)
 isInRange(a);
}

This will get you some working code, and an interface. It may not be fast enough, but you don't know that yet. Leave the lucene implementation for later. :)

This is not a full solution, but perhaps a different approach could help yield better results.

Nathan Feger
+1  A: 

What about if you store the ranges themselves within the NumberLine. You can do a merge when overlapping ranges are added. CheckRange then can query the ranges that are stored inside NumberLine instead of the individual elements. This then becomes O(N) in the number of ranges, instead of O(N) in the number of elements. If you do merge ranges when possible then the number of ranges becomes smaller than the number of calls to AddRange.

See the code sample below. I'm not an expert on .Net collections so a more efficient implementation might be possible by selecting better collection types. _NT suggested storing values in a tree structure. You can apply this also to ranges and store them by the start number. This makes searching the ranges faster, both when adding and checking. In my current implementation adding Ranges to the end is slower than adding ranges at the beginning. When storing this in an efficient tree the complexity becomes O(log N) in the number of ranges.

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;

namespace NumberLine
{
    class Program
    {
        static void Main(string[] args)
        {
            NumberLine line = new NumberLine();
            line.AddRange(1, 5);
            line.AddRange(10, 12);
            line.AddRange(20, 30);

            List<Range> ranges = line.CheckRange(10, 25);
            foreach (Range r in ranges)
            {
                for (int i = r.Start; i <= r.End; i++)
                {
                    Console.WriteLine(i);
                }
            }
        }
    }

    class Range
    {
        public int Start;
        public int End;
    }

    class NumberLine
    {
        private SortedList<int, Range> Ranges = new SortedList<int, Range>();

        public void AddRange(int start, int end)
        {
            if (Ranges.Count == 0)
            {
                 Ranges.Add(start, new Range() { Start = start, End = end });
            }
            else
            {
                foreach (Range currentRange in Ranges.Values)
                {
                    if (start <= currentRange.Start) 
                    {
                        if (end >= currentRange.End)
                        {
                            currentRange.Start = start;
                            currentRange.End = end;
                        }
                        else
                        {
                            currentRange.Start = start;
                        }
                        Ranges.RemoveAt(start);
                        Ranges.Add(start, currentRange);
                        break;
                    } 
                    else
                    {
                        if (start <= currentRange.End)
                        {
                            currentRange.End = end;
                            break;
                        }
                        else
                        {
                            Ranges.Add(start, new Range(){ Start = start, End = end });
                            break;
                        }
                    }
                }           
            }
        }

        public List<Range> CheckRange(int start, int end)
        {
            List<Range> result = new List<Range>();
            foreach (Range currentRange in Ranges.Values)
            {
                if (start <= currentRange.End)
                {
                    if (end <= currentRange.End)
                    {
                        result.Add(new Range() { Start = currentRange.Start, End = end });
                        break;
                    }
                    else
                    {
                        if (start <= currentRange.Start)
                        {
                            result.Add(new Range() { Start = currentRange.Start, End = currentRange.End });
                        }
                        else
                        {
                            result.Add(new Range() { Start = start, End = currentRange.End });
                        }
                    }
                }
            }
            return result;
        }
    }

}
Jeroen Huinink