tags:

views:

293

answers:

10

Working in C++ in a Linux environment, I have a situation where a number of integer ranges are defined, and integer inputs map to different arbitrary integers based on which range they fall into. None of the ranges overlap, and they aren't always contiguous.

The "simplest" way to solve this problem is with a bunch of if-statements for each range, but the number of ranges, their bounds, and the target values can all vary, so if-statements aren't maintainable.

For example, the ranges might be [0, 70], called r_a, [101, 250], call it r_b, and [201, 400], call it r_c. Inputs in r_a map to 1, in r_b map to 2, and r_c map to 3. Anything not in r_a, r_b, r_c maps to 0.

I can come up with a data structure & algorithm that stores tuples of (bounds, map target) and iterates through them, so finding the target value takes linear time in the number of bounds pairs. I can also imagine a scheme that keeps the pairs ordered and uses a binary sort-ish algorithm against all the lower bounds (or upper bounds), finds the closest to the input, then compares against the opposing bound.

Is there a better way to accomplish the mapping than a binary-search based algorithm? Even better, is there some C++ library out that does this already?

A: 

A simple Linked List containing the range entries should be quick enough, even for say 50-100 ranges. Moreover, you could implement a Skip List, on say the upper bounds, to speed up these range queries. Yet another possibility is an Interval Tree.

Ultimately I'd choose the simplest: binary search.

sixlettervariables
A: 

Your example ranges overlap, but the question says they wont. I'll assume the range is a typo. You could, could, store the destinations in an array and use the indices as the ranges. It's pretty easy, but ugly and not very maintainable. You'd need to initialize the array to 0, then for each range, iterate over those indices and set each index to the destination value. Very ugly, but constant lookup time so maybe useful if the numbers don't get too high and the ranges don't change very often.

David Kanarek
A: 

Record the limits into a set (or map). When you call insert you will have a return value which is a pair. An iterator and a boolean. If the boolean is true then a new element is created which you have to remove later. After that step one with the iterator and look at what you have found.

http://www.cplusplus.com/reference/stl/set/insert/ See Return value

Notinlist
:-D http://www.cplusplus.com/reference/stl/set/lower_bound/ You should check that too.
Notinlist
If you don't want to insert the item, why would you call `insert`? Why not just call `map::find` (or `set::find`) and be done with it? The return value from `find` is easier to deal with as well: an iterator pointing at the element (if found) or `container.end()` if not found.
Jerry Coffin
Jeffy: You misunderstood the question. We do not search for exact numbers we search for numbers nearby. Secondly: My comment has more clue than the proposed solution, but I don't like to edit.
Notinlist
+2  A: 

Wouldn't a simple array be enough? You're not saying how many items you have, but by far the fastest data structure is a simple array.

If the ranges are:

  • 0..9 --> 25
  • 10..19 --> 42

Then the array would simply be like this:

[25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42]
Lasse V. Karlsen
And what is this array going to look like if the ranges are `[10, 20] -> 25` and `[10000, 30000] -> 42`?
AndreyT
Well, it's going to be bigger, but still instantly fast. Hence my comment about "not saying how many items you have".
Lasse V. Karlsen
That's the solution I immediately thought about. A classic speed / memory trade-off, but who can beat `O(1)` complexity ?
Matthieu M.
Even if you have to go to a sparse array, you should be able to make it faster than a binary tree.
Jason Orendorff
+2  A: 

You can have two sorted arrays: one for lower bounds, one for upper bounds. Use std::lower_bound(lower_bound_array, value) and std::upper_bound(upper_bound_array, value). If the index of both results is the same, return index + 1. Otherwise, return 0.

If the indices returned match, it means that the value is >= the lower bound and < the upper bound. If they don't, then you are in between ranges.

MSN
A: 

It's 1-dimensional spatial index. Quadtree-style binary tree will do, for example - and there are several other widely used methods.

ima
+11  A: 

The best approach here is indeed a binary search, but any efficient order-based search will do perfectly well. You don't really have to implement the search and the data structure explicitly. You can use it indirectly by employing a standard associative container instead.

Since your ranges don't overlap, the solution is very simple. You can immediately use a std::map for this problem to solve it in just a few lines of code.

For example, this is one possible approach. Let's assume that we are mapping an [ int, int ] range to an int value. Let's represent our ranges as closed-open ranges, i.e. if the original range is [0, 70], let's consider a [0, 71) range instead. Also, let's use the value of 0 as a "reserved" value that means "no mapping" (as you requested in your question)

const int EMPTY = 0;

All you need to do is to declare a map from int to int:

typedef std::map<int, int> Map;
Map map;

and fill it with each end of your closed-open ranges. The left (closed) end should be mapped to the desired value the entire range is mapped to, while the right (open) end should be mapped to our EMPTY value. For your example, it will look as follows

map[0] = r_a;
map[71] = EMPTY;

map[101] = r_b;
map[251] = EMPTY;

map[260] = r_c; // 260 adjusted from 201
map[401] = EMPTY;

(I adjusted your last range, since in your original example it overlapped the previous range, and you said that your ranges don't overlap).

That's it for initialization.

Now, in order to determine where a given value of i maps to all you need to do is

Map::iterator it = map.upper_bound(i);

If it == map.begin(), then i is not in any range. Otherwise, do

--it;

If the it->second (for the decremented it) is EMPTY, then i is not in any range.

The combined "miss" check might look as follows

Map::iterator it = map.upper_bound(i);
if (it == map.begin() || (--it)->second == EMPTY)
  /* Missed all ranges */;

Otherwise, it->second (for the decremented it) is your mapped value

int mapped_to = it->second;

Note that if the original ranges were "touching", as in [40, 60] and [61, 100], then the closed-open ranges will look as [40, 61) and [61, 101) meaning that the value of 61 will be mapped twice during map initialization. In this case it is important to make sure that the value of 61 is mapped to the proper destination value and not to the value of EMPTY. If you map the ranges as shown above in the left-to-right (i.e. increasing) order it will work correctly by itself.

Note, that only the endpoints of the ranges are inserted into the map, meaning that the memory consumption and the performance of the search depends only on the total number of ranges and completely independent of their total length.


If you wish, you can add a "guard" element to the map during the initialization

map[INT_MIN] = EMPTY;

(it corresponds to "negative infinity") and the "miss" check will become simpler

Map::iterator it = map.upper_bound(i);

assert(it != map.begin());
if ((--it)->second == EMPTY)
  /* Missed all ranges */;

but that's just a matter of personal preference.

Of course, if you just want to return 0 for non-mapped values, you don't need to carry out any checking at all. Just take the it->second from the decremented iterator and you are done.

AndreyT
Very helpful explanation of how to use the upper_bound operation, good catch on the intent with the ranges.
Ian Durkan
+1  A: 

The ideal is an interval tree (specialized binary tree). Wikipedia describes the method completely. Better than I. You won't get much more optimal than this, without sacrificing space for performance.

Pestilence
Appreciate the detailed info.
Ian Durkan
A: 

You may find Minimal Perfect Hashing Function useful, http://cmph.sourceforge.net/.

ZelluX
+2  A: 

I would use a very simple thing: a std::map.

class Range
{
public:
  explicit Range(int item);  // [item,item]
  Range(int low, int high);  // [low,high]

  bool operator<(const Range& rhs) const
  {
    if (mLow < rhs.mLow)
    {
      assert(mHigh < rhs.mLow); // sanity check
      return true;
    }
    return false;
  } // operator<

  int low() const { return mLow; }
  int high() const { return mHigh; }

private:
  int mLow;
  int mHigh;
}; // class Range

Then, let's have a map:

typedef std::map<Range, int> ranges_type;

And write a function that search in this map:

int find(int item, const ranges_type& ranges)
{
  ranges_type::const_iterator it = ranges.lower_bound(Range(r));
  if (it != ranges.end() && it->first.low() <= item)
    return it->second;
  else
    return 0; // No mapping ?
}

Main benefits:

  • Will check that the ranges effectively don't overlap during insertion in the set (you can make it so that it's only in debug mode)
  • Supports edition of the Ranges on the fly
  • Finding is fast (binary search)

If the ranges are frozen (even if their values are not), you may wish to use Loki::AssocVector to reduce the memory overhead and improve performance a bit (basically, it's a sorted vector with the interface of a map).

Matthieu M.
Great solution! Very compact!
Ian Durkan