views:

179

answers:

3

What's the most idiomatic way to convert a set of integers into a set of ranges?

E.g. given the set {0, 1, 2, 3, 4, 7, 8, 9, 11} I want to get { {0,4}, {7,9}, {11,11} }.

Let's say we are converting from std::set<int> into std::vector<std::pair<int, int>>. I treat Ranges as inclusive on both sides, since it's more convenient in my case, but I can work with open-ended ranges too if necessary.

I've written the following function, but I feel like reinventing the wheel. Please tell maybe there's something in STL or boost for this.

typedef std::pair<int, int> Range;

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
    Range r = std::make_pair(-INT_MAX, -INT_MAX);

    BOOST_FOREACH(int i, indices)
    {
           if (i != r.second + 1)
           {
            if (r.second >= 0) ranges.push_back(r);
            r.first = i;                    
           }

           r.second = i;
    }

    ranges.push_back(r);
}
+3  A: 

I don't think there's anything in the STL or Boost that does this.

One thing you can do is to make your algorithm a little bit more general:

template<class InputIterator, class OutputIterator>
void setToRanges(InputIterator first, InputIterator last, OutputIterator dest)
{
    typedef std::iterator_traits<InputIterator>::value_type item_type;
    typedef typename std::pair<item_type, item_type> pair_type;
    pair_type r(-std::numeric_limits<item_type>::max(), 
                -std::numeric_limits<item_type>::max());

    for(; first != last; ++first)
    {
        item_type i = *first;
        if (i != r.second + 1)
        {
            if (r.second >= 0) 
                *dest = r;
            r.first = i;                    
        }
        r.second = i;
    }
    *dest = r;
}

Usage:

std::set<int> set;
// insert items

typedef std::pair<int, int> Range;
std::vector<Range> ranges;

setToRanges(set.begin(), set.end(), std::back_inserter(ranges));

You should also consider using the term interval instead of range, because the latter in STL parlance means "any sequence of objects that can be accessed through iterators or pointers" (source).

Finally, you should probably take at look at the Boost Interval Arithmetic Library, which is currently under review for Boost inclusion.

Manuel
I'm afraid the current algorithm can't be used for general version: for example, it won't work with the unsigned types. The initial version at least explicitly demanded ints.Second, +1 for "interval". This was actually what I've used in my code, but posting here I decided to change it to "range" considering the term to be more familiar. Third, Boost Interval sure is powerful, but not of much use for me since all I do with the ranges is put them into the WHERE SQL clause like "DELETE FROM X WHERE id >= interval_start AND id <= interval_end"
Alex Jenter
+1  A: 

No shrinkwrapped solution I'm afraid, but an alternative algorithm.

Store your items in a bitvector - O(n) if you know the maximum item at the start and preallocate the vector.

Translate that vector into a vector of transition point flags - exclusive-or the bitvector with a bitshifted version of itself. Slightly fiddly at the word boundaries, but still O(n). Logically, you get a new key at the old max + 1 (the transition back to zeros after all your keys are exhausted), so it's a good idea to allow for that in the preallocation of the vector.

Then, iterate through the bitvector finding the set bits. The first set bit indicates the start of a range, the second the end, the third the start of the next range and so on. The following bit-fiddling function (assuming 32 bit int) may be useful...

int Low_Bit_No (unsigned int p)
{
  if (p == 0)  return -1;  //  No bits set

  int           l_Result = 31;
  unsigned int  l_Range  = 0xffffffff;
  unsigned int  l_Mask   = 0x0000ffff;

  if (p & l_Mask)  {  l_Result -= 16;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x00ff00ff;
  if (p & l_Mask)  {  l_Result -=  8;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x0f0f0f0f;
  if (p & l_Mask)  {  l_Result -=  4;  }  else  {  l_Mask ^= l_Range;  }
  l_Range &= l_Mask;
  l_Mask  &= 0x33333333;
  if (p & l_Mask)  {  l_Result -=  2;  }  else  {  l_Mask ^= l_Range;  }
  l_Mask  &= 0x55555555;
  if (p & l_Mask)  {  l_Result -=  1;  }

  return l_Result;
}
Steve314
Interesting idea, the problem is 1) I don't know the number of items upfront and it can be very large; 2) this subroutine is not performance bottleneck, saying "idiomatic" I was aiming at more clarity and being able to reuse library code; and 3) this algorithm is the same O(n) as the initial solution but much less maintainable.
Alex Jenter
I figured as much - it wasn't really intended to be serious, just me being bored and wasting some time, though the bit-fiddling is old code so not *that* much time. Someone searching for similar keywords may find it useful - that's my excuse, and I'm sticking to it.
Steve314
+1  A: 

I'd use adjacent_find with a predicate that defines "adjacency" as two elements that are not sequential. This solution doesn't depend on INT_MAX. Still feels kinda clunky.

bool notSequential(int a, int b) { return (a + 1) != b; }

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  int first;
  while (iter != end)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter != end)
    {
      ranges.push_back(std::make_pair(first, *iter));
      ++iter;
    }
  }
  ranges.push_back(std::make_pair(first, *--iter));
}

That tests against end more than necessary. adjacent_find can never return the last element of a list, so the incremented iterator will never be end and thus can still be dereferenced. It could be rewritten as:

void setToRanges(const std::set<int>& indices, std::vector<Range>& ranges)
{
  std::set<int>::iterator iter = indices.begin();
  std::set<int>::iterator end = indices.end();
  if (iter == end) return; // empty set has no ranges
  int first;
  while (true)
  {
    first = *iter;
    iter = std::adjacent_find(iter, end, notSequential);
    if (iter == end) break;
    ranges.push_back(std::make_pair(first, *iter++));
  }
  ranges.push_back(std::make_pair(first, *--iter));
}
Dan
Very nice idea, it's interesting to learn about the adjacent_find algorithm, haven't used it before!It's good that this code doesn't depend on INT_MAX and works on all integers and not just the positive ones. On the other hand, the code is longer and more obscure, uses two loops (one while and one inside of adjacent_find) instead of just one, and an additional free function needs to be defined.
Alex Jenter