views:

307

answers:

4
+4  Q: 

Rank Tree in C++

We need ADT having search and rank features. That is, in addition to the interface of STL map, a function 'int get_rank(key)' is required.

Standard implementation of such function requires supporting and updating an extra integer field in every node of self-balanced searching tree (e.g., in black-red tree, used in STL map/set). But it seems, STL map/set do not do this.

We're looking for a solution based on standard containers (STL, Boost) having the best possible Time Complexity: finding/adding/erasing an element take O(log n) (like in STL map/set), computing the rank by a key takes also O(log n).

By the rank of an element, we mean the position of the element in the sorted sequence of all the elements of the map/set.

Example. set = {0, 4, 6, 7, 8} rank(0)=1, rank(4)=2, rank(6)=3, rank(7)=4, rank(8)=5.

In our opinion, under Time Complexity constrains above, the problem cannot be solved by a combination of two maps one sorting by key and other sorting by rank.

Thanks.

A: 

I would suppose that by rank you actually mean the distance from the root, since if it could be stored contiguously with the value you would not have to go to such length.

I think you could do it "externally", since in this case the rank can be extrapolated from the number of times the comparison predicate is used...

namespace detail
{
  template <class Comparator>
  class CounterComparator: Comparator
  {
  public:
    CounterComparator(size_t& counter):
        Comparator(), mCounter(&counter) {}
    CounterComparator(Comparator comp, size_t& counter):
        Comparator(comp), mCounter(&counter) {}

    template <class T, class U>
    bool operator()(T& lhs, U& rhs) const
    { 
      ++(*mCounter);
      return this->Comparator::operator()(lhs,rhs);
    }
  private:
    size_t* mCounter;
  };
} // namespace detail

template <
  class Key, 
  class Value, 
  class Cmp = std::less<Key>, 
  class Allocator = std::allocator< std::pair<const Key,Value> >
>
class SuperMap
{
  typedef detail::CounterComparator<Cmp> Comparator;
public:
  SuperMap(): mCounter(0), mData(Comparator(mCounter)) {}

  Value& operator[](const Key& key) { return mData[key]; }

  size_t rank(const Key& key) const
  { 
    mCounter = 0; mData.find(key); return mCounter;
  }

private:
  typedef std::map<Key,Value, Comparator, Allocator> data_type;

  mutable size_t mCounter;
  data_type mData;
}; // class SuperMap

int main(int argc, char* argv[])
{
  SuperMap<int,int> superMap;
  superMap[1] = 42;
  std::cout << superMap.rank(1) << std::endl;
}

// outputs
// 2

It counts the number of tests, but since std::map stops testing as soon as it gets the right key... it should be alright :) Though there is probably some offset to deduce there (1 or 2) to get the rank instead.

If you gave a better definition of rank I may work a bit more but I don't want to spend too much time in the wrong direction.

Matthieu M.
A: 

I suggest using two maps of pointers. The first map will have [key, pointer to data] pairs and the other would have [rank, pointer to data] pairs. This is also known as multiple indices.

The rank index allows the data to be sorted without any relationship to the first, key, index. Also, having an extra, isolated index means that the data structure and methods of the index don't need to be modified; standard structures and methods can be used.

The data pointer allows one datum to be shared between the two indices. I highly suggest using shared pointers to the datum. The Boost library provides tested mechanics for shared pointers: Boost Smart Pointers.

Thomas Matthews
+1  A: 

The rank of the given key K is the number of keys which are less or equal to K.

E.g., let set s = {1, 3, 4, 6, 9}. Then rank(1) = 1, rank(4) = 3, rank(9) = 5.

The STL function distance() can be used for computing the rank of an element x appearing in the set s.

rank = distance(s.begin(), s.find(x));

The problem is that its time complexity is O(n).

Note that proposed two maps (or sets) indexed by key and by rank is not correct solution. The problem is that a change of one element affects ranks of many others. E.g., adding element 0 to the set s above change the ranks of all existing elements: s' = {0, 1, 3, 4, 6, 9}. rank(1) = 2, rank(4) = 4, rank(9) = 6.

Thanks.

Slava
A: 

You can probably do what you want with intrusive containers (boost). You'll need to maintain your extra summary information yourself - I'm not certain that the needed hooks are built into the boost library, but I'd hope so.

If "rank" means "subscript" or "index" (imagining you data represented as an ordered array) the method isn't too difficult. The number in each node is just the number of items in the subtree descended from that node. I've implemented this in a multiway tree library, the useful operations being...

  • Find subscript
  • Determine subscript from cursor/iterator
  • Step iterator by offset
  • Distance between two iterators
  • Find first unused key >= minimum value, for unsigned integer keys

That last one is kind of a surprise bonus - I won't guarantee O(log n) because I never really checked, but it exploits both key and size summaries in nodes, and in practice I've used it with large containers with no problems. And it's amazing how useful it is - probably more frequently useful than the obvious operations.

The library isn't released, I'm afraid, but this kind of thing is do-able and useful.

Steve314