Assume you have many elements, and you need to keep track of the equivalence relations between them. If element A is equivalent to element B, it is equivalent to all the other elements B is equivalent to.
I am looking for an efficient data structure to encode this information. It should be possible to dynamically add new elements through an equivalence with an existing element, and from that information it should be possible to efficiently compute all the elements the new element is equivalent to.
For example, consider the following equivalence sets of the elements [0,1,2,3,4]:
0 = 1 = 2
3 = 4
where the equality sign denotes equivalence. Now we add a new element 5
0 = 1 = 2
3 = 4
5
and enforcing the equivalence 5=3
, the data structure becomes
0 = 1 = 2
3 = 4 = 5
From this, one should be able to iterate efficiently through the equivalence set for any element. For 5, this set would be [3,4,5].
Boost already provides a convenient data structure called disjoint_sets
that seems to meet most of my requirements. Consider this simple program that illustates how to implement the above example:
#include <cstdio>
#include <vector>
#include <boost/pending/disjoint_sets.hpp>
#include <boost/unordered/unordered_set.hpp>
/*
Equivalence relations
0 = 1 = 2
3 = 4
*/
int main(int , char* [])
{
typedef std::vector<int> VecInt;
typedef boost::unordered_set<int> SetInt;
VecInt rank (100);
VecInt parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);
SetInt elements;
for (int i=0; i<5; ++i) {
ds.make_set(i);
elements.insert(i);
}
ds.union_set(0,1);
ds.union_set(1,2);
ds.union_set(3,4);
printf("Number of sets:\n\t%d\n", (int)ds.count_sets(elements.begin(), elements.end()));
// normalize set so that parent is always the smallest number
ds.normalize_sets(elements.begin(), elements.end());
for (SetInt::const_iterator i = elements.begin(); i != elements.end(); ++i) {
printf("%d %d\n", *i, ds.find_set(*i));
}
return 0;
}
As seen above one can efficiently add elements, and dynamically expand the disjoint sets. How can one efficiently iterate over the elements of a single disjoint set, without having to iterate over all the elements?