Given a list of sets:
- S_1 : [ 1, 2, 3, 4 ]
- S_2 : [ 3, 4, 5, 6, 7 ]
- S_3 : [ 8, 9, 10, 11 ]
- S_4 : [ 1, 8, 12, 13 ]
- S_5 : [ 6, 7, 14, 15, 16, 17 ]
What the most efficient way to merge all sets that share at least 2 elements? I suppose this is similar to a connected components problem. So the result would be:
- [ 1, 2, 3, 4, 5, 6, 7, 14, 15, 16, 17] (S_1 UNION S_2 UNION S_5)
- [ 8, 9, 10, 11 ]
- [ 1, 8, 12, 13 ] (S_4 shares 1 with S_1, and 8 with S_3, but not merged because they only share one element in each)
The naive implementation is O(N^2), where N is the number of sets, which is unworkable for us. This would need to be efficient for millions of sets.