One example doesn't make a complete specification. For example, how would your answer be different if the collection of sets also included
set E: 1 2 3
set F: 1 3
which would make 3 the most frequently-occurring value among sets that have non-empty intersection with D
? So here are my assumptions:
Given a target set (D
in your original example):
- Values in "overlapping sets" (sets that have non-empty intersection with the target set) are more relevant that values not in those overlapping sets.
- Under the constraint of statement 1, relevance is determined by frequency of occurrence.
In your original example, A
overlaps with D
, so the universe {1, 2, 3, 4, 5, 6, 7} is partitioned into overlapping {1, 2, 3, 4} and non-overlapping {5, 6, 7}. The value frequencies are {1:2, 2:1, 3:2, 4:3, 5:2, 6:2, 7:1}. Combining these facts gives overlapping frequencies {1:2, 2:1, 3:2, 4:3} and non-overlapping frequencies {5:2, 6:2, 7:1}, which produces the order 4, 3, 1, 2 followed by 5, 6, 7. (I notice that you didn't assign a relevance to 1. If deliberate, that can be a final step of removing values of the target set from the final ordering.)
In my adjusted example, the frequencies become {1:4, 2:3, 3:4, 4:3, 5:2, 6:2, 7:1}. That gives overlapping frequencies {1:4, 2:3, 3:4, 4:3} and non-overlapping frequencies {5:2, 6:2, 7:1}, which produces the order 1, 3, 2, 4 followed by 5, 6, 7.
Pseudo-code for this algorithm is:
Initialize overlapping
and universe
to be empty sets and frequency
to be an empty hash.
For each set s
in the collection of sets (other than t
, the target set):
2.1. Set universe
to the union of s
and universe
2.2. If s
intersected with t
has at least one element:
2.2.1. Set `overlapping` to the union of `overlapping` and `s`
2.3. For each element e
in s
:
2.3.1. If 'e' is a key in `frequency`
2.3.1.1. Then increase the value (count) for `e` in `frequency` by 1
2.3.1.2. Else initialize the value (count) for `e` in `frequency` to 1
Set nonOverlapping
to the difference of universe
and overlapping
Sort the elements of universe
by their values in frequency
as the first part of the result.
Append to the result the elements of nonOverlapping
, also sorted by their values in frequency
.
(If you did intend for elements of t
to be eliminated, I'd do that as a post-processing step in 4.)