Given a list of objects with multiple attributes I need to find the list of sets created by a union of all intersecting subsets.
Specifically these are Person objects, each with many attributes. I need to create a list of 'master' sets based on a handful of unique identifiers such as SSN, DLN, etc.
For instance, if Person A and Person B have the same SSN they create a set i. Then if Person B and C have the same DLN, they create a set ii. Person D and E have the same SSN but it (and all other identifiers) does not match any of the identifiers of Persons A, B or C. After merging all intersecting subsets I would end up with one set with Persons A,B,C and another set with Persons D, E.
Here is the psuedo-code for my solution. I am curious if anyone has already come up with a more efficient way of merging all possible intersecting sets. Keep in mind that the links between sets could be X Persons long (i.e. A matches B by SSN and B matches C by DLN and C matches D by SSN and D matches E by some other identifier would result in Persons A-E in one set). Also assume that the language this will be implemented in supports set operations.
bigSetList = array of all of the uniq Sets
fullyTested = false
while (bigSetList.size() > 1) or (fullyTested is false)
foreach thisSet in bigSetList order by size desc
if count(sets that intersect with thisSet) > 0
newThisSet = thisSet
intersectingSets = []
bigSetList.delete(thisSet)
foreach testSet in bigSetList
if thisSet.intersects(testSet)
newThisSet.addAll(testSet)
intersectingSets.push(testSetID)
end if
end
bigSetList.delete(intersectingSets)
bigSetList.push(newThisSet)
bigSetList.sort()
break
end if
end foreach
fullyTested = true // have looped through every set in the list and found 0 intersect partners
end