views:

91

answers:

2

I have a list of sets (a,b,c,d,e in below example). Each of the sets contains a list of nodes in that set (1-6 below). I was wondering that there probably is a general known algorithm for achieving the below, and I just do not know about it.

sets[
 a[1,2,5,6],
 b[1,4,5],
 c[1,2,5],
 d[2,5],
 e[1,6],
]

I would like to generate a new structure, a list of groups, with each group having

  • all the (sub)sets of nodes that appear in multiple sets
  • references to the original sets those nodes belong to

So the above data would become (order of groups irrelevant).

group1{nodes[2,5],sets[a,c,e]}
group2{nodes[1,2,5],sets[a,c]}
group3{nodes[1,6],sets[a,e]}
group4{nodes[1,5],sets[a,b,c]}

I am assuming I can get the data in as an array/object structure and manipulate that, and then spit the resulting structure out in whatever format needed.

It would be a plus if:

  • all groups had a minimum of 2 nodes and 2 sets.
  • when a subset of nodes is contained in a bigger set that forms a group, then only the bigger set gets a group: in this example, nodes 1,2 do not have a group of their own since all the sets they have in common already appear in group2.

(The sets are stored in XML, which I have also managed to convert to JSON so far, but this is irrelevant. I can understand procedural (pseudo)code but also something like a skeleton in XSLT or Scala could help to get started, I guess.)

A: 
Beta
Thank you! I had not noticed this until now. I sorta made a solution myself already but will go through this ASAP.
visu
This puzzles me. "If S cannot be a member of G but the intersection of S ang G's set contains more than one node, make a new group for that intersection and add it to the list."When S *can* be a member of G, doesn't the intersection contain a number of nodes that is the equivalent of the size of G - which is often more than one? So how does the intersection containing more than one node mean that S can not be a member of G?
visu
Ah, now I seem to get it. I thought "but" was saying that the part before it and after it are equivalent, whereas now I understand that they are not. For future reference, I suggest changing the sentence to"If S cannot be a member of G /and also/ the intersection of S ang G's set contains more than one node, make a new group for that intersection and add it to the list."
visu
It seems that when you give S a group of its own, you may be creating a duplicate, i.e. if there is already a group with the exact same nodes. If G has the same nodes as S, then we should not create a new group for S since S was already added to G in 1.1.
visu
Having a conversation with myself here, I am not sure if you have to do the combining of 1.3 anyway though, so my previous remark may be moot.
visu
Sorry, I've been away from the net. If you remove 1.3 you'll get groups with the same set. You can avoid this problem by adding a conditional to 1.2: "if you have not added S to any G". I figured 1.3 was less elegant but simpler.
Beta
A: 
/*
Pseudocode algorithm for creating groups data from a set dataset, further explained in the project documentation. This is based on 
http://stackoverflow.com/questions/1644387/create-groups-from-sets-of-nodes

I am assuming 
- Group is a structure (class) the objects of which contain two lists: a list of sets and a list of nodes (group.nodes). Its constructor accepts a list of nodes and a reference to a Set object
- Set is a list structure (class), the objects (set)  of which contain the nodes of the list in set.nodes
- groups and sets are both list structures that can contain arbitrary objects which can be iterated with foreach(). 
- you can get the objects two lists have in common as a new list with intersection()
- you can count the number of objects in a list with length()
*/

//Create groups, going through the original sets
foreach(sets as set){
    if(groups.nodes.length==0){
        groups.addGroup(new Group(set.nodes, set));
    }
    else{
        foreach (groups as group){
                if(group.nodes.length() == intersection(group.nodes,set.nodes).length()){
                    // the group is a subset of the set, so just add the set as a member the group
                    group.addset(set);
                    if (group.nodes.length() < set.nodes.length()){
                    // if the set has more nodes than the group that already exists, 
                    // create a new group for the nodes of the set, with set as a member of that group
                    groups.addGroup(new Group(set.nodes, set));
                    }
                }

                // If group is not a subset of set, and the intersection of the nodes of the group 
                // and the nodes of the set
                // is greater than one (they have more than one person in common), create a new group with 
                // those nodes they have in common, with set as a member of that group
                else if(group.nodes.length() > intersection(group.nodes,set.nodes).length() 
                    && intersection(group.nodes,set.nodes).length()>1){
                    groups.addGroup(new Group(intersection(group.nodes,set.nodes), set);
                }
        }
    }

}

// Cleanup time!
foreach(groups as group){
    //delete any group with only one member set (for it is not really a group then)
    if (group.sets.length<2){
        groups.remove(group);
    }
    // combine any groups that have the same set of nodes. Is this really needed? 
    foreach(groups2 as group2){
        //if the size of the intersection of the groups is the same size as either of the 
        //groups, then the groups have the same nodes.
        if (intersection(group.nodes,group2.nodes).length == group.nodes.length){
            foreach(group2.sets as set2){
                if(!group.hasset(set)){
                    group.addset(set2);
                }
            }
            groups.remove(group2);
        }
        }

}
visu