views:

47

answers:

3

I have a SQL table with three columns X, Y, Z. I need to split it in groups in such a way that all records with same value of X or Y or Z are assigned to the same group. I need to make sure that the records with same value X or Y or Z are never split across multiple groups.

If you think of records as nodes and values of X, Y, Z as edges, this problem is the same as finding all graphs where the nodes in each graph will be connected directly or indirectly via X, Y, or Z-edge, but each graph will have no edges in common with other graphs (otherwise it would be part of the same graph).

A few years ago I knew what this was called and even remembered the algorithm but now it escapes me. Please tell me how this problem is called so I can Google for solution. If you now a good algorithm -- please point me to it. If you have a SQL implementation -- I will marry you :)

Example:

    X                   Y               Z            BUCKET
---------     ----------------      ---------      -----------
   1                   34              56              1
   54                  43              45              2
   1                   12              22              1
   2                   34              11              1

The last row is in bucket 1 because of the value of Y=34 which is the same as of the first row, which is in bucket 1.

A: 

to find how many nodes in each group x:

select x, count(x) 
from mytable
group by x

or to find the list of sets x:

select distinct x from mytable;
Randy
All the values of X do not represent the complete group. The group also includes all the values of Y that match any of the values of Y in the records with the same value of X. And so on, recursively for all other values of X, Y and Z.
zvolkov
+2  A: 

It looks not like a graph, more like a simplicial complex. But if we treat this complex as its skeletal graph (the numbers are treated as vertices and a row in a table means that all that three vertices are connected by an edge), then we may just use any algorithm to find connected components of this graph. I'm not sure whether there is a feasible way to do this in SQL though, perhaps it would be more prudent to use a graph database somehow.

However, for this specific problem there may be some easy solution attainable by means of SQL which I didn't look for.

kohomologie
Connected component is the keyword! Thanks!
zvolkov
A: 

Why don't you initially GROUP BY one of the colums (say X), make buckets, then do so for Y and Z, each time merging all the buckets from the previous step if you find new groups.

Repeat the process for X, Y, and Z until the buckets stop changing.

Are you working for linked-in or facebook? :)

jdv