This is a problem I've just run into, or rather its a simplification that captures the core problem.
Imagine I have a spreadsheet containing a number of columns, each of them labeled, and a number of rows.
I want to determine when the value in one column can be inferred from the value in another. For example, we might find that every time a '1' appears in column a, a '5' always appears in column d, but whenever a '2' appears in column a, a 3 always appears in column d. We observe that the value in column a reliably predicts the value in column c.
The goal is to identify all such relationships between columns.
The naive solution is to start with a list of all pairs of columns, (a, b), (a, c), (a, d)... (b, c), (b, d)... and so on. We call these the "eligible" list.
For each of these pairs, we keep track of the value of the first in the pair, and the corresponding value in the second. If we notice that we see the same value for the first of a pair, but a different value for the second of a pair, then this pair is no-longer eligible.
Whatever is left at the end of this process is the set of valid relationships.
Unfortunately, this rapidly becomes impractical as the number of columns increases, as the amount of data we must store is in the order of the number of columns squared.
Can anyone think of an efficient way to do this?