views:

118

answers:

2

I have a problem where I have to find multiple combinations of subsets within nested hashsets. Basically I have a "master" nested HashSet, and from a collection of "possible" nested HashSets I have to programmatically find the "possibles" that could be simultaneous subsets of the "master".

Lets say I have the following:

            var master = new HashSet<HashSet<string>>(new HashSet<string>[] {
                        new HashSet<string>( new string[] { "A", "B", "C"}),
                        new HashSet<string>( new string[] { "D", "E"}),
                        new HashSet<string>( new string[] { "F"})
                    }
                ); 
            var possible1  = new HashSet<HashSet<string>>(new HashSet<string>[] {
                        new HashSet<string>( new string[] { "A", "B", "C"}),
                        new HashSet<string>( new string[] { "F"})
                    }
                 );
            var possible2 = new HashSet<HashSet<string>>(new HashSet<string>[] {
                        new HashSet<string>( new string[] { "D", "E"})
                    }
                 ); 
            var possible3 = new HashSet<HashSet<string>>(new HashSet<string>[] {
                        new HashSet<string>( new string[] { "F"})
                    }
                 ); 
            var possible4 = new HashSet<HashSet<string>>(new HashSet<string>[] {
                        new HashSet<string>( new string[] { "X", "Y", "Z"})
                    }
                ); 
            var possible5 = new HashSet<HashSet<string>>(new HashSet<string>[] {
                        new HashSet<string>( new string[] { "A", "B" }),
                        new HashSet<string>( new string[] { "D", "E"})
                    }
                ); 

The output I should get from my algorithm should be as follows:

All possible combination subsets:

possible1 and possible2
possible3 and possible5
possible2 and possible3
possible1
possible2
possible3
possible5

I'm trying to figure out the best way to approach this. There is, of course, the brute force option, but I'm trying to avoid that if I can.

I just hope my question was clear enough.

EDIT

To further elaborate on what constitutes a subset, here are some examples, given the master {{"A","B","C"},{"C","D","E",F"},{"X","Y","Z"}} :

  • {{"A","B"}{"C","D"}} would be a subset of
  • {{"A","B","C"},{"X","Y"}} would be a subset
  • {{"A","B"},{"A","B"}} would NOT be a subset
  • {{"A","B","C","D"}} would NOT be a subset
  • {{"A","B","C"},{"C","D","X"}} would NOT be a subset

Basically each child set needs to be a subset of a corresponding child in the master.

A: 

Use the simplest solution possible.

Keep in mind that if someone else has to look at your code they should be able to understand what it's doing with as little effort as possible. I already found it hard to understand from your description what you want to do and I haven't had to read code yet.

If you find that it's too slow after it's working optimize it then.

If possible write unit tests. Unit tests will ensure that your optimized solution is also working correctly and will help others ensure their changes don't break anything.

Jay
+1  A: 

Use bruteforce:

public static int IsCsInMaster(HashSet<string> childSubset, List<HashSet<string>> master, int startIndex)
{
    for (int i = startIndex; i < master.Count; i++)
        if (childSubset.IsSubsetOf(master[i])) return i;

    return -1;
}

public static bool IsChildInMaster(List<HashSet<string>> child, List<HashSet<string>> master)
{
    foreach (var childSubset in child) if (IsCsInMaster(childSubset, master, 0) == -1) return false;

    return true;
}

public static bool IsChildInMasterMulti(List<HashSet<string>> child, List<HashSet<string>> master)
{
    Dictionary<int, int> subsetChecker = new Dictionary<int, int>();
    List<IEnumerable<int>> multiMatches = new List<IEnumerable<int>>();
    int subsetIndex;

    // Check for matching subsets.
    for (int i = 0; i < child.Count; i++)
    {
        subsetIndex = 0;
        List<int> indexes = new List<int>();

        while ((subsetIndex = IsCsInMaster(child[i], master, subsetIndex)) != -1)
        {
            indexes.Add(subsetIndex++);
        }

        if (indexes.Count == 1)
        {
            subsetIndex = indexes[0];
            if (subsetChecker.ContainsKey(subsetIndex)) return false;
            else subsetChecker[subsetIndex] = subsetIndex;
        }
        else
        {
            multiMatches.Add(indexes);
        }
    }

    /*** Check for multi-matching subsets. ***/ //got lazy ;)
    var union = multiMatches.Aggregate((aggr, indexes) => aggr.Union(indexes));

    // Filter the union so only unmatched subset indexes remain.
    List<int> filteredUion = new List<int>();
    foreach (int index in union)
    {
        if (!subsetChecker.ContainsKey(index)) filteredUion.Add(index);
    }

    return (filteredUion.Count >= multiMatches.Count);
}

And in code:

IsChildInMasterMulti(possible2, master)

The code does not handle the {{"A","B"},{"A","B"}} case, though. That is a LOT more difficult (flagging used subsets in master, maybe even individual elements - recursively).

Edit2: The third method handles the {{"A","B"},{"A","B"}} case as well (and more).

Jaroslav Jandek
yeah, unfortunately, the {{"A","B"},{"A","B"}} case is a requirement for me.
blesh
If you match `X:{A, B}` set with the `Y:{A, B, C}` set, should the algorithm flag the `Y` set as used (`Y:{}`) or a `Z:{C}` set can still match the `Y:{C}` set?Or alternative question: would `{{"A","B"},{"A","B"}}` match `{A,A,B,B}` (same question asken in two ways)?
Jaroslav Jandek
No, a set that is matched is all done. Its contents don't spill over into something for the next match. In the case of `{{"A","B"},{"A","B"}}` vs `{{"A","A","B","B"}}`, only one `{"A","B"}` child set would match up, leaving another `{"A","B"}` child remaining to be matched.
blesh
Well, all edited, now you only need to do the 2 simplest things, combine sets and use the methods check if they are subsets.
Jaroslav Jandek
In truth, your solution is very similar to the solution I'm already using. The main difference is I created a NestedHashSet<T> class that has IsSupersetOf and IsSubsetOf methods, set to return true or false at the earliest possible moment. It also has a UnionWith() method to allow joining of multiple NestedHashSets into one to be checked against the master.
blesh
It is the same algorithm then just wrapped in a class. Considering your algorithm is not a normal subset, it is the best approach, since you have to use a flagging algorithm. It is more like a matching algorithm, which are expensive. You can't design much more effective algorithm than the one provided (except the obvious minor stuff - like an `if` to check whether `indexes.Add` is necessary, etc.). You could compare performance between the algorithms using the `StopWatch` class. Btw. you should use `IsChildInMaster` first on all possibles and only then use `IsChildInMasterMulti`.
Jaroslav Jandek
`IsCsInMaster` is a `~O(m*n)` operation. `IsChildInMaster` is a `~O(m*n)` operation at best and `O(c*m*n)` operation at worst. `IsChildInMasterMulti` is a `~O(c*m*n+c)` + the aggregate part which could be `O(1)` to `O(c*m*n*k^2)` where k is the average possible.Count.
Jaroslav Jandek