tags:

views:

364

answers:

5

I have some configuration data that I'd like to model in code as so:

Key1,  Key2,  Key3,  Value
null,  null,  null,  1
1,     null,  null,  2
9,     null,  null,  21
1,     null,  3,     3
null,  2,     3,     4
1,     2,     3,     5

With this configuration set, I then need to do lookups on bazillion (give or take) {Key1, Key2, Key3} tuples to get the "effective" value. The effective value to use is based on a Key/Priority sum whereby in this example:

Key1 - Priority 10
Key2 - Priority 7
Key3 - Priority 5

So a specifc query which has a config entry on Key1=null, Key2=match, and Key3=match beats out one that has Key1=match, Key2=null, and Key3=null since Key2+Key3 priority > Key1 priority... That make sense?!

given a key of {1, 2, 3} the value should be 5.
given a key of {3, 2, 3} the value should be 4.
given a key of {8, 10, 11} the value should be 1.
given a key of {1, 10, 11} the value should be 2.
given a key of {9, 2, 3} the value should be 4.
given a key of {8, 2, 3} the value should be 4.
given a key of {9, 3, 3} the value should be 21.

Is there an easy way to model this data structure and lookup algorithm that is generic in that the # and types of keys is variable, and the "truth table" (the ordering of the lookups) can be defined dynamically? The types being generics instead of ints would be perfect (floats, doubles, ushorts, etc), and making it easy to expand to n number of keys is important too!

Estimated "config" table size: 1,000 rows, Estimated "data" being looked up: 1e14

That gives an idea about the type of performance that would be expected.

I'm looking for ideas in C# or something that can translate to C# easily.

+2  A: 

EDIT: This code is apparently not what's required, but I'm leaving it as it's interesting anyway. It basically treats Key1 as taking priority, then Key2, then Key3 etc. I don't really understand the intended priority system yes, but when I do I'll add an answer for that.

I would suggest a triple layer of Dictionaries - each layer has:

Dictionary<int, NextLevel> matches;
NextLevel nonMatch;

So at the first level you'd look up Key1 - if that matches, that gives you the next level of lookup. Otherwise, use the next level which corresponds with "non-match".

Does that make any sense? Here's some sample code (including the example you gave). I'm not entirely happy with the actual implementation, but the idea behind the data structure is sound, I think:

using System;
using System.Collections;
using System.Collections.Generic;

public class Test
{
    static void Main()
    {
        Config config = new Config
        {
            { null,  null,  null,  1 },
            { 1,     null,  null,  2 },
            { 1,     null,  3,     3 },
            { null,  2,     3,     4 },
            { 1,     2,     3,     5 }
        };

        Console.WriteLine(config[1, 2, 3]);
        Console.WriteLine(config[3, 2, 3]);
        Console.WriteLine(config[9, 10, 11]);
        Console.WriteLine(config[1, 10, 11]);
    }
}

// Only implement IEnumerable to allow the collection initializer
// Not really implemented yet - consider how you might want to implement :)
public class Config : IEnumerable
{
    // Aargh - death by generics :)
    private readonly DefaultingMap<int, 
                         DefaultingMap<int, DefaultingMap<int, int>>> map
        = new DefaultingMap<int, DefaultingMap<int, DefaultingMap<int, int>>>();

    public int this[int key1, int key2, int key3]
    {
        get
        {
            return map[key1][key2][key3];
        }
    }

    public void Add(int? key1, int? key2, int? key3, int value)
    {
        map.GetOrAddNew(key1).GetOrAddNew(key2)[key3] = value;
    }

    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }
}

internal class DefaultingMap<TKey, TValue>
    where TKey : struct 
    where TValue : new()
{
    private readonly Dictionary<TKey, TValue> mapped = new Dictionary<TKey, TValue>();
    private TValue unmapped = new TValue();

    public TValue GetOrAddNew(TKey? key)
    {
        if (key == null)
        {
            return unmapped;
        }
        TValue ret;
        if (mapped.TryGetValue(key.Value, out ret))
        {
            return ret;
        }
        ret = new TValue();
        mapped[key.Value] = ret;
        return ret;
    }

    public TValue this[TKey key]
    {
        get
        {
            TValue ret;
            if (mapped.TryGetValue(key, out ret))
            {
                return ret;
            }
            return unmapped;
        }
    }

    public TValue this[TKey? key]
    {
        set
        {
            if (key != null)
            {
                mapped[key.Value] = value;
            }
            else
            {
                unmapped = value;
            }
        }
    }
}
Jon Skeet
If I'm understanding correctly, make sure that you use some sort of 'null object' for the key value, or int?. I'm not sure you can use a dictionary where the key value can be a 'real' null, which seems to be called for. i.e., if I'm reading right, then {9,2,3} should have a '4'.
GWLlosa
Yeah, I think this falls short due to the non heirarchy of it - but perhaps more code could help?
TheSoftwareJedi
Yes it does. Did you try it? I did: Console.WriteLine(config[9, 2, 3]); // Prints 4
Jon Skeet
@Jon Sorry, I edited the question to show how Key1 null, Key2/Key3 non-null overrides Key1 non-null, Key1/2 null.
TheSoftwareJedi
@Jon So 9, 2, 3 in this solution mathes on the 9 in Key1, but could also match on the 2, 3 - which is considered "more specific" based on the truth table I'm providing.
TheSoftwareJedi
Jon you are awesome. I tried doing this, and I failed. I like this solution better than mine. +1.
David Basarab
@TheSoftwareJedi: Ah. Okay. In that case this approach certainly won't work. I'm not sure I understand your priority model then. Are you saying that Key3 takes priority? Or is it based on how many matches there are? If it's just a case of Key3 taking priority over Key2 etc, then it's easy.
Jon Skeet
Key/Priority pairs that would work for this: So Key1=10, Key2=7, Key3=5. So to determine the most specific, you add up priorities of the non-null columns. This is how I did it in Oracle.
TheSoftwareJedi
Okay, that's easy enough to do. I'll do that in one of the other answers. ("Reverse order of the matrix created" is a bit ambiguous to me - I still don't really understand what that means.)
Jon Skeet
Cool - thanks Jon! Just ignore the "reverse order" thing - I'll edit the question as the Key priority (assuming no combinations of adding are equal) works fine.
TheSoftwareJedi
@Jon question edited.
TheSoftwareJedi
I've just finished implementing the version with the priorities (10, 7, 5).
Jon Skeet
A: 

I've done something similar with a tree. My solution does assume priority on the concrete key segments; ie, (9,2,3) matches (9,x,x) instead of (x,2,3).

The implementation of mine, given newer data, returns 21 when (9,2,3) is sent because it finds it matches the (9,any,any) case first. Without c in my code, (9,2,3) returns 4 because (-1, 2, 3) is the best match.

Implementation:

class Program
{
    static void Main(string[] args)
    {
        DecisionTree<Leaf, int> tree = new DecisionTree<Leaf, int>();

        Leaf a, b, c, d, e, f;

        //-1 == default/any/wildcard

        a = new Leaf(new int[3] { -1, -1, -1 }, 1);
        b = new Leaf(new int[3] { 1, -1, -1 }, 2);
        c = new Leaf(new int[3] { 9, -1, -1 }, 21);
        d = new Leaf(new int[3] { 1, -1, 3 }, 3);
        e = new Leaf(new int[3] { -1, 2, 3 }, 4);
        f = new Leaf(new int[3] { 1, 2, 3 }, 5);

        tree.AddProperty(a);
        tree.AddProperty(b);
        tree.AddProperty(c);
        tree.AddProperty(d);
        tree.AddProperty(e);
        tree.AddProperty(f);

        System.Diagnostics.Debug.WriteLine(tree.GetProperty(new int[3] { 9, 2, 3 }));
    }
}

public class Leaf : IDecisionLeaf<int>
{
    public Leaf(int[] key, int res)
    {
        this.key = key;
        this.result = res;
    }

    int result;

    public int Result
    {
        get { return result; }
        set { result = value; }
    }

    public override string ToString()
    {
        return "Leaf = " + result.ToString();
    }

    #region IDecisionLeaf<int> Members

    int[] key;

    public int[] Key
    {
        get { return key; }
    }

    #endregion
}

Base code:

public class DecisionTree<T, K> where T : IDecisionLeaf<K> where K : IComparable
{
    Node tree = new BranchNode();
    K defaultValue;

    public void AddProperty(T t)
    {            
        tree.AddChild(t, t.Key);
    }

    public T GetProperty(params K[] ids)
    {
        List<K> key = new List<K>(ids.Length);

        foreach (K id in ids)
        {
            key.Add(id);
        }

        return tree.GetChild(key.ToArray());
    }

    public K DefaultValue
    {
        get { return defaultValue; }
        set { defaultValue = value; }
    }

    #region Node Classes

    abstract class Node
    {
        K key;

        public K Key
        {
            get { return key; }
            set { key = value; }
        }

        public virtual T GetChild(K[] keys)
        {
            throw new InvalidOperationException("GetChild needs to be overridden.");
        }

        public virtual void AddChild(T property, K[] keys)
        {
            throw new InvalidOperationException("AddChild needs to be overridden.");
        }

        public K DefaultKey
        {
            get
            {
                IComparable def = null;

                if (key is int)
                {
                    def = -1;
                }
                else if (key is string)
                {
                    def = "";
                }

                if (def == null)
                {
                    IDecisionKey k = key as IDecisionKey;

                    if (k != null)
                    {
                        def = k.DefaultKey;
                    }
                }

                return (K)def;
            }
        }
    }

    class LeafNode : Node
    {
        T property;

        public T Property
        {
            get { return property; }
            set { property = value; }
        }

        public override T GetChild(K[] keys)
        {
            return property;
        }

        public override void AddChild(T property, K[] keys)
        {
            this.Property = property;
        }
    }

    class BranchNode : Node
    {
        Dictionary<K, Node> nodes = new Dictionary<K, Node>();

        public override T GetChild(K[] keys)
        {
            T child = default(T); 

            if (keys.Length == 0)
                return default(T);                

            List<K> keyList = new List<K>(keys);
            K key = keyList[0];

            keyList.RemoveAt(0);

            if (nodes.ContainsKey(key))
            {
                child = nodes[key].GetChild(keyList.ToArray());
            }
            else if (nodes.ContainsKey(this.DefaultKey))
            {
                child = nodes[this.DefaultKey].GetChild(keyList.ToArray());
            }

            /*
             * Though this seems to be a redundant check of the "else if" above, it
             * really isn't due to recursion.  This will be hit when the if's above 
             * are both false or when either are true and the call to GetChild returns
             * null.
             * 
             * The idea here is that the recursive call resulted in no children, so 
             * we will use this node's default case instead.
             */
            if (child == null && nodes.ContainsKey(this.DefaultKey))
            {
                child = nodes[this.DefaultKey].GetChild(keyList.ToArray());
            }

            return child;
        }

        public override void AddChild(T property, K[] keys)
        {
            Node n;
            List<K> k = new List<K>(keys);
            K firstKey = k[0];
            k.RemoveAt(0);

            if (k.Count == 0)
            {
                n = new LeafNode();
            }
            else
            {
                if (nodes.ContainsKey(firstKey))
                {
                    n = nodes[firstKey];
                }
                else
                {
                    n = new BranchNode();
                }
            }

            n.AddChild(property, k.ToArray());

            if (!nodes.ContainsKey(firstKey))
            {
                nodes.Add(firstKey, n);
            }
        }
    }

    #endregion
}

public interface IDecisionLeaf<T> where T: IComparable
{
    T[] Key { get;}
}

public interface IDecisionKey
{
    IComparable DefaultKey { get;}
}
Austin Salonen
Yeah, but like Jon's answer, that's ordered priority based on Key1 always winning. In this case, Key2/Key3 specific beat out Key1 specific. Your tree wouldn't find that case for 9, 2, 3 above.
TheSoftwareJedi
We did see issues with the key priority and this implementation that we solved by added more data; ie, we would add a (9,2,3)=4 node to cover that case.
Austin Salonen
+2  A: 

To answer your question about something which is generic in the number and type of keys - you can't make the number and type of keys dynamic and use generics - generics are all about providing compile-time information. Of course, you could use ignore static typing and make it dynamic - let me know if you want me to implement that instead.

How many entries will there be, and how often do you need to look them up? You may well be best just keeping all the entries as a list and iterating through them giving a certain "score" to each match (and keeping the best match and its score as you go). Here's an implementation, including your test data - but this uses the keys having priorities (and then summing the matches), as per a previous comment...

using System;
using System.Collections;
using System.Collections.Generic;

public class Test
{
    static void Main()
    {
        Config config = new Config(10, 7, 5)
        {
            { new int?[]{null,  null,  null},  1},
            { new int?[]{1,     null,  null},  2},
            { new int?[]{9,     null,  null},  21},
            { new int?[]{1,     null,  3},     3 },
            { new int?[]{null,  2,     3},     4 },
            { new int?[]{1,     2,     3},     5 }
        };

        Console.WriteLine(config[1, 2, 3]);
        Console.WriteLine(config[3, 2, 3]);
        Console.WriteLine(config[8, 10, 11]);
        Console.WriteLine(config[1, 10, 11]);
        Console.WriteLine(config[9, 2, 3]);
        Console.WriteLine(config[9, 3, 3]);
    }
}

public class Config : IEnumerable
{
    private readonly int[] priorities;
    private readonly List<KeyValuePair<int?[],int>> entries = 
        new List<KeyValuePair<int?[], int>>();

    public Config(params int[] priorities)
    {
        // In production code, copy the array to prevent tampering
        this.priorities = priorities;
    }

    public int this[params int[] keys]
    {
        get
        {
            if (keys.Length != priorities.Length)
            {
                throw new ArgumentException("Invalid entry - wrong number of keys");
            }
            int bestValue = 0;
            int bestScore = -1;
            foreach (KeyValuePair<int?[], int> pair in entries)
            {
                int?[] key = pair.Key;
                int score = 0;
                for (int i=0; i < priorities.Length; i++)
                {
                    if (key[i]==null)
                    {
                        continue;
                    }
                    if (key[i].Value == keys[i])
                    {
                        score += priorities[i];
                    }
                    else
                    {
                        score = -1;
                        break;
                    }
                }
                if (score > bestScore)
                {
                    bestScore = score;
                    bestValue = pair.Value;
                }
            }
            return bestValue;
        }
    }

    public void Add(int?[] keys, int value)
    {
        if (keys.Length != priorities.Length)
        {
            throw new ArgumentException("Invalid entry - wrong number of keys");
        }
        // Again, copy the array in production code
        entries.Add(new KeyValuePair<int?[],int>(keys, value));
    }

    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }
}

The above allows a variable number of keys, but only ints (or null). To be honest the API is easier to use if you fix the number of keys...

Jon Skeet
We can fix the number of keys, and I can reimplement each time. Usually I have between 3 and 6 keys.I implemented this same sort of thing with the scoring system, but it looks up into every config for every row so the performance is O(n) which really blows when the configuration table gets big.
TheSoftwareJedi
Do you look up the configuration particularly often? And with different keys? If you look it up frequently, but with a small range of key combinations, you could always cache the results of previous searches in a Dictionary.
Jon Skeet
I can do you a fixed-at-3-keys caching implementation in about an hour, if you're interested. (It's dinner-and-putting-kids-to-bed time right now.)
Jon Skeet
(I've also just thought of an alternative way of doing it using dictionaries which could be interesting... but tricky.)
Jon Skeet
@Jon - straight caching won't work. I'm talking bazillions of values. Imagine 4 keys with 10,000 unique values in each (1e16 rows). Duplicate lookups are quite rare anyhow, so that wouldn't affect.
TheSoftwareJedi
Okay, I'll have a think. It would have been nice to know all this information to start with :) (You'll have a hard time fitting that many keys into memory, mind you.) I have to say that it doesn't sound much like what I understand by "configuration values".
Jon Skeet
That is, 1e16 rows that I need to lookup - but rarely over a couple hundred configuration defaulting rows.
TheSoftwareJedi
I'm thinking a dictionary of distinct key values for each key, then upon a lookup determining the most specific combination by looking up each key in it's respective distinct list, computing the best possible match, doing some lookups.... something like that...
TheSoftwareJedi
Can be made constant by splitting every Column combination into it's own dictionary, then looking up the key into each. In this example would be O(6). At least that sclaes linearly. Combine with something like memcached could probably be pretty fast...
TheSoftwareJedi
If you only have 100 actual entries, then O(n) is actually pretty fast - especially because the constant and the linear factors are very small. I do have another idea though...
Jon Skeet
But by testing the 6 combinations of possible matches in 6 distinct dictionaries storing each distinct priority match from the config, we can make it worst case O(6), best case O(1). Note, I already implemented the O(n) solution - it's not near fast enough
TheSoftwareJedi
Yes - I'm about to add an answer with a multiple-dictionary solution too. Btw, the list solution can be optimised somewhat by ordering the entries in descending score order - then you just need to look through them, and the first one that matches is what you return.
Jon Skeet
+1  A: 

Yet another solution - imagine that the entries are a bit pattern of null/non-null. You have one dictionary per bit pattern (i.e. { 1, null, null } and { 9, null, null } go in the same dictionary, but { 1, 2, 3 } goes in a different one. Each dictionary effectively has a score as well - the sum of the priorities for the non-null parts of the key. You wil end up with 2^n dictionaries, where n is the number of elements in the key.

You order the dictionaries in reverse score order, and then just look up the given key in each of them. Each dictionary needs to ignore the values in the key which aren't in its bit pattern, which is done easily enough with a custom IComparer<int[]>.

Okay, here's the implementation:

------------ Test.cs -----------------
using System;

sealed class Test
{
    static void Main()
    {
        Config config = new Config(10, 7, 5)
        {
            { null, null, null, 1 },
            {null,  null,  null,  1},
            {1,     null,  null,  2},
            {9,     null,  null,  21},
            {1,     null,  3,     3 },
            {null,  2,     3,     4 },
            {1,     2,     3,     5 }
        };

        Console.WriteLine(config[1, 2, 3]);
        Console.WriteLine(config[3, 2, 3]);
        Console.WriteLine(config[8, 10, 11]);
        Console.WriteLine(config[1, 10, 11]);
        Console.WriteLine(config[9, 2, 3]);
        Console.WriteLine(config[9, 3, 3]);
    }
}

--------------- Config.cs -------------------
using System;
using System.Collections;

sealed class Config : IEnumerable
{
    private readonly PartialMatchDictionary<int, int> dictionary;

    public Config(int priority1, int priority2, int priority3)
    {
        dictionary = new PartialMatchDictionary<int, int>(priority1, priority2, priority3);
    }

    public void Add(int? key1, int? key2, int? key3, int value)
    {
        dictionary[new[] { key1, key2, key3 }] = value;
    }

    public int this[int key1, int key2, int key3]
    {
        get
        {
            return dictionary[new[] { key1, key2, key3 }];
        }
    }

    // Just a fake implementation to allow the collection initializer
    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }
}

-------------- PartialMatchDictionary.cs -------------------
using System;
using System.Collections.Generic;
using System.Linq;

public sealed class PartialMatchDictionary<TKey, TValue> where TKey : struct
{
    private readonly List<Dictionary<TKey[], TValue>> dictionaries;
    private readonly int keyComponentCount;

    public PartialMatchDictionary(params int[] priorities)
    {
        keyComponentCount = priorities.Length;
        dictionaries = new List<Dictionary<TKey[], TValue>>(1 << keyComponentCount);
        for (int i = 0; i < 1 << keyComponentCount; i++)
        {
            PartialComparer comparer = new PartialComparer(keyComponentCount, i);
            dictionaries.Add(new Dictionary<TKey[], TValue>(comparer));
        }
        dictionaries = dictionaries.OrderByDescending(dict => ((PartialComparer)dict.Comparer).Score(priorities))
                                   .ToList();
    }

    public TValue this[TKey[] key]
    {
        get
        {
            if (key.Length != keyComponentCount)
            {
                throw new ArgumentException("Invalid key component count");
            }
            foreach (Dictionary<TKey[], TValue> dictionary in dictionaries)
            {
                TValue value;
                if (dictionary.TryGetValue(key, out value))
                {
                    return value;
                }
            }
            throw new KeyNotFoundException("No match for this key");
        }
    }

    public TValue this[TKey?[] key]
    {
        set
        {
            if (key.Length != keyComponentCount)
            {
                throw new ArgumentException("Invalid key component count");
            }
            // This could be optimised (a dictionary of dictionaries), but there
            // won't be many additions to the dictionary compared with accesses
            foreach (Dictionary<TKey[], TValue> dictionary in dictionaries)
            {
                PartialComparer comparer = (PartialComparer)dictionary.Comparer;
                if (comparer.IsValidForPartialKey(key))
                {
                    TKey[] maskedKey = key.Select(x => x ?? default(TKey)).ToArray();
                    dictionary[maskedKey] = value;
                    return;
                }
            }
            throw new InvalidOperationException("We should never get here");
        }
    }

    private sealed class PartialComparer : IEqualityComparer<TKey[]>
    {
        private readonly int keyComponentCount;
        private readonly bool[] usedKeyComponents;
        private static readonly EqualityComparer<TKey> Comparer = EqualityComparer<TKey>.Default;

        internal PartialComparer(int keyComponentCount, int usedComponentBits)
        {
            this.keyComponentCount = keyComponentCount;
            usedKeyComponents = new bool[keyComponentCount];
            for (int i = 0; i < keyComponentCount; i++)
            {
                usedKeyComponents[i] = ((usedComponentBits & (1 << i)) != 0);
            }
        }

        internal int Score(int[] priorities)
        {
            return priorities.Where((value, index) => usedKeyComponents[index]).Sum();
        }

        internal bool IsValidForPartialKey(TKey?[] key)
        {
            for (int i = 0; i < keyComponentCount; i++)
            {
                if ((key[i] != null) != usedKeyComponents[i])
                {
                    return false;
                }
            }
            return true;
        }

        public bool Equals(TKey[] x, TKey[] y)
        {
            for (int i = 0; i < keyComponentCount; i++)
            {
                if (!usedKeyComponents[i])
                {
                    continue;
                }
                if (!Comparer.Equals(x[i], y[i]))
                {
                    return false;
                }
            }
            return true;
        }

        public int GetHashCode(TKey[] obj)
        {
            int hash = 23;
            for (int i = 0; i < keyComponentCount; i++)
            {
                if (!usedKeyComponents[i])
                {
                    continue;
                }
                hash = hash * 37 + Comparer.GetHashCode(obj[i]);
            }
            return hash;
        }
    }
}

It gives the right results for the samples that you gave. I don't know what the performance is like - it should be O(1), but it could probably be optimised a bit further.

Jon Skeet
I doubt this is anywhere near O(1) (I think you intend O(nk)). In degenerate cases, it could be O(nkr), where n = values, k = number of keys, and r = number of rules. To see this, imagine that it matches against the last sorted rule (10, 10, 10). It would have to compare against every rule.
FryGuy
I was only taking the number of rules into account as a factor. For a given number of keys, there will be a fixed number of dictionaries. A lookup in a dictionary is O(1), so overall it should be O(1). It doesn't need to compare against each rule - just each potential combination of null/non-null.
Jon Skeet
Feel free to try it though - it's easy enough to generate 10, 100, 1000, 10000 random rules and benchmark each.
Jon Skeet
(If we were to consider keys as a variable (e.g. 3 in this example), it would be O(2^k), as that's the number of dictionaries we have to look through each time.)
Jon Skeet
Oh, I misunderstood how your solution worked. I thought it ... well I'm not sure what I thought. It's a clever solution, but O(n*2^k + r*2^k*lg(r)) might be beaten by the O(n*k + r*k^2) tree method I proposed below.
FryGuy
Where do you get the O(n*2^k) bit for? My O(1) was for *lookup* only, and it should be constant (in n) at that point.
Jon Skeet
Well, I'm giving for n lookups. The question was for doing n lookups (on the order of millions) against r rules (on the order of thousands), with k keys (3-10)
FryGuy
No, it's not for n lookups - it's the cost of a lookup against the number of rules. That was the O(n) that TheSoftwareJedi was worried about in a previous solution. My solution is O(1) for a given number of keys. I haven't looked up what yours is yet.
Jon Skeet
Just looked a little bit at your code, which also appears to be O(1). So both should be okay from a lookup-time perspective. What's yours like in terms of space? I can't say I've paid enough attention to it to understand the exact details yet.
Jon Skeet
Just noticed another downvote for this. Anyone care to comment why, given that it: a) works, b) does it in constant time as requested? Sorry if I sound annoyed, but I've put a lot of time into this question... I'm not that bothered by rep in general, but this just feels wrong.
Jon Skeet
I think worse case memory for mine is O(r^k) but normally more in the lines of O(r*k), whereas yours is O(r + 2^k). I haven't downvoted any of yours besides the currently top answer that seems to be incorrect. I've spent a lot of time on mine too.. it's an interesting problem.
FryGuy
Yes, it's certainly been interesting - no regrets there :) Later on I must try to actually understand your solution. Last night I didn't have time to do anything more than sort out my own!
Jon Skeet
+1  A: 

I'm assuming that there are few rules, and a large number of items that you're going to check against the rules. In this case, it might be worth the expense of memory and up-front time to pre-compute a structure that would help you find the object faster.

The basic idea for this structure would be a tree such that at depth i, you would follow the ith element of the rule, or the null branch if it's not found in the dictionary.

To build the tree, I would build it recursively. Start with the root node containing all possible rules in its pool. The process:

  • Define the current value of each rule in the pool as the score of the current rule given the path taken to get to the node, or -infinity if it is impossible to take the path. For example, if the current node is at the '1' branch of the root, then the rule {null, null, null, 1} would have a score of 0, and the rule {1, null, null, 2} would have a score 10
  • Define the maximal value of each rule in the pool as its current score, plus the remaining keys' score. For example, if the current node is at the '1' branch of the root, then the rule {null, 1, 2, 1} would have a score of 12 (0 + 7 + 5), and the rule {1, null, null, 2} would have a score 10 (10 + 0 + 0).
  • Remove the elements from the pool that have a maximal value lower than the highest current value in the pool
  • If there is only one rule, then make a leaf with the rule.
  • If there are multiple rules left in the pool, and there are no more keys then ??? (this isn't clear from the problem description. I'm assuming taking the highest one)
  • For each unique value of the (i+1)th key in the current pool, and null, construct a new tree from the current node using the current pool.

As a final optimization check, I would check if all children of a node are leaves, and if they all contain the same rule, then make the node a leaf with that value.

given the following rules:

null,  null,  null = 1
1,     null,  null = 2
9,     null,  null = 21
1,     null,  3    = 3
null,  2,     3    = 4
1,     2,     3    = 5

an example tree:

       key1   key2   key3
root:
 |----- 1
 |      |----- 2          = 5
 |      |-----null
 |             |----- 3   = 3
 |             |-----null = 2
 |----- 9
 |      |----- 2
 |      |      |----- 3   = 4
 |      |      |-----null = 21
 |      |-----null        = 21
 |-----null
        |----- 2          = 4
        |-----null        = 1

If you build the tree up in this fashion, starting from the highest value key first, then you can possibly prune out a lot of checks against later keys.

Edit to add code:

class Program
{
    static void Main(string[] args)
    {
        Config config = new Config(10, 7, 5)
        {
            { new int?[]{null,  null,  null},  1},
            { new int?[]{1,     null,  null},  2},
            { new int?[]{9,     null,  null},  21},
            { new int?[]{1,     null,  3},     3 },
            { new int?[]{null,  2,     3},     4 },
            { new int?[]{1,     2,     3},     5 }
        };

        Console.WriteLine("5 == {0}", config[1, 2, 3]);
        Console.WriteLine("4 == {0}", config[3, 2, 3]);
        Console.WriteLine("1 == {0}", config[8, 10, 11]);
        Console.WriteLine("2 == {0}", config[1, 10, 11]);
        Console.WriteLine("4 == {0}", config[9, 2, 3]);
        Console.WriteLine("21 == {0}", config[9, 3, 3]);            
        Console.ReadKey();
    }
}


public class Config : IEnumerable
{
    private readonly int[] priorities;
    private readonly List<KeyValuePair<int?[], int>> rules =
        new List<KeyValuePair<int?[], int>>();
    private DefaultMapNode rootNode = null;

    public Config(params int[] priorities)
    {
        // In production code, copy the array to prevent tampering
        this.priorities = priorities;
    }

    public int this[params int[] keys]
    {
        get
        {
            if (keys.Length != priorities.Length)
            {
                throw new ArgumentException("Invalid entry - wrong number of keys");
            }

            if (rootNode == null)
            {
                rootNode = BuildTree();
                //rootNode.PrintTree(0);
            }

            DefaultMapNode curNode = rootNode;
            for (int i = 0; i < keys.Length; i++)
            {
                // if we're at a leaf, then we're done
                if (curNode.value != null)
                    return (int)curNode.value;

                if (curNode.children.ContainsKey(keys[i]))
                    curNode = curNode.children[keys[i]];
                else
                    curNode = curNode.defaultChild;
            }

            return (int)curNode.value;
        }
    }

    private DefaultMapNode BuildTree()
    {
        return new DefaultMapNode(new int?[]{}, rules, priorities);
    }

    public void Add(int?[] keys, int value)
    {
        if (keys.Length != priorities.Length)
        {
            throw new ArgumentException("Invalid entry - wrong number of keys");
        }
        // Again, copy the array in production code
        rules.Add(new KeyValuePair<int?[], int>(keys, value));

        // reset the tree to know to regenerate it.
        rootNode = null;
    }

    public IEnumerator GetEnumerator()
    {
        throw new NotSupportedException();
    }

}


public class DefaultMapNode
{
    public Dictionary<int, DefaultMapNode> children = new Dictionary<int,DefaultMapNode>();
    public DefaultMapNode defaultChild = null; // done this way to workaround dict not handling null
    public int? value = null;

    public DefaultMapNode(IList<int?> usedValues, IEnumerable<KeyValuePair<int?[], int>> pool, int[] priorities)
    {
        int bestScore = Int32.MinValue;

        // get best current score
        foreach (KeyValuePair<int?[], int> rule in pool)
        {
            int currentScore = GetCurrentScore(usedValues, priorities, rule);
            bestScore = Math.Max(bestScore, currentScore);
        }

        // get pruned pool
        List<KeyValuePair<int?[], int>> prunedPool = new List<KeyValuePair<int?[], int>>();
        foreach (KeyValuePair<int?[], int> rule in pool)
        {
            int maxScore = GetCurrentScore(usedValues, priorities, rule);
            if (maxScore == Int32.MinValue)
                continue;

            for (int i = usedValues.Count; i < rule.Key.Length; i++)
                if (rule.Key[i] != null)
                    maxScore += priorities[i];

            if (maxScore >= bestScore)
                prunedPool.Add(rule);
        }

        // base optimization case, return leaf node
        // base case, always return same answer
        if ((prunedPool.Count == 1) || (usedValues.Count == prunedPool[0].Key.Length))
        {
            value = prunedPool[0].Value;
            return;
        }

        // add null base case
        AddChild(usedValues, priorities, prunedPool, null);
        foreach (KeyValuePair<int?[], int> rule in pool)
        {
            int? branch = rule.Key[usedValues.Count];
            if (branch != null && !children.ContainsKey((int)branch))
            {
                AddChild(usedValues, priorities, prunedPool, branch);
            }
        }


        // if all children are the same, then make a leaf
        int? maybeOnlyValue = null;
        foreach (int v in GetAllValues())
        {
            if (maybeOnlyValue != null && v != maybeOnlyValue)
                return;
            maybeOnlyValue = v;
        }
        if (maybeOnlyValue != null)
            value = maybeOnlyValue;

    }

    private static int GetCurrentScore(IList<int?> usedValues, int[] priorities, KeyValuePair<int?[], int> rule)
    {
        int currentScore = 0;
        for (int i = 0; i < usedValues.Count; i++)
        {
            if (rule.Key[i] != null)
            {
                if (rule.Key[i] == usedValues[i])
                    currentScore += priorities[i];
                else
                    return Int32.MinValue;
            }
        }
        return currentScore;
    }

    private void AddChild(IList<int?> usedValues, int[] priorities, List<KeyValuePair<int?[], int>> prunedPool, Nullable<int> nextValue)
    {
        List<int?> chainedValues = new List<int?>();
        chainedValues.AddRange(usedValues);
        chainedValues.Add(nextValue);            
        DefaultMapNode node = new DefaultMapNode(chainedValues, prunedPool, priorities);
        if (nextValue == null)
            defaultChild = node;
        else
            children[(int)nextValue] = node;
    }

    public IEnumerable<int> GetAllValues()
    {
        foreach (DefaultMapNode child in children.Values)
            foreach (int v in child.GetAllValues())
                yield return v;
        if (defaultChild != null)
            foreach (int v in defaultChild.GetAllValues())
                yield return v;
        if (value != null)
            yield return (int)value;
    }

    public void PrintTree(int depth)
    {
        if (value == null)
            Console.WriteLine();
        else
        {
            Console.WriteLine(" = {0}", (int)value);
            return;
        }

        foreach (KeyValuePair<int, DefaultMapNode> child in children)
        {
            for (int i=0; i<depth; i++)
                Console.Write("    ");
            Console.Write(" {0}  ", child.Key);                
            child.Value.PrintTree(depth + 1);
        }
        for (int i = 0; i < depth; i++)
            Console.Write("    ");
        Console.Write("null");
        defaultChild.PrintTree(depth + 1);
    }
}
FryGuy
I think this is a great solution. It's fairly complex though - I'll have to tinker with it later. Would love it to take TKey1, TKey2, TKey3, TValue instead to make it generic. Bonus points for that deliverable. :)
TheSoftwareJedi