views:

109

answers:

2

In comments to "How to implement List, Set, and Map in null free design?", Steven Sudit and I got into a discussion about using a callback, with handlers for "found" and "not found" situations, vs. a tryGet() method, taking an out parameter and returning a boolean indicating whether the out parameter had been populated. Steven maintained that the callback approach was more complex and almost certain to be slower; I maintained that the complexity was no greater and the performance at worst the same.

But code speaks louder than words, so I thought I'd implement both and see what I got. The original question was fairly theoretical with regard to language ("And for argument sake, let's say this language don't even have null") -- I've used Java here because that's what I've got handy. Java doesn't have out parameters, but it doesn't have first-class functions either, so style-wise, it should suck equally for both approaches.

(Digression: As far as complexity goes: I like the callback design because it inherently forces the user of the API to handle both cases, whereas the tryGet() design requires callers to perform their own boilerplate conditional check, which they could forget or get wrong. But having now implemented both, I can see why the tryGet() design looks simpler, at least in the short term.)

First, the callback example:

class CallbackMap<K, V> {
    private final Map<K, V> backingMap;

    public CallbackMap(Map<K, V> backingMap) {
        this.backingMap = backingMap;
    }

    void lookup(K key, Callback<K, V> handler) {
        V val = backingMap.get(key);
        if (val == null) {
            handler.handleMissing(key);
        } else {
            handler.handleFound(key, val);
        }
    }
}

interface Callback<K, V> {
    void handleFound(K key, V value);
    void handleMissing(K key);
}

class CallbackExample {
    private final Map<String, String> map;
    private final List<String> found;
    private final List<String> missing;
    private Callback<String, String> handler;

    public CallbackExample(Map<String, String> map) {
        this.map = map;
        found = new ArrayList<String>(map.size());
        missing = new ArrayList<String>(map.size());
        handler = new Callback<String, String>() {
            public void handleFound(String key, String value) {
                found.add(key + ": " + value);
            }

            public void handleMissing(String key) {
                missing.add(key);
            }
        };
    }

    void test() {
        CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map);
        for (int i = 0, count = map.size(); i < count; i++) {
            String key = "key" + i;
            cbMap.lookup(key, handler);
        }
        System.out.println(found.size() + " found");
        System.out.println(missing.size() + " missing");
    }
}

Now, the tryGet() example -- as best I understand the pattern (and I might well be wrong):

class TryGetMap<K, V> {
    private final Map<K, V> backingMap;

    public TryGetMap(Map<K, V> backingMap) {
        this.backingMap = backingMap;
    }

    boolean tryGet(K key, OutParameter<V> valueParam) {
        V val = backingMap.get(key);
        if (val == null) {
            return false;
        }
        valueParam.value = val;
        return true;
    }
}

class OutParameter<V> {
    V value;
}

class TryGetExample {
    private final Map<String, String> map;
    private final List<String> found;
    private final List<String> missing;

    private final OutParameter<String> out = new OutParameter<String>();

    public TryGetExample(Map<String, String> map) {
        this.map = map;
        found = new ArrayList<String>(map.size());
        missing = new ArrayList<String>(map.size());
    }

    void test() {
        TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map);

        for (int i = 0, count = map.size(); i < count; i++) {
            String key = "key" + i;
            if (tgMap.tryGet(key, out)) {
                found.add(key + ": " + out.value);
            } else {
                missing.add(key);
            }
        }
        System.out.println(found.size() + " found");
        System.out.println(missing.size() + " missing");
    }
}

And finally, the performance test code:

public static void main(String[] args) {
    int size = 200000;
    Map<String, String> map = new HashMap<String, String>();
    for (int i = 0; i < size; i++) {
        String val = (i % 5 == 0) ? null : "value" + i;
        map.put("key" + i, val);
    }

    long totalCallback = 0;
    long totalTryGet = 0;

    int iterations = 20;
    for (int i = 0; i < iterations; i++) {
        {
            TryGetExample tryGet = new TryGetExample(map);
            long tryGetStart = System.currentTimeMillis();
            tryGet.test();
            totalTryGet += (System.currentTimeMillis() - tryGetStart);
        }
        System.gc();

        {
            CallbackExample callback = new CallbackExample(map);
            long callbackStart = System.currentTimeMillis();
            callback.test();
            totalCallback += (System.currentTimeMillis() - callbackStart);
        }
        System.gc();
    }

    System.out.println("Avg. callback: " + (totalCallback / iterations));
    System.out.println("Avg. tryGet(): " + (totalTryGet / iterations));
}

On my first attempt, I got 50% worse performance for callback than for tryGet(), which really surprised me. But, on a hunch, I added some garbage collection, and the performance penalty vanished.

This fits with my instinct, which is that we're basically talking about taking the same number of method calls, conditional checks, etc. and rearranging them. But then, I wrote the code, so I might well have written a suboptimal or subconsicously penalized tryGet() implementation. Thoughts?


Updated: Per comment from Michael Aaron Safyan, fixed TryGetExample to reuse OutParameter.

+2  A: 

I would say that neither design makes sense in practice, regardless of the performance. I would argue that both mechanisms are overly complicated and, more importantly, don't take into account actual usage.

Actual Usage
If a user looks up a value in a map and it isn't there, most likely the user wants one of the following:

  • To insert some value with that key into the map
  • To get back some default value
  • To be informed that the value isn't there

Thus I would argue that a better, null-free API would be:

  • has(key) which indicates if the key is present (if one only wishes to check for the key's existence).
  • get(key) which reports the value if the key is present; otherwise, throws NoSuchElementException.
  • get(key,defaultval) which reports the value for the key, or defaultval if the key isn't present.
  • setdefault(key,defaultval) which inserts (key,defaultval) if key isn't present, and returns the value associated with key (which is defaultval if there is no previous mapping, otherwise prev mapping).

The only way to get back null is if you explicity ask for it as in get(key,null). This API is incredibly simple, and yet is able to handle the most common map-related tasks (in most use cases that I have encountered).

I should also add that in Java, has() would be called containsKey() while setdefault() would be called putIfAbsent(). Because get() signals an object's absence via a NoSuchElementException, it is then possible to associate a key with null and treat it as a legitimate association.... if get() returns null, it means the key has been associated with the value null, not that the key is absent (although you can define your API to disallow a value of null if you so choose, in which case you would throw an IllegalArgumentException from the functions that are used to add associations if the value given is null). Another advantage to this API, is that setdefault() only needs to perform the lookup procedure once instead of twice, which would be the case if you used if( ! dict.has(key) ){ dict.set(key,val); }. Another advantage is that you do not surprise developers who write something like dict.get(key).doSomething() who assume that get() will always return a non-null object (because they have never inserted a null value into the dictionary)... instead, they get a NoSuchElementException if there is no value for that key, which is more consistent with the rest of the error checking in Java and which is also a much easier to understand and debug than NullPointerException.

Answer To Question
To answer original question, yes, you are unfairly penalizing the tryGet version.... in your callback based mechanism you construct the callback object only once and use it in all subsequent calls; whereas in your tryGet example, you construct your out parameter object in every single iteration. Try taking the line:

OutParameter out = new OutParameter();

Take the line above out of the for-loop and see if that improves the performance of the tryGet example. In other words, place the line above the for-loop, and re-use the out parameter in each iteration.

Michael Aaron Safyan
In C#, we have `map.ContainsKey` to do what your `has` does and can emulate the defaulting `get` with `map[key] ?? defaultVal`, which collapses nulls. If we wanted to throw on failure, we could say `if (!map.TryGetValue(key,out val)) throw new ArgumentException("Key not found")` or something like that. I don't think there's any *efficient* way to do `setdefault`, though, in either language. In a C++ STL map, however, it's possible.
Steven Sudit
@Steven, the defaulting get() is easy to do, but it assumes that you are already aware that get() may return null... which I think is a bad design... if you try to get something that isn't there and you haven't specified what you want it to give you when it isn't there, then it should blow up in your face. As for setdefault(), the efficiency reason is a major motivation for providing such a function; it takes 2 lookups without it. Atomicity and thread-safety is another; you will find that Java's concurrent map implementation provides an equivalent putIfAbsent() function.
Michael Aaron Safyan
@Michael -- If that's the behavior you need, `get(k, default)` / `putIfAbsent(k, v)` is a sensible pattern -- I've used it myself. When you need behavior, and not just a value, I like the way the callback pattern forces you to handle both cases. We can argue till the cows come home about how often it's likely to occur in "actual usage". :)
David Moles
@Michael And good point about repeatedly declaring `OutParameter` -- thanks for catching that. Doesn't seem to make a difference to the performance numbers, though (I guess object creation really has gotten cheap).
David Moles
BTW, `ConcurrentMap` in Java has `putIfAbsent()`, but it has the annoying behavior of returning the previous value (which can be null), rather than the current value, which means if you don't read the documentation, you can get bitten trying to do something like `Value v = map.putIfAbsent(key, default)`.
David Moles
@David Moles, I already mentioned that Java's concurrent map has a putIfAbsent, but thank you for pointing out that it has different semantics than those I gave for setdefault(). I agree, it would be nice if it returned the value actually associated with the key.
Michael Aaron Safyan
@Michael: Like Dave, I agree that a defaulting get is useful, and have made good use of it. But I'm not sure I can agree with your statement that returning null is an undesirable behavior. Perhaps my assumptions are influenced by SQL, but it does make sense to me that requesting a value for a nonexistent key should return a nonexistent -- *null* -- value. I guess it depends on whether you think null is inherently bad or just a complication that is sometimes unwelcome; I lean towards the latter. In any case, this has certainly been an educational experience. Thank you.
Steven Sudit
+1  A: 

David, thanks for taking the time to write this up. I'm a C# programmer, so my Java skills are a bit vague these days. Because of this, I decided to port your code over and test it myself. I found some interesting differences and similarities, which are pretty much worth the price of admission as far as I'm concerned. Among the major differences are:

  1. I didn't have to implement TryGet because it's built into Dictionary.
  2. In order to use the native TryGet, instead of inserting nulls to simulate misses, I simply omitted those values. This still means that v = map[k] would have set v to null, so I think it's a proper porting. In hindsight, I could have inserted the nulls and changed (_map.TryGetValue(key, out value)) to (_map.TryGetValue(key, out value) && value != null)), but I'm glad I didn't.
  3. I want to be exceedingly fair. So, to keep the code as compact and maintainable as possible, I used lambda calculus notation, which let me define the callbacks painlessly. This hides much of the complexity of setting up anonymous delegates, and allows me to use closures seamlessly. Ironically, the implementation of Lookup uses TryGet internally.
  4. Instead of declaring a new type of Dictionary, I used an extension method to graft Lookup onto the standard dictionary, much simplifying the code.

With apologies for the less-than-professional quality of the code, here it is:

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication1
{
    static class CallbackDictionary
    {
        public static void Lookup<K, V>(this Dictionary<K, V> map, K key, Action<K, V> found, Action<K> missed)
        {
            V v;
            if (map.TryGetValue(key, out v))
                found(key, v);
            else
                missed(key);
        }
    }

    class TryGetExample
    {
        private Dictionary<string, string> _map;
        private List<string> _found;
        private List<string> _missing;

        public TryGetExample(Dictionary<string, string> map)
        {
            _map = map;
            _found = new List<string>(_map.Count);
            _missing = new List<string>(_map.Count);
        }

        public void TestTryGet()
        {
            for (int i = 0; i < _map.Count; i++)
            {
                string key = "key" + i;
                string value;
                if (_map.TryGetValue(key, out value))
                    _found.Add(key + ": " + value);
                else
                    _missing.Add(key);
            }

            Console.WriteLine(_found.Count() + " found");
            Console.WriteLine(_missing.Count() + " missing");
        }

        public void TestCallback()
        {
            for (int i = 0; i < _map.Count; i++)
                _map.Lookup("key" + i, (k, v) => _found.Add(k + ": " + v), k => _missing.Add(k));

            Console.WriteLine(_found.Count() + " found");
            Console.WriteLine(_missing.Count() + " missing");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            int size = 2000000;
            var map = new Dictionary<string, string>(size);
            for (int i = 0; i < size; i++)
                if (i % 5 != 0)
                    map.Add("key" + i, "value" + i);

            long totalCallback = 0;
            long totalTryGet = 0;

            int iterations = 20;
            TryGetExample tryGet;
            for (int i = 0; i < iterations; i++)
            {
                tryGet = new TryGetExample(map);
                long tryGetStart = DateTime.UtcNow.Ticks;
                tryGet.TestTryGet();
                totalTryGet += (DateTime.UtcNow.Ticks - tryGetStart);
                GC.Collect();

                tryGet = new TryGetExample(map);
                long callbackStart = DateTime.UtcNow.Ticks;
                tryGet.TestCallback();
                totalCallback += (DateTime.UtcNow.Ticks - callbackStart);
                GC.Collect();
            }

            Console.WriteLine("Avg. callback: " + (totalCallback / iterations));
            Console.WriteLine("Avg. tryGet(): " + (totalTryGet / iterations));
        }
    }
}

My performance expectations, as I said in the article that inspired this one, would be that neither one is much faster or slower than the other. After all, most of the work is in the searching and adding, not in the simple logic that structures it. In fact, it varied a bit among runs, but I was unable to detect any consistent advantage.

Part of the problem is that I used a low-precision timer and the test was short, so I increased the count by 10x to 2000000 and that helped. Now callbacks are about 3% slower, which I do not consider significant. On my fairly slow machine, callbacks took 17773437 while tryget took 17234375.

Now, as for code complexity, it's a bit unfair because TryGet is native, so let's just ignore the fact that I had to add a callback interface. At the calling spot, lambda notation did a great job of hiding the complexity. If anything, it's actually shorter than the if/then/else used in the TryGet version, although I suppose I could have used a ternary operator to make it equally compact.

On the whole, I found the C# to be more elegant, and only some of that is due to my bias as a C# programmer. Mainly, I didn't have to define and implement interfaces, which cut down on the plumbing overhead. I also used pretty standard .NET conventions, which seem to be a bit more streamlined than the sort of style favored in Java.

Steven Sudit
Thanks, Steven! Good point about there being no need to actually put nulls into the map -- I must have spaced out there. And yeah, I think C# definitely has better idioms for these things these days.
David Moles
Oh, one point -- I left the extra curly braces in the loop for a reason: to make sure the `Examples` drop out of scope and become available for garbage collection. Though I imagine the compiler might be smart enough to figure out that they're never accessed after the `test()` call anyway.
David Moles
@David: Ah, that makes sense. Rather than using scoping, I would probably null out these variables to explicitly show my intent to release them, while making them available for GC. The net effect would be the same, of course. My take on C# is that it's MS Java, but with some of the ugly parts sanded off. The first-class support for properties and delegates, along with the heaping tablespoon of syntactic sugar, makes it go down more easily than its Java equivalent.
Steven Sudit