tags:

views:

893

answers:

3

When using a Guid as an index for a Dictionary, is it better to use the Guid object, or the string representation of the Guid?

I just refactored some code which was using string to use the object, because there were new Guid() calls all over the place. But that left me wondering what the performance issues might be. (The collections are fairly small, but they get iterated lots of times.)

+16  A: 

The Guid should be quicker, as the comparison is simpler - just a few direct bytes. The string involves a dereference and lots more work.

Of course - you could profile ;-p

Evidence:

Searching for 7f9b349f-f36f-94de-ad96-04279ddf6ecf
As guid: 466; -1018643328
As string: 512; -1018643328
Searching for 870ba465-08f2-c872-cfc9-b3cc1ffa09de
As guid: 470; 1047183104
As string: 589; 1047183104
Searching for d2376f8a-b8c9-4633-ee8e-9679bb30f918
As guid: 423; 1841649088
As string: 493; 1841649088
Searching for 599889e8-d5fd-3618-4c4f-cb620e6f81bb
As guid: 488; -589561792
As string: 493; -589561792
Searching for fb64821e-c541-45f4-0fd6-1c772189dadf
As guid: 450; 1389733504
As string: 511; 1389733504
Searching for 798b9fe5-ba15-2753-357a-7637161ee48a
As guid: 415; 779298176
As string: 504; 779298176
Searching for 12ba292e-8e59-e5d0-7d04-e811a237dc21
As guid: 457; 558250944
As string: 564; 558250944
Searching for 05b3ce14-dfbf-4d3a-1503-ced515decb81
As guid: 413; 1658205056
As string: 504; 1658205056
Searching for 8db4a556-0a65-d8cb-4d0d-0104245d18b8
As guid: 415; 696231936
As string: 506; 696231936
Searching for c49cf80c-5537-fba5-eebd-8ad21bba09c4
As guid: 459; 2100976384
As string: 557; 2100976384

based on:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
static class Program
{

    static void Main()
    {
        Random rand = new Random(123456);
        int COUNT = 1000;
        Dictionary<Guid, int> guids = new Dictionary<Guid, int>(COUNT);
        Dictionary<string, int> strings = new Dictionary<string, int>(
            COUNT, StringComparer.Ordinal);

        byte[] buffer = new byte[16];
        for (int i = 0; i < COUNT; i++)
        {
            rand.NextBytes(buffer);
            Guid guid = new Guid(buffer);
            int val = rand.Next();
            guids.Add(guid, val);
            strings.Add(guid.ToString(), val);
        }

        for(int i = 0 ; i < 10 ; i++) {
            int index = rand.Next(COUNT);
            Guid guid = guids.Keys.Skip(index).First();
            Console.WriteLine("Searching for " + guid);
            int chk = 0;
            const int LOOP = 5000000;
            Stopwatch watch = Stopwatch.StartNew();
            for (int j = 0; j < LOOP; j++)
            {
                chk += guids[guid];
            }
            watch.Stop();
            Console.WriteLine("As guid: " + watch.ElapsedMilliseconds
                   + "; " + chk);
            string key = guid.ToString();
            chk = 0;
            watch = Stopwatch.StartNew();
            for (int j = 0; j < LOOP; j++)
            {
                chk += strings[key];
            }
            watch.Stop();
            Console.WriteLine("As string: " + watch.ElapsedMilliseconds
                   + "; " + chk);
        }
        Console.ReadLine();

    }
}
Marc Gravell
Oh, you won't do it for me? ;)
Benjol
Wow, you did! The answer is yours, sir!
Benjol
Service with a smile ;-p
Marc Gravell
So, string is about 20% faster on those figures (but they include more than just the add operations). Would be interesting to see difference in lookup times.
Richard
Actually, the figures *only* cover the lookup times. The add is not profiled.
Marc Gravell
+1  A: 

The collections are fairly small, but they get iterated lots of times

If you are iterating, there are no key to key comparisons. If you are adding/modifying or looking up by key, then keys will be hashed and the hashes compared; only if the hashes are equal will the keys be compared.

Therefore, unless you are performing a lot of key based operations on huge dictionaries with many hash collisions the speed of key to key comparisons will not be a major factor.

Richard
Yeah, bad wording on my part. There's not much point having a dictionary if there are no lookups!
Benjol
A Dictionary ensures keys are unique and O(log n) insertion; this can be very useful even if you are only going to iterate.
Richard
(see reply to your comment on my post)
Marc Gravell
A: 

My first thought would have been, that Guid objects are faster, but if you get some input as string and have to search it in a small collection (hashset) of GUIDs (which aren't changing often), it might be faster to store them as strings, because:

  • For searching a string in a GUID-Dictionary, you have to parse the string (including error checking etc.), create the Guid structure, get the hash code, do the hash lookup and one final comparison of the GUID bytes.

  • For searching a string in a String-Dictionary, you have to build the hash of the string (possibly faster than building the Guid struct), lookup the hash and do one string comparison. If, for instance, you expect many GUIDs not to be in the collections, the hash comparison will fail often an you don't even have to do the string comparison (which takes slightly more time than the GUID-comparison from point 1 above)

If you already have Guid-structures as input (e.g. because you did some validity-checking on the input strings) of course it's far better to reuse them as index in the dictionary.

BUT: From point of view of design clarity (which is far more important than performance in 99% of all code) you should use Guid structures and only change that, if you really run into performance troubles (and profiling shows that you get an advantage out of the string solution).

MartinStettner