views:

302

answers:

5

I have a key/value pairs mapped by hash (ID) in Dictionary<string, Dictionary<string, string>> called id2key_value. You can think of this as a way to represent database-like table with rows.

I added some helper functions to ease the use of some basic datatypes by doing a cast, like

public int GetInt(string id, string key)
{
    int value = 0;
    bool success = int.TryParse(map[id][key], out value);

    if (success)
        return value;
    else
        throw new InvalidOperationException(
            "Trying to obtain a non-integer value with GetInt().");
}

Well, I thought I was being clever when I came up with an idea of a "cast-cache", which basically holds already parsed objects, so I could skip the parsing of the string for int, bool, DateTime, etc., and simply cast them to appropriate datatype from the cache. Like,

public int GetInt(string id, string key)
{
    if (cast_cache.ContainsKey(id) && cast_cache[id].ContainsKey(key))
        return (int) cast_cache[id][key];

    int value = 0;
    bool success = int.TryParse(map[id][key], out value);

    if (success)
    {
        this.AddToCache(id, key, value);

        return value;
    }
    else
        throw new InvalidOperationException(
            "Trying to obtain a non-integer value with GetInt().");
}

The "cast-cache" is simply Dictionary<string, Dictionary<string, object>>.

So, I did some performance testing with 10000 ints added to the map. Then I did one million random retrieves, with and without "cast-caching".

It took 495(ms) without caching and 490(ms) with caching. I also ran a test with DateTime, there the difference was more significant, but less than I expect (~750(ms) non-cached vs ~500(ms) cached).

Not (obviously) understanding the principle of a cast, how costly this operation is and why the performance is so close to the one that "deserializes" from string?

+8  A: 

Casting is much faster that most people seem to think since you are not touching the object itself (you are simply changing the reference that points to that object).

One of the main reasons that you ought to avoid casting is that when you cast you are eschewing type-safety and introducing potential execution-time errors into your application. I would rarely consider casting to be a performance concern.

As a side note I would make sure that you test both reference types and value types in your cache to make sure that you are not incurring a performance penalty due to the boxing and unboxing of any value types.

The performance penalty of boxing comes from the fact that casting a value type to an object does involve more than changing a reference as the value type must be copied to the heap. Also the use of a boxed value type will unbox the reference type and subsequently copy those values from the heap to the stack again.

Andrew Hare
Well, in this case it's actually unboxing rather than anything else... but I agree in general.
Jon Skeet
Nice - I edited just before you commented :)
Andrew Hare
The problem in the question isn't boxing/unboxing, its re-parsing the data from a string. Boxing/unboxing is very fast compared to string.TryParse().
Daniel Pryden
Edit: of course I meant int.TryParse(string)... sorry, it's Monday.
Daniel Pryden
Right but after the first parse the value is casted to the correct value.
Andrew Hare
A: 

Your example works faster because of the relationship between time and space. How much memory does your program need if you continue caching every hash type?

mcandre
+1  A: 

Hang on, what you're calling a "cast" in your code is not a cast.

If you're doing this:

bool success = int.TryParse(map[id][key], out value);

That's a conversion, not a cast. A cast would be:

value = (int) map[id][key];

or

value = map[id][key] as int;

Which would fail in this case because the object really is a string, not an int.

If you have a Dictionary<string, string> but you are really interested in storing arbitrary objects in it, I would make it a Dictionary<string, object>. As a result you'll be able to use casts instead of conversions, which as Andrew Hare points out is going to be a lot faster.

Daniel Pryden
"Well, I thought I was being clever when I came up with an idea of a "cast-cache", which basically holds already parsed objects, so I could skip the parsing of the string for int, bool, DateTime, etc., and simply cast them to appropriate datatype from the cache."I obviously understand that, unfortunate name for the cache. This solution would limit the serialization of the objects to ToString() or reflection, which isn't adequate. This would require storing also the type of the object to the storage, requiring more space, and something I'm trying to avoid.
A: 

Your code may perform quicker with the cache if you reduced the number of dictionary lookups. Additionally, making the cache Dictionary<string, int> instead of Dictionary<string, object> will avoid boxing and unboxing which is also costly.

public int GetInt(string id, string key)
{
    int value;
    Dictionary<string, int> cache;

    if (cast_cache.TryGetValue(id, out cache) 
          && cache.TryGetValue(key, out value))
    {
        return value;
    }

    if (int.TryParse(map[id][key], out value))
    {
        this.AddToCache(id, key, value);
        return value;
    }

    throw new InvalidOperationException("Trying to obtain a non-integer value with GetInt().");
}
adrianbanks
I could use a dictionary for every different type, but this kinda defeats the purpose of what I am doing here. Replacing with ´Dictionary<string, object>´, we obtain 50% speedup, so, thanks! I believe HounShell's proposal will gain even bigger speedup, so the answer will go to him once I confirm.
A: 

There are a couple things to consider in there. First, just so you know the right terminology, this is actually unboxing (you've taken a value-type and stored it as a reference-type, or boxed it. Reverting to the value-type is unboxing).

Second, I'd wager that the majority of your code isn't in the unboxing, instead it's in the multiple calls into the cache dictionary:

if (cast_cache.ContainsKey(id) && cast_cache[id].ContainsKey(key))            
    return (int)cast_cache[id][key]

I count 5 dictionary traverses in there: cast_cache.ContainsKey(id), cast_cache[id], .ContainstKey(key), cast_cache[id], and [key].

That's pretty harsh. You could cut down on a lot of these by using an aggregated key. Instead of looking for [id][key], combine those into a single object. That would cut your number of dictionaries down exponentially and cut those lookups down to 2, 1 if you skip the ContainsKey() with a try/catch (check the speed on that).

Here's a class that would allow you to combine those:

public class Vector
{
 private object[] _Data;

 public object this[int index]
 {
  get
  {
   return _Data[index];
  }
 }

 public Vector(params object[] data)
 {
  _Data = (object[])data.Clone();
 }

 public override bool Equals(object obj)
 {
  Vector OtherVector = obj as Vector;

  if (OtherVector == null)
   return false;

  if (OtherVector._Data.Length != _Data.Length)
   return false;

  for (int I = 0; I < _Data.Length; I++)
   if (!_Data[I].Equals(OtherVector._Data[I]))
    return false;

  return true;
 }

 public override int GetHashCode()
 {
  int Result = 0;
  for (int I = 0; I < _Data.Length; I++)
   Result = Result ^ (_Data[I].GetHashCode() * I);

  return Result;
 }
}

Try that out and see how your speed goes

Hounshell
This was what my intuition was telling me. I will try out and report back, thanks!
I'm trying to figure out how to use this. If the cache is changed to ´Dictionary<int, object>´, how are the collisions produced by vector handled? int hash = (new Vector(id, key)).GetHashCode();
It should be a Dictionary<Vector<IDType, KeyType>, object> The Dictionary looks things up using GetHashCode() and Equals() on the key. The Vector class just makes acts as a proxy for those two calls. It makes sure that hash codes are relatively unique and hash codes are the same for two classes that are Equal() (the two requirements for overloading GetHashCode()) and it calls Equals() on each of it's constituents. It's essentially just a proxy for an n-value key, but you can use it directly
Hounshell
Sorry, re-read my own class. It'd be Dictionary<Vector, object> and you'd use it like this:Vector MyKey = new Vector(id, key);if (cast_cache.ContainsKey(MyKey)) return (int)cast_cache[MyKey];Also, as an aside, have you thought about using different caches for different types? Let's say you ask for a value as a string, then you ask for it as an int. The routine would barf because it's already been cached as a string. Plus, by using more strongly type dictionaries you could avoid having to explicitly unbox the ints:Dictionary<Vector, int> cast_cache_int;
Hounshell