views:

624

answers:

6

To interact with an external data feed I need to pass a rolling security key which has been MD5 hashed (every day we need to generate a new MD5 hashed key).

I'm trading up whether or not to do it every time we call the external feed or not. I need to has a string of about 10 characters for the feed.

It's for an ASP.NET (C#/ .NET 3.5) site and the feed is used on pretty much every page. Would I best off generating the hash once a day and then storing it in the application cache, and taking the memory hit, or generating it on each request?

+2  A: 

Generate some sample data. Well, a lot of it. Compute the MD5 of the sample data. Measure the time it takes. Decide for yourself.

doppelfish
A: 

How much data is being hashed? Is there some easy trigger for noticing that the data has changed? Could you store the current hash in the same place that you're storing the data that gets hashed?

Jon Skeet
+6  A: 

The only acceptable basis for optimizations is data. Measure generating this inline and measure caching it.

My high-end workstation can calculate well over 100k MD5 hashes of a 10-byte data segment in a second. There would be zero benefit from caching this for me and I bet it's the same for you.

Sander
For me it's never really interesting; How many seconds does X take to execute? I'd rather see a time complexity analysis of the algorithm ( at first ) of course then you need to meassure how much time each constant takes.
Filip Ekberg
The complexity analysis is only interesting if the amount of data changes. We don't know enough about the data here - if it's a constant (or near-constant) amount then Sander's answer is precisely the way to go.
Jon Skeet
Did you calculate the MD5 100k times from the same 10bytes, or did you generate 100k segments of 10bytes? The cache might play a role here. Still: Hard data - good to have.
doppelfish
A: 

If it'll be the same for a given day caching it might be an idea. You could even set the cache to be 24 hours and write code to regenerate the hash when the cache expires

Conrad
A: 

Calculate the time complexity of the algorithm!

Look at the following code:

   public string GetMD5Hash(string input)
    {
        System.Security.Cryptography.MD5CryptoServiceProvider x = new System.Security.Cryptography.MD5CryptoServiceProvider();
        byte[] bs = System.Text.Encoding.UTF8.GetBytes(input);
        bs = x.ComputeHash(bs);
        System.Text.StringBuilder s = new System.Text.StringBuilder();
        foreach (byte b in bs)
        {
            s.Append(b.ToString("x2").ToLower());
        }
        string password = s.ToString();
        return password;
    }

If we were to calculate the time complexity we would get T= 11 + n * 2 however this is just "what we see" i.e. ToLower might do some heavy work which we don't know. But from this point we can see that this algorithm is O(n) in all cases. Meaning time grows as data growns.

Also to adress the cache issue, I'd rather have my "heavy" work in Memory since memory is less expensive when compared to CPU-usage.

Filip Ekberg
Certainly time will grow as data grows - but it's the cost of ComputeHash which is most relevant to this question.
Jon Skeet
Also, the cost of your string operations doesn't depend on the size of the input - because ComputeHash will always give the same size of hash, regardless of input size.
Jon Skeet
(Sorry, I meant the string operations *after* ComputeHash - the call to GetBytes depends on the input size, certainly. There's no evidence (yet) that the input data to the real problem is text yet though.)
Jon Skeet
Good point Jon. However ComputeHash will take longer for a bigger amount of data, which is understandable of course. But that is a "constant" cost which depencs on n. And besides ComputeHash you anyways have a cost of n. So, doing md5 might aswell end up in n^2. Depending on the structure of CH.
Filip Ekberg
Constant cost which depends on n? How can it be constant if it depends on n? This whole question is basically asking "What is the cost of ComputeHash" so I don't see what value your answer adds, I'm afraid.
Jon Skeet
Ok I was unclear on the "variable constant", hehe, basicly what i mean is that when calculating the time complexity from my example, it's a constant. But, when we look inside it, n may very much be increased. I see why you don't see the value. But, time complexity training is never bad!
Filip Ekberg
No, n won't be increased - but the overall complexity of the algorithm may well be. That's why you can never judge overall complexity without knowing the complexity of all the calls made in the code. Otherwise you'd just wrap the whole thing in a "DoIt" method of O(1) :)
Jon Skeet
My point about this answer not adding value is that the question was: "How expensive is it to compute a hash" and your answer, after the irrelevant string building stuff is stripped away, is "It depends on how expensive the ComputeHash method is."
Jon Skeet
A: 

Using the Asp.Net cache is very easy so I don't see why you shouldn't cache the key.

Storing the key in cache may even save some memory since you can reuse it instead of creating a new one for each request.

Rune Grimstad