tags:

views:

153

answers:

6

I don't know if the title makes sense, but basically I am wondering people's opinions on whether to calculate public members whenever they have to be changed, or as soon as they are accessed?

Say you have a class like CustomCollection, that has a property called Count. Should Count be updated for each Add, Remove, etc operations, or should it just be calculated at the time it's accessed?

Keeping up-to-date seems intuitive, but then you wonder, how often people call Add, Remove, etc vs .Count.

Also is there a hybrid version where you can cache it when the property is accessed? I think that would require another variable to be updated, right?

+1  A: 

This answer will depend on the type of application. And the specific situation.

For a count variable, I would typically update it on the fly, because the cost to calculate may be prohibitive, and the cost to store is very low.

In other circumstances, the cost to caluculate may be very low, but the cost of storage could be high (maybe in terms of the number of places code would need to be maintained to keep it accurate)

Jason Coyne
+4  A: 

You're right when you say you have to consider how often these functions are accessed. If count is accessed all the time, it shouldn't be on-demand as that would be slower than necessary. If the other functions are accessed more, then recalculating count every time would be a waste as well.

A middle ground would be having something that calculates count on demand if a flag is set to false, and then sets the flag to true. Calls to add, remove, etc would set the flag to false.

Something like this:

Class CachedCount 
   int count = 0;
   boolean count_is_valid = false;

   int getCount()
       if count_is_valid
           return count;
       else
           count = calculate_count();
           count_is_valid = true;
           return count;

    void Add(item)
        count_is_valid = false;
        ...

    ...

Note that this would really only provide a benefit if you access count several times in a row without accessing add, remove, etc in between, and that accesses to add, remove, etc aren't interleaved with calls to count. The benefit of this is lost if the requests are interleaved. The biggest benefit comes from sequences like: add, add, add, remove, remove, add, count, count, count, count, count, count rather than add, count, add, count, remove, count, remove, count, add, count.

Welbog
Thanks Welbog, can you please give a small example for your middle ground?
Joan Venge
Faster than me :)
Joan Venge
There it is in pseudocode. I'll leave it as an exercise to convert it to C#. Or anything else, for that matter.
Welbog
I think it really depends on the situation. For instance, when looking at your example, I don't see a benefit, because incrementing (decrementing) a count on add (remove) operations is not more expensive than storing and updating count_is_valid.
0xA3
Well, the derived field might be a lot more complicated than a simple increment. What if it's a standard deviation or something else that can't be calculated incrementally?
Welbog
+1  A: 

You might want to do some kind of metrics concerning how often those public members are accessed vs. the time required to calculate them. If it's a trivial operation to keep it up to date, then do it the more intuitive way, if it's a complicated operation to update but it's rarely accessed then it might make sense to provide it on demand.

SnOrfus
+2  A: 

Depends on the time it takes to update count, if its a time consuming process and you infrequently call Count I would opt for updating count when you call the function and not every time you modify the collection.

RA
+1  A: 

On-demand is arguably simpler, and could therefore generally be considered better.

As for performance, most applications probably won't notice the difference. Both approaches have their backsides; the up-to-date strategy can cause unnecessary calculations, while on-demand can cause you to perform the same calculations more than once (but on the other hand, clients can store values of calculations on their own accord and thus avoid double work). On-demand also creates smaller objects, theoretically meaning fewer cache misses and fewer garbage collects, both of which can harm performance greatly. But again, most apps wouldn't notice, I'd reckon.

By the way, if the calculation is very expensive, it shouldn't be in a property, if I remember the .Net coding guidelines correctly.

gustafc
+4  A: 

Most of the information about the state of the class should always be up-to-date as a side effect of manipulating the data. For example the Count property is based on the internal storage of the data (ie. the array length).

Other properties that depend on certain conditions of the state of the class may need to be calculated. For exaple a ContainsValidOrder property might depend on orders in the class. For those properties you have to evaluate the use of the class and decide if the cost of calculating the value as you add and remove items from the collection is cheaper then scannig the entire collection each time the property is accessed.

The .NET guidlines do suggest however that properties do not execute complex code and that repeated access of the property does not have any side-effects or performance implications. So for properties that represent calculated data it might be better to use a method GetXXX. This indicates to the developer using your library that 1. the calculation might take some time and 2. they should hold on to the value for the duratino of their task.

Paul Alexander