views:

66

answers:

2

I need to store in my object which values have already been handlerd, I am doubting what would cost more performance, should I create an array that stores:

  • The instance references (they are not structs, only ref classes)
  • The hashcode of the items
  • The name of the name of the properties (string) that have been handled

Update
my aim is that the collection of the data on the handled references should cost the less of memory the possible since I am going to have a tone of the parent instance type.
I care less about the retrieval time (i.e. collection.Contains(reference)).

So my question is then what of the above array what cost less memory.

+3  A: 

Storing references to the object seems like the easiest and lowest memory cost option.

If you're using this for a "has this been handled" check, the best option (for fastest checking) is probably to implement Object.Equals and Object.GetHashCode on your class, and then use a HashSet<T>. HashSet<T> is nice for this because it provides an O(1) Contains() method.

If you can't change the class to allow for hashing, you can alternatively implement IEqualityComparer for the object.

Reed Copsey
My question is basically, sine I want the collection containing the data about the handled refs to be as less as possible in memory weight since I am going to have a gazillion of these (parent) class.I care less about the time of the retrieval (i.e. collection.Contains)
Shimmy
Well, using the object reference will always be one of the cheapest options - if the object exists elsewhere. This way, you're only storing one object reference per object, where using strings is ref+string content, etc...
Reed Copsey
@Shimmy: Are the "parent" objects organized in a specific manner before you do the processing? (ie: in a list) If so, you might want to store an array of bool values (one per object) as this will be even lighter weight on memory, and very fast to access if you're accessing the parent classes by index already.
Reed Copsey
Lemme try to make things clearer I have an object (this is what I call the 'parent') that has to load values (I don't have control of that part, it's a subclass). I need to flag certain objects and hold this data on the parent. how should I store this data, even I want to store a list/dictionary of bools I'd anyway have to store the key as well then why store the value if the key is enough.I've updated my question. please take a look.
Shimmy
In short, my question is what weights less: an array of objects (it's actually references) or array of hashcodes / hashset.
Shimmy
Shimmy: Reference is 32 or 64bits per ref - hashcodes/hashset depends on size of hash ;) If you use a string, it'll be 32/64bits + strlen bytes.
Reed Copsey
@Shimmy: Remember - strings are reference types, so using a string will mean adding a reference + the string.
Reed Copsey
Right. so I am gonna stick to an array of references (rather than hashset)
Shimmy
I updated you answer based on your comment. Please correct me if I did wrong.
Shimmy
A: 

.NET style hashcodes aren't an option unless the possible range of different values for your objects is less than 2^32, as otherwise you will get false-positives (and given the birthday paradox, this can happen more often than you might think even with a great hash fuction). Hashcodes give a quick link to a bucket of zero-or-more items, which are then checked for equality. Hence a hashcode-base solution would require you to store a reference to each object as well anyway, and hence can't be smaller in memory to storing just the references.

If the objects couldn't be garbage collected (i.e. they are still "alive" to another part of the application) then the cost of storing the reference will be 4 or 8 bytes depending on architecture. If they could be GC'd then the cost is dependant on the size of the graph of that object.

Now, if you can create your own lossless hash object of the objects that is smaller than that, you can get a memory saving. E.g.:

public class ObjectOfInterest
{// all fields public for sake of simplicity in example
    public int ID; // this is important diff id - diff object.
    public int ParID; // this is unimportant, as same for all objects processed here.
    public ParentType Parent; // this is just memoised based on _parID;
    public decimal Val; // this is important.
    public string Name; // unimportant for our purposes.
    public RelatedType Stuff; // memoised based on _id
}

Then we can produce a related:

public struct HashObject
{
    private readonly int _id;
    private readonly decimal _val;
    public HashObject(ObjectOfInterest ooi)
    {
        _id = ooi.ID;
        _val = ooi.Val;
    }
    public bool Matches(ObjectOfInterest ooi)
    {
        return _id == ooi.ID && _val == ooi.Val;
    }
    // because one of the options as to how to store *this* is hashing
    public bool Equals(HashObject ho)
    {
        return _id == ho._id && _val == ooi._val;
    }
    public override bool Equals(object obj)
    {
        return Equals(obj as HashObject);
    }
    public int GetHashCode()
    {
        unchecked
        {
            return _val.GetHashCode() ^ (_id << 16) ^ (_id >> 16);
        }
    }
}

Now, we store the HashObjects and use these to note what we've done with. In this case we're going to take up at least 20 bytes storing this struct, plus overhead of whatever means we have to store it. Smaller if the ObjectOfInterest can now be GC'd, pointless if they are still in memory anyway.

There's a hash and equality approach (knowledge of likely values can improve how good the hash is) if you decided to store these in a HashSet themselves. HashSet won't be the most memory efficient collection, though it might be that given the extra strain you are putting on this in all of these comparisons, that you do need the faster look up. This is area for experimentation over theory (esp. since the details change according to your objects). If you can take the look-up time complexity of constantly scanning arrays, then that's your best bet.

If there isn't a possible smaller object than you original type that allows for a full relevantly-equal comparison, then this approach can't work.

Jon Hanna
The reference objects are only references, never birthdays, they return the original value of Object.GetHashCode. anyway, I am not going to have many references on each collection (0-20).
Shimmy
Just store the reference.
Jon Hanna
BTW, you do know what the "birthday paradox" is and how it relates to things other than birthdays, yes? If not look it up, it's important to know.
Jon Hanna
I know it. thanks. yes. but as I said, what was important for me is the memory cost, not the performing speed.Thanks for your effort.
Shimmy