ansaurus

Question

.NET should I store references or values?

Answer 1

+3 A:

Storing references to the object seems like the easiest and lowest memory cost option.

If you're using this for a "has this been handled" check, the best option (for fastest checking) is probably to implement Object.Equals and Object.GetHashCode on your class, and then use a HashSet<T>. HashSet<T> is nice for this because it provides an O(1) Contains() method.

If you can't change the class to allow for hashing, you can alternatively implement IEqualityComparer for the object.

Reed Copsey 2010-08-10 00:35:29

My question is basically, sine I want the collection containing the data about the handled refs to be as less as possible in memory weight since I am going to have a gazillion of these (parent) class.I care less about the time of the retrieval (i.e. collection.Contains)

Shimmy 2010-08-10 00:43:28

Well, using the object reference will always be one of the cheapest options - if the object exists elsewhere. This way, you're only storing one object reference per object, where using strings is ref+string content, etc...

Reed Copsey 2010-08-10 00:45:28

@Shimmy: Are the "parent" objects organized in a specific manner before you do the processing? (ie: in a list) If so, you might want to store an array of bool values (one per object) as this will be even lighter weight on memory, and very fast to access if you're accessing the parent classes by index already.

Reed Copsey 2010-08-10 00:46:26

Lemme try to make things clearer I have an object (this is what I call the 'parent') that has to load values (I don't have control of that part, it's a subclass). I need to flag certain objects and hold this data on the parent. how should I store this data, even I want to store a list/dictionary of bools I'd anyway have to store the key as well then why store the value if the key is enough.I've updated my question. please take a look.

Shimmy 2010-08-10 00:56:25

In short, my question is what weights less: an array of objects (it's actually references) or array of hashcodes / hashset.

Shimmy 2010-08-10 00:59:16

Shimmy: Reference is 32 or 64bits per ref - hashcodes/hashset depends on size of hash ;) If you use a string, it'll be 32/64bits + strlen bytes.

Reed Copsey 2010-08-10 01:05:56

@Shimmy: Remember - strings are reference types, so using a string will mean adding a reference + the string.

Reed Copsey 2010-08-10 01:06:31

Right. so I am gonna stick to an array of references (rather than hashset)

Shimmy 2010-08-10 01:32:33

I updated you answer based on your comment. Please correct me if I did wrong.

Shimmy 2010-08-10 01:36:29

Answer 2

A:

.NET style hashcodes aren't an option unless the possible range of different values for your objects is less than 2^32, as otherwise you will get false-positives (and given the birthday paradox, this can happen more often than you might think even with a great hash fuction). Hashcodes give a quick link to a bucket of zero-or-more items, which are then checked for equality. Hence a hashcode-base solution would require you to store a reference to each object as well anyway, and hence can't be smaller in memory to storing just the references.

If the objects couldn't be garbage collected (i.e. they are still "alive" to another part of the application) then the cost of storing the reference will be 4 or 8 bytes depending on architecture. If they could be GC'd then the cost is dependant on the size of the graph of that object.

Now, if you can create your own lossless hash object of the objects that is smaller than that, you can get a memory saving. E.g.:

public class ObjectOfInterest
{// all fields public for sake of simplicity in example
    public int ID; // this is important diff id - diff object.
    public int ParID; // this is unimportant, as same for all objects processed here.
    public ParentType Parent; // this is just memoised based on _parID;
    public decimal Val; // this is important.
    public string Name; // unimportant for our purposes.
    public RelatedType Stuff; // memoised based on _id
}

Then we can produce a related:

public struct HashObject
{
    private readonly int _id;
    private readonly decimal _val;
    public HashObject(ObjectOfInterest ooi)
    {
        _id = ooi.ID;
        _val = ooi.Val;
    }
    public bool Matches(ObjectOfInterest ooi)
    {
        return _id == ooi.ID && _val == ooi.Val;
    }
    // because one of the options as to how to store *this* is hashing
    public bool Equals(HashObject ho)
    {
        return _id == ho._id && _val == ooi._val;
    }
    public override bool Equals(object obj)
    {
        return Equals(obj as HashObject);
    }
    public int GetHashCode()
    {
        unchecked
        {
            return _val.GetHashCode() ^ (_id << 16) ^ (_id >> 16);
        }
    }
}

Now, we store the HashObjects and use these to note what we've done with. In this case we're going to take up at least 20 bytes storing this struct, plus overhead of whatever means we have to store it. Smaller if the ObjectOfInterest can now be GC'd, pointless if they are still in memory anyway.

There's a hash and equality approach (knowledge of likely values can improve how good the hash is) if you decided to store these in a HashSet themselves. HashSet won't be the most memory efficient collection, though it might be that given the extra strain you are putting on this in all of these comparisons, that you do need the faster look up. This is area for experimentation over theory (esp. since the details change according to your objects). If you can take the look-up time complexity of constantly scanning arrays, then that's your best bet.

If there isn't a possible smaller object than you original type that allows for a full relevantly-equal comparison, then this approach can't work.

Jon Hanna 2010-08-10 01:08:34

The reference objects are only references, never birthdays, they return the original value of Object.GetHashCode. anyway, I am not going to have many references on each collection (0-20).

Shimmy 2010-08-10 01:34:14

Just store the reference.

Jon Hanna 2010-08-10 07:48:15

BTW, you do know what the "birthday paradox" is and how it relates to things other than birthdays, yes? If not look it up, it's important to know.

Jon Hanna 2010-08-10 09:15:51

I know it. thanks. yes. but as I said, what was important for me is the memory cost, not the performing speed.Thanks for your effort.

Shimmy 2010-08-11 02:29:18

ansaurus

tags:

views:

answers:

.NET should I store references or values?

related questions