ansaurus

Question

Answer 1

+5 A:

For strings, you could perhaps look at interning; either with the inbuilt interner, or (preferably) a custom interner - basically a Dictionary<string,string>. What this means is that each identical string uses the same reference - and the duplicates can be collected.

Don't do anything with the int; that is already optimal.

For example:

using System;
using System.Collections.Generic;
class StringInterner {
    private readonly Dictionary<string, string> lookup
        = new Dictionary<string, string>();
    public string this[string value] {
        get {
            if(value == null) return null;
            if(value == "") return string.Empty;
            string result;
            lock (lookup) { // remove if not needed to be thread-safe     
                if (!lookup.TryGetValue(value, out result)) {
                    lookup.Add(value, value);
                    result = value;
                }
            }
            return result;
        }
    }
    public void Clear() {
        lock (lookup) { lookup.Clear(); }
    }
}
static class Program {
    static void Main() {
        // this line is to defeat the inbuilt compiler interner
        char[] test = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' };

        string a = new string(test), b = new string(test);
        Console.WriteLine(ReferenceEquals(a, b)); // false
        StringInterner cache = new StringInterner();
        string c = cache[a], d = cache[b];
        Console.WriteLine(ReferenceEquals(c, d)); // true
    }
}

You could take this further with WeakReference if desired.

Note importantly that you don't need to change your design - you just change the code that populates the object to use the interner/cache.

Marc Gravell 2009-11-13 06:33:47

+1 Nice example :)

Andrew Hare 2009-11-13 06:42:30

thanks. I guess I need to determine whether the CLR is already doing this for me.

Chris Herring 2009-11-13 06:57:07

The docs specify that the memory allocated for CLR interned strings are not likely to be releeased until the CLR itself terminates. Is this why you prefer the custom interner?

Chris Herring 2009-11-13 07:05:33

Correct; by having my own cache I control the life-time. If your strings are related to domain-specific data (perhaps from a db) this can be important. And clearing it is as simple as just letting the cache get collected (the existing string uses won't care).

Marc Gravell 2009-11-13 07:08:18

Note that the CLR will only (by default) intern strings in a few specific scenarios; string literals from your source code are interned, for example.

Marc Gravell 2009-11-13 07:10:05

Answer 2

+4 A:

Have you actually determined whether or not this is actually a problem? The CLR does a lot of string interning on your behalf so it is possible (depending on CLR version and how your code was compiled) that you are not using as much memory as you think you are.

I would highly recommend that you validate your suspicions about memory utilization before you change your design.

Andrew Hare 2009-11-13 06:35:30

Thanks, I didn't know that. The beginning example is a simplification of an already existing design which does take up a lot of memory which I assumed the numerous strings to be contributing to. Hard to pin down exactly where the memory is going...

Chris Herring 2009-11-13 06:53:29

Answer 3

A:

As others have mentioned you first want to see if this is a problem before changing your design etc. If it is a problem and / or you dealing with a lot of sparse data then a sparse data structure might be more applicable for the problem. I will post a very simple naive implementation later (as I am unable to at the moment) but a two dimensional sparse matrix would do what you desire.

The concept is that a single format (either string or class) will be stored for a given cell range e.g. 1-1000. To benefit from this you have to make some design changes... The format property will need to be removed from the cell class. Instead the format should be registered either in the table class or preferably with another class. For example

public class CellFormats 
{ ....
public void Register(int start, int finish, string format);
}

The cell formats class will contain the sparse matrix which will contain a format for a range.

The Table class will then utilise the CellFormats class. Instead of having a method called GetCellFormat it would have a method with the following signature

void ApplyCellFormat(Position cellPosition)

This would retrieve the cell format from the CellFormats class (sparse matrix) and apply it to the cell.

This technique can dramatically reduce memory usage if the data is common and applied to large ranges. But as I have already said, you need to ensure this is cause of the problem before making your design more complex by adding code like this.

Ian Gibson 2009-11-13 11:35:14

ansaurus

tags:

views:

answers:

Reducing memory of similar objects

related questions