views:

96

answers:

3

I'm looking at reducing the memory consumption of a table like collection object.

Given a class structure like

Class Cell
{
    public property int Data;
    public property string Format;
}

Class Table
{
    public property Dictionary<Position, Cell> Cells;
}

When there are a large number of cells the Data property of the Cell class may be variable but the Format property may be repeated many times, e.g. the header cells may have an empty format string for titles and the data cells may all be "0.00".

One idea is to something like the following

Class Cell
{
    public property int Data;
    public property int FormatId;
}
Class Table
{
    public property Dictionary<Position, Cell> Cells;
    private property Dictionary<Position, string> Formats;

    public string GetCellFormat(Position);
}

This would save memory on strings however the FormatId integer value would still be repeated many times.

Is there a better implementation than this? I've looked at the flyweight pattern but am unsure if it matches this.

A more complex implementation I am considering is removing the Format property from the Cell class altogether and instead storing the Formats in a dictionary that groups adjacent cells together
e.g. there may be 2 entries like this
<item rowFrom=1 rowTo=1 format="" />
<item romFrom=2 rowTo=1000 format="0.00" />

+5  A: 

For strings, you could perhaps look at interning; either with the inbuilt interner, or (preferably) a custom interner - basically a Dictionary<string,string>. What this means is that each identical string uses the same reference - and the duplicates can be collected.

Don't do anything with the int; that is already optimal.

For example:

using System;
using System.Collections.Generic;
class StringInterner {
    private readonly Dictionary<string, string> lookup
        = new Dictionary<string, string>();
    public string this[string value] {
        get {
            if(value == null) return null;
            if(value == "") return string.Empty;
            string result;
            lock (lookup) { // remove if not needed to be thread-safe     
                if (!lookup.TryGetValue(value, out result)) {
                    lookup.Add(value, value);
                    result = value;
                }
            }
            return result;
        }
    }
    public void Clear() {
        lock (lookup) { lookup.Clear(); }
    }
}
static class Program {
    static void Main() {
        // this line is to defeat the inbuilt compiler interner
        char[] test = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' };

        string a = new string(test), b = new string(test);
        Console.WriteLine(ReferenceEquals(a, b)); // false
        StringInterner cache = new StringInterner();
        string c = cache[a], d = cache[b];
        Console.WriteLine(ReferenceEquals(c, d)); // true
    }
}

You could take this further with WeakReference if desired.

Note importantly that you don't need to change your design - you just change the code that populates the object to use the interner/cache.

Marc Gravell
+1 Nice example :)
Andrew Hare
thanks. I guess I need to determine whether the CLR is already doing this for me.
Chris Herring
The docs specify that the memory allocated for CLR interned strings are not likely to be releeased until the CLR itself terminates. Is this why you prefer the custom interner?
Chris Herring
Correct; by having my own cache I control the life-time. If your strings are related to domain-specific data (perhaps from a db) this can be important. And clearing it is as simple as just letting the cache get collected (the existing string uses won't care).
Marc Gravell
Note that the CLR will only (by default) intern strings in a few specific scenarios; string literals from your source code are interned, for example.
Marc Gravell
+4  A: 

Have you actually determined whether or not this is actually a problem? The CLR does a lot of string interning on your behalf so it is possible (depending on CLR version and how your code was compiled) that you are not using as much memory as you think you are.

I would highly recommend that you validate your suspicions about memory utilization before you change your design.

Andrew Hare
Thanks, I didn't know that. The beginning example is a simplification of an already existing design which does take up a lot of memory which I assumed the numerous strings to be contributing to. Hard to pin down exactly where the memory is going...
Chris Herring
A: 

As others have mentioned you first want to see if this is a problem before changing your design etc. If it is a problem and / or you dealing with a lot of sparse data then a sparse data structure might be more applicable for the problem. I will post a very simple naive implementation later (as I am unable to at the moment) but a two dimensional sparse matrix would do what you desire.

The concept is that a single format (either string or class) will be stored for a given cell range e.g. 1-1000. To benefit from this you have to make some design changes... The format property will need to be removed from the cell class. Instead the format should be registered either in the table class or preferably with another class. For example

public class CellFormats 
{ ....
public void Register(int start, int finish, string format);
}

The cell formats class will contain the sparse matrix which will contain a format for a range.

The Table class will then utilise the CellFormats class. Instead of having a method called GetCellFormat it would have a method with the following signature

void ApplyCellFormat(Position cellPosition)

This would retrieve the cell format from the CellFormats class (sparse matrix) and apply it to the cell.

This technique can dramatically reduce memory usage if the data is common and applied to large ranges. But as I have already said, you need to ensure this is cause of the problem before making your design more complex by adding code like this.

Ian Gibson