views:

236

answers:

6

I'm writing a game that has a huge 2D array of "cells". A cell takes only 3 bytes. I also have a class called CellMap, which contains the 2D array as a private field, and provides access to it via a public indexer.

Profiling showed that a performance problem is caused by garbage collection of too many Cell objects. So I decided to make Cell a struct (it was a class).

But now code like this doesn't work:

cellMap[x, y].Population++;

I can think of many options, but I don't really like any of them.

  1. Make the array public, and write cellMap.Data[x, y].Population = 5;
  2. Stop using a CellMap class, and just use a 2D array directly. But CellMap is very convenient because it implements its own optimized serialization, and it exposes Width and Height properties that are more convenient than writing cellMap.GetUpperBound(0)+1
  3. Make Cell immutable. But then how would the code look? cellMap[x, y] = IncrementCellPopulation(cellMap[x, y])? Very verbose.
  4. A couple of utility functions like cellMap.SetPopulationAt(x, y, 5)
  5. In every class that owns a CellMap, add a utility property like private Cell[,] CellData { get { return this.CellMap.GetInternalArray(); } }, so then my code can look like CellData[x, y].Population++

How is this problem traditionally solved?

+4  A: 

6 . Use a ref parameter in a method that mutates the value, call it as IncrementCellPopulation(ref cellMap[x, y])

leppie
That's possible? Wow, I could not have told you that! (Am I correct in saying this only works for elements in arrays, and not in e.g. `List<T>` or other collection types?)
Dan Tao
@Dan Tao: Just works on arrays and fields.
leppie
Actually, now I am not so sure about multidimensional arrays...
leppie
@Dan or leppie: Could either of you post a link to a reference on this. Not doubting you, just want to understand all of the restrictions.
Steven Sudit
Works on multidimensional arrays too :)
leppie
@Steven: "ref" works on *variables* because *ref makes an alias to a variable*. Variables can *vary*, that is why they are called *variables*, so to change an immutable struct, you have to change the value in a variable. (Integers are immutable, but you can change which integer is in a variable.) So now the question is "what is a variable?" locals, fields, array elements and pointer dereferences are variables. Properties and indexers are not; those are actually method calls to getters and setters, not variable accesses.
Eric Lippert
@Steven Sudit: I tried to look for it, but cannot find the reference in the C# spec now. Edit: I might have picked this up from CLI reference and the fact that there are MSIL instructions for it.
leppie
@leppie: section 1.6.6.1: "The argument passed for a reference parameter must be a variable", section 1.3: "There are several kinds of variables in C#, including fields, array elements, local variables, and parameters."
Eric Lippert
@Eric: Thanks for the clarification. I do understand that passing a reference to a value type does allow us to change it, but I'm unclear on the restrictions implied by Dan and leppie. For example, I'm not sure why it wouldn't work on a multidimensional array. That's why I asked about a reference.
Steven Sudit
@Steven: *multidimensional array elements* are *array elements*. Array elements are *variables*. I note that it is not as *cheap* to take the managed address of a multidimensional array element as it is to take the managed address of a single dimensional array element, but you can, and the compiler will generate code to do so on your behalf.
Eric Lippert
@Eric: Ok, that makes more sense. I was thinking that, if it works on single-dimensional array, it has to work for multidimensional. Thanks for confirming this and thereby correcting my initial confusion. My equilibrium is restored.
Steven Sudit
+1  A: 

Encapsulate what you want the CellMap to do, and allow access to the actual array only through appropriate methods like IncrementPopupation(int x, int y). Making an array (or any variable, for that matter) public in most cases is a serious code smell, as is returning an array in .NET.

For performance reasons, consider using a single dimensional array; those are way faster in .NET.

lbruder
If the performance problem is the garbage collector then changing the array layout is unlikely to help.
Eric Lippert
+9  A: 

If you want to make Cell immutable - as you should if it is a struct - then a good technique is to make a factory that is an instance method on the Cell:

struct C
{
    public int Foo { get; private set; }
    public int Bar { get; private set; }
    private C (int foo, int bar) : this()
    {
        this.Foo = foo;
        this.Bar = bar;
    }
    public static C Empty = default(C);
    public C WithFoo(int foo)
    {
        return new C(foo, this.Bar);
    }
    public C WithBar(int bar)
    {
        return new C(this.Foo, bar);
    }
    public C IncrementFoo()
    {
        return new C(this.Foo + 1, bar);
    }
    // etc
}
...
C c = C.Empty;
c = c.WithFoo(10);
c = c.WithBar(20);
c = c.IncrementFoo();
// c is now 11, 20

So your code would be something like

map[x,y] = map[x,y].IncrementPopulation();

However, I think this is possibly a blind alley; it might be better to simply not have so many Cells around in the first place, rather than trying to optimize a world where there are thousands of them. I'll write up another answer on that.

Eric Lippert
In addition to the efficiency concern you mention, I find this approach a bit nonintuitive compared to the approach used in your other answer. I feel more comfortable with a mental model of a `Cell` as basically an immutable pair of coordinates pointing to `Population` / `Color` data, rather than as an immutable container which holds this data. Both mental models are valid, but for some reason I like the first one better.
Brian
+17  A: 

So there are actually two problems here. There's the question you actually asked: what are techniques to deal with the fact that structs ought to be immutable because they are copied by value, but you want to mutate one. And then there's the question which is motivating this one, which is "how can I make the performance of my program acceptable?"

My other answer addresses the first question, but the second question is interesting as well.

First off, if the profiler has actually identified that the performance problem is due to garbage collection of cells, then it is possible that making cell into a struct will help. It is also possible that it will not help at all, and it is possible that doing so will make it worse.

Your cells do not contain any reference types; we know this because you've said they are only three bytes. If someone else reading this is thinking that they could make a performance optimization by turning a class into a struct then it might not help at all because the class might contain a field of reference type, in which case the garbage collector still has to collect every instance, even if it is turned into a value type. The reference types in it need to be collected too! I would only recommend attempting this for performance reasons if Cell contains only value types, which apparently it does.

It might make it worse because value types are not a panacea; they have costs too. Value types are often more expensive to copy than reference types (which are pretty much always the size of a register, almost always aligned on the appropriate memory boundary, and therefore the chip is highly optimized for copying them). And value types are copied all the time.

Now, in your case you have a struct which is smaller than a reference; references are four or eight bytes typically. And you're putting them in an array, which means that you are packing the array down; if you have a thousand of them, it'll take three thousand bytes. Which means that three out of every four structs in there are misaligned, meaning more time (on many chip architectures) to get the value out of the array. You might consider measuring the impact of padding your struct out to four bytes to see if that makes a difference, provided you're still going to keep them in an array, which brings me to my next point...

The Cell abstraction might simply be a bad abstraction for the purpose of storing data about lots of cells. If the problem is that Cells are classes, you're keeping an array of thousands of Cells, and collecting them is expensive, then there are solutions other than making Cell into a struct. Suppose for example that a Cell contains two bytes of Population and one byte of Color. That is the mechanism of Cell, but surely that is not the interface you want to expose to the users. There is no reason why your mechanism has to use the same type as the interface. And therefore you could manufacture instances of the Cell class on demand:

interface ICell
{
   public int Population { get; set; }
   public Color Color { get; set; }
}
private class CellMap
{
    private ushort[,] populationData; // Profile the memory burden vs speed cost of ushort vs int
    private byte[,] colorData; // Same here. 
    public ICell this[int x, int y] 
    {
        get { return new Cell(this, x, y); }
    }

    private sealed class Cell : ICell
    {
        private CellMap map;
        private int x;
        private int y;
        public Cell(CellMap map, int x, int y)
        {
            this.map = map; // etc
        }
        public int Population  
        {
            get { return this.map.populationData[this.x, this.y]; } 
            set { this.map.populationData[this.x, this.y] = (ushort) value; } 
        }

and so on. Manufacture the cells on demand. They will almost immediately be collected if they are short-lived. CellMap is an abstraction, so use the abstraction to hide the messy implementation details.

With this architecture you don't have any garbage collection problems because you have almost no live Cell instances, but you can still say

map[x,y].Population++;

no problem, because the first indexer manufactures an immutable object which knows how to update the state of the map. The Cell doesn't need to be mutable; notice that the Cell class is completely immutable. (Heck, the Cell could be a struct here, though of course casting it to ICell would just box it anyway.) It is the map which is mutable, and the cell mutates the map for the user.

Eric Lippert
+1 Excellent solution; I implemented almost the exact same idiom with triangle meshes and it worked great.
Ron Warholic
+1 that definitely the right answer !
digEmAll
+1, solve the real underlying problem
jvenema
+2  A: 

If your cell map is in fact "sparse", that is, if there are a lot of adjacent cells that have either no value or some default value, I would suggest that you do not create a cell object for these. Only create objects for cells that actually have some non-default state. (This might reduce the total number of cells by a significant amount, thereby taking pressure off the garbage collector.)

This approach would of course require you to find a new way to store your cell map. You would have to get away from storing your cells in an array (since they are not sparse), and embrace a different kind of data structure, probably a tree.

For example, you could subdivide your map into a number of uniform regions such that you can translate any cell coordinates into a corresponding region. (You could further subdivide each region into sub-regions according to the same idea.) You could then have a search tree per region where the cell coordinates act as key into the tree.

Such a scheme would allow you to only store the cells you need, while still offering fast access to any cell in your map. If no cell at some specified coordinates is found in the trees, it can be assumed that it's a default cell.

stakx
This is a great idea. If the map is square then an immutable quad tree is an excellent data structure for this sort of thing. I have been planning a series of blog posts on using immutable quad trees for efficiently storing data where big portions of it are empty, but I haven't gotten around to it yet.
Eric Lippert
A: 

Eric Lippert's approach is good, but I would suggest using a base class rather than an interface for the indirect accessor. The following program demonstrates a class which acts like a sparse array of points. Provided that one never persists any item of type PointRef(*), things should work beautifully. Saying:

  MyPointHolder(123) = somePoint

or

  MyPointHolder(123).thePoint = somePoint

will both create a temporary pointRef object (a pointRef.onePoint in one case; a pointHolder.IndexedPointRef in the other) but the widening typecasts work to maintain value semantics. Of course, things would have been much easier if (1) methods on value types could be marked as mutators, and (2) writing a field of a structure accessed via property would could automatically read the property, edit the temporary structure, and write it back. The approach used here works, though alas I don't know any way to make it generic.

(*) Items of type PointRef should only be returned by properties, and should never be stored in a variable or used as parameters to anything other than a setter property which will convert to a Point.

MustInherit Class PointRef
    Public MustOverride Property thePoint() As Point
    Public Property X() As Integer
        Get
            Return thePoint.X
        End Get
        Set(ByVal value As Integer)
            Dim mypoint As Point = thePoint
            mypoint.X = value
            thePoint = mypoint
        End Set
    End Property
    Public Property Y() As Integer
        Get
            Return thePoint.X
        End Get
        Set(ByVal value As Integer)
            Dim mypoint As Point = thePoint
            mypoint.Y = value
            thePoint = mypoint
        End Set
    End Property
    Public Shared Widening Operator CType(ByVal val As Point) As PointRef
        Return New onePoint(val)
    End Operator
    Public Shared Widening Operator CType(ByVal val As PointRef) As Point
        Return val.thePoint
    End Operator
    Private Class onePoint
        Inherits PointRef

        Dim myPoint As Point

        Sub New(ByVal pt As Point)
            myPoint = pt
        End Sub

        Public Overrides Property thePoint() As System.Drawing.Point
            Get
                Return myPoint
            End Get
            Set(ByVal value As System.Drawing.Point)
                myPoint = value
            End Set
        End Property
    End Class
End Class


Class pointHolder
    Dim myPoints As New Dictionary(Of Integer, Point)
    Private Class IndexedPointRef
        Inherits PointRef

        Dim ref As pointHolder
        Dim index As Integer
        Sub New(ByVal ref As pointHolder, ByVal index As Integer)
            Me.ref = ref
            Me.index = index
        End Sub
        Public Overrides Property thePoint() As System.Drawing.Point
            Get
                Dim mypoint As New Point(0, 0)
                ref.myPoints.TryGetValue(index, mypoint)
                Return mypoint
            End Get
            Set(ByVal value As System.Drawing.Point)
                ref.myPoints(index) = value
            End Set
        End Property
    End Class

    Default Public Property item(ByVal index As Integer) As PointRef
        Get
            Return New IndexedPointRef(Me, index)
        End Get
        Set(ByVal value As PointRef)
            myPoints(index) = value.thePoint
        End Set
    End Property

    Shared Sub test()
        Dim theH1, theH2 As New pointHolder
        theH1(5).X = 9
        theH1(9).Y = 20
        theH2(12).X = theH1(9).Y
        theH1(20) = theH2(12)
        theH2(12).Y = 6
        Dim h5, h9, h12, h20 As Point
        h5 = theH1(5)
        h9 = theH1(9)
        h12 = theH2(12)
        h20 = theH1(20)
    End Sub
End Class
supercat