ansaurus

Question

Best way to store a sparse matrix in .NET

Answer 1

+2 A:

I haven't used it, but Nmath Matrix handles these (not free).

Also, Extreme Optimization Numerical Libraries for .NET (not free).

Here's a free one: Math.NET Project (specifically MathNet.Numerics.LinearAlgebra.Sparse namespace)

Mitch Wheat 2009-04-16 14:25:12

Answer 2

+2 A:

I guess a Dictionary<int, Dictionary<int, object >> should suffice.

SDX2000 2009-04-16 14:25:47

Answer 3

+1 A:

I think this could be done by using a class holding plain array, saving the horizontal offset applied between matrix rows and defining stripe of a row, e.g. the number of valid entries. So for a large matrix where only the diagonal and two neighbor elements are defined you'd create an array of 3 * number of rows and store 3 as the stripe width. The offset depends on the size of the matrix.

I'm not aware of anything free which already does this.

__grover 2009-04-16 14:30:01

Good idea. I might implement it as such:Assuming only positive input, we could handle negative numbers as the number of 0 entries between entries. So the following...[1,2,-30,0,1,2,-29]Expands into[1,2,0,0...][0,1,2,0...]To offset, array[m*row+column] is (row,column) of an mxn matrix

Stefan Kendall 2009-04-16 14:40:39

Answer 4

A:

Here's a list of general data structure schemas. Each has its advantages and disadvantages, and are suitable for slightly different kinds of problems where sparse matrices arise. You'd probably want to implement them on top of existing data structures, such as List<> and Dictionary<>.

Pontus Gagge 2009-04-16 14:33:47

Answer 5

+1 A:

There are two questions here:

"Mostly around the main diagonal" is too vague. If the elements lie in bands, then use banded storage of the bands themselves, as vectors offset from the main diagonal. If the elements are scattered randomly in the vicinity of the main diagonal, then either use a banded form that may include some zeros in the bands, or use a pure sparse form that stores only the elements and their positions in the array.
What will you do with the matrix? If your goal is merely efficient storage, then a banded form will be efficient, with fast access to any element. If you will do linear algebra with the matrix, but never more than matrix*vector multiplies, then the banded form will still work splendidly. If you work with matrix*matrix multiplies or matrix factorizations, where fill-in becomes a problem, then a pure sparse form may be more appropriate. For example, the product of two banded matrices will have additional bands, so the product of two tridiagonal matrices will be pentadiagonal. For a factorization, reorderings will sometimes be useful to minimize fill-in. (AMD is one choice, Approximate Minimum Degree permutation, but there are other schemes.)

woodchips 2009-04-16 14:48:18

Answer 6

+2 A:

You could use an index based on the [row,col] of the cell. Since the data is on a diagonal, the typical approach of storing the row index and the associated column indeces with data is not optimal. Here is some code you could use to do it:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long Size { get; private set; }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.Size = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;
            }
        }
    }

    static void Main()
    {
        var sm = new SparseMatrix<int>(512, 512);
        sm[42, 42] = 42;
        int val1 = sm[13, 13];
        int val2 = sm[42, 42];

        Console.WriteLine("VAL1 = " + val1); // prints out 0
        Console.WriteLine("VAL2 = " + val2); // prints out 42

        Console.ReadLine();
    }

Note that when T is a struct, you might have to call the IsCellEmpty since getting the contents of a cell will not be null and will have the default value for that type. You can also expand the code to give you a quick "SparseRatio" based on the Size property and _cells.Count.

EDIT:

Well, if you are interesting is speed, you can do the trade-off of space vs speed. Instead of having only one dictionary, have three! It triples your space, but it makes enumerating in any way you want real easy. Here is some new code that shows that:

    public class SparseMatrix<T>
    {
        public int Width { get; private set; }
        public int Height { get; private set; }
        public long MaxSize { get; private set; }
        public long Count { get { return _cells.Count; } }

        private Dictionary<long, T> _cells = new Dictionary<long, T>();

        private Dictionary<int, Dictionary<int, T>> _rows = 
            new Dictionary<int, Dictionary<int, T>>();

        private Dictionary<int, Dictionary<int, T>> _columns = 
            new Dictionary<int, Dictionary<int, T>>();

        public SparseMatrix(int w, int h)
        {
            this.Width = w;
            this.Height = h;
            this.MaxSize = w * h;
        }

        public bool IsCellEmpty(int row, int col)
        {
            long index = row * Width + col;
            return _cells.ContainsKey(index);
        }

        public T this[int row, int col]
        {
            get
            {
                long index = row * Width + col;
                T result;
                _cells.TryGetValue(index, out result);
                return result;
            }
            set
            {
                long index = row * Width + col;
                _cells[index] = value;

                UpdateValue(col, row, _columns, value);
                UpdateValue(row, col, _rows, value);
            }
        }

        private void UpdateValue(int index1, int index2, 
            Dictionary<int, Dictionary<int, T>> parent, T value)
        {
            Dictionary<int, T> dict;
            if (!parent.TryGetValue(index1, out dict))
            {
                parent[index2] = dict = new Dictionary<int, T>();
            }
            dict[index2] = value;
        }
    }

If you want to iterate over all the entries, use _cells. If you want all the rows for a given column use _columns. If you want all the columns in a given row use _rows.

If you want to iterate in sorted order, you can start to add LINQ into the mix and/or use a sorted list with an inner class that encapsulates an entry (which would have to store the row or column and implement IComparable<T> for sorting to work).

Erich Mirabal 2009-04-16 15:04:19

Thank you, I like where you are going with this. Using dictionaries doesn't give me efficient access to entire rows or columns does it? (maybe using Linq it does ... ?). See my edit above.

Jeffrey Cameron 2009-04-16 15:52:54

See the update for another option. If space is not an issue, do the trade-off to get faster access by having multiple dictionaries.

Erich Mirabal 2009-04-16 18:28:59

Excellent suggestions, thank you very much

Jeffrey Cameron 2009-04-17 13:08:10

ansaurus

tags:

views:

answers:

Best way to store a sparse matrix in .NET

related questions