ansaurus

Question

How do I solve this sparse matrix storage problem in C?

Answer 1

+2 A:

An alternate approach is to use a linked structure (very efficient if the matrix very sparse, not so good as it gets more filled). I hinted at the implementation in a earlier answer.

I you are going to go with the continuous run implementation, I'm not sure that you really want/need to use equal lengths rows. Why not use a ragged array?

dmckee 2010-08-12 18:49:12

I think I dont want to use a ragged array is because I want to create A with a single malloc. I will eventually be doing a sparse vector/matrix multiply and I will need it to run as efficiently as possible

Derek 2010-08-12 18:50:45

In that case, your problem would appear to be one of determining the maximum row width before you can allocate the whole structure. I think that means you're going to have to make two passes over the data file.

dmckee 2010-08-12 18:59:36

I updated with a bit of a graphical picture to help understand what I am trying to accomplish

Derek 2010-08-12 19:12:00

What you wanted was quite clear from the beginning. But you need some info you don't have (or at least haven't told us about) in order to make the single malloc that you want. To wit: you need to know the maximum length-of-the-represented row (and the number of rows, though I've been assuming that you knew that). This means two passes over your data.

dmckee 2010-08-12 19:32:15

Answer 2

+3 A:

I would implement this as a ragged array, with A[n][0] always returning the element on the diagonal. A[n][1] will return the item just to the right of the diagonal, A[n][2] will return the item to the left of the diagonal, and so. Then, you just need a function that maps matrix index [i,j] to ragged array index[r][s].

This has the advantage of sparsity, and if your values stay close to the diagonal the arrays are not very long.

Alternatively, you could have this definition:

struct Row
{
    int InitialOffset;
    int NumElements;
    int[] Values;
}

Then you would have a Row[]. Retrieving a value based on matrix index would look like this:

//matrix is merely an array of rows...
int GetValue(*matrix this, int i, int j)
{
    Row CurrentRow = (*this)[i];
    if (CurrentRow.InitialOffset > j)
        return 0;
    else if (CurrentRow.InitialOffset + CurrentRow.NumElements < j)
        return 0; 
    return CurrentRow.Values[j - CurrentRow.InitialOffset]
}

My C syntax is a little hazy, but you should get the idea.

Based on your demonstration, I would recommend this:

struct Matrix
{
    int[,] Data
    int[] StartOffset;
    int[] NumberElements;
}

Used as follows...

int GetValue(*Matrix this, int i, int j)
{
    if (this.StartOffset[i] > j)
        return 0;
    else if (this.StartOffset[i] + this.NumberElements[i] < j)
        return 0; 
    return this.Data[i, j-this.StartOffset[i]];
}

Your initialization procedure would look something like this

//Data is a struct that holds row index, col index, and value
Matrix* InitMatrix (*Data values, int numVals)
{
    //loop through values to find longest row and number of rows
    //create new matrix, malloc matrix for longrow * numRows
    //malloc numrows elements for StartOffset and NumItems
    //foreach row, find min() and max()-min() of col indexs and 
    //store in StartOffset and NumItems
}

You need to do some processing, but data compression isn't cheap.

DonaldRay 2010-08-12 18:53:24

Cute indexing idea. Should be very helpful if the values hew close to the diagonal.

dmckee 2010-08-12 19:00:45

I had previously implemented something similar to this. However, since the element at A[n][0] is actually the same value as A[n+1][1] in my physical structure I am modelling, any update to A (as in parallel programming) was a pain, because I had to re-pack the entire matrix basically, or just re-build it completely.

Derek 2010-08-12 19:11:20

Eep. Should I clear out my previous suggestions or keep them for posterity? (see latest and greatest in #3, should do what your picture says)

DonaldRay 2010-08-12 19:16:48

@DonaldRay - but in the worst case you still need an array of size mxn even for very sparse matrices. Think of a matrix where only the first and last columns are populated. It is very empty, but you need to address all the elements to the left and right of the diagonal to get to the edges. Or did I misunderstand your suggestion?

ysap 2010-08-12 19:18:18

@Donald: I generally leave mine around. Mostly people won't down vote a good suggestion that has been rendered obsolete by new information.

dmckee 2010-08-12 19:18:47

So the problem with the suggestion you have is that I dont actually have a matrix to start from here. I am starting with a list of i,j coordinates, and the values corresponding to that i, j point. I want to end up essentially with a matrix that is fully-packed with all those zeroes, minus the leading and trailing zeros from the row.

Derek 2010-08-12 19:21:43

@ysap: that would be a worst case scenario for my first suggestion. I misread the original question; I interpreted "not a set distance from diagonal" as "close to the diagonal, but varyingly so". That case is still a weakness in the second suggestion, but is not present in the third.

DonaldRay 2010-08-12 19:23:23

Or to put it another way, I want to do a storage compression where I dont care about about any zeros except for the ones that are in between my non-zero values. I know those non-zero values will be relatively close to the diagonal, so there is no worst case close to what @ysap was suggesting

Derek 2010-08-12 19:23:47

@DonaldRay: he was right about worst case, but dont get hung up in that detail much..*most* of my cases are going to be somewhat near the diagonal, just doesnt have to be

Derek 2010-08-12 19:25:07

@Derek: updated answer to show how to initialize given a list of (i,j, data). You'll have to write the code yourself, but that should be a cakewalk :)

DonaldRay 2010-08-12 19:30:08

@DonaldRay I actually arlready have that exact code written. Your idea about suing the low values for the offset is the trick. It just hit me when re-reading the answer

Derek 2010-08-12 19:39:43

@DonaldRay - I still don't agree. Thinking of the MxN matrix I described above - your GetValue() implementation suggests that given a `StartOffset[i]` of 1, and assuming `j` is the required column (which for my example will be N), then your `.Data[]` array is still N elements long (`j-StartOffset[i]`).

ysap 2010-08-12 20:01:55

@Ysap: Read Derek's comments to this question. You're right, but the worst case just won't appear in the target domain

DonaldRay 2010-08-12 20:09:52

@DonaldRay - yes, I have read it. I just wanted to point out that your latest offer is no better than the previous one in terms of memory preservation. Having that said - it there is a priori information about a problem space, shortcuts and assumptions can always be made to ease a solution.

ysap 2010-08-12 20:13:00

@DonaldRay - While at it - I'm curious, how do you add the horizontal separator lines to your answer?

ysap 2010-08-12 20:14:10

@ysap: from http://stackoverflow.com/editing-help: Horizontal Rules Insert a horizontal rule <hr/> by putting three or more hyphens, asterisks, or underscores on a line by themselves.

DonaldRay 2010-08-13 12:53:03

Answer 3

+1 A:

Derek, you mentioned in one of the comments that you want to use a single malloc. That means that you know how many nonempty elements you have. Given this, tt is possible to store the sparse matrix in an array which holds, per element, the value of the matrix element and the "location delta" to the next element. Something like:

struct melem {
    int value; // value of data
    int offset; // offset to next element
}

struct melem matrix[num_nonempty_elements];

...

// Note: this is pseudocode!
matrix[row*COLS + col].value = a[row][col];
matrix[row*COLS + col].offset = (row*COLS + col)_[i] - (row*COLS + col)_[i-1];

EDIT: Thinking about it, this is pretty similar to the linked list approach, but requires 1 allocation. OTOH, it may require more calculation to access the required cell.

ysap 2010-08-12 19:32:16

ansaurus

tags:

views:

answers:

How do I solve this sparse matrix storage problem in C?

related questions