views:

248

answers:

5
+2  Q: 

2D Data Structure

I've given this a lot of thought but haven't really been able to come up with something.

Suppose I want a m X n collection of elements sortable by any column and any row in under O(m*n), and also the ability to insert or delete a row in O(m+n) or less... is it possible?

What I've come up with is a linked-grid, where the nodes are inserted into a vector so I have indices for them, and indexed the first row and column to remove the necessity to traverse the list in any one direction. with my method I've achieved the above complexity, but I was just wondering if it is possible to reduce that further by a non-constant factor.

Example for sortability:

1 100 25 34
2 20  15 16
3 165 1  27

Sorted by 3rd row:

25 1 34 100
15 2 16 20
 1 3 27 165

Sorting THAT by 1st column:

 1 3 27 165
15 2 16 20
25 1 34 100
A: 

You can use a hash table and insert (i,j) -> node where (i,j) is a 2-tuple containing 2 integers. You can write your own custom class which defines Equals method and a GetHash() method for that ... or Python gives it to you free of charge.

Now ... what exactly do you mean - sortable by a row or a column? Give an example with values please!

Hamish Grubijan
I thought of a hash table, but then the sorting becomes very troublesome.
Vanwaril
Nope, not very troublesome at all, but it will take O(m*n).
Hamish Grubijan
You need to look at the nth row or column, extract it as a list, sort it, deduce the permutation list e.g. (1,2,5,4,3,6) showing where the indices should go, an then apply this order to ALL elements inside the dictionary. This is a fun problem, and Python implementation can be quite concise.
Hamish Grubijan
Well, what you're saying then is to use a hash table instead of my vector, but that makes no difference, because the vector is still amortized constant time insertion. the node structure will still have to have 4 pointers, and the sorting and insertion logic will still have to deal with those.
Vanwaril
Yes. The way I'm doing it is through the indexing vectors -- I'm sorting the index based on the values of the permutation list. My implementation is in C++ and pretty long though, heh :)
Vanwaril
I am not a C++ guru. My feeling is that sometimes the complexity of C++ gets in the way. I like simpler implementations because they are less prone to bugs. Python + dict allows me to do exactly that, IMO. Also, how fast is your insertion / deletion of a row? Finally, if you are using c++, then add it as a tag.
Hamish Grubijan
Insertion and deletion of a row in O(m+n). It goes across the header to the row/column number and inserts, changing links on either side. Similarly for delete, connecting links.
Vanwaril
A: 

Perhaps by creating a small database for it?

Databases sorting algorithms probably are better than reinventing the wheel. MySql would do. In order to gain performance, table can be created in memory. Then you can index on columns as a usual table, and let the database engine do the dirty job (ordering and such). And then you just harvest the results.

oscar
+3  A: 

I would create two index arrays, one for the columns, and one for the rows. So for your data

1 100 25 34
2 20  15 16
3 165 1  27   

You create two arrays:

  • cols = [0, 1, 2, 3]
  • rows = [0, 1, 2]

Then when you want to sort the matrix by the 3rd row, you keep the original matrix intact, but just change the indices array accordingly:

  • cols = [2, 0, 3, 1]
  • rows = [0, 1, 2]

The trick now is to access your matrix with one indirection. So instead of accessing it with m[x][y] you access it by m[cols[x]][rows[y]]. You also have to use m[cols[x]][rows[y]] when you perform the reordering of the rows/cols array.

This way sorting is O(n*log(n)), and access is O(1).

For the data structure, I would use an array with links to another array:

+-+
|0| -> [0 1 2 3 4]
|1| -> [0 1 2 3 4]
|2| -> [0 1 2 3 4]
+-+

To insert a row, just insert it at the last position and update the the rows index array accordingly, with the correct position. E.g. when rows was [0, 1, 2] and you want to insert it at the front, rows will become [3, 0, 1, 2]. This way insertion of a row is O(n).

To insert a column, you also add it as the last element, and update cols accordingly. Inserting a column is O(m), row is O(n).

Deletion is also O(n) or O(m), here you just replace the column/row you want to delete with the last one, and then remove the index from the index array.

martinus
But then insertion and deletion of an element are O(m*n), and of a row, O(m^2*n)..
Vanwaril
Hi Vanwaril, I have update the entry, I think insertion and deletion can also be done in O(n) or O(m).
martinus
I never thought of swapping for delete... thanks!
Vanwaril
A: 

If I were handed this problem, I'd create row and column remapping vectors. E.G. to sort rows, I'd determine row order as normal, but instead of copying rows, I'd just change the row remapping vector.

It would look something like this:

// These need to be set up elsewhere.
size_t nRows, nCols;
std::vector<T> data;

// Remapping vectors.  Initially a straight-through mapping.
std::vector<size_t> rowMapping(nRows), colMapping(nCols);
for(size_t y = 0; y < nRows; ++y)
    rowMapping[y] = y;
for(size_t x = 0; x < nCols; ++x)
    colMapping[x] = x;

// Then you read data(row, col) with
T value = data[rowMapping[row] * nCols + colMapping[col]];

P.S. a small optimization would be to store pointers in rowMapping instead of indices. This would let you do T value = rowMapping[row][colMapping[col]];, however, you would have to recalculate the pointers every time that the dimensions of data changes, which could be error-prone.

Mike DeSimone
Again, the problem with this is though access and sorting are fast, insertion and deletion are not.
Vanwaril
Insertion and deletion are O(n) if you pre-allocate rows and columns. Also, fast insertion and deletion was not specified as a requirement.
Mike DeSimone
+1  A: 

Just to add to martinus and Mike's answers: what you need is, in essence, pivoting, which is what they suggest and a very well known technique used in pretty much any numeric algorithm involving matrices. For example, you can run a quick search for "LU decomposition with partial pivoting" and "LU decomposition with full pivoting". The additional vectors that store the permutations are called the "pivots".

d.