views:

519

answers:

13

This is a question asked to me by a very very famous MNC. The question is as follows ...

Input an 2D N*N array of 0's and 1's. If A(i,j) = 1, then all the values corresponding to the ith row and the jth column are going to be 1. If there is a 1 already, it remains as a 1.

As an example , if we have the array

  1 0 0 0 0 
  0 1 1 0 0 
  0 0 0 0 0 
  1 0 0 1 0
  0 0 0 0 0

we should get the output as

 1 1 1 1 1
 1 1 1 1 1
 1 1 1 1 0
 1 1 1 1 1
 1 1 1 1 0

The input matrix is sparsely populated.

Is this possible in less than O(N^2)?

No additional space is provided was another condition. I would like to know if there's a way to achieve the complexity using a space <= O(N).

P.S : I don't need answers that give me a complexity of O(N*N). This is not a homework problem. I have tried much and couldn't get a proper solution and thought I could get some ideas here.Leave the printing aside for the complexity

My rough idea was to may be dynamically eliminate the number of elements traversed restricting them to around 2N or so. But I couldn't get a proper idea.

+8  A: 

In the worst case, you may need to toggle N * N - N bits from 0 to 1 to generate the output. It would seem you're pretty well stuck with O(N*N).

kbrimington
but actually , the matrix is sparsely populated
ravi
@ravi: You are correct; however, a simple identity matrix, which is also sparse, causes N*N-N bits to be toggled to 1. Sparseness appears to offer no advantage in worst-case performance here.
kbrimington
@ravi: Which matrix is sparsely populated? The result matrix? Are you saying there's a limit on the number of 1s in the result?
David Thornley
@David Thornley: He's saying that the input matrix is sparse. The output matrix may or may not be sparse. As kbrimington points out, that seems to offer no advantage.
andand
@Tgr is right. Once you noticed that the result matrix as in intersted property which is : if a(i,j) = 0 and a(i',j') then a(i,j') and a(i',j) should be equal to 0. you just need to store the list of valid i and valid j for the negate output matrix. It the example you have i=[2,4] and j=[4]. So that way the identity is an 2 list of a length of N which O(2*N)= O(N) :-)
mb14
@mb14: That is an interesting point; however, the solution requires an alternate encoding of *both* input and output than required by the OP. Conversion from the matrix to the dual array format and back is still an O(N*N) operation.
kbrimington
@kbrimington: I see what you mean, but if you are using a "normal" 2D matrix, even creating an empty array is O(NxN) so in that case the question is not even interesting. I'm not a expert in sparse matrix , but I guess the idea is you can use whatever format you think is good to represent the matrix. And indeed , whatever format you chose, displaying as a 2D matrix will be O(NxN). But if the idea is to find the solution to a problem, I think you are free to use whatever reprensation you want, and even have a different for the input and output results. You just chose the most efficient one.
mb14
@mb14: Your point is well taken. With input and output specified in a fixed format, the best we could hope for was something that could update data in-place without iterating over all O(N*N) elements. It would be interesting, were we to ignore the output encoding, to see if @Tgr's alternate input encoding can be achieved in less than O(N*N) given the OP's input. The best I can dream up would run in N(N-1)/2, which is still O(N*N).
kbrimington
@kbrimington: talking about sparse matrices usually implies a special input format; otherwise you couldn't take advantage of it being sparse. If your input is an NxN bit array, doing anything interesting with it requires at least N*N operations in the worst case - doing any less would mean that you don't read some of the bits at all. (That is unless you suppose unconventional operations which read or manipulate multiple matrix elements at once.)
Tgr
@Tgr: Agreed. I'm in no way arguing that alternate encodings are bad or invalid. I find your solution to be clever. I remain curious if there is a sub-quadratic algorithm for converting the OP's input to your alternate input encoding.
kbrimington
I posted an answer whose worst case is with O(N*N)... see if possibl
ravi
You don't have to toggle any bits, because you don't have to store the resulting matrix. You just need to know what lines and columns are 'full'.
Andy
@kbrimington: you would need exactly N*N operations just to check every bit of the input. You cannot convert into any alternative format without doing that much.
Tgr
+5  A: 

I would imagine that you can optimize it for the best case, but I'm tempted to say that your worst case is still O(N*N): Your worst case will be an array of all 0s, and you will have to examine every single element.

The optimization would involve skipping a row or column as soon as you found a "1" (I can provide details, but you said you don't care about O(N*N)", but unless you have metadata to indicate that an entire row/column is empty, or unless you have a SIMD-style way to check multiple fields at once (say, if every row is aligned by 4, and you can read 32 bits worth data, or if your data is in form of a bitmask), you will always have to deal with the problem of an all-zero array.

EboMike
i actually have some solns. with N*N ... Using multiprocessors is not an option...
ravi
I didn't say multiprocessor. SIMD = Single Instruction, Multiple Data, i.e. one instruction to access more than one data.
EboMike
@Ebo : sry... an error
ravi
+1  A: 

There is clearly up to O(N^2) work to do. In the matrix

1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1

all bits have to be set to 1, and N*(N-1) are not set to one (20, in this 5x5 case).

Conversely, you can come up with an algorithm that always does it in O(N^2) time: sum along the top row and let column, and if the row or column gets a nonzero answer, fill in the entire row or column; then solve the smaller (N-1)x(N-1) problem.

So there exist cases that must take at least N^2 and any case can be solved in N^2 without extra space.

Rex Kerr
+3  A: 

Since every entry of the matrix has to be checked, your worst case is always going to be N*N.

With a small 2*N extra storage, you can perform the operation in O(N*N). Just create a mask for each row and another for each column - scan the array and update the masks as you go. Then scan again to populate the result matrix based on the masks.

If you're doing something where the input matrix is changing, you could store a count of non-zero entries for each row and column of the input (rather than a simple mask). Then when an entry in the input changes, you update the counts accordingly. At that point, I would drop the output matrix entirely and query the masks/counts directly rather than even maintaining the output matrix (which could also be updated as thing change in less than N*N time if you really wanted to keep it around). So loading the initial matrix would still be O(N*N) but updates could be much less.

phkahler
I like the idea of keeping track of a mask of set rows and a mask of set columns. You could input and output in a run length encoded format e.g. 1'1's 10'0's 3'1's... Remember the input in RLE form as it comes in while updating the mask of set rows and the mask of set columns. Then regurgitate the input in RLE form, taking account of the row and column masks as you go. For that matter, you could keep track of the masks in RLE form as well. With RLE you could have a compact form both for allmost all 0 on input and almost all 1 on output.
mcdowella
+3  A: 

The input matrix may be sparse, but unless you can get it in a sparse format (i.e. a list of (i,j) pairs that are initially set), just reading your input will consume Ω(n^2) time. Even with sparse input, it's easy to end up with O(n^2) output to write. As a cheat, if you were allowed to output a list of set rows and set columns, then you could get down to linear time. There's no magic to be had when your algorithm actually has to produce a result more substantial than 'yes' or 'no'.

Mcdowella's comment on another answer suggests another alternative input format: run-length encoding. For a sparse input, that clearly requires no more than O(n) time to read it (consider how many transitions there are between 0 and 1). However, from there it breaks down. Consider an input matrix structured as follows:

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 . . . 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .
.     .
.       .
.         .

That is, alternating 0 and 1 on the first row, 0 everywhere else. Clearly sparse, since there are n/2 ones in total. However, the RLE output has to repeat this pattern in every row, leading to O(n^2) output.

Novelocrat
A: 

If your matrix is sparse, the complexity depends much on the input encoding and its in particular not well measured in N N2 or something like that but in terms of N your input complexity Min and your output complexity Mout. I'd expect something like O(N + Min + Mout) but much depending on the encoding and the tricks that you can play with it.

Jens Gustedt
A: 

That depends entirely of your input data structure. If you pass your matrix (1s and 0s) as a 2D array you need to traverse it and that is O(N^2). But as your data is sparse, if you only pass the 1's as input, you can do it so the ouptut is O(M), where M is not the number of cells but the number of 1 cells. It would be something similar to this (pseudocode below):

list f(list l) {
   list rows_1;
   list cols_1;

    for each elem in l {
        rows_1[elem.row] = 1;
        cols_1[elem.col] = 1;
    }

    list result;
    for each row in rows_1 {
        for each col in cols_1 {
             if (row == 1 || col == 1) {
                 add(result, new_elem(row, col));
             }
        }
    } 
   return result;
}
gpeche
+2  A: 

You say:

we should get the output as...

So you need to output the entire matrix, which has N^2 elements. This is O(N*N).

The problem itself is not O(N*N): you dont have to compute and store the entire matrix: you only need two vectors, L and C, each of size N:

L[x] is 1 if line x is a line of ones, 0 otherwise;

C[x] is 1 if line x is a line of ones, 0 otherwise;

You can construct these vectors in O(N), because the initial matrix is sparse; your input data will not be a matrix, but a list containing the coordinates(line,column) of each non-zero element. While reading this list, you set L[line]=1 and C[column]=1, and the problem is solved: M[l,c] == 1 if L[l]==1 OR C[c]==1

Andy
I already mentioned... please dont consider the complexity of printing
ravi
A: 

Don't fill the center of the matrix when you're checking values. As you go through the elements, when you have 1 set the corresponding element in the first row and the first column. Then go back and fill down and across.

edit: Actually, this is the same as Andy's.

Hugh Brackett
+6  A: 

Clearly, nor the output matrix nor its negated version has to be sparse (take a matrix with half of the first row set to 1 and anything else to 0 to see), so time depends on what format you are allowed to use for the output. (I'm assuming the input is a list of elements or something equivalent, since otherwise you couldn't take advantage of the matrix being sparse.)

A simple solution for O(M+N) space and time (M is the number of ones in the input matrix): take two arrays of length N filled with ones, iterate through all ones in the input, and for each drop the X coordinate from the first array and the Y from the second one. The output is the two arrays, which clearly define the result matrix: its (X,Y) coordinate is 0 iff the X coordinate of the first array and the Y coordinate of the second are 0.

Update: depending on the language, you could use some trickery to return a normal 2D array by referencing the same row multiple times. For example in PHP:

// compute N-length arrays $X and $Y which have 1 at the column 
// and row positions which had no 1's in the input matrix
// this is O(M+N)
$result = array();
$row_one = array_fill(0,N,1);
for ($i=0; $i<N; $i++) {
    if ($Y[$i]) {
         $result[$i] = &$row_one;
    } else {
         $result[$i] = &$X;
    }
}
return $result;

Of course this is a normal array only as long as you don't try to write it.

Tgr
+1. I think it is important to recognize that sparse means that M/N (the equivalent number of diagonals or rows or columns filled with 1) does not increase with N and is much smaller than N. And I think without having an output data structure other than a plain NxN array performance better than O(N^2) is not achievable.
Andre Holzner
A: 

It depends on your data structure.

There are only two possible cases for rows:

  • A row i is filled with 1's if there is an element (i,_) in the input
  • All other rows are the same: i.e. the j-th element is 1 iff there is an element (_,j) in the input.

Hence the result could be represented compactly as an array of references to rows. Since we only need two rows the result would also only consume O(N) memory. As an example this could be implemented in python as follows:

def f(element_list, N):
  A = [1]*N
  B = [0]*N
  M = [B]*N
  for row, col in element_list:
    M[row] = A
    B[col] = 1
  return M

A sample call would be

 f([(1,1),(2,2),(4,3)],5)

with the result

[[0, 1, 1, 1, 0], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [0, 1, 1, 1, 0], [1, 1, 1, 1, 1]]

The important point is that the arrays are not copied here, i.e. M[row]=A is just an assignment of a reference. Hence the complexity is O(N+M), where M is the length of the input.

abc
+2  A: 

Hii guys ,

thanks to the comment from mb14 i think i could get it solved in less than O(N*N) time... The worst would take O(N*N)...

Actually , we have the given array suppose

  1 0 0 0 1
  0 1 0 0 0 
  0 1 1 0 0 
  1 1 1 0 1
  0 0 0 0 0 

Lets have 2 arrays of size N (this would be the worst case) ... One is dedicated for indexing rows and other columns... Put those with a[i][1] = 0 in one array and then a[1][j] =0 in another..

Then take those values only and check for the second row and colums...In this manner , we get the values of rows and colums where there are only 0;'s entirely...

The number of values in the row array gives number of 0's in the result array and the points a[row-array values][column array value] gives you those points ....

We could solve it in below O(N*N) and worst is O(N*N) ... As we can seee , the arrays ( of size N) diminishes ....

I did this for a few arrays and got the result for all of them ... :)

Please correct me if i am wrong anywhere...

Thanx for all your comments guys...You are all very helpful and i did learn quite a few things along the way ... :)

ravi
(vote for my comment then ;-) note this one)
mb14
yeah ... agreed...
ravi
A: 
#include<stdio.h>

include

int main() { int arr[5][5] = { {1,0,0,0,0}, {0,1,1,0,0}, {0,0,0,0,0}, {1,0,0,1,0}, {0,0,0,0,0} }; int var1=0,var2=0,i,j;

for(i=0;i<5;i++)
   var1 = var1 | arr[0][i];

for(i=0;i<5;i++)
   var2 = var2 | arr[i][0];

for(i=1;i<5;i++)
   for(j=1;j<5;j++)
      if(arr[i][j])
         arr[i][0] = arr[0][j] = 1;

for(i=1;i<5;i++)
   for(j=1;j<5;j++)
          arr[i][j] = arr[i][0] | arr[0][j];

for(i=0;i<5;i++)
   arr[0][i] = var1;

for(i=0;i<5;i++)
   arr[i][0] = var2;

for(i=0;i<5;i++)
{
   printf("\n");             
   for(j=0;j<5;j++)
      printf("%d ",arr[i][j]);
}

getch();

}

This program makes use of only 2 4 temporary variables (var1,var2,i and j) and hence runs in constant space with time complexity O(n^2).. I Think it is not possible at all to solve this problem in < O(n^2).

Raj Kumar