tags:

views:

190

answers:

1

I have the following matrix of size m=4

   0.00000   0.09130   0.09130   0.00000
   0.04565   0.00000   0.00000   0.00000
   0.04565   0.00000   0.00000   0.00000
   0.00000   0.00000   0.00000   0.00000

And I want to replace the diagonal of that matrix with (1 - sum of its column). Resulting matrix:

   0.90870   0.09130   0.09130   0.00000
   0.04565   0.90870   0.00000   0.00000
   0.04565   0.00000   0.90870   0.00000
   0.00000   0.00000   0.00000   1.00000

So for example for (1,1) we have

   1 - (0.04565 + 0.04565 + 0.00000) = 0.90870

Now the actual practice the size of m is very large of scale 10^6 to 10^7. So I can't afford to store the initial matrix into a container.

Is there any memory efficient alternative way to do it?

The current is the implementation I have for slurping it into vector of vectors. It cannot handle large m (10^6).

    #include <iostream>
    #include <vector>
    #include <fstream>
    #include <sstream>
    #include <map>
    using namespace std;

      // Initialize Matrix Before Slurping
       vector <vector<double> > Matrix;
        Matrix.resize(nofRow);
        for(size_t i = 0; i < nofRow; ++i)
        {
            Matrix[i].resize(nofCol);
        }



        if (arg_count !=2 ) {
        cerr << "expected one argument" << endl;
        return EXIT_FAILURE;
    }

    string line;
    ifstream myfile (arg_vec[1]);

    // Slurpint it
    int count1=0;
    if (myfile.is_open())
    {   

        while (getline(myfile,line) )
        {
            stringstream ss(line);
            double Value;
            count1++;            

            int count2=0;
            while (ss >> Value) {
                count2++;
                Matrix[count1][count2] = Value;
            }


        }
        myfile.close();
    }
    else { cout << "Unable to open file"; }


     // Summing up Column;
        vector <double> ColSum;
        ColSum.resize(nofCol);
        for(size_t i = 0; i < nofRow; ++i)
        {
            for(size_t j = 0; j < nofCol; ++j)
            {
                //std::cout <<"["<<i<<"]"<<"["<<j<<"] = " <<Matrix[i][j]<<std::endl;
                ColSum[j] += Matrix[i][j];
            }
        }  



        // Printing it
        for(size_t k = 0; k < nofRow; ++k)
        {
            for(size_t l = 0; l < nofCol; ++l)
            {
                  if (k == l ) {
                      double OneMinusSum = 1 - ColSum[k];
                      //if (OneMinusSum < 0) { OneMinusSum = 1; };
                     std::cout << OneMinusSum << "\t";
                  }
                  else {
                      std::cout<< Matrix[k][l] << "\t";
                  }
            }

            std::cout << std::endl;
        }
+4  A: 

Create a vector of size m to store the diagonal. Then go through the file and add the ith column of each line to diag[i]. Now go through the file again and output each line, but replace the value of the ith element on the ith line with diag[i]. This way you only need to store a vector of size m in memory.

sepp2k
Also, assuming disk I/O is going to be the bottleneck, and that the 1PB of input data is physically stored on more than one disk volume, some kind of parallelisation would be in order. The vector representing the diagonal can easily be sharded, so this algorithm is hopefully a good start.
Steve Jessop