views:

74

answers:

2

I have a numpy matrix that contains mostly nonzero values, but that occasionally will contain a zero value. I need to be able to:
1.) count the non-zero values in each row, and put that count into a variable that I can use in subsequent operations, perhaps by iterating through row indices and performing the calculations during the iterative process
2.) count the non-zero values in each column, and put that count into a variable that I can use in subsequent operations, perhaps by iterating through column indices and performing the calculations during the iterative process

For example, one thing I need to do is to sum each row and then divide each row sum by the number of non-zero values in each row, reporting a separate result for each row index. And then I need to sum each column and then divide the column sum by the number of non-zero values in the column, also reporting a separate result for each column index. I need to do other things as well, but they should be easy after I figure out how to do the things that I am listing here.

The code I am working with is below. You can see that I am creating an array of zeros and then populating it from a csv file. Some of the rows will contain values for all the columns, but other rows will still have some zeros remaining in some of the last columns, thus creating the problem described above.

The last 5 lines of code below are from another posting on this forum, and those last 5 lines of code return a printed list of row/column indices for the zeros. But I do not know how to use that resulting information to create the nonzero row counts and nonzero column counts described above. Can anyone help me with this?

ANOVAInputMatrixValuesArray=zeros([len(TestIDs),9],float)
j=0
for j in range(0,len(TestIDs)):
    TestID=str(TestIDs[j])
    ReadOrWrite='Read'
    fileName=inputFileName
    directory=GetCurrentDirectory(arguments that return correct directory)
    inputfile=open(directory,'r')
    reader=csv.reader(inputfile)
    m=0
    for row in reader:
        if m<9:
            if row[0]!='TestID':
                ANOVAInputMatrixValuesArray[(j-1),m]=row[2]
                m+=1
    inputfile.close()

IndicesOfZeros = indices(ANOVAInputMatrixValuesArray.shape) 
locs = IndicesOfZeros[:,ANOVAInputMatrixValuesArray == 0]
pts = hsplit(locs, len(locs[0]))
for pt in pts:
    print(', '.join(str(p[0]) for p in pt))
+2  A: 
import numpy as np

a = np.array([[1, 0, 1],
              [2, 3, 4],
              [0, 0, 7]])

columns = (a != 0).sum(0)
rows    = (a != 0).sum(1)

The variable (a != 0) is an array of the same shape as original a and it contains True for all non-zero elements.

The .sum(x) function sums the elements over the axis x. Sum of True/False elements is the number of True elements.

The variables columns and rows contain the number of non-zero (element != 0) values in each column/row of your original array:

columns = np.array([2, 1, 3])
rows    = np.array([2, 3, 1])

EDIT: The whole code could look like this (with a few simplifications in your original code):

ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float)
for j, TestID in enumerate(TestIDs):
    ReadOrWrite = 'Read'
    fileName = inputFileName
    directory = GetCurrentDirectory(arguments that return correct directory)
    # use directory or filename to get the CSV file?
    with open(directory, 'r') as csvfile:
        ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]

nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0)
nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1)

EDIT 2:

To get the mean value of all columns/rows, use the following:

colMean = a.sum(0) / (a != 0).sum(0)
rowMean = a.sum(1) / (a != 0).sum(1)

What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.

eumiro
A: 

Thank you. So I followed your advice and wrote:
columns = (ANOVAInputMatrixValuesArray != 0).sum(0)
rows = (ANOVAInputMatrixValuesArray != 0).sum(1)
print 'columns are: ',columns
print 'rows are: ',rows
This resulted in the following output being printed:
columns are: [2 2 2 2 2 2 2 1 1]
rows are: [9 7]

Now how do I access these numbers so that the sum of all numerical values in row 0 can be divided by 9 and the sum of all numerical values of row 1 can be divided by 7, etc.?

I will want to do the same thing with columns, but the row example may express my question more clearly, given all the duplicate values in the list of column counts above.

MedicalMath
Please write your subsequent questions to the original post, do not create new answers.
eumiro
I have answered in my original answer as a new EDIT...
eumiro