views:

68

answers:

2

I have a python script that successfully loads a csv file into a 2d numpy array and which then successfully extracts the value of a desired cell based on its column and row header values. For diagnostic purposes, I have the script print the contents of the data matrix before it is put into a numpy array. The script works when the data from the underlying csv file contains values for all rows/columns. The problem is that it throws an error when I run the script on a csv file that apparently has a couple of empty rows/columns at the end of the csv file. I tried to address this by opening up the csv file in Notepad++ and deleting as much as it would let me delete from the end of the file. Notepad++ let me delete one row at the end, but did not indicate that there were any empty columns. Upon deeper examination of the relationship between the python printout and the structure of my underlying data, I see that the python print command is saying that there are two empty columns at the end of the array. In any event, after editing the csv file, I still got the same data printed out when I ran the script, and it still threw the same error, as if I had not deleted the empty line from the end of the csv file. I checked that I had saved the csv file, opened and closed the csv file a couple of times, and closed and re-opened python a couple of times, but the error still persists:

Here is my question:
How do I modify the script below to avoid this error?

Here is the function I was referring to above:

def GetHSD_alpha(NumberOfColumnMeans,dfResid):  
    dirname=os.path.dirname(os.getcwd())  
    resources=os.path.join(dirname,'resources')  
    inputfile=os.path.join(resources,'CriticalValuesOfTukeysHSD_a_0_01.csv')  
    separator=','  
    ColumnIndex=NumberOfColumnMeans  
    RowIndex=dfResid  
    cast = p.cast  
    data = [[] for dummy in xrange(13)]  
    for line in open(inputfile, 'r'):  
        fields = line.strip().split(separator)  
        for i, number in enumerate(fields):  
            data[i].append(number)  

    print 'data HSD alpha is:  ',data
    time.sleep(2)

    CriticalValuesArray=p.array(data)
    HSD_alpha_0_01=CriticalValuesArray[ColumnIndex,RowIndex]

return HSD_alpha_0_01

Also, for reference, here is an ABBREVIATED version of the result of printing the data that throws the error. Notice the empty elements at the end, which I cannot seem to manually eliminate from my csv file before running the script:

data HSD alpha is: [['', '5', '6', '7'], ['2', '5.7', '5.24', '4.95'], ['3', '6.98', '6.33', '5.92'], ['11', '10.48', '9.3', '8.55'], [], []]

Also for reference, here is the ABBREVIATED version of the result of printing data from another csv file that I imported into the script for diagnostic purposes. The data corresponding from the printout below did NOT cause the script to throw an error:
data HSD alpha is: [['', '1', '2', '3'], ['1', '4052', '98.49', '34.12'], ['2', '4999', '99.01', '30.81'], ['3', '5403', '99.17', '29.46']]

Again, when I open the underlying csv files in Notepad++, there do not seem to be any empty columns or rows, and I have checked those data files carefully.

Finally, I imagine that the number of empty rows/columns may vary, so any solution would need to be able to handle variables numbers of empty rows/columns.

Thank you in advance.

A: 

found the answer. I needed to change the following line of code:

data = [[] for dummy in xrange(11)]

xrange needed to be set to 11 and not to 13.

simple answer, but it took a lot of digging. this thread is answered/finished now.

MedicalMath
You should accept your own answer to mark the question as finished.
katrielalex
A: 

why do you write your own csv loader? numpy.loadtxt? or in your case with missing values: numpy.genfromtxt

tillsten