views:

448

answers:

4
import scipy,array


def try_read_file():

    def line_reader(lines):
        for l in lines:
            i = l.find('#')
            if i != -1: l = l[:i]
            l = l.strip()
            if l: yield l

    def column_counter():
        inputer = (line.split() for line in line_reader(file('/home/kartik/Downloads/yahoo_dataset/set1.train.txt'.strip())))
        loopexit = 0
        for line in inputer:
            feature_tokens = (token.split(':') for token in line[6:])
            feature_ids = array.array('I')
            for t in feature_tokens:
                feature_ids.append(int (t[0]))

            tmpLength = feature_ids[-1]
            print feature_ids
            loopexit = loopexit + 1
            if loopexit > 0:
                break

        return tmpLength

    def line_counter():
        inputer = (line.split() for line in line_reader(file('/home/kartik/Downloads/yahoo_dataset/set1.train.txt'.strip())))
        noOfRows = 0
        for line in inputer:
            noOfRows = noOfRows + 1
        return noOfRows


    inputer = (line.split() for line in line_reader(file('/home/kartik/Downloads/yahoo_dataset/set1.train.txt'.strip())))

    feature_id_list = []
    feature_value_list = []
    relevance_list = []

    noOfRows = line_counter()
    noOfCols = column_counter()

    print noOfRows
    print noOfCols              # line 52
    #Create the feature array
    feature_array = scipy.zeros((noOfRows,noOfCols), float) 
    rowCounter = 1;
    for line in inputer:
        feature_tokens = (token.split(':') for token in line[6:])
        feature_ids = array.array('I')  
        feature_values = array.array('f')

        for t in feature_tokens:
            feature_ids.append(int(t[0]))
            if (t[0]!=colCounter):
                feature_array[rowCounter,colCounter] = 0
            else:    
                feature_array[rowCounter,colCounter] = t[1]
            feature_values.append(float(t[1]))
            colCounter = colCounter + 1;  

        label = float(line[0])
        assert(line[1].startswith('qid:'))

        query_id = int(line[1][4:])
        feature_id_list.append(feature_ids)
        feature_value_list.append(feature_values)
        relevance_list.append(label)
        rowCounter = rowCounter + 1;

    return feature_array   

Error:

Traceback (most recent call last):
  File "<pyshell#97>", line 1, in <module>
    try_read_file()
  File "/home/kartik/Python/prelim_read.py", line 52, in try_read_file
    print noOfCols
TypeError: an integer is required

What is the problem, i couldn't figure it out?

I tried to debug it, but it doesnt really go inside those methods. It gives me an address in place of those variables.

+1  A: 

This error doesn't make sense on the face of things. Can you add a print type(noOfCols) before your call to print noOfCols?

Chris AtLee
A: 

Offhand, noOfCols comes from column_counter(), which returns the value of the local tmpLength. If tmpLength wasn't an integer, it would be either because feature_ids[-1] wasn't an integer or the for line in inputer loop body was never entered. On second thought, both those scenarios should result in a different exception than what was seen.

outis
A: 

I don't know by you're being shown print noOfCols as line 52 -- I make it line 47 in your code. Line 50's the closest one where an integer is required:

feature_array = scipy.zeros((noOfRows,noOfCols), float) 

Your code is truly peculiar, e.g. the for loop ending with

        loopexit = loopexit + 1
        if loopexit > 0:
            break

which, given that loopexit is set to 0 just before the loop, is the weirdest way I've ever seen to unconditionally exit a loop -- and a for loop that unconditionally ends at the end of the first leg is, essentially, not a loop at all. But these utter weirdness samples still don't explain your bug, and especially why the line number and source code being shown as the point of exception don't match. What do those print statements you have show?

Alex Martelli
RE: line numbers. Part of the difference might be due to my edits to fix indenting. However, even in the original the `print noOfCols` was at most line 48 of `try_read_file`
outis
A: 

Ok, I don't know what the problem was. When I started up Python again it was not giving this error.

I don't know what happened in between and what caused those errors. It's a mystery to me. Of course, now the array i was trying to put these values has "dimensions too large" and I'm getting a ValueError. I'll probably split the array.

The line number mismatch is probably due to some after-editing. Ya, the for loop exit is weird i just needed to see if it works so was a quick workaround since i'm not familiar with python at all.

@All, Thanks for your help!