views:

139

answers:

2

I am reading a text file from the web. The file starts with some header lines containing the number of data points, followed the actual vertices (3 coordinates each). The file looks like:

# comment
HEADER TEXT
POINTS 6 float
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8 9.9
POLYGONS

the line starting with the word POINTS contains the number of vertices (in this case we have 3 vertices per line, but that could change)

This is how I am reading it right now:

ur=urlopen("http://.../file.dat")

j=0
contents = []
while 1:
    line = ur.readline()
    if not line:
        break
    else:
        line=line.lower()       

    if 'points' in line :
        myline=line.strip()
        word=myline.split()
        node_number=int(word[1])
        node_type=word[2]

        while 'polygons'  not in line :
            line = ur.readline()
            line=line.lower() 
            myline=line.split()

            i=0
            while(i<len(myline)):                    
                contents[j]=float(myline[i])
                i=i+1
                j=j+1

How can I read a specified number of floats instead of reading line by line as strings and converting to floating numbers?

Instead of ur.readline() I want to read the specified number of elements in the file

Any suggestion is welcome..

+3  A: 

I'm not entirely sure what your goal is from your explanation.

For the record, here is code that does basically the same thing as yours seems to be trying to that uses some techniques I would employ over the ones you have chosen. It's usually a sign that you're doing something wrong if you're using while loops and indices and indeed your code does not work because contents[j] = ... will be an IndexError.

lines = (line.strip().lower() for line in your_web_page)

points_line = next(line for line in lines if 'points' in line)
_, node_number, node_type = points_line.split()
node_number = int(node_number)

def get_contents(lines):
    for line in lines:
        if 'polygons' in line:
            break

        for number in line.split():
            yield float(number)

contents = list(get_contents(lines))

If you are more explicit about the new thing it is you want to do, maybe someone can provide a better answer for your ultimate goal.

Mike Graham
A: 

Here is a no-fuss cleanup of your code that should make the looping over the contents much faster.

ur=urlopen("http://.../file.dat")
contents = []
node_number = 0
node_type = None
while 1:
    line = ur.readline()
    if not line:
        break
    line = line.lower()       
    if 'points' in line :
        word = line.split()
        node_number = int(word[1])
        node_type = word[2]
        while 1:
            pieces = ur.readline().split()
            if not pieces: continue # or break or issue error message
            if pieces[0].lower() == 'polygons': break
            contents.extend(map(float, pieces))
assert len(contents) == node_number * 3

If you wrap the code in a function and call that, it will run even faster (because you will be accessing local variables instead of global ones).

Note that the most significant changes are near/at the end of the script.

HOWEVER: stand back and think about this for a few seconds: how much of the time is taken up by the ur.readline() and how much by unpacking the lines?

John Machin
@John Machin, Good call with standing back and thinking about it, but it's quite possible we're not standing far enough back yet.
Mike Graham