tags:

views:

214

answers:

4

I am writing a file to disk in stages. As I write it I need to know the line numbers that I am writing to use to build an index. The file now has 12 million lines so I need to build the index on the fly. I am doing this in four steps, with four groupings of the value that I am indexing on. Based on some examples I found elsewhere on SO I decided that to keep my functions as clean as possible I would get the linesize of the file before I start writing so I can use that count to continue to build my index.

So I have run across this problem, theoretically I don't know if I am adding the first chunk or the last chunk to my file so I thought to get the current size I would

myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt','a')
try:
    num_lines=sum(1 for line in myFile)
except IOError:
    num_lines=0

When I do this the result is always 0-even if myFile exists and has a num_lines >0

If I do this instead:

myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt')
try:
    num_lines=sum(1 for line in myFile)
except IOError:
    num_lines=0

I get the correct value iff myFile exists. byt if myFile does not exist, if I am on the first cycle, I get an error message.

As I was writing out this question it occurred to me that the reason for the value num_lines=0 on every case the file exists is because the file is being opened for appending to so the file is opened at the last line and is now awaiting for lines to be delivered. So this fixes the problem

try:
    myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt')
    num_lines=sum(1 for line in myFile)

except IOError:
    num_lines=0

My question is whether or not this can be done another way. The reason I ask is because I have to now close myFile and reopen it for appending:

That is to do the work I need to do now that I have the ending index number for the data that is already in the file I have to

myFile.close()
myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt','a')

Now, here is where maybe I am learning something- given that I have to open the file twice then maybe getting the starting index (num_lines) should be moved to a function

def getNumbLines(myFileRef):
    try:
        myFile=open(myFileRef)
        num_lines=sum(1 for line in myFile)
        myFile.close()
    except IOError:
        num_lines=0
    return num_lines

It would be cleaner if I did not have to open/handle the file twice.

Based on Eric Wendelin's answer I can just do:

myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt','a+')
num_lines=sum(1 for line in myFile)

Thanks

A: 

Open the file for updates ('u' or 'rw', I forget). Now you can read it until EOF and then start writing to append.

Aaron Digulla
+4  A: 

You can open a file for reading AND writing:

myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt','r+')

Try that.

UPDATE: Ah, my mistake since the file might not exist. Use 'a+' instead of 'r+'.

Eric Wendelin
I don't see how this gets me anything over my solution, Iif the file does not exist I have to handle it differently so this is in someways worse
PyNEwbie
That's right. Fixed now. You can also donum_lines=sum(1 for line in myFile) or 0instead of using the execpt: for it
Eric Wendelin
A: 

I assume you are writing the file, in that case why don't you maintain a separate track of how many lines you have already written? to me it looks very wasteful that you have to read whole file line by line just to get line number.

Anurag Uniyal
I am thinking ahead a bit, I want something I can run while on vacation and don't have to worry about whether the computer hiccuped between writing too the file and updating the count.
PyNEwbie
A: 

A bit late to the party but for the file existing problem why not use (Psuedocode):

If FileExists(C:\NEWMASTERLIST\FULLLIST.txt') then
begin
  Open file etc 
  Calc numlines etc
end
else
  Create new file etc
  NumLines := 0;
end;
Despatcher