I am writing a file to disk in stages. As I write it I need to know the line numbers that I am writing to use to build an index. The file now has 12 million lines so I need to build the index on the fly. I am doing this in four steps, with four groupings of the value that I am indexing on. Based on some examples I found elsewhere on SO I decided that to keep my functions as clean as possible I would get the linesize of the file before I start writing so I can use that count to continue to build my index.
So I have run across this problem, theoretically I don't know if I am adding the first chunk or the last chunk to my file so I thought to get the current size I would
myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt','a')
try:
num_lines=sum(1 for line in myFile)
except IOError:
num_lines=0
When I do this the result is always 0-even if myFile exists and has a num_lines >0
If I do this instead:
myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt')
try:
num_lines=sum(1 for line in myFile)
except IOError:
num_lines=0
I get the correct value iff myFile exists. byt if myFile does not exist, if I am on the first cycle, I get an error message.
As I was writing out this question it occurred to me that the reason for the value num_lines=0 on every case the file exists is because the file is being opened for appending to so the file is opened at the last line and is now awaiting for lines to be delivered. So this fixes the problem
try:
myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt')
num_lines=sum(1 for line in myFile)
except IOError:
num_lines=0
My question is whether or not this can be done another way. The reason I ask is because I have to now close myFile and reopen it for appending:
That is to do the work I need to do now that I have the ending index number for the data that is already in the file I have to
myFile.close()
myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt','a')
Now, here is where maybe I am learning something- given that I have to open the file twice then maybe getting the starting index (num_lines) should be moved to a function
def getNumbLines(myFileRef):
try:
myFile=open(myFileRef)
num_lines=sum(1 for line in myFile)
myFile.close()
except IOError:
num_lines=0
return num_lines
It would be cleaner if I did not have to open/handle the file twice.
Based on Eric Wendelin's answer I can just do:
myFile=open(r'C:\NEWMASTERLIST\FULLLIST.txt','a+')
num_lines=sum(1 for line in myFile)
Thanks