ansaurus

Question

Answer 1

+4 A:

With readlines() you read whole file at once, so you use 1 GB of memory. Insted of this try:

f = open(...)
while 1:
   line = f.readline()
   if not line:
     break
   line = line.rstrip()
   ... do something with line
   ... 
f.close()

If all you need is to remove \n then do not do it line by line, but do it with chunks of text:

import sys

f = open('query.txt','r')
while 1:
    part = f.read(1024)
    if not part:
     break
    part = part.replace('\n', '')
    sys.stdout.write(part)

Michał Niklas 2009-05-06 05:59:43

1024 is dumb low buffer size. You should increase it to at least 64KiB. Also it's stupid from python to not use generator in readlines-method.

Cheery 2009-05-06 08:53:11

The readlines method was added before Python had generators, and changing it later would have caused existing programs to break. That's the curse of evolving languages.

Lars Wirzenius 2009-05-06 09:39:31

Answer 2

+15 A:

The readlines method reads in the entire file. You don't want to do that for a file that is large in relation to your physical memory size.

The fix is to read the file in small chunks, and process those individually. You can, for example, do something like this:

for line in f.xreadlines():
    ... do something with the line

The xreadlines does not return a list of lines, but an iterator, which returns one line at a time, when the for loop calls it. An even simpler way of doing that is:

for line in f:
    ... do something with the line

Depending on what you do, processing the file line-by-line may be easy or hard. I didn't really get what your sample code is trying to do, but it looks like it should be doable to do it by line.

Lars Wirzenius 2009-05-06 06:05:40

Answer 3

+6 A:

The script is not working because it reads all lines of the file in advance, making it nescessary to keep the whole file in memory. The easiest way to iterate over all lines in a file is

for line in open("test.txt", "r"):
    # do something with the "line"

fforw 2009-05-06 08:41:32

Now this looks like correct. up!

Cheery 2009-05-06 08:53:58

Answer 4

+2 A:

Your program is very redundant. Looks like everything you do can be done using these lines:

import sys
for line in open('reg.fa'):
    sys.stdout.write(line.rstrip())

That is enough. This program gives the same result from your original code in the question but is much simpler and clearer. And it can also handle files of any size.

nosklo 2009-05-06 11:23:06

Doesn't give exactly the same result: This strips all trailing whitespace on lines (not just the line terminator), and doesn't print a final newline

Miles 2009-06-08 07:53:18

Answer 5

A:

Hi ..

From your coding it is clear that you want string buffer of single line. As a point of view of coding it is bad that you storethe whole file content in one string buffer. And then you processed your requirement. And code contain too many local variables.

You could have used following chunk of code.

f = open (file_name,mode)

for line in f:

"""

Do the processing 

"""

Charan 2009-06-08 07:41:09

ansaurus

tags:

views:

answers:

file handling in python

related questions