tags:

views:

285

answers:

5

Hi,
Thanks in advance. I have written a program which works for small files. But that doesn't work for files of 1 GB. Please tell me is there is any way to handle big file. Here is the code.

fh=open('reg.fa','r')
c=fh.readlines()
fh.close() 
s=''  
for i in range(0,(len(c))):  
    s=s+c[i]  
    lines=s.split('\n')
    for line in s:
            s=s.replace('\n','')
s=s.replace('\n','')          
print s
+4  A: 

With readlines() you read whole file at once, so you use 1 GB of memory. Insted of this try:

f = open(...)
while 1:
   line = f.readline()
   if not line:
     break
   line = line.rstrip()
   ... do something with line
   ... 
f.close()

If all you need is to remove \n then do not do it line by line, but do it with chunks of text:

import sys

f = open('query.txt','r')
while 1:
    part = f.read(1024)
    if not part:
     break
    part = part.replace('\n', '')
    sys.stdout.write(part)
Michał Niklas
1024 is dumb low buffer size. You should increase it to at least 64KiB. Also it's stupid from python to not use generator in readlines-method.
Cheery
The readlines method was added before Python had generators, and changing it later would have caused existing programs to break. That's the curse of evolving languages.
Lars Wirzenius
+15  A: 

The readlines method reads in the entire file. You don't want to do that for a file that is large in relation to your physical memory size.

The fix is to read the file in small chunks, and process those individually. You can, for example, do something like this:

for line in f.xreadlines():
    ... do something with the line

The xreadlines does not return a list of lines, but an iterator, which returns one line at a time, when the for loop calls it. An even simpler way of doing that is:

for line in f:
    ... do something with the line

Depending on what you do, processing the file line-by-line may be easy or hard. I didn't really get what your sample code is trying to do, but it looks like it should be doable to do it by line.

Lars Wirzenius
+6  A: 

The script is not working because it reads all lines of the file in advance, making it nescessary to keep the whole file in memory. The easiest way to iterate over all lines in a file is

for line in open("test.txt", "r"):
    # do something with the "line"
fforw
Now this looks like correct. up!
Cheery
+2  A: 

Your program is very redundant. Looks like everything you do can be done using these lines:

import sys
for line in open('reg.fa'):
    sys.stdout.write(line.rstrip())

That is enough. This program gives the same result from your original code in the question but is much simpler and clearer. And it can also handle files of any size.

nosklo
Doesn't give exactly the same result: This strips all trailing whitespace on lines (not just the line terminator), and doesn't print a final newline
Miles
A: 

Hi ..

From your coding it is clear that you want string buffer of single line. As a point of view of coding it is bad that you storethe whole file content in one string buffer. And then you processed your requirement. And code contain too many local variables.

You could have used following chunk of code.

f = open (file_name,mode)

for line in f:

"""

Do the processing 

"""
Charan