ansaurus

Question

suggestion required related to rewriting and string manipulation

Answer 1

A:

You are overwriting as you go, but your final results are shorter than the original, so you are getting the last X characters of the original bleeding through, where X is the difference in size from the original to the new version. The extra .seek() and truncate() calls in this version will seek to the end of your new output and cut off the rest of the file.

filesrc = open('c:/ODI_FILE/split_doc.txt','r+')
lines=0
for list in filesrc.readlines():
    #split the records by the delimiter
        fields = list.split(',')
    list = ",".join([s.strip() for s in fields])
    filesrc.seek(lines)
    filesrc.writelines(list+"\n")
    lines += len(list+"\n")
filesrc.seek(lines)
filesrc.truncate()
filesrc.close()

teepark 2009-12-17 19:59:06

THANKS SO MUCH IT WORKED . i am facing another issue now , i get java out of memory error when dealing with 500,000 rows and i have changed to 512mx but still it fails . i actually had the same issue while using for loop with another program when i used the while loop it worked . Is it possible to change the program for while loop ,Thanks again so much for your prompt help

kdev 2009-12-17 20:11:50

readlines() will read the entire contents into a list in memory. The problem with using an iterator instead is that you are seek()ing in the same file which I suspect will cause problems with an iterator. To use a while loop, you will need two pointers into your file and seek between them. Is it feasible to read from one file and write to another? That would simplify your task.

Mark Peters 2009-12-17 20:25:32

Initially i did write in another file but later i realized that my requirement is mostly to use the same file name which is coming and its not possible for me to change the file name so i have to read and write into the same file.

kdev 2009-12-17 20:46:24

You DON'T need to read and write into the same file. As others have pointed out, it is perilous. Consider what happens if power fails. DON'T read the whole file into memory. Rename old file to have a name that includes a timestamp, read old file, write/flush/close new file. Delete the backup file much later when you are sure you don't need it any more.

John Machin 2009-12-17 21:13:02

There might be perils overwriting the same file when you aren't reading the whole file into memory (reading the next chunk gives you some of the new version from the last iteration), so either one of those improvements could be perilous, but doing them *both* is certainly a much better approach. Operate on one chunk at a time, outputting to a new file, then copy the new file to the old one's location.

teepark 2009-12-20 04:39:06

Answer 2

+1 A:

You don't want to write to the same file while you're reading it. It's technically possible, but that path is fraught with trouble and misery.

Here's the plain and simple process you should follow:

read the whole file into a string then close the file
split the string on newlines into a list
process each line to remove extra spacing
rejoin the list into a string
overwrite the source file with the new cleaned data

If you don't want to load the whole file into memory at once, then try this process:

open the file for reading
read line by line
write cleaned lines to a new temp output file
when all lines are written, delete the original file
rename temp file to original name

My recommendation is to write it both ways and see what works or doesn't work and which way is faster, rather than assume you can't read it all into memory just because it is millions of lines. Maybe it will work just fine.

Also, you can certainly make this work with a while loop as well. To do so, you will want to read the Python docs on the form of a while loop and do some experiments. How you write that loop will depend on how you loaded the file: all at once into a string and then split into a list, or line by line directly from the file. For either case, how do you know how much work the while loop will have to do, how will you advance from one piece of work to the next, and how will you know when its done? If you can answer these, you can write your loop.

Todd 2009-12-17 20:02:50

tHANKS FOR THE SUGGESTION , I WILL TRY TO WORK AROUND THE WAY

kdev 2009-12-17 20:19:56

Answer 3

A:

This does not answer your question, but have you considered not doing this with jython?

Tried with Sed?

Peter Lang 2009-12-17 20:25:39

ansaurus

tags:

views:

answers:

suggestion required related to rewriting and string manipulation

related questions