tags:

views:

113

answers:

4

Ok, so I'm trying to delete lines from a text file with java. Currently the way I'm doing this, is I'm keep track of a line number and inputting an index. The index is the line I want deleted. So each time I read a new line of data I increment the line count. Now when I reach the line count that is the same index, I dont write the data to the temporary file. Now this works, but what if for example I'm working with huge files and I have to worry about memory restraints. How can I do this with.. file markers? For example.. place the file marker on the line I want to do delete. Then delete that line? Or is that just too much work?

A: 

You could use nio to delete the region of the file that correspond to that line.

EDIT added some hints

By creating a FileChannel and using a Buffer, you could open the file, erase the required line by pushing over it the content that come after.

Unfortunatly, I must confess my knowledge of nio stops approximatly here ...

Riduidel
More details? The technique is not at all obvious from that link.
Michael Myers
A: 

You could use a random access file. Keep a pointer to the byte you are reading and another for the byte you are writing. Fill a buffer with data and as you read it count the lines. If you have nothing to delete reset the channel to the write pointer and output the buffer, then reset the channel to the read pointer. If you find a line to delete, output the buffer to that point at the write index, then increment the read pointer until you find the end of the line, and then output the remainder of your buffer (refilling the buffer as necessary), repeat for each line to be deleted.

M. Jessup
A: 

Ideally, I would use an ETL tool to perform this kind of batch work. Assuming you do not have access to such a tool, I would recommend gZipping the file first and then read it using java.util.zip.

Here is a good tutorial on how to do it.

Hope this helps!

CoolBeans
I really don't think zipping the file would improve either the memory footprint or the execution speed.
Michael Myers
Not sure if I follow you. Zipping would make the file size smaller and hence less data to hoard right? The added execution time is the time it takes in compressing it.
CoolBeans
If you zip it on the disk, then uncompress it when reading, it ends up the same size in memory as it originally was, plus the overhead of the zipping mechanism. If you do it the other was round, and zip it in memory to a byte buffer, it will be smaller, so will take a little longer to run out of space, but you still will run out of memory at some point.
Pete Kirkham
No I meant to read the file from the zipped stream itself using java's GZIPInputStream library. I agree you can eventually run out of memory if the file is that big. However with 90% compression that threshold may be pretty high. The overhead of zipping can be eliminated by having the file gZipped before it's sent to the app.
CoolBeans
+2  A: 

Don't keep the file in memory, just read it one line at a time and write it out to the temporary file one line at at a time skipping the line that needs to be deleted.

MK
That's what it sounds like he's doing already.
Michael Myers