tags:

views:

37

answers:

3

I have a general IO question. I was trying to replace a single line in an ascii encoded file. After searching around quite a bit I found that it is not possible to do that. According to what I read if a single line needs to be replaced in a file, the whole file needs to be rewritten. I read that this is the same for all OS's. After reading that I thought ok, no choice, I'll just rewrite the whole file.\n

What got me wondering about this again is I've been working with a program that uses a ".dat" and ".idx" file for it's database. The program is constantly reading and writing to the db. So my question is, it obviously needs to write only small portions at a time (the db is about 200mb in size) so theres no way it could be efficient to write the whole file each time. So my question is what kind of solution would a program like this have for such a problem. Would it write to memory and then every now and then rewrite the whole database. Would it be writing temp files and then merging them to the DB at some point? Or is it possible for a single (or several) lines in the db to be written without the whole file be written?

Any info on this would be greatly appreciated!

Thx

nt

A: 

This is not true ntmp. You can indeed write in the middle of a file. How you do it depends on the system and programming language you use. What you are looking for might be seeking operations in IO.

Well you will not exactly have to rewrite the whole file. Only the rest of the file where you start inserting, since that part will needed to be moved behind what you are inserting.

There are several ways you can solve this, one would for example be to reserve space in the file (making the file larger). That way you would only have to move data when the placeholder areas have been filled out.

Write a bit more and we might be able to help you out.

Thomas Børlum
Keep in mind you can only replace stuff in a file(or append to it) - not delete or insert stuff Thus if you want to replace a "line", your new line will have to have the same length as the old one.
nos
A: 

The general comment 'you have to rewrite the whole file' applies when the line you are replacing is of length L1 and the line you are adding is of length L2 and L1 ≠ L2. The trouble is that if L1 is bigger than L2, then you have to move the data in the rest of the file down the file to avoid leaving a gap with garbage where the end of the line was (and you must chop off the tail of the file - shortening it, to avoid leaving garbage at the end). Conversely, if L1 is smaller than L2, you have to move the lines after line up the file to avoid having the new line overwrite the start of the next line.

In the case of the .dat and .idx files, though, you will find that indeed, you are correct: the software is not rewriting the whole file each time. There's a moderate chance that the files represent a C-ISAM file, or one of the related systems (D-ISAM, T-ISAM, etc). In original (Informix) C-ISAM, the .dat file contains fixed length records, so it is possible to write over any old record with a new record because L1 = L2, always. The .idx file is more complex, but it is split into pages (possibly as small as 512 bytes per page), and when an edit is needed, the whole page is rewritten. Since the pages are all the same size, L1 = L2 again - and it is safe to do the rewrite of just the section of the index file that changes.

When a C-ISAM file contains variable length data, the fixed portion of the record is stored in the .dat file, and the variable length portion of the data is stored in pages within the .idx file. This arrangement has just one merit - it leaves the records in the .dat file at a fixed size.

Jonathan Leffler
Thank you very much for the useful info!
ntmp
@ntmp: the way of showing appreciation on StockOverflow and the related sites is by up-voting useful answers (possibly upvoting several answers for a question), and then selecting (by means of clicking on the white tick, which will turn green when you've clicked on it) the most useful answer.
Jonathan Leffler
A: 

You mention that the .dat and .idx are probably C-ISAM or related. Any recommendations on software to try reading the files? I can read them in a text editor but it doesn't really help me because I actually want to edit them.

ntmp
this is a comment, not an answer.
seanizer