views:

77

answers:

4

Hi,

Deleting a particular line/certain bytes from a file is very inefficient as there is a lot of reading and writing(re-writing) to be done. Is there anyway we can minimize work in such a process? Imagine if an entire file is a set of linked lists and as a user we know the structure of these linked lists then wouldn't it be wonderful as we can easily accomplish such a task quite efficiently. Unfortunately, that is not the case.

I asked this question as I was curious to know how an OS manages a text file.

Thanks, Mir

+1  A: 

The best you can do is come up with a sparse storage for the file. That is, blank out the row and come up with a way to signal that the line is "blanked".

Mike Axiak
A: 

Not without help from the filesystem. And that is an OS service, and can't be adddressed in "pure" C.

dmckee
+1  A: 

There is no way. Text files have to be parsed to find the needed line, and rewritten for inserts or deletes, and there is nothing you can do. I'd say it was a design error.

FS doesn't offer support for that either, it works only with blocks of fixed size (and even then you have to do some magic to convince it to insert a new block in the middle of the file).

But, you could use another idea: in some cases you could generate an easy parsable binary file from that, do operations on it, and generate the text file when needed (sure, it will make sense only if the updates are performed frequently and the txt file is needed rarely).

ruslik
File systems *can* offer partial length blocks, though most do not. With that support you *can* implement partial re-writing in the middle of a file, so *"no way"* is too strong; but not by much because *"no reasonable way with popular platforms"* is correct.
dmckee
+1  A: 

A file can be described as a container of no, some or many characters with a fixed order; it is pretty much like char * in RAM but on disk (dead memory). How the file is actually (physically) arranged on the hardware depends on the partition. While it would be technically possible to accomplish a removal of certain characters without rewriting the file, this would still be inefficient due to block size and all.

Some tricks that are being used by databases and other software to remove data without "closing the gap of data" are to

  • use separate files for every lines
    • Pro : removing and inserting lines is easily done
    • Con : removing and inserting lines means renaming the files (like RENUM in GW-BASIC) unless each file knows it's previous and next file.... But overall, this is a bad approach.
  • emulate a FS inside a file where each line is a virtual file (with a header, size, etc)
    • Pro : works just like a linked list to add and remove lines
    • Con : lot's of work for simple text data....
  • etc.

There's not much that can be done otherwise. There are many options that can be used to "optimize" data manipulation instead of using a flat file. This doesn't fall onto the OS or FS to solve this problem. For most use cases, where a flat file is good enough, rewriting part of the file to remove some data is an acceptable solution.

Yanick Rochon