tags:

views:

214

answers:

5

A problem I was working on recently got me to wishing that I could lop off the front of a file. Kind of like a “truncate at front,” if you will. Truncating a file at the back end is a common operation–something we do without even thinking much about it. But lopping off the front of a file? Sounds ridiculous at first, but only because we’ve been trained to think that it’s impossible. But a lop operation could be useful in some situations.

A simple example (certainly not the only or necessarily the best example) is a FIFO queue. You’re adding new items to the end of the file and pulling items out of the file from the front. The file grows over time and there’s a huge empty space at the front. With current file systems, there are several ways around this problem:

  • As each item is removed, copy the remaining items up to replace it, and truncate the file. Although it works, this solution is very expensive time-wise.
  • Monitor the size of the empty space at the front, and when it reaches a particular size or percentage of the entire file size, move everything up and truncate the file. This is much more efficient than the previous solution, but still costs time when items are moved in the file.
  • Implement a circular queue in the file, adding new items to the hole at the front of the file as items are removed. This can be quite efficient, especially if you don’t mind the possibility of things getting out of order in the queue. If you do care about order, there’s the potential of having to move items around. But in general, a circular queue is pretty easy to implement and manages disk space well.

But if there was a lop operation, removing an item from the queue would be as easy as updating the beginning-of-file marker. As easy, in fact, as truncating a file. Why, then, is there no such operation?

I understand a bit about file systems implementation, and don't see any particular reason this would be difficult. It looks to me like all it would require is another word (dword, perhaps?) per allocation entry to say where the file starts within the block. With 1 terabyte drives under $100 US, it seems like a pretty small price to pay for such functionality.

What other tasks would be made easier if you could lop off the front of a file as efficiently as you can truncate at the end?

Can you think of any technical reason this function couldn't be added to a modern file system? Other, non-technical reasons?

A: 

I think there's a bit of a chicken-and-egg problem in there: because filesystems have not supported this kind of behavior efficiently, people haven't written programs to use it, and because people haven't written programs to use it, there's little incentive for filesystems to support it.

You could always write your own filesystem to do this, or maybe modify an existing one (although filesystems used "in the wild" are probably pretty complicated, you might have an easier time starting from scratch). If people find it useful enough it might catch on ;-)

David Zaslavsky
A: 

Actually there are record base file systems - IBM have one and I believe DEC VMS also had this facility. I seem to remember both allowed (allow? I guess they are still around) deleting and inserting at random positions in a file.

anon
A: 

The unix cmd tail might be what you are looking for.

tail +10000 # lops off the first 10000 lines of a file.
Sanjaya R
A: 

NTFS can do something like this with it's sparse file support but it's generaly not that useful.

Martin Beckett
A: 

Truncate files at front seems not to hard to implement at system level.

But there is issues.

  • The first one is at programming level. When opening file in random access the current paradigm is to use offset from the beginning of the file to point out different places in the file. If we truncate at beginning of file (or perform insertion or removal from the middle of the file) that is not any more a stable property. (While appendind or truncating from the end is not a problem).

In other words truncating the beginning would change the only reference point and that is bad.

  • At a system level uses exists as you pointed out, but are quite rare. I believe most use of files is of the write once read many kind, so even truncate is not a critical feature and we could probably do without it (well some things would become more difficult, but nothing would become impossible).

We we want more complex accesses (and there is indeed need) we open files in random mode and add some internal structure information. This information can also be shared between several files. This lead us to the last issue I see, probably the most important.

  • In a sense when we use random access files with some internal structure... we still use files but we are not any more using files paradigm. The typical such case is the database where we want to perform insertion or removal of records without caring at all about their physical place. Databases can use files as low level implementation but for optimisation purpose some database editors choose to completely bypass filesystem (think about Oracle partitions).

I see no technical reason why we couldn't do everything is currently done in an operating system with files using a database as data storage layer. I even heard that NTFS has many common points with databases in it's internals. An operating system can (and probably will in some not so far feature) use another paradigm than files one.

Summarily i believe that's no technical problemat all, just a change of paradigm and that removing the beginning is definitely not in the current file paradigm, but not a big and useful enough change to compel changing anything at all.

kriss