views:

162

answers:

4

Suppose I have a process which creates a file, writes some data to it, then after a short amount of processing (by itself or another process), deletes it and closes all remaining file descriptors.

I am assuming here that there is enough ram to keep the pages in memory until the file is deleted, and that nobody calls sync() in the interim.

Will the blocks of the now-deleted file ever be written back to disc, or are they immediately removed from the dirty-list?

Or does it depend on filesystem? Filesystems like xfs and ext4 have "delayed allocation" which might support this feature, if it is implement.

+2  A: 

In the classic Unix file systems, the answer would be "No" (that is, the data for a created and deleted file would not necessarily ever make it to the disk), though some of the directory metadata (modification time) would probably still change. Therefore, what happens does depend in part on the file system in use.

Note that even calling sync() doesn't guarantee that they are written; it only schedules the writing of the data back to disk. Hence the ancient injunction to type the sync command twice before bringing down the system - doing that gave the computer enough time to complete the writing because it can write to disk faster than you can type sync twice (especially if you happen to be using a real Teletype at 110 baud).


The POSIX standard says (of the sync() function which is used by the sync command):

The sync() function shall cause all information in memory that updates file systems to be scheduled for writing out to all file systems.

The writing, although scheduled, is not necessarily complete upon return from sync().

If Linux has changed its definition to assure you that 'all data is written to disk', then that is a valid and useful extension. But it isn't the classic behaviour - and beware translating the Linux expertise to other systems.

There are other functions, such as fsync(), that give different, more stringent, promises:

The fsync() function shall request that all data for the open file descriptor named by fildes is to be transferred to the storage device associated with the file described by fildes. The nature of the transfer is implementation-defined. The fsync() function shall not return until the system has completed that action or until an error is detected.

And there are options to file descriptors that give other promises again: O_SYNC, O_DSYNC, O_RSYNC. Look them up in the POSIX standard (open()).

Jonathan Leffler
I'm not completely sure that answers the question. Under linux sync claims to actually wait for the writes to complete. Also, on a modern system, you can have a lot more dirty data than the system can write while I'm typing "sync".
MarkR
Isn't the case rather that sync *does* guarantee the data has been written, but the second sync is actually useless, but just gives a time window for the harddisk's own write-cache to be written to the platters?
snemarch
I'm still not sure this has anything to do with my question - which is that if a file is written, read, and then deleted WITHOUT sync (or fsync) being called, does it get written to disc at all? I don't know the answer.
MarkR
@MarkR: My response to your question is "it depends on the file system" and on the workload on the machine, but in classical systems, the file data would not necessarily hit the disk. With a journalled file system, I'm not sure whether the data would hit the disk, but it probably would (a guess).
Jonathan Leffler
@MarkR: the stuff below the line was more a response to the comment by @snemarch, pointing out that POSIX does not require the sync() system call to ensure that all data is written to disk by the time it returns.
Jonathan Leffler
+1  A: 

I agree with Jonathan Leffler, but not only for classic Unix file systems: There has been a discussion with a similar topic concerning the ext4 file system.

In a comment, Theodore Ts'o (one of the main developers of the ext4 file system) states: ``... for example, if you create a scratch file, and then delete it 20 seconds later, it will probably never hit the disk.''

Jochen Walter
Thankyou, this is the first answer which actually properly answers my question.
MarkR
+1  A: 

I did some research on this and found that, under Linux, it does indeed depend on the filesystem.

ext3 seems to always write back deleted files no matter how small they are or how soon they are deleted. XFS does not always do so, which can result in much better performance for applications which use short-lived temporary files if you have enough ram.

I suspect that the "modern" Linux filesystems (ext4, btrfs) do this too. It's a good thing.

MarkR
A: 

What do you really need to know, here?

If the question is "will it probably be written to disk?" the answer is no, if your processing is brief, but no promises.

If the question is "can I be sure that it won't be written to disk?" the answer is also no. A deleted file is a file like any other as long as it's held open; it's just a file without a name (link).

If the answer is "is it totally free in terms of the disk?" the answer is no again -- for example I'm pretty certain that on a system with quotas, the number of "blocks" in the file would be charged against the user's filesystem quota as soon as you write them.

hobbs
I was trying to determine whether short-lived temporary files are causing IO load on busy machines, and if so, is there an approach which fixes it; the answers appear to be yes, and yes respectively (use a filesystem which supports deferred allocation).
MarkR