tags:

views:

125

answers:

3

I think about adding possibility of using same the filename for both input and output file to my program, so that it will replace the input file.

As the processed file may be quite large, I think that best solution would to be first open the file, then remove it and create a new one, i.e. like that:

/* input == output in this case */
FILE *inf = fopen(input, "r");
remove(output);
FILE *outf = fopen(output, "w");

(of course, with error handling added)

I am aware that not all systems are going to allow me to remove open file and that's acceptable as long as remove() is going to fail in that case.

I am worried though if there isn't any system which will allow me to remove that open file and then fail to read its' contents.

C99 standard specifies behavior in that case as ‘implementation-defined’; SUS doesn't even mention the case.

What is your opinion/experience? Do I have to worry? Should I avoid such solutions?

EDIT: Please note this isn't supposed to be some mainline feature but rather ‘last resort’ in the case user specifies same filename as both input and output file.

EDIT: Ok, one more question then: is it possible that in this particular case the solution proposed by me is able to do more evil than just opening the output file write-only (i.e. like above but without the remove() call).

+4  A: 

All systems that I'm aware of that let you remove open files implement some form of reference-counting for file nodes. So, removing a file removes the directory entry, but the file node itself still has one reference from open file handle. In such an implementation, removing a file obviously won't affect the ability to keep reading it, and I find it hard to imagine any other reasonable way to implement this behavior.

Pavel Minaev
+1  A: 

I've always got this to work on Linux/Unix. Never on Windows, OS/2, or (shudder) DOS. Any other platforms you are concerned about?

This behaviour actually is useful in using temporary diskspace - open the file for read/write, and immediately delete it. It gets cleaned up automatically on program exit (for any reason, including power-outage), and makes it much harder (but not impossible) for others to monitor it (/proc can give clues, if you have read access to that process).

Tanktalus
what I'm concerned about is that Pavel Minaev is a developer in Microsoft and he says quite the opposite about windows...
Pavel Shved
Do you have a link?
Tanktalus
Where did I say that it won't work in Windows?
Pavel Minaev
+4  A: 

No, it's not safe. It may work on your file system, but fail on others. Or it may intermittently fail. It really depends on your operating system AND file system. For an in depth look at Solaris, see this article on file rotation.

Take a look at GNU sed's '--in-place' option. This option works by writing the output to a temporary file, and then copying over the original. This is the only safe, compatible method.

You should also consider that your program could fail at any time, due to a power outage or the process being killed. If this occurs, then your original file will be lost. Additionally, for file systems which do have reference counting, your not saving any space, over the temp file solution, as both files have to exist on disk until the input file is closed.

If the files are huge, and space is at premium, and developer time is cheap, you may be able to open a single for read/write, and ensure that your write pointer does not advance beyond your read pointer.

brianegge
Yes, I am aware of these issues. I more thought about using it to make epic failure less possible if user specifies same file for both input and output.
Michał Górny
That is at least part of the value of always writing the output to a temp file. Furthermore, you might want to carefully rename the input to a 2nd temp, rename the output temp to the final name, then, only if everything succeeded, delete the input. If the use case wants the input retained as .bak or some such, then you do more shuffling accordingly.
RBerteig
You are indeed right, I totally forgot about that. The problem is that that will collide with my current method of output but I'll ask if it's useful in the next question.
Michał Górny