views:

95

answers:

5

I did a big oops, but the file is still open and in use.

Following (http://stackoverflow.com/questions/1178593/link-to-a-specific-inode), copying from the /proc/###/fd/### to a new file is not useful because:

  1. the file is changing
  2. The filesize is 40G and the disk is full (150MB free)

I am attempting to relink it to the filesystem (undelete it).

    COMMAND    PID    USER   FD      TYPE             DEVICE        SIZE       NODE NAME
vmware-vm 4281    root  126u      REG              253,0 40020664320   10928132 /var/mnt/partial.img

I held the file open with a "wc /proc/4281/fd/126", then suspended it.

I created a link on the filesystem by using debugfs (inspired from dag wieers) then edited the directory entry to set the deleted time to 0, update the link count. reboot and run fsck all is well.

This is a kernel mod to do it, I have not tested it yet.

+1  A: 

It took a little bit to figure out what you were asking.

There is no user land API to do this that I know of. It would be nice to be able to create a link with an open file descriptor, which, of course, would fail if the file descriptor was anything not on disk or if the newpath did not reside on the same disk as that file, but I don't know of anything like that.

Part of the reason for this is that in reality the file no longer has to actually be on disk. It can live completely (or partially) in the file system's cache. The OS could decide not to flush changes to that file to disk because it may think that doing so won't matter (unless it needs to free up some RAM).

nategoose
I'm not convinced about the 3rd paragraph, but I agree with the 'no user-land API' part.
Jonathan Leffler
@Jonathan Leffler: You're right that this may not actually be the reason for no system call for this (it's probably actually the other way round, now that I think about it) I do think that not flush an unlinked file to disk would be reasonable behavior for an OS except in the case of hibernating.
nategoose
I agree that the data may not be flushed to disk (and the system will probably try to avoid doing that) - but if the buffer pool needs the space, the system can still write it to disk. The inode exists to store the relevant information, and will be maintained accurately as blocks are added to the file, etc. There just isn't a directory entry referencing the inode.
Jonathan Leffler
The inode(s) is just another block of the file, and the one most likely not to be updated on the disk. It's the one that really has no on-disk references to it, in this case (unless the file grows and new blocks are added and the on disk version of the inode isn't updated to reflect this).
nategoose
A user land API has been brought up before and shot down several times in the Linux world; the justifications have been security-related, not filesystem disk format-related. http://lkml.org/lkml/1998/3/8/1 http://lkml.org/lkml/2002/1/19/16 http://lkml.org/lkml/2003/4/6/112 et al.
ephemient
@ephemient: interesting; none of the suggestions seemed to address the case where the fd has zero links - or suggested that the owner of the file should be changed to the EUID of the process (just as a normal file would be created with the EUID) and the group for the file should be the directory group (if the directory is SGID, or on MacOS X) or the EGID of the process. It would require an error EHASNAME or something similar. I think the security issues then cease to be an issue - only the process has access to the file, and it really doesn't matter whether it was readonly originally.
Jonathan Leffler
@Jonathan Leffler: I was actually thinking that the best limiting conditions would be that only the owner of the file or root could make a hard link via an open file descriptor. I can't think of any security issues that this would present.
nategoose
sorry for the hastilly written question. Agreed that the file or parts there of may only be in cache. But since the kernel recognizes files by inode, creating the file with debugfs/fdlink would trigger the link count up, forcing it to eventually flush to disk as it is not deleted anymore.
Jason Pyeron
A: 

The best way I know is to use gdb and attach to the process that still has the file open, then manually call library functions from inside gdb to open a new file and copy the file contents to the new file.

R..
A: 

Use another disk?

SamB
again the contents are changing, a copy would be out of date.
Jason Pyeron
+1  A: 

The syscall you want, the idea - link file descriptor to a new name - was proposed and rejected several times in past (discussions pop-up on LKML time after time) due to security reasons: if a file was deleted then it was deleted, period. (See Edit1 below.)

Lots of security oriented application depend on the behavior that the file they have deleted yet they keep file descriptor open, cannot be reopened ever again. To accomplish that, on one side there are the overly restrictive permissions on the /proc/*/fd/* links (only owner may only read) and on the other side is the missing syscall.


I am attempting to undelete it.
1. the file is changing
2. 40G and the disk is full

You are out of luck. You can't give new name to a deleted (yet open) file. Learn to use rm -i (I hated the default RedHat's aliases for root shell, but eventually learned to love them).


Edit1 Comment to another response here by @ephemient, pulling the refernces I was lazy to look up myself:

A user land API has been brought up before and shot down several times in the Linux world; the justifications have been security-related, not filesystem disk format-related. lkml.org/lkml/1998/3/8/1 lkml.org/lkml/2002/1/19/16 lkml.org/lkml/2003/4/6/112 et al.

Dummy00001
security wise if root can run debugfs and/or install kernel mods, then what more security issues could there be with a userland option restricted to root?
Jason Pyeron
the /proc/*/fd/* can be read by root ofcourse
Jason Pyeron
@Jason: in Linux "root" per se doesn't exist anymore. It is simply user with every every [capability](http://linux.die.net/man/7/capabilities) enabled. With kernel option one can disable loadable modules (and that I heard is the usual practice for security enhanced installations) - but not the syscall, implemented by the file systems. Shortly, potential security concerns outweigh benefits of having the syscall.
Dummy00001
A: 

It is not part of the base kernel, but there is a module out there http://fdlink.svn.sourceforge.net/viewvc/fdlink/trunk/flink/ which is supposed to do it. I think that using debugfs is easier but the kernel mod might be cleaner.

Jason Pyeron