views:

357

answers:

7

Consider a sparse file with 1s written to a portion of the file.

I want to reclaim the actual space on disk for these 1s as I no longer need that portion of the sparse file. The portion of the file containing these 1s should become a "hole" as it was before the 1s were themselves written.

To do this, I cleared the region to 0s. This does not reclaim the blocks on disk.

How do I actually make the sparse file, well, sparse again?

This question is similar to this one but there is no accepted answer for that question.

Consider the following sequence of events run on a stock Linux server:

$ cat /tmp/test.c
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>

int main(int argc, char **argv) {
    int fd;
    char c[1024];

    memset(c,argc==1,1024);

    fd = open("test",O_CREAT|O_WRONLY,0777);
    lseek(fd,10000,SEEK_SET);
    write(fd,c,1024);
    close(fd);

    return 0;
}

$ gcc -o /tmp/test /tmp/test.c

$ /tmp/test

$ hexdump -C ./test
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002710  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01  |................|
*
00002b10

$ du -B1 test; du -B1 --apparent-size test
4096        test
11024       test

$ /tmp/test clear

$ hexdump -C ./test
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002b10

$ du -B1 test; du -B1 --apparent-size test
4096        test
11024       test

# NO CHANGE IN SIZE.... HMM....

EDIT -

Let me further qualify that I don't want to rewrite files, copy files, etc. If it is not possible to somehow free previously allocated blocks in situ, so be it, but I'd like to determine if such is actually possible or not. It seems like "no, it is not" at this point. I suppose I'm looking for sys_punchhole for Linux (discussions of which I just stumbled upon).

A: 

Seems like writing zeros (as in the referenced question) to the part you're done with is a logical thing to try. Here a link to an MSDN question for NTFS sparse files that does just that to "release" the "unused" part. YMMV.

http://msdn.microsoft.com/en-us/library/ms810500.aspx

No Refunds No Returns
I did that as noted in the 'script' output.
z8000
Read the article. Windows has a special call to release the blocks. Linux probably does too.
No Refunds No Returns
http://lists.linuxcoding.com/kernel/2005-q4/msg10956.html
z8000
+1  A: 

This way is cheap, but it works. :-P

  1. Read in all the data past the hole you want, into memory (or another file, or whatever).
  2. Truncate the file to the start of the hole (ftruncate is your friend).
  3. Seek to the end of the hole.
  4. Write the data back in.
Chris Jester-Young
Ouch. So let me further qualify that I am looking for something that "scales" well. :) I don't want to rewrite files, copy files, etc. If it is not possible to somehow free previously allocated blocks in situ, so be it, but I'd like to determine if this is true or false.
z8000
It depends on your filesystem. We've already seen that NTFS handles this. I imagine that any of the other filesystems [Wikipedia lists][1] as handling transparent compression would do exactly the same - this is, after all, equivalent to transparently compressing the file. [1] http://en.wikipedia.org/wiki/Comparison_of_file_systems#Allocation_and_layout_policies
James Polley
It works, but in O(n).
dmeister
+2  A: 

Ron Yorston offers several solutions; but they all involve either mounting the FS read-only (or unmounting it) while the sparsifying takes place; or making a new sparse file, then copying across those chunks of the original that aren't just 0s, and then replacing the original file with the newly-sparsified file.

It really depends on your filesystem though. We've already seen that NTFS handles this. I imagine that any of the other filesystems Wikipedia lists as handling transparent compression would do exactly the same - this is, after all, equivalent to transparently compressing the file.

James Polley
+1  A: 

Right now it appears that only NTFS supports hole-punching. This has been historically a problem across most filesystems. POSIX as far as I know, does not define an OS interface to punch holes, so none of the standard Linux filesystems have support for it. NetApp supports hole punching through Windows in its WAFL filesystem. There is a nice blog post about this here.

For your problem, as others have indicated, the only solution is to move the file leaving out blocks containing zeroes. Yeah its going to be slow. Or write an extension for your filesystem on Linux that does this and submit a patch to the good folks in the Linux kernel team. ;)

Edit: Looks like XFS supports hole-punching. Check this thread.

Another really twisted option can be to use a filesystem debugger to go and punch holes in all indirect blocks which point to zeroed out blocks in your file (maybe you can script that). Then run fsck which will correct all associated block counts, collect all orphaned blocks (the zeroed out ones) and put them in the lost+found directory (you can delete them to reclaim space) and correct other properties in the filesystem. Scary, huh?


Disclaimer: Do this at your own risk. I am not responsible for any data loss you incur. ;)

Sudhanshu
A: 

umount your filesystem and edit filesystem directly in way similar debugfs or fsck. usually you need driver for each used fs.

vitaly.v.ch
A: 

Hello,

After you have "zeroed" some region of the file you have to tell to the file system that this new region is intended to be a sparse region. So in case of NTFS you have to call DeviceIoControl() for that region again. At least I do this way in my utility: "sparse_checker"

For me the bigger problem is how to unset the sparse region back :).

Regards

opal
A: 

It appears as if linux have added a syscall called fallocate for "punching holes" in files. The implementations in individual filesystems seem to focus on the ability to use this for pre-allocating a larger continous number of blocks.

There is also the posix_fallocate call that only focus on the latter, and is not usable for hole punching.

Christian