tags:

views:

1793

answers:

7

I'd like to create a sparse file such that all-zero blocks don't take up actual disk space until I write data to them. Is it possible?

+4  A: 

As in other Unixes, it's a feature of the filesystem. Either the filesystem supports it for ALL files or it doesn't. Unlike Win32, you don't have to do anything special to make it happen. Also unlike Win32, there is no performance penalty for using a sparse file.

On MacOS, the default filesystem is HFS+ which does not support sparse files. You can optionally format a volume with UFS which does support sparse files. HFS+ is the default filesystem on MacOS because it supports the archaic "resource fork" stuff which a few programs still use.

Colin Jensen
Hmmm...in some sense the data fork is the unixy one, and the resource fork is the specialized oddball, no? Not that it matters either way. Cheers.
dmckee
Yes, the "resource fork" is the archaic one. Most programs deal with the "data fork" only.
ephemient
Thanks -- I'll edit my comment to fix my error
Colin Jensen
A: 

hdiutil can handle sparse images and files but unfortunately the framework it links against is private.

You could try defining external symbols as defined by the DiskImages framework below but this is most likely not acceptable for production code, plus since the framework is private you'd have to reverse engineer its use cases.

cristi:~ diciu$ otool -L /usr/bin/hdiutil

/usr/bin/hdiutil: /System/Library/PrivateFrameworks/DiskImages.framework/Versions/A/DiskImages (compatibility version 1.0.8, current version 194.0.0) [..]

cristi:~ diciu$ nm /System/Library/PrivateFrameworks/DiskImages.framework/Versions/A/DiskImages | awk -F' ' '{print $3}' | c++filt | grep -i sparse

[..]

CSparseFile::sector2Band(long long)

CSparseFile::addIndexNode()

CSparseFile::readIndexNode(long long, SparseFileIndexNode*)

CSparseFile::readHeaderNode(CBackingStore*, SparseFileHeaderNode*, unsigned long)

[... cut for brevity]

Later Edit

You could use hdiutil as an external process and have it create an sparse disk image for you. From the C process you would then create a file in the (mounted) sparse disk image.

diciu
A: 

If you want portability, the last resort is to write your own access function so that you manage an index and a set of blocks.

In essence you manage a single file as the OS manages the disk keeping the chain of the blocks that are part of the file, the bitmap of allocated/free blocks etc.

Of course this will lead to a non optimized and slower access, I would reccomend this apprach only if the requirement to save space is absolutely critical and you have enough time to write a robust set of access functions.

And even in that case, I would first investigate if your problem is in need of a different solution. Probably you should store your data differently?

Remo.D
A: 

If you seek (fseek, ftruncate, ...) to past the end, the file size will be increased without allocating blocks until you write to the holes. But there's no way to create a magic file that automatically converts blocks of zeroes to holes. You have to do it yourself.

This may be helpful to look at (the OpenBSD cp command inserts holes instead of writing zeroes). patch

This is true on Linux, but not on the default Mac OS X filesystem, HFS+. See my answer to this question.
titaniumdecoy
+3  A: 

There seems to be some confusion as to whether the default Mac OS X filesystem (HFS+) supports holes in files. The following program demonstrates that this is not the case.

#include <stdio.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>

void create_file_with_hole(void)
{
    int fd = open("file.hole", O_WRONLY|O_TRUNC|O_CREAT, 0600);
    write(fd, "Hello", 5);
    lseek(fd, 99988, SEEK_CUR); // Make a hole
    write(fd, "Goodbye", 7);
    close(fd);
}

void create_file_without_hole(void)
{
    int fd = open("file.nohole", O_WRONLY|O_TRUNC|O_CREAT, 0600);
    write(fd, "Hello", 5);
    char buf[99988];
    memset(buf, 'a', 99988);
    write(fd, buf, 99988); // Write lots of bytes
    write(fd, "Goodbye", 7);
    close(fd);
}

int main()
{
    create_file_with_hole();
    create_file_without_hole();
    return 0;
}

The program creates two files, each 100,000 bytes in length, one of which has a hole of 99,988 bytes.

On Mac OS X 10.5 on an HFS+ partition, both files take up the same number of disk blocks (200):

$ ls -ls
total 400
200 -rw-------  1 user  staff  100000 Oct 10 13:48 file.hole
200 -rw-------  1 user  staff  100000 Oct 10 13:48 file.nohole

Whereas on CentOS 5, the file without holes consumes 88 more disk blocks than the other:

$ ls -ls
total 136
 24 -rw-------  1 user   nobody 100000 Oct 10 13:46 file.hole
112 -rw-------  1 user   nobody 100000 Oct 10 13:46 file.nohole
titaniumdecoy
+1  A: 

I think it'd be helpful to know WHY you want sparse files.

Wil Shipley
A: 

Hello, This thread becomes a comprehensive source of info about the sparse files. Here is the missing part for Win32:

Decent article with examples

Tool that estimates if it makes sense to make file as sparse

Regards

opal