views:

364

answers:

4

I have a directory I’m archiving:

$ du -sh oldcode
1400848
$ tar cf oldcode.tar oldcode

So the directory is 1.4gb. The file is significantly smaller, though:

$ ls -l oldcode.tar
-rw-r--r-- 1 ieure ieure 940339200 2002-01-30 10:33 oldcode.tar

Only 897mb. It’s not compressed in any way:

$ file oldcode.tar
oldcode.tar: POSIX tar archive

Why is the tar file smaller than it’s contents?

A: 

du counts disk blocks, not file size duder.

+19  A: 

You get a difference because of the way the filesystem works.

In a nutshell your disk is made out of clusters. Each cluster has a fixed size of - let's say - 4 kilobytes. If you store a 1kb file in such a cluster 3kb will be unused. The exact details vary with the kind of file-system that you use, but most file-systems work that way.

3kb wasted space is not much for a single file, but if you have lots of very small files the waste can become a significant part of the disk usage.

Inside the tar-archive the files are not stored in clusters but one after another. That's where the difference comes from.

Nils Pipenbrinck
Makes perfect sense. I completely forgot about block size.
ieure
+2  A: 

Having no knowledge of what tar you're using or what sort of Unix system you're using, here's my guess: oldcode contains numerous smaller files, which when by themselves use disk space inefficiently, since disk space is allocated by some sort of block, rather than byte by byte. In the tar file, they're concatenated, and make maximum use of the disk space they're assigned.

David Thornley
+2  A: 

This has something to do with the blocksize of your filesystem. man 1 du on MacOSX 10.5.6 states:

The du utility displays the file system block usage for each file argument and for each directory in the file hierarchy rooted in each directory argument. If no file is specified, the block usage of the hierarchy rooted in the current directory is displayed.

[mirko@borg foo]$ ls -la
total 0
drwxr-xr-x   2 mirko  wheel   68 Jan 30 21:20 .
drwxrwxrwt  10 root   wheel  340 Jan 30 21:16 ..
[mirko@borg foo]$ du -sh
0B  .
[mirko@borg foo]$ touch foo
[mirko@borg foo]$ ls -la
total 0
drwxr-xr-x   3 mirko  wheel  102 Jan 30 21:20 .
drwxrwxrwt  10 root   wheel  340 Jan 30 21:16 ..
-rw-r--r--   1 mirko  wheel    0 Jan 30 21:20 foo
[mirko@borg foo]$ du -sh
0B  .
[mirko@borg foo]$ echo 1 > foo
[mirko@borg foo]$ ls -la
total 8
drwxr-xr-x   3 mirko  wheel  102 Jan 30 21:20 .
drwxrwxrwt  10 root   wheel  340 Jan 30 21:16 ..
-rw-r--r--   1 mirko  wheel    2 Jan 30 21:20 foo
[mirko@borg foo]$ du -sh
4.0K    .

As you see even a file of 2 bytes takes a whole block of 4kb. There are some filesystems which avoid this waste of space by block suballocation.

Mirko Friedenhagen