With most Linux distributions dropping gzip and bzip2 in favor of LZMA2 for compressing their packages, and many open source implementations for many platforms, I wonder: Shouldn't we lay DEFLATE and the .zip
format (which unfortunately got bastardized over and over) to rest, and move on to other, modern ways of distributing our (source) packages?
GNU tar supports the J
switch, which uses xz
(another LZMA2 compressor) as filter:
$ tar cJf foo.tar.xz foo/
However, I tend to use 7z
(p7zip implementation) and it's friend 7za
under Linux for creating archives. I still use the "avoid tar-bombs" paradigm, when creating archives, meaning there's a directory in that archives, so extracting from commandline does not result in spilling out files in the current directory (this is standard modus operandi on Linux with things like tar
, but it seems to be much less of a commen thing to do under Windows).
Anyways, it seems due to the use in packages (Fedora RPMs and Ubuntu DEBs, for instance), as well as filters for tools like tar
, that LZMA2 is the "next best thing" coming to use after bzip2. It has a great compression rate (beats bzip2 by far in standard settings) and is very fast at it, too (compression is slightly slower than gzip,
I did some benchmarking myself, but I'd like to turn the spot on some more extensive Benchmarks:
- Rating based benchmark at compressionratings.com
- Efficiency based benchmark at maximumcompression.com
Now, you'll notice, that 7-zip, which is the reference implementation, does not appear on first place. However Freearc uses it's own .arc
format, which is not really cross platform capable and not compatible to the old ARC from the 80's. nanozip isn't open source, which kind of a downturn, but it's the algorithm that counts, not the archiver!
Anyways, now that performance with 7-zip and its derivative implementations (xz), is not an issue any more, and the compression ratio is speaking for itself, I feel like distributing my source packages as .7z
or .tar.xz
archives. However, I have two hurdles in front of me, which I don't seem able to take:
Advocates of WinRAR. Dont' get me wrong, I hold no grudge against WinRAR or its users, it's just that I can't really make RARs on Linux, and there's no need to, since we have free LZMA2 tools. And as I said, since becoming an integral part of distribution packages, it's available on any modern Distribution. Since it takes about the same time to make a
.7z
than a.rar
and LZMA2 files are generally smaller, I don't see why not use 7-zip.tar archives have to be zip or bzip2, no exceptions. This is a hard one. Why are so many people impressed with gzip? Even bzip2 doesn't see much usage most of the time. Granted, gzip is fast, a good point when it comes to on-demand compression such as in web servers, or when creating large mirror-backups. But what about distributing software? LZMA2 is very asymmetrical. While compression takes its time, decompression is blazingly fast.
OK, now here comes my question:
Since LZMA2 is arguably the next better compression algorithm, why are people not jumping onto the train? Why do people still use WinRAR, which is proprietary, has a worse compression ratio, and is not ported to Linux (except unrar
, but you obviously can't create archives with that). Why are Tarballs still mostly gziped?
Is there no way on how to convince people to move on to a newer, reliable archiving format, that's not only cross-platform, but also free? When I give someone a file ending in .7z
, they tend not to know what to do with it, will this ever change?
Oh, and here's the little benchmark I did myself. I used the default settings everywhere:
11837440 GNUtar_TAR.tar
10657984 Arc_ARC.arc
9632524 PA2010_TAR_BZip2.tar.bz2
9536967 PA2010_LHA_Frozen5.lzh
9510148 PA2010_ZIP_BZip2.zipx
9490211 GNUtar_TAR.tar.bz2
9467242 PA2010_LHA_Frozen6.lzh
9463630 7-zip_ZIP_BZip2.zip
9437520 7-zip_7-ZIP_BZip2.7z
9398798 Arj_ARJ.arj
9373435 GNUtar_TAR.tar.gz
9370456 PA2010_BlackHole_Deflate.bh
9369621 Lha_LHA_Frozen6.lzh
9367712 PA2010_ZIP_Deflate.zip
9364237 PA2010_TAR_gzip.tar.gz
9360248 PA2010_Cabinet_MsZip.cab
9303923 7-zip_ZIP_Deflate.zip
9215279 7-zip_ZIP_Deflate64.zip
9189365 PA2010_ZIP_PPMd.zipx
9060663 PA2010_7-ZIP_PPMd.7z
8931280 PA2010_Cabinet_LZX.cab
8847427 7-zip_7-ZIP_PPMd.7z
8803350 PA2010_ZIP_Optimized.zipx
8803350 PA2010_ZIP_Wavpack.zipx
8802850 PA2010_ZIP_LZMA.zipx
5812491 FreeArc_7-ZIP.arc
5789853 7-zip_7-ZIP_LZMA.7z
5789853 PA2010_7-ZIP_LZMA.7z
5789024 GNUtar_TAR.tar.xz
5782637 FreeArc_UHARC.arc
5770969 FreeArc_CCM.arc
5739697 Fp8_5.fp8
5718865 Fp8_8.fp8
5685234 Paq8px_5.paq8px
5677662 Paq8kx_5.paq8kx
5644422 Paq8px_8.paq8px
5609608 Paq8kx_8.paq8kx
(Size in Bytes; Filename: Archiver_Format_Algorithm.Extension
)
The set of filles consists of disk images which contain a DOS installation:
1474979 disk01.144
1474979 disk02.144
1474979 disk03.144
1474979 disk04.144
1474979 disk05.144
1474979 ldisk01.144
1474979 ldisk02.144
1474979 ldisk03.144
24325 diskcopy.com
(Size in Bytes)