views:

2391

answers:

11

This question about zip bombs naturally led me to the Wikipedia page on the topic. The article mentions an example of a 45.1 kb zip file that decompresses to 1.3 exabytes.

What are the principles/techniques that would be used to create such a file in the first place? I don't want to actually do this, more interested in a simplified "how-stuff-works" explanation of the concepts involved.

p.s.

The article mentions 9 layers of zip files, so it's not a simple case of zipping a bunch of zeros. Why 9, why 10 files in each?

+10  A: 

Create a 1.3 exabyte file of zeros.

Right click > Send to compressed (zipped) folder.

wefwfwefwe
I hope it's not that simple!!
pufferfish
You forgot the sarcasm "smiley."
tvanfosson
That would most likely be impossible with most file systems and compression algorithms due to file size limits. However, nesting files in the compressed archive (and putting more nested archives in the archive, if the compression algorithm has a total size limitation) allows you to bypass these limits.
Blixt
@pufferfish: loss-less(!) data compression generally relies on repeating data, so the 1.3 exabytes of zeros can be seen as 1.3 * 2^60 (=1.49879796 * 10^18) zeros. This little piece of information is sufficient to restore the original 1.3 exabyte file.
Martin Klinke
should make a 1.3 exabyte file of 1's. They're much skinnier than 0's :)
Quinn Wilson
I am not even sure this would work, unless the ZIP utility has smarts. With a HUGE amount of similarity in the file, i think file size would remain pretty contstant because the dictionary has to hold an entry for 1.3 exabytes of zeros otherwise.
San Jacinto
and the reason for the 9 layers of nested zip files mentioned?
pufferfish
@unknown - or you could just put the 1 zero in the dictionary?
wefwfwefwe
yes, you can. that's what i am referring to. it is up to the implementation. remember, most of the time you're not going to be packing a file full of 1 symbol.
San Jacinto
@quinn - that's why compressing the (initally fatter) zeros is much more effective
wefwfwefwe
This gives you a > 1gb zip file unless I'm mistaken
Chris S
+1 funny!
Mark Harrison
+1  A: 

I don't know if ZIP uses Run Length Encoding, but if it did, such a compressed file would contain a small piece of data and a very large run-length value. The run-length value would specify how many times the small piece of data is repeated. When you have a very large value, the resultant data is proportionally large.

Joe
ZIP uses the Lempel-Ziv-Welch (or a modified version of) compression which effectively tokenises the data. Long runs of 'sets' of bytes will result in good compression, hence why GIF (which also uses LZW) is good for graphics and JPEG (which uses a complex sine wave compression) is better for photos where the data is much more 'random'.
Lazarus
A: 

Perhaps, on unix, you could pipe a certain amount of zeros directly into a zip program or something? Don't know enough about unix to explain how you would do that though. Other than that you would need a source of zeros, and pipe them into a zipper that read from stdin or something...

Svish
Why the down vote? Am I wrong?
Svish
Svish, someone downvoted all the answers on the page.
James McMahon
Nope, it works - see my solution below for an example.
Thomi
Downvoted for disregarding the actual question, which mentions a specific file that's explicitly not the result of zipping one big stream of zeroes.
Michael Borgwardt
Nope, you'll still be limited by the computing power. Ideally you don't want to run gzip/zip since it will use a lot of CPU (or at least O(n) n being the size of the decompressed file)
tonfa
@tonfa: Well, of course you will be limited by computing power. My reasoning was that you might not want to create an exabyte large file on your disc and then zip that...
Svish
+2  A: 

To create one in a practical setting (i.e. without creating a 1.3 exabyte file on you enormous harddrive), you would probably have to learn the file format at a binary level and write something that translates to what your desired file would look like, post-compression.

Andy_Vulhop
There are many ways to circumvent this.
mafutrct
+2  A: 

Serious answer:

(Very basically) Compression relies on spotting repeating patterns, so the zip file would contain data representing something like

0x100000000000000000000000000000000000  
(Repeat this '0' ten trillion times)

Very short zip file, but huge when you expand it.

wefwfwefwe
+4  A: 

Below is for Windows:

From the Security Focus proof of concept (NSFW!), it's a ZIP file with 16 folders, each with 16 folders, which goes on like so (42 is the zip file name):

\42\lib 0\book 0\chapter 0\doc 0\0.dll
...
\42\lib F\book F\chapter F\doc F\0.dll

I'm probably wrong with this figure, but it produces 4^16 (4,294,967,296) directories. Because each directory needs allocation space of N bytes, it ends up being huge. The dll file at the end is 0 bytes.

Unzipped the first directory alone \42\lib 0\book 0\chapter 0\doc 0\0.dll results in 4gb of allocation space.

Chris S
How is SF NSFW?
Robert Fraser
I guess some workplaces have filtering or logging of such "dubious" sites... :-)
scraimer
I just assumed their were naked ladies doing security research.
James McMahon
The zip was nsfw. A big panic red alarm will go off and a cage will fall down from the ceiling around your desk
Chris S
You'll get an angry sys admin running to your desk if you click on the link, or just blocked URL and then a meeting with HR if you work at that kind of establishment
Chris S
If every hit on a virus file results in an interview with HR, then either you don't need the virus scanner, or else you don't need your HR department. One of them isn't contributing to the business ;-)
Steve Jessop
Could also be NSFW because a Network Virus Scanner might want to check it - and extract it to do so.
Michael Stum
The virus scanner should just mark it suspicious (which may result in it being safely blocked, or may result in you unsafely being reported for trying to install viruses). If the bomb actually explodes, then your IT department has learnt something valuable - they need a better virus scanner.
Steve Jessop
+18  A: 

Citing from the Wikipedia page:

One example of a Zip bomb is the file 45.1.zip which was 45.1 kilobytes of compressed data, containing nine layers of nested zip files in sets of 10, each bottom layer archive containing a 1.30 gigabyte file for a total of 1.30 exabytes of uncompressed data.

So all you need is one single 1.3GB file full of zeroes, compress that into a ZIP file, make 10 copies, pack those into a ZIP file, and repeat this process 9 times.

This way, you get a file which, when uncompressed completely, produces an absurd amount of data without requiring you to start out with that amount.

Additionally, the nested archives make it much harder for programs like virus scanners (the main target of these "bombs") to be smart and refuse to unpack archives that are "too large", because until the last level the total amount of data is not that much, you don't "see" how large the files at the lowest level are until you have reached that level, and each individual file is not "too large" - only the huge number is problematic.

Michael Borgwardt
Can't be... once you zip the file of zeros at the bottom, the resulting zipped file is not going to be nearly as compressible for the next layer.
pufferfish
Ah, but at each level, you have ten *identical* files - which again compresses nicely. Though ZIP does not exploit cross-file redundancy, an archive containing ten individually compressed identical files probably has lots of redundancy itself for the next layer to exploit.
Michael Borgwardt
This is insanely complicated, given that there are far simpler methods. As pufferfish pointed out, an already-compressed file is going to be *less* compressable than a non-compressed file, so your final zipped file will end up being larger than it needs to be.
Thomi
The point is NOT how to generate the maximum amount of data from the smallest possible file - the point is defeating virus scanners' attempts to guard against too-large archives.
Michael Borgwardt
That's not the thrust of the article on wikipedia. It seems to push a DOS-style attack.
San Jacinto
Yeah, but for that attack to succeed, the scanner has to actually open the archives, not refuse to do so because it can apply a simple "reject archives when the sum of the decompressed file size is larger than the HD's remaining free space" rule - the nested archives make it very hard to apply such a rule without already crashing during its application.
Michael Borgwardt
What you are saying is true, but you are downvoting valid answers based off of something the OP didn't even ask. Additionally, there are PLENTY of unzip tools that don't even pass the archive THROUGH the anti-virus, and there are many ways to obtain a file where the anti-virus doesn't have knowledge of the archive's existence. Also, what you are saying is extremely product-dependent. I see no reason for you to downvote simply because it doesn't fit your exact use case. Others have answered the OP correctly, even if not as thoroughly as your response was because of your knowledge of the topic.
San Jacinto
All modern anti-virus scanners work on-access and monitor all downloads (corporate proxies do this centrally). They do not depend on the cooperation of an unzip tool. In fact, I'm not aware of any unzip tool that actively calls a virus scanner to check archives.
Michael Borgwardt
Again, this is making a lot of assumptions. How many Linux servers are out there not running any anti-virus at all? I'm tired. You win.
San Jacinto
But the files don't get extracted recursively... the victim should keep on extracting the sub zip files to make it work...Any work around for it.
Manoj
A virus scanner *has* to recursively open archives in order to scan the files in them - or reject nested archives, but then you'll end up rejecting a lot of legitimate stuff (nearly every non-trivial Java app will have JAR libraries inside its distribution/installation archive).
Michael Borgwardt
@unknown I'm not saying all computers are running such virus scanners - but a very large percentage of all Windows PCs (and probably servers as well) does, and that's how they work.
Michael Borgwardt
@Michael I'm not contesting your explanation. Please see the comments on the OP.
San Jacinto
I tried making a file of zeros (1gb) and zipping it. It produces a 500mb zip file, unless Michael meant 0000,0000 for each byte.
Chris S
Also if you look at the exploit details, you can see McAfee, Sophos and a few others plugged this hole a while ago. However it still remains from 2001. Scansafe sees it at a trojan for some reason.
Chris S
If 1GB of zeroes (binary or ASCII zeroes does not matter) produces a 500MB ZIP file, then you either messed up and did not in fact fill the file with zeroes, or your ZIP packer is really really bad. And yeah, this isn't exactly new, so I'd expect the antivirus makers to have wised up... it's not impossible to defend against such a file, just somewhat hard.
Michael Borgwardt
+5  A: 

This is easily done under Linux using the following command:

dd if=/dev/zero bs=1024 count=10000 | zip zipbomb.zip -

Replace count with the number of KB you want to compress. The example above creates a 10MiB zip bomb (not much of a bomb at all, but it shows the process).

You DO NOT need hard disk space to store all the uncompressed data.

Thomi
But you *need* the computing power to compress the uncompressed data, it's still O(n) in the size of the *uncompressed* data.
tonfa
Yes, as are all the other answers here.
Thomi
Michael Borgwardt's answer is O(log N) in the size of the uncompressed data.
Steve Jessop
Approximately, anyway. Each repeat of the process "strip off the archive headers, duplicate the compressed file entry 10 times, replace the archive headers, compress" increases the level of zip nesting by 1, takes time proportional to the size of the compressed data from the previous step, multiplies the size of the uncompressed data by 10, and if it increases the size of the compressed data at all, certainly doesn't do so by anything like a linear factor.
Steve Jessop
So just as a test, I zip -9 1.3 GB of zeros. The result is a 1.3M file. I duplicated this 10 times (couldn't be bothered messing with the zip headers, so the result won't work as a zip bomb, but illustrates the principle) to give a 13M file, which compresses with zip -9 to 34381 bytes. So the duplication step actually makes the file smaller, because deflate only supports tokens of a certain max size. Next step results in 18453, then 19012, 19312, 19743, 20120, 20531, 20870.
Steve Jessop
A: 

All file compression algorithms rely on the entropy of the information to be compressed. Theoretically you can compress a stream of 0's or 1's, and if it's long enough, it will compress very well.

That's the theory part. The practical part has already been pointed out by others.

Calyth
+1  A: 

A nice way to create a zipbomb (or gzbomb) is to know the binary format you are targeting. Otherwise, even if you use a streaming file (for example using /dev/zero) you'll still be limited by computing power needed to compress the stream.

A nice example of a gzip bomb: http://selenic.com/googolplex.gz57 (there's a message embedded in the file after several level of compression resulting in huge files)

Have fun finding that message :)

tonfa
A: 

Even if I knew, I wouldn't put that in here for anyone to find.

redtuna