views:

213

answers:

2

I'd like to create following functionality for my web-based application:

  1. user uploads an archive file (zip/rar/tar.gz/tar.bz etc) (content - several image files)
  2. archive is automatically extracted after upload
  3. images are shown in the HTML list (whatever)

Are there any security issues involved with extraction process? E.g. possibility of malicious code execution contained within uploaded files (or well-prepared archive file), or else?

+4  A: 

Aside the possibility of exploiting the system with things like buffer overflows if it's not implemented carefully, there can be issues if you blindly extract a well crafted compressed file with a large file with redundant patterns inside (a zip bomb). The compressed version is very small but when you extract, it'll take up the whole disk causing denial of service and possibly crashing the system.

Also, if you are not careful enough, the client might hand a zip file with server-side executable contents (.php, .asp, .aspx, ...) inside and request the file over HTTP, which, if not configured properly can result in arbitrary code execution on the server.

Mehrdad Afshari
Whoa, that zipbomb thing looks rather devastating ;) (http://en.wikipedia.org/wiki/Zip_bomb). Are the other archive formats/extraction algorithms issues-free?
gorsky
The very purpose of a compression algorithm is to compress stuff as good as possible so it's not considered a flaw in the algorithm. The entity who decompresses the file should perform some sanity checks on the size and prevent decompression of suspicious files.
Mehrdad Afshari
With zip file you can't reliably work out what size you are going to end up with until after you have decompressed.
Tom Hawtin - tackline
+2  A: 

In addition to Medrdad's answer: Hosting user supplied content is a bit tricky. If you are hosting a zip file, then that can be used to store Java class files (also used for other formats) and therefore the "same origin policy" can be broken. (There was the GIFAR attack where a zip was attached to the end of another file, but that no longer works with the Java PlugIn/WebStart.) Image files should at the very least be checked that they actually are image files. Obviously there is a problem with web browsers having buffer overflow vulnerabilities, that now your site could be used to attack your visitors (this may make you unpopular). You may find some client side software using, say, regexs to pass data, so data in the middle of the image file can be executed. Zip files may have naughty file names (for instance, directory traversal with ../ and strange characters).

What to do (not necessarily an exhaustive list):

  • Host user supplied files on a completely different domain.
  • The domain with user files should use different IP addresses.
  • If possible decode and re-encode the data.
  • There's another stackoverflow question on zip bombs - I suggest decompressing using ZipInputStream and stopping if it gets too big.
  • Where native code touches user data, do it in a chroot gaol.
  • White list characters or entirely replace file names.
  • Potentially you could use an IDS of some description to scan for suspicious data (I really don't know how much this gets done - make sure your IDS isn't written in C!).
Tom Hawtin - tackline