Im really worried about people being able to upload malicious php files, big security risk!
Tip of the iceberg!
i also need to be aware of people changing the extensions of php files trying to get around this security feature.
Generally changing the extensions will stop PHP from interpreting those files as scripts. But that's not the only problem. There are more things than ‘...php’ that can damage the server-side; ‘.htaccess’ and files with the X bit set are the obvious ones, but by no means all you have to worry about. Even ignoring the server-side stuff, there's a huge client-side problem.
For example if someone can upload an ‘.html’ file, they can include a <script> tag in it that hijacks a third-party user's session, and deletes all their uploaded files or changes their password or something. This is a classic cross-site-scripting (XSS) attack.
Plus, thanks to the ‘content-sniffing’ behaviours of some browsers (primarily IE), a file that is uploaded as ‘.gif’ can actually contain malicious HTML such as this. If IE sees telltales like (but not limited to) ‘<html>’ near the start of the file it can ignore the served ‘Content-Type’ and display as HTML, resulting in XSS.
Plus, it's possible to craft a file that is both a valid image your image parser will accept, and contains embedded HTML. There are various possible outcomes depending on the exact version of the user's browser and the exact format of the image file (JPEGs in particular have a very variable set of possible header formats). There are mitigations coming in IE8, but that's no use for now, and you have to wonder why they can't simply stop doing content-sniffing, you idiots MS instead of burdening us with shonky non-standard extensions to HTTP headers that should have Just Worked in the first place.
I'm falling into a rant again. I'll stop. Tactics for serving user-supplied images securely:
1: Never store a file on your server's filesystem using a filename taken from user input. This prevents bugs as well as attacks: different filesystems have different rules about what characters are allowable where in a filename, and it's much more difficult than you might think to ‘sanitise’ filenames.
Even if you took something very restrictive like “only ASCII letters”, you still have to worry about too-long, too-short, and reserved names: try to save a file with as innocuous a name as “com.txt” on a Windows server and watch your app go down. Think you know all the weird foibles of path names of every filesystem on which your app might run? Confident?
Instead, store file details (such as name and media-type) in the database, and use the primary key as a name in your filestore (eg. “74293.dat”). You then need a way to serve them with different apparent filenames, such as a downloader script spitting the file out, a downloader script doing a web server internal redirect, or URL rewriting.
2: Be very, very careful using ZipArchive. There have been traversal vulnerabilities in extractTo of the same sort that have affected most naive path-based ZIP extractors. In addition, you lay yourself open to attack from ZIP bombs. Best to avoid any danger of bad filenames, by stepping through each file entry in the archive (eg. using zip_read/zip_entry_*) and checking its details before manually unpacking its stream to a file with known-good name and mode flags, that you generated without the archive's help. Ignore the folder paths inside the ZIP.
3: If you can load an image file and save it back out again, especially if you process it in some way in between (such as to resize/thumbnail it, or add a watermark) you can be reasonably certain that the results will be clean. Theoretically it might be possible to make an image that targeted a particular image compressor, so that when it was compressed the results would also look like HTML, but that seems like a very difficult attack to me.
4: If you can get away with serving all your images as downloads (ie. using ‘Content-Disposition: attachment’ in a downloader script), you're probably safe. But that might be too much of an inconvenience for users. This can work in tandem with (3), though, serving smaller, processed images inline and having the original higher-quality images available as a download only.
5: If you must serve unaltered images inline, you can remove the cross-site-scripting risk by serving them from a different domain. For example use ‘images.example.com’ for untrusted images and ‘www.example.com’ for the main site that holds all the logic. Make sure that cookies are limited to only the correct virtual host, and that the virtual hosts are set up so they cannot respond on anything but their proper names (see also: DNS rebinding attacks). This is what many webmail services do.
In summary, user-submitted media content is a problem.
In summary of the summary, AAAARRRRRRRGGGGHHH.
ETA re comment:
at the top you mentioned about 'files with the X bit set', what do you mean by that?
I can't speak for ZipArchive.extractTo() as I haven't tested it, but many extractors, when asked to dump files out of an archive, will recreate [some of] the Unix file mode flags associated with each file (if the archive was created on a Unix and so actually has mode flags). This can cause you permissions problems if, say, owner read permission is missing. But it can also be a security problem if your server is CGI-enabled: an X bit can allow the file to be interpreted as a script and passed to any script interpreter listed in the hashbang on the first line.
i thought .htaccess had to be in the main root directory, is this not the case?
Depends how Apache is set up, in particular the AllowOverride directive. It is common for general-purpose hosts to AllowOverride on any directory.
what would happen if someone still uploaded a file like ../var/www/wr_dir/evil.php?
I would expect the leading ‘..’ would be discarded, that's what other tools that have suffered the same vulnerability have done.
But I still wouldn't trust extractTo() against hostile input, there are too many weird little filename/directory-tree things that can go wrong — especially if you're expecting ever to run on Windows servers. zip_read() gives you much greater control over the dearchiving process, and hence the attacker much less.