views:

204

answers:

7

I want to check that a file system path is valid and safe to use relative to another path. So I want to know if there are any other special characters like /../ and /./ which might cause a path to actually point somewhere else.

If that is all I have to worry about then a quick replace of those chars followed by something like this to check for any other bad filesystem chars should work right?

[^a-z0-9\.\-_]

(On windows stuff like C:\ would also have to be allowed)

The use case is that I have a folder which site administrators can create directories in and I want to FORCE them to only create directories in that folder. In other words, no being sneaky with ...path/uploads/../../../var/otherfolder/ if you know what I mean ;)

+2  A: 

Which language are you using?

In PHP, for example, you can get the realpath of any string and then compare it to a base directory. If you find your base directoy is a prefix of the realpath, then you're good to go.

Although that's only for PHP, you should be able to find a similar approach in other languages.

Seb
I am in fact using PHP - but realpath() requires that the path actually exists in order for this to work. Some of these paths don't exist yet - I want to check that they are valid safe locations *before* I create them.
Xeoncross
simple solution, if the directory you pass to realpath fales, remove the last file/directory and test again. If that fails then you know that you not only need to create the leaf but also the parent directory? Do you want that. If yes create it if not then say it is an invalid dir.
hhafez
+1  A: 

The answer depends on the filesystem used. It's different on Windows, different on *nix.

For example, on Windows-based desktop platforms, invalid path characters might include quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0), and Unicode characters 16 through 18 and 20 through 25.

I don't know which platform/language are you using, but if you are using .NET you can get list of chars which cannot be in filename by calling Path.GetInvalidFilenameChars and list of chars which cannot be in path by calling Path.GetInvalidPathChars

Martin Vobr
Thanks, but i'm not worried about supporting all the possible characters I can. Anything not matching the above regex is nonstandard in all the systems I have ever used. The thing that I want to know is if there are other *symbolic link* commands that might change a paths value when resolved.
Xeoncross
+2  A: 

There are several oddities on Windows/DOS. Opening any of these will both read and write to unexpected places. I havnt tried how .NET handles these, but I presume that you would get some kind of security exceptions.

CON   Console. Reads from keyboard, writes to screen.
      "COPY CON temp.txt", end input with ctrl-z.

PRN   Printer. (Defaults to LPT1?)
LPTn  Parallell ports.

AUX   "Auxiliary device." Have never seen anyone use this myself.

COMn  Serial ports.

NUL   /dev/null
Simon Svensson
It's very enlightening to check out `FileInfo` and `FileStream` code, in fact - it checks for all that stuff, and then some.
Pavel Minaev
+1  A: 

Unix symbolic links can be tricky, and can even be created to cause pathing loops on some systems. You should lstat() the filename to get the actual inode and devno numbers to see if two pathnames are actually the same file.

Loadmaster
Note that NTFS also supports linking, including directory symlinks (usually done using a junction point).
Michael Madsen
In Vista/2008 and above, NTFS supports full-featured directory symlinks.
Pavel Minaev
But again, if one of the paths does not exist yet what can I do then? Before I create it I want to insure it is a child of another path.
Xeoncross
Then compare the two parent directories.
Loadmaster
+1  A: 

Have you considered using something like chroot? You can create something called a "chroot jail" that will prevent people from getting outside it. This is enforced by the OS, so you don't have to write it yourself. Note that this only works on *nix, and on some variants of *nix, it does not have all the security features necessary to make it foolproof (i.e. there are known ways of escaping).

rmeador
Great idea, but in my case I'm thinking of a webserver with an app like ruby or PHP which manages some directories in the webroot. Wouldn't chroot'ing it cause some problems for apache?
Xeoncross
as the wikipedia link notes, one common use of chroot is to confine a web server so that if it is compromised, the attacker can't take over the whole system. I've never set up this type of configuration, but it can be done :) I bet there are tutorials if you google for it.
rmeador
+1  A: 

For resolving paths, ., and .., (and in most cases, // for Unix and \\ for Windows) are the main things you really need to worry about in terms of resolving paths. From RFC 3986, this is the algorithm for resolving relative paths in URIs. For the most part, it also applies to file system paths.

An algorithm, remove_dot_segments:

  1. The input buffer is initialized with the now-appended path components and the output buffer is initialized to the empty string.
  2. While the input buffer is not empty, loop as follows:
    1. If the input buffer begins with a prefix of "../" or "./", then remove that prefix from the input buffer; otherwise,
    2. If the input buffer begins with a prefix of "/./" or "/.", where "." is a complete path segment, then replace that prefix with "/" in the input buffer; otherwise,
    3. If the input buffer begins with a prefix of "/../" or "/..", where ".." is a complete path segment, then replace that prefix with "/" in the input buffer and remove the last segment and its preceding "/" (if any) from the output buffer; otherwise,
    4. If the input buffer consists only of "." or "..", then remove that from the input buffer; otherwise,
    5. Move the first path segment in the input buffer to the end of the output buffer, including the initial "/" character (if any) and any subsequent characters up to, but not including, the next "/" character or the end of the input buffer.
  3. Finally, the output buffer is returned as the result of remove_dot_segments.

Example run:

STEP   OUTPUT BUFFER         INPUT BUFFER

 1 :                         /a/b/c/./../../g
 2E:   /a                    /b/c/./../../g
 2E:   /a/b                  /c/./../../g
 2E:   /a/b/c                /./../../g
 2B:   /a/b/c                /../../g
 2C:   /a/b                  /../g
 2C:   /a                    /g
 2E:   /a/g

STEP   OUTPUT BUFFER         INPUT BUFFER

 1 :                         mid/content=5/../6
 2E:   mid                   /content=5/../6
 2E:   mid/content=5         /../6
 2C:   mid                   /6
 2E:   mid/6

Don't forget that it's possible to do things like specify more ".." segments than there are parent directories. So if you're trying to resolve a path, you could end up trying to resolve beyond /, or in the case of Windows, C:\.

Bob Aman
Ok, then I guess with the addition of `//` and `\\` that is all I need to worry about. At least, no one else has mentioned anything.
Xeoncross
Yeah, replace `\ ` with `/` for the sake of normalizing, and then replace `//` with `/` after running the above algorithm, and I think you should be covered, or at least very close.
Bob Aman
That is just incredibly dangerous.
Tom Hawtin - tackline
Depends on what he does with it. Admittedly, I did just gloss over the security aspect of this and just answered his question. But yes, you could do some serious damage if you don't actually check that the result of this algorithm is a legitimate location. The advice regarding `chroot` was not-to-be-ignored.
Bob Aman
In other news, this is one more reason never to use PHP.
Bob Aman
haha, the language makes no difference. People can program poorly anywhere.
Xeoncross
I beg to differ. You're right that people can program poorly anywhere, but the language absolutely **does** make a difference! You will find roughly a thousand times more vulnerable sites based on PHP with an objective google hacking query than say, ones based on Ruby or Python. You can write insecure sites in those languages too, but the tools are much better and the programmers are better educated on the security dangers, so they're a lot less common.
Bob Aman
@Bob - the reason you find more vulnerable sites in PHP is because it's simpler to learn for a beginner than Ruby or Python. I code in PHP and I consider my sites to be secure - because I've been programming professionally for over 7 years and I know what I do.
Seb
@Seb The corollary to "people can program poorly anywhere" is that people can program well anywhere. That said, when I interview PHP programmers, the one question I'm guaranteed to ask is "What's good about PHP and what's bad about it?" I'm absolutely going to give a "Don't hire" answer if all they do is rave about it. As far as I'm concerned, there are three things that makes PHP worth knowing: it's easy to learn, it's deployed everywhere, and it's wildly popular. If you aren't specifically trying to take advantage of one of those, you shouldn't be using it.
Bob Aman
@Bob The reason that most PHP sites are such a mess - is because of the caliber of the programmer creating them. By the time Ruby got big, only people fully capable of proper coding standards made the jump - giving the allusion that Ruby was more secure by the plain fact that all the beginners we're left behind.
Xeoncross
@Xeoncross @Seb I think both of you misunderstand. I'm saying that the language and the developer ecosystem around the language make it easier to write things securely in Ruby. For instance, virtually all of the DB access libraries for Ruby use the `?` notation for inserting variables properly escaped into SQL query strings. Conversely, the `mysql_query` function in PHP **still** doesn't do this. Even with the warning in the manual about escaping things, this makes it very easy for developers who don't know better to write vulnerable code.
Bob Aman
True, many insecure properties of the infancy of PHP still exist from the old days. Register Globals and the PHP 4-5 string functions lack of multibyte support are among others.
Xeoncross
+1  A: 

I've already directly answered the question, but as Tom said, what you're trying to do is inherently dangerous. What you should probably do instead is create one directory at a time. Pass it through a regexp validator and don't let them use dot segments at all. Just have a text field in a form for the directory name and a "Make Directory" button. Let them traverse the directory tree to create sub-directories. This way you can be absolutely confident that the files are going where they should.

This has the advantage of working on both Windows and *nix without the need for chroot.

Addenda:

This Regexp will only match illegitimate directory names, assuming that you're accepting directories one at a time:

/^(\.\.?|.*?[^a-zA-Z0-9\. _-]+.*?|^)$/

Valid directory names:

  • "This is a directory"
  • ".hidden"
  • "example.com"
  • "10-28-2009"

Invalid directory names:

  • ""
  • "."
  • ".."
  • "../somewhere/else"
  • "/etc/passwd"
  • "would:be?rejected!by;OS"
Bob Aman
Yes, I plan on only allowing admins to create directories one step at a time so I can safely validate them. However, in this case I wanted to prevent bad coding from being allowed access to dangerous locations should something go wrong. Basically, just an extra safety check against bad PHP programmers. ;)
Xeoncross