tags:

views:

2387

answers:

4

What is the most correct regular expression (regex) for a UNIX file path?

For example, to detect something like this:

/usr/lib/libgccpp.so.1.0.2

It's pretty easy to make a regular expression that will match most files, but what's the best one, including one that can detect escaped whitespace sequences, and unusual characters you don't usually find in file paths on UNIX.

Also, are there library functions in several different programming languages that provide a file path regex?

+1  A: 

This looks like it could fit what you're looking for: Windows and Unix Path Regex, or at least provide a starting point.

Andy Mikula
+5  A: 

If you don't mind false positives for identifying paths, then you really just need to ensure the path doesn't contain a NUL character; everything else is permitted (in particular, / is the name-separator character). The better approach would be to resolve the given path using the appropriate file IO function (e.g. File.exists(), File.getCanonicalFile() in Java).

Long answer:

This is both operating system and file system dependent. For example, the Wikipedia comparison of file systems notes that besides the limits imposed by the file system,

MS-DOS, Microsoft Windows, and OS/2 disallow the characters \ / : ? * " > < | and NUL in file and directory names across all filesystems. Unices and Linux disallow the characters / and NUL in file and directory names across all filesystems.

In Windows, the following reserved device names are also not permitted as filenames:

CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5,
COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, 
LPT5, LPT6, LPT7, LPT8, LPT9
Zach Scrivena
Additional: because of the variety between file systems, there are methods that get you the information you need.
Robert P
@Robert: Thanks! I've updated my answer accordingly.
Zach Scrivena
+2  A: 

I'm not sure how common a regex check for this is across systems, but most programming languages (especially the cross platform ones) provide a "file exists" check which will take this kind of thing into account

Out of curiosity, where are these paths being input? Could you control that to a greater degress to the point where you won't have to check the individual pieces of the path? For example using a file chooser dialog?

greg
+3  A: 

The proper regular expression to match all UNIX paths is: [^\0]+

That is, one or more characters that are not a NUL.

Darron