tags:

views:

151

answers:

2

While answering this question regarding safe escaping of filename with spaces (and potentially other characters), one of the answers said to use Perl's built-in quotemeta function.

The documentation of quotemeta states:

quotemeta (and \Q ... \E ) are useful when interpolating strings 
into regular expressions, because by default an interpolated variable 
will be considered a mini-regular expression.  

In the documentation for quotemeta, the only mention of its use is to escape all the characters other than /[A-Za-z_0-9]/ with a \ for use in a regex. It does not state the use for filenames. This does seem like a very pleasant, if undocumented, side effect however.

In a comment to Sinan Ünür answer to the earlier question, hobbs states:

shell escaping is different from regexp escaping, and although I can't come up with a situation where quotemeta would give a truly unsafe result, it's not meant for the task. If you must escape, instead of bypassing the shell, I suggest trying String::ShellQuote which takes a more conservative approach using sh single quotes to defang everything except single quotes themselves, and backslashes for single quotes. – hobbs Aug 13 '09 at 14:25

Is it safe -- completely -- to use quotemeta in place of more conservative file quoting like String::Shellquote? Is quotemeta utf8 or multibyte character safe?

I put together a test that is unclear. quotemeta works well, it seems, except for a file name or directory name with a \n, or \r in it. While rare, these characters are legal in Unix and I have seen them. Recall that certain characters, such as LF, CR and NUL cannot be escaped with \. I read my hard drive with 700k files with quotemeta and had no failures.

I have suspicion (though I have not demonstrated it yet) that quotemeta might fail with multibyte characters where one or more of the bytes falls into the ASCII range. For example,à can be encoded as one character (UTF8 C3 A0) or as two characters (U+0061 gives a u+0300 is a combining graves accent). The only demonstrated failure I have with quotemeta is with files with a \n or \r in the path that I created. I would be interested in other characters to put in nasty_names to test.

ShellQuote works perfectly on all file names except those terminated by a NUL when creating a file. I have never ever had a failure with it.

So what to use? Just to be clear: shell quoting is not something I do often, since I usually just use Perl open to open a pipe to a process. That method does not suffer the shell issues discussed. I am interested since I have seen quotemeta used often for file name escaping.

(Thanks to Ether I have added IPC::System::Simple)

Test file:

use strict; use warnings; use autodie;
use String::ShellQuote;
use File::Find;
use File::Path;
use IPC::System::Simple 'capturex';

my @nasty_names;
my $top_dir = '/Users/andrew/bin/pipetestdir/testdir';
my $sub_dir = "easy_to_remove_me";
my (@qfail, @sfail, @ipcfail);

sub wanted { 
    if ($File::Find::name) { 
         my $rtr;
         my $exec1="ls ".quotemeta($File::Find::name);
         my $exec2="ls ".shell_quote($File::Find::name);
         my @exec3= ("ls", $File::Find::name);

         $rtr=`$exec1`;  
         push @qfail, "$exec1" 
              if $rtr=~/^\s*$/ ;

         $rtr=`$exec2`;
         push @sfail, "$exec2" 
              if $rtr=~/^\s*$/ ;

         $rtr = capturex(@exec3);
         push @ipcfail, \@exec3
              if $rtr=~/^\s*$/ ;     
    }
}

chdir($top_dir) or die "$!";
mkdir "$top_dir/$sub_dir";
chdir "$top_dir/$sub_dir";

push @nasty_names, "name with new line \n in the middle";
push @nasty_names, "name with CR \r in the middle";
push @nasty_names, "name with tab\tright there";
push @nasty_names, "utf \x{0061}\x{0300} combining diacritic";
push @nasty_names, "utf e̋ alt combining diacritic";
push @nasty_names, "utf e\x{cc8b} alt combining diacritic";
push @nasty_names, "utf άέᾄ greek";
push @nasty_names, 'back\slashes\\Not\\\at\\\\end';
push @nasty_names, qw|back\slashes\\IS\\\at\\\\end\\\\|;

sub create_nasty_files {
    for my $name (@nasty_names) {
       open my $fh, '>', $name ; 
       close $fh;
    }
}

for my $dir (@nasty_names) {
    chdir("$top_dir/$sub_dir");
    mkpath($dir);
    chdir $dir;
    create_nasty_files();
}

find(\&wanted, $top_dir);

print "\nquotemeta failed on:\n", join "\n", @qfail;
print "\nShell Quote failed on:\n", join "\n", @sfail;
print "\ncapturex failed on:\n", join "\n", @ipcfail;
print "\n\n\n",
      "Remove \"$top_dir/$sub_dir\" before running again...\n\n";
+2  A: 

You could also use IPC::System::Simple capture() or capturex() (which I suggested in another answer on that first question), which will let you bypass the shell.

I added these lines to your script and found that no examples failed:

use IPC::System::Simple 'capturex';
...
my (@qfail, @sfail, @ipcfail);
...
         my @exec3= ("ls", $File::Find::name);
...
         $rtr = capturex(@exec3);
         push @ipcfail, \@exec3
              if $rtr=~/^\s*$/ ;
...
print "\ncapturex failed on:\n", join "\n", @ipcfail;

But in general, you should solve the actual problem, rather than attempting to find better band-aids. quotemeta is intended specifically to escape regular expression-significant characters, which as you have discovered is not a perfect overlap with the set of characters that are significant to the shell.

Ether
+1: I agree with IPC or just the later Perl open form of it.
drewk
Thanks for adding to the test script...
drewk
It's not that `quotemeta` quotes the wrong characters, it's that in standard Unix shells, backslash followed by newline does not mean a newline, it means nothing. It's used to allow you to break long commands in a shell script over multiple lines. (I never use newlines in filenames, so I don't often think about the problems that causes.)
cjm
@cjm: well, it quotes the wrong characters in the sense that it isn't properly quoting a newline; a corollary in this case is one *cannot* properly quote a backslash in this case (for the shell). :)
Ether
+10  A: 

Quotemeta is safe under these assumptions:

  1. Only non-alphanumeric characters have a special meaning.
  2. If a non-alphanumeric character has a special meaning, putting a backslash in front of it will always make it non-special.
  3. If a non-alphanumeric character doesn't have a special meaning, putting a backslash in front of it will do nothing.

The shell violates rules 2 and 3 no matter what quote context you use -- outside of quotes, backslash-newline doesn't generate newline; in double-quotes, backslash-punctuation puts a backslash into the output (outside of a certain list of punctuation); and in single-quotes, everything is literal and backslash doesn't even protect you against a closing single-quote.

I still recommend String::ShellQuote if you need to quote things for the shell. I also recommend avoiding letting the shell process your filenames entirely, if you can, by using LIST-form system/exec/open or IPC::Open2, IPC::Open3, or IPC::System::Simple.

As for things besides the shell... lots of different things violate one or more of the rules. For example, obsolete POSIX "basic" regexes and various kinds of editor regexes have punctuation characters that are non-special by default, but become special when preceded by backslash. Basically what I'm saying is, know the thing that you're feeding your data to very well, and escape properly. Only use quotemeta if it's an exact fit, or if you're using it for something that's not very important.

hobbs
+1: Thanks. I use `String::ShellQuote` mostly because I understand how it works.
drewk