views:

149

answers:

2

Possible Duplicate:
How can I manually interpolate string escapes in a Perl string?

I'm reading a string from a particular file. The problem with it is that it contains escaped characters, like:

Hello!\nI\'d like to tell you a little \"secret\"...

I'd like it to be printed out without escape sequences, like:

Hello!
I'd like to tell you a little "secret".

I thought about removing single backslashes and replacing double with single (since \ is represented as \\), but that doesn't help me with the \n, \t issues and so on. Before trying to fiddle with ugly, complex replace strings I thought I'd ask - maybe Perl has a built-in mechanism for such transformation?

+1  A: 

I hate to suggest this, but string eval would solve the problem, but string eval brings up a host of security and maintenance issues. Where does this data come from? Are there any contracts between the producers of data and you about what the string will hold?

#!/usr/bin/perl

use strict;
use warnings;

while (my $input = <DATA>) {
    #note: this only works if # is not allowed as a character in the string
    my $string = eval "qq#$input#" or die $@;
    print $string;
}

__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
This is bad @{[print "I have pwned you\n"]}.

The other solution is to create a hash that defines all of the escapes you want to implement and do a substitution.

Chas. Owens
It's a local application, a command line script, used to parse logfiles from some other utility. In that case I think eval wouldn't be that much of a security breach, right?
Neo
Are you eval'ing what is in the log files? If so, how did the data get in the log file? If all a user has to do is craft the right message to break or compromise your code, then they will. A better option would be to fix whoever is writing the log files to use a standardized method of escaping special characters like the one in RFC 3986 (i.e. URI escaping).
Chas. Owens
try Safe for this.
sreservoir
+1  A: 

For Perl single character backslash escapes, you can do this safely using a two character eval as part of the substitution. You need to put in the characters that are acceptable to interpret in the character class after the \, and then the single character after is eval'd and inserted into the string.

Consider:

#!/usr/bin/perl
use warnings;
use strict;

print "\n\n\n\n";

while (my $data = <DATA>) {
    $data=~s/\\([rnt'"\\])/"qq|\\$1|"/gee;
    print $data;
}

__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
A backslask:\\
Tab'\t'stop
line 1\rline 2  (on Unix, "line 1" will get overwritten)
line 3\\nline 4 (should result in "line 3\\nline 4")
line 5\r\nline 6

Output:

Hello!
I'd like to tell you a little "secret".
A backslask:\
Tab'    'stop
line 2  (on Unix, "line 1" will get overwritten)
line 3\nline 4 (should result in "line 3\nline 4")
line 5
line 6

The line s/\\([rnt'"\\])/"qq|\\$1|"/gee does the work.

  • The \\([rnt'"\\]) has the acceptable characters to eval inside the braces.

  • The gee part does a double eval on the replacement string.

  • The "qq|\\$1|" part is eval'd twice. The first eval replaces $1 into the string, and the second performs the interpolation.

I cannot think of a two character combination here that would be a security breach...

This method does not deal with the following properly:

  • Quoted strings. For example, Perl would not unescape the string 'line 1\nline 2' because of the single quotes.

  • Escapes sequences that are longer than a single character, such as hex \x1b or Unicode such as \N{U+...} or control sequences such as \cD

  • Anchored escapes, such as \LMAKE LOWER CASE\E or \Umake upper case\E

If you want more complete escape replacement, you can use this regex:

#!/usr/bin/perl
use warnings;
use strict;

print "\n\n\n\n";

binmode STDOUT, ":utf8";

while (my $data = <DATA>) {
    $data=~s/\\(
        (?:[arnt'"\\]) |               # Single char escapes
        (?:[ul].) |                    # uc or lc next char
        (?:x[0-9a-fA-F]{2}) |          # 2 digit hex escape
        (?:x\{[0-9a-fA-F]+\}) |        # more than 2 digit hex
        (?:\d{2,3}) |                  # octal
        (?:N\{U\+[0-9a-fA-F]{2,4}\})   # unicode by hex
        )/"qq|\\$1|"/geex;  
    print $data;
}

__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
Here is octal: \120 
Here is UNICODE: \N{U+0041} and \N{U+41} and \N{U+263D}
Here is a little hex:\x50 \x5fa \x{5fa} \x{263B}
lower case next char \lU \lA
upper case next char \ua \uu
A backslask:\\
Tab'\t'stop
line 1\rline 2  (on Unix, "line 1" will get overwritten)
line 3\\nline 4 (should result in "line 3\\nline 4")
line 5\r\nline 6

That handles all Perl escapes except:

  1. Anchored type (\Q, \U, \L ended by \E)

  2. Quoted forms, such as 'don't \n escape in single quotes' or [not \n in here]

  3. named unicode characters, such as \N{THAI CHARACTER SO SO}

  4. Control characters like \cD (that is easily added...)

But that was not part of your question as I understood it...

drewk
That first replace worked great, thanks!
Neo