For Perl single character backslash escapes, you can do this safely using a two character eval
as part of the substitution. You need to put in the characters that are acceptable to interpret in the character class after the \
, and then the single character after is eval
'd and inserted into the string.
Consider:
#!/usr/bin/perl
use warnings;
use strict;
print "\n\n\n\n";
while (my $data = <DATA>) {
$data=~s/\\([rnt'"\\])/"qq|\\$1|"/gee;
print $data;
}
__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
A backslask:\\
Tab'\t'stop
line 1\rline 2 (on Unix, "line 1" will get overwritten)
line 3\\nline 4 (should result in "line 3\\nline 4")
line 5\r\nline 6
Output:
Hello!
I'd like to tell you a little "secret".
A backslask:\
Tab' 'stop
line 2 (on Unix, "line 1" will get overwritten)
line 3\nline 4 (should result in "line 3\nline 4")
line 5
line 6
The line s/\\([rnt'"\\])/"qq|\\$1|"/gee
does the work.
The \\([rnt'"\\])
has the acceptable characters to eval inside the braces.
The gee
part does a double eval on the replacement string.
The "qq|\\$1|"
part is eval'd twice. The first eval
replaces $1
into the string, and the second performs the interpolation.
I cannot think of a two character combination here that would be a security breach...
This method does not deal with the following properly:
Quoted strings. For example, Perl would not unescape the string 'line 1\nline 2' because of the single quotes.
Escapes sequences that are longer than a single character, such as hex \x1b
or Unicode such as \N{U+...}
or control sequences such as \cD
Anchored escapes, such as \LMAKE LOWER CASE\E or \Umake upper case\E
If you want more complete escape replacement, you can use this regex:
#!/usr/bin/perl
use warnings;
use strict;
print "\n\n\n\n";
binmode STDOUT, ":utf8";
while (my $data = <DATA>) {
$data=~s/\\(
(?:[arnt'"\\]) | # Single char escapes
(?:[ul].) | # uc or lc next char
(?:x[0-9a-fA-F]{2}) | # 2 digit hex escape
(?:x\{[0-9a-fA-F]+\}) | # more than 2 digit hex
(?:\d{2,3}) | # octal
(?:N\{U\+[0-9a-fA-F]{2,4}\}) # unicode by hex
)/"qq|\\$1|"/geex;
print $data;
}
__DATA__
Hello!\nI\'d like to tell you a little \"secret\".
Here is octal: \120
Here is UNICODE: \N{U+0041} and \N{U+41} and \N{U+263D}
Here is a little hex:\x50 \x5fa \x{5fa} \x{263B}
lower case next char \lU \lA
upper case next char \ua \uu
A backslask:\\
Tab'\t'stop
line 1\rline 2 (on Unix, "line 1" will get overwritten)
line 3\\nline 4 (should result in "line 3\\nline 4")
line 5\r\nline 6
That handles all Perl escapes except:
Anchored type (\Q, \U, \L ended by \E)
Quoted forms, such as 'don't \n escape in single quotes'
or [not \n in here]
named unicode characters, such as \N{THAI CHARACTER SO SO}
Control characters like \cD
(that is easily added...)
But that was not part of your question as I understood it...