#!/usr/bin/perl
use strict; use warnings;
use Socket qw( :crlf );
my $text = "a${CR}b${CRLF}c${LF}";
$text =~ s/$LF|$CR$LF?/<br>/g;
print $text;
Following up on @daxim's comment, here is the modified version:
#!/usr/bin/perl
use strict; use warnings;
use charnames ':full';
my $text = "a\N{CR}b\N{CR}\N{LF}c\N{LF}";
$text =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;
print $text;
Following up on @Marcus's comment here is a contrived example:
#!/usr/bin/perl
use strict; use warnings;
use charnames ':full';
my $t = (my $s = "a\012\015\012b\012\012\015\015c");
$s =~ s/\r?\n/<br>/g;
$t =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;
print "This is \$s: $s\nThis is \$t:$t\n";
This is a mismash of carriage returns and line feeds (which, at some point in the past, I did encounter).
Here is the output of the script on Windows using ActiveState Perl:
C:\Temp> t | xxd
0000000: 5468 6973 2069 7320 2473 3a20 613c 6272 This is $s: a<br
0000010: 3e3c 6272 3e62 3c62 723e 3c62 723e 0d0d ><br>b<br><br>..
0000020: 630d 0a54 6869 7320 6973 2024 743a 613c c..This is $t:a<
0000030: 6272 3e3c 6272 3e62 3c62 723e 3c62 723e br><br>b<br><br>
0000040: 3c62 723e 3c62 723e 630d 0a <br><br>c..
or, as text:
chis is $s: a<br><br>b<br><br>
This is $t:a<br><br>b<br><br><br><br>c
Admittedly, you are not likely to end up with this input. However, if you want to cater for any unexpected oddities that might indicate a line ending, you might want to use
$s =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;
Also, for reference, CGI.pm canonicalizes line-endings this way:
# Define the CRLF sequence. I can't use a simple "\r\n" because the meaning
# of "\n" is different on different OS's (sometimes it generates CRLF, sometimes LF
# and sometimes CR). The most popular VMS web server
# doesn't accept CRLF -- instead it wants a LR. EBCDIC machines don't
# use ASCII, so \015\012 means something different. I find this all
# really annoying.
$EBCDIC = "\t" ne "\011";
if ($OS eq 'VMS') {
$CRLF = "\n";
} elsif ($EBCDIC) {
$CRLF= "\r\n";
} else {
$CRLF = "\015\012";
}