ansaurus

Question

How do I convert various user-inputted line break characters to <br> using Perl?

Answer 1

+10 A:

There's nothing wrong with using regexes here:

s/\r?\n/<br>/g;

Ether 2010-06-18 21:15:04

Answer 2

+1 A:

Looking into this matter i found the follow modules:

http://search.cpan.org/~rubykat/txt2html-2.51/lib/HTML/TextToHTML.pm / http://search.cpan.org/~rubykat/txt2html-2.51/scripts/txt2html

http://search.cpan.org/~cwest/HTML-FromText-2.05/lib/HTML/FromText.pm

Both were reviewed and you can check it at cpan.

But i cannot say much about it nor how reliable it will be against newlines.

Prix 2010-06-18 21:16:37

Answer 3

+3 A:

Actually, if you're having to deal with Mac users, or if there still happens to be some weird computer that uses form-feeds, you would probably have to use something like this:

$input =~ s/(\r\n|\n|\r|\f)/<br>/g;

Bushman 2010-06-18 21:20:30

Macs haven't used CRs for many many years. Nowadays it's windows vs. the rest of the world.

Ether 2010-06-18 21:27:10

@Ether: All internet text protocols use the same \r\n system as Windows, so it's actually more a case of "Windows and Internet protocols versus Unix".

Kinopiko 2010-06-19 00:52:56

Answer 4

+3 A:

#!/usr/bin/perl

use strict; use warnings;

use Socket qw( :crlf );

my $text = "a${CR}b${CRLF}c${LF}";

$text =~ s/$LF|$CR$LF?/<br>/g;

print $text;

Following up on @daxim's comment, here is the modified version:

#!/usr/bin/perl

use strict; use warnings;
use charnames ':full';

my $text = "a\N{CR}b\N{CR}\N{LF}c\N{LF}";

$text =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;

print $text;

Following up on @Marcus's comment here is a contrived example:

#!/usr/bin/perl

use strict; use warnings;
use charnames ':full';

my $t = (my $s = "a\012\015\012b\012\012\015\015c");
$s =~ s/\r?\n/<br>/g;

$t =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;

print "This is \$s: $s\nThis is \$t:$t\n";

This is a mismash of carriage returns and line feeds (which, at some point in the past, I did encounter).

Here is the output of the script on Windows using ActiveState Perl:

C:\Temp> t | xxd
0000000: 5468 6973 2069 7320 2473 3a20 613c 6272  This is $s: a<br
0000010: 3e3c 6272 3e62 3c62 723e 3c62 723e 0d0d  ><br>b<br><br>..
0000020: 630d 0a54 6869 7320 6973 2024 743a 613c  c..This is $t:a<
0000030: 6272 3e3c 6272 3e62 3c62 723e 3c62 723e  br><br>b<br><br>
0000040: 3c62 723e 3c62 723e 630d 0a              <br><br>c..

or, as text:

chis is $s: a<br><br>b<br><br>
This is $t:a<br><br>b<br><br><br><br>c

Admittedly, you are not likely to end up with this input. However, if you want to cater for any unexpected oddities that might indicate a line ending, you might want to use

$s =~ s/\N{LF}|\N{CR}\N{LF}?/<br>/g;

Also, for reference, CGI.pm canonicalizes line-endings this way:

# Define the CRLF sequence.  I can't use a simple "\r\n" because the meaning
# of "\n" is different on different OS's (sometimes it generates CRLF, sometimes LF
# and sometimes CR).  The most popular VMS web server
# doesn't accept CRLF -- instead it wants a LR.  EBCDIC machines don't
# use ASCII, so \015\012 means something different.  I find this all 
# really annoying.
$EBCDIC = "\t" ne "\011";
if ($OS eq 'VMS') {
  $CRLF = "\n";
} elsif ($EBCDIC) {
  $CRLF= "\r\n";
} else {
  $CRLF = "\015\012";
}

Sinan Ünür 2010-06-18 21:27:26

Import from [`charnames`](http://perldoc.perl.org/charnames.html) is a better choice for named constants than `Socket`.

daxim 2010-06-18 22:21:19

I am intrigued by this solution and am wondering what advantages it has over the simple regex above? Do the named constants account for a wider array of line break character possibilities?

Marcus 2010-06-19 13:33:23

@Marcus The pattern itself also handles the Mac OS 9 style line breaks consisting simply of a carriage return as well. As for using the character codes rather than `\r` and `\n`, see the update to my post.

Sinan Ünür 2010-06-19 14:42:29

Answer 5

A:

Dave Sherohman 2010-06-19 12:04:16

ansaurus

tags:

views:

answers:

How do I convert various user-inputted line break characters to <br> using Perl?

related questions