extending on your answer
use File::Slurp ();
my $value = File::Slurp::slurp($filename);
$value =~ s/\R*//g;
File::Slurp abstracts away the File IO stuff and just returns a string for you.
NOTE
Important to note the addition of /g
, without it, given a multi-line string, it will only replace the first offending character.
Also, the removal of $
, which is redundant for this purpose, as we want to strip all line breaks, not just line-breaks before whatever is meant by $
on this OS.
In a multi-line string, $
matches the end of the string and that would be problematic ).
Point 3 means that point 2 is made with the assumption that you'd also want to use /m
otherwise '$' would be basically meaningless for anything practical in a string with >1 lines, or, doing single line processing, an OS which actually understands $
and manages to find the \R*
that proceed the $
Examples
while( my $line = <$foo> ){
$line =~ $regex;
}
Given the above notation, an OS which does not understand whatever your files '\n' or '\r' delimiters, in the default scenario with the OS's default delimiter set for $/
will result in reading your whole file as one contiguous string ( unless your string has the $OS's delimiters in it, where it will delimit by that )
So in this case all of these regex are useless:
/\R*$//
: Will only erase the last sequence of \R
in the file
/\R*//
: Will only erase the first sequence of \R
in the file
/\012?\015?//
: When will only erase the first 012\015
, \012
, or \015
sequence, \015\012
will result in either \012
or \015
being emitted.
/\R*$//
: If there happens to be no byte sequences of '\015$OSDELIMITER' in the file, then then NO linebreaks will be removed except for the OS's own ones.
It would appear nobody gets what I'm talking about, so here is example code, that is tested to NOT remove line feeds. Run it, you'll see that it leaves the linefeeds in.
#!/usr/bin/perl
use strict;
use warnings;
my $fn = 'TestFile.txt';
my $LF = "\012";
my $CR = "\015";
my $UnixNL = $LF;
my $DOSNL = $CR . $LF;
my $MacNL = $CR;
sub generate {
my $filename = shift;
my $lineDelimiter = shift;
open my $fh, '>', $filename;
for ( 0 .. 10 )
{
print $fh "{0}";
print $fh join "", map { chr( int( rand(26) + 60 ) ) } 0 .. 20;
print $fh "{1}";
print $fh $lineDelimiter->();
print $fh "{2}";
}
close $fh;
}
sub parse {
my $filename = shift;
my $osDelimiter = shift;
my $message = shift;
print "Parsing $message File $filename : \n";
local $/ = $osDelimiter;
open my $fh, '<', $filename;
while ( my $line = <$fh> )
{
$line =~ s/\R*$//;
print ">|" . $line . "|<";
}
print "Done.\n\n";
}
my @all = ( $DOSNL,$MacNL,$UnixNL);
generate 'Windows.txt' , sub { $DOSNL };
generate 'Mac.txt' , sub { $MacNL };
generate 'Unix.txt', sub { $UnixNL };
generate 'Mixed.txt', sub {
return @all[ int(rand(2)) ];
};
for my $os ( ["$MacNL", "On Mac"], ["$DOSNL", "On Windows"], ["$UnixNL", "On Unix"]){
for ( qw( Windows Mac Unix Mixed ) ){
parse $_ . ".txt", @{ $os };
}
}
For the CLEARLY Unprocessed output, see here: http://pastebin.com/f2c063d74
Note there are certain combinations that of course work, but they are likely the ones you yourself naívely tested.
Note that in this output, all results must be of the form >|$string|<>|$string|<
with NO LINE FEEDS to be considered valid output.
and $string
is of the general form {0}$data{1}$delimiter{2}
where in all output sources, there should be either :
- Nothing between
{1}
and {2}
- only
|<>|
between {1}
and {2}