ansaurus

Question

Why does chomp fail to remove newlines on Windows XP with Eclipse and Cygwin Perl?

Answer 1

+4 A:

Based on the lengths, I'd say you're getting the input string as:

test<cr><lf>

where <cr> and <lf> are ASCII codes 0x13 and 0x10 respectively.

When you chomp it, it removes the <lf> but leaves the <cr> there.

It's almost certainly an interaction issue between Eclipse, Cygwin and Windows, disagreeing on what the end-of-line character sequence should be. I couldn't replicate your problem with just Perl/Cygwin or Perl/Windows but this command gives similar results (in Cygwin):

echo 'test^M' | perl qq.pl | sed 's/^M/\n/g'

(qq.pl is your script and "^M" is the actual CTRL-M). Here's the output in text form:

4 6
|test| |test
|
4 5

and octal dump:

0000000 2034 0a36 747c 7365 7c74 7c20 6574 7473
          4       6  \n   |   t   e   s   t   |       |   t   e   s   t
        064 040 066 012 174 164 145 163 164 174 040 174 164 145 163 164
0000020 7c0a 340a 3520 000a
         \n   |  \n   4       5  \n  \0
        012 174 012 064 040 065 012 000
0000027

So I'd say that your input is putting on both <cr> and <lf>, and the print is translating <cr> to <lf> (or just doing the same thing for both of them).

If you need a workaround for your environment, you can replace your chomp line with:

$input =~ s/\r?\n$//;

as in:

use warnings;
use strict;
my $test = "test";
my $input = <STDIN>;
print length $test ," ",length $input,"\n";
$input =~ s/\r?\n$//;
print "|$test| |$input|\n";
print length $test," ",length $input,"\n";
if ($test eq $input) {
    print "TIME TO QUIT";
}

which works on Cygwin for the test data I used (check it for your own situation, of course), but you may find you can solve it better by using tools that all agree on the line end sequence (eg, Perl for Windows rather than the Cygwin one may do the trick for you).

paxdiablo 2009-10-05 06:34:20

To expand on this answer, Ubuntu, and *NIX in general only uses `LF` as an EOL (end of line) character, while Windows uses `CRLF`. `chomp` will always remove exactly zero or one characters, it doesn't remove EOL's as such (it just happens that it does on *NIX because they use a one character EOL).

Matthew Scharley 2009-10-05 06:39:30

@Matthew: note that `chop` removes the last character; `chomp` is the safe form of `chop` that only removes line endings.

Jonathan Leffler 2009-10-05 06:48:18

@Matthew Scharley: http://perldoc.perl.org/functions/chomp.html

Sinan Ünür 2009-10-05 11:23:06

Thanks everyone above and below!

Beauchamp 2009-10-08 03:21:05

Answer 2

+4 A:

Given that Windows XP figures in the problem, the difference must be due to CRLF (carriage return, line feed) handling. The chomp removes, it appears, the LF but not the CR; the print translates the CR into CR LF.

The Perl doc for chomp says that if you set the EOL correctly for Windows ($/ = "\r\n";), then chomp should do its stuff correctly:

$/ = "\r\n";
$test = "test\r\n";
print "<<$test>>\n";
chomp $test;
print "<<$test>>\n";

A hex dump of the output of that yields:

0x0000: 3C 3C 74 65 73 74 0D 0A 3E 3E 0A 3C 3C 74 65 73   <<test..>>.<<tes
0x0010: 74 3E 3E 0A                                       t>>.
0x0014:

I'm not sure why $/ is not set automatically - it may be Cygwin confusing things (pretending too successfully it is running on Unix).

Jonathan Leffler 2009-10-05 06:35:35

`chomp` always removes exactly zero or one character, it doesn't remote an EOL (sadly), do yes, you're right.

Matthew Scharley 2009-10-05 06:37:05

@Matthew: try the new fragment of code; it will remove multiple characters if `$/` is set appropriately.

Jonathan Leffler 2009-10-05 06:46:07

@Matthew Scharley: http://perldoc.perl.org/functions/chomp.html

Sinan Ünür 2009-10-05 11:43:09

`chop` removes only one character. `chomp` removes the value of `$/` , which can be multiple characters.

Brad Gilbert 2009-10-06 03:44:05

Answer 3

+3 A:

Here is how to remove a trailing \r\n or \n (whichever is at the end):

$input =~ s@\r?\n\Z(?!\n)@@;

Another option is to do a

binmode(STDIN, ':crlf')

before reading anything from STDIN. This would convert trailing \r\n to just a \n, which you can remove using chomp. This will also work even if your input contains only \n. See the documentation about PerlIO for more.

pts 2009-10-05 07:18:37

ansaurus

tags:

views:

answers:

Why does chomp fail to remove newlines on Windows XP with Eclipse and Cygwin Perl?

related questions