tags:

views:

210

answers:

3

I'm running WinXP, Eclipse 3.2 with EPIC and Cygwin for my Perl interpreter and I'm get an unexpected result.

FYI... when I run it on my Ubuntu distro (vmware, same pc) I get the expected results. What gives?

############ CODE: #############

use warnings ; 
use strict ; 

my $test = "test" ; 
my $input = <STDIN> ;

print length $test , " " , length $input , "\n"  ;

chomp $input ; 

print "|$test| |$input| \n";    #the bars indicate white space, new line etc...

print length $test ," " , length $input , "\n"  ; 

if ($test eq $input) {
    print "TIME TO QUIT" ; 
}

Results on XP:

test           <-- my input
4 6            <-- lengths printed before chomp
|test| |test   <-- print the variables after chomp
|              <-- there is still a new line there
4 5            <-- lengths after the initial chomp
+4  A: 

Based on the lengths, I'd say you're getting the input string as:

test<cr><lf>

where <cr> and <lf> are ASCII codes 0x13 and 0x10 respectively.

When you chomp it, it removes the <lf> but leaves the <cr> there.

It's almost certainly an interaction issue between Eclipse, Cygwin and Windows, disagreeing on what the end-of-line character sequence should be. I couldn't replicate your problem with just Perl/Cygwin or Perl/Windows but this command gives similar results (in Cygwin):

echo 'test^M' | perl qq.pl | sed 's/^M/\n/g'

(qq.pl is your script and "^M" is the actual CTRL-M). Here's the output in text form:

4 6
|test| |test
|
4 5

and octal dump:

0000000 2034 0a36 747c 7365 7c74 7c20 6574 7473
          4       6  \n   |   t   e   s   t   |       |   t   e   s   t
        064 040 066 012 174 164 145 163 164 174 040 174 164 145 163 164
0000020 7c0a 340a 3520 000a
         \n   |  \n   4       5  \n  \0
        012 174 012 064 040 065 012 000
0000027

So I'd say that your input is putting on both <cr> and <lf>, and the print is translating <cr> to <lf> (or just doing the same thing for both of them).

If you need a workaround for your environment, you can replace your chomp line with:

$input =~ s/\r?\n$//;

as in:

use warnings;
use strict;
my $test = "test";
my $input = <STDIN>;
print length $test ," ",length $input,"\n";
$input =~ s/\r?\n$//;
print "|$test| |$input|\n";
print length $test," ",length $input,"\n";
if ($test eq $input) {
    print "TIME TO QUIT";
}

which works on Cygwin for the test data I used (check it for your own situation, of course), but you may find you can solve it better by using tools that all agree on the line end sequence (eg, Perl for Windows rather than the Cygwin one may do the trick for you).

paxdiablo
To expand on this answer, Ubuntu, and *NIX in general only uses `LF` as an EOL (end of line) character, while Windows uses `CRLF`. `chomp` will always remove exactly zero or one characters, it doesn't remove EOL's as such (it just happens that it does on *NIX because they use a one character EOL).
Matthew Scharley
@Matthew: note that `chop` removes the last character; `chomp` is the safe form of `chop` that only removes line endings.
Jonathan Leffler
@Matthew Scharley: http://perldoc.perl.org/functions/chomp.html
Sinan Ünür
Thanks everyone above and below!
Beauchamp
+4  A: 

Given that Windows XP figures in the problem, the difference must be due to CRLF (carriage return, line feed) handling. The chomp removes, it appears, the LF but not the CR; the print translates the CR into CR LF.

The Perl doc for chomp says that if you set the EOL correctly for Windows ($/ = "\r\n";), then chomp should do its stuff correctly:

$/ = "\r\n";
$test = "test\r\n";
print "<<$test>>\n";
chomp $test;
print "<<$test>>\n";

A hex dump of the output of that yields:

0x0000: 3C 3C 74 65 73 74 0D 0A 3E 3E 0A 3C 3C 74 65 73   <<test..>>.<<tes
0x0010: 74 3E 3E 0A                                       t>>.
0x0014:

I'm not sure why $/ is not set automatically - it may be Cygwin confusing things (pretending too successfully it is running on Unix).

Jonathan Leffler
`chomp` always removes exactly zero or one character, it doesn't remote an EOL (sadly), do yes, you're right.
Matthew Scharley
@Matthew: try the new fragment of code; it will remove multiple characters if `$/` is set appropriately.
Jonathan Leffler
@Matthew Scharley: http://perldoc.perl.org/functions/chomp.html
Sinan Ünür
`chop` removes only one character. `chomp` removes the value of `$/` , which can be multiple characters.
Brad Gilbert
+3  A: 

Here is how to remove a trailing \r\n or \n (whichever is at the end):

$input =~ s@\r?\n\Z(?!\n)@@;

Another option is to do a

binmode(STDIN, ':crlf')

before reading anything from STDIN. This would convert trailing \r\n to just a \n, which you can remove using chomp. This will also work even if your input contains only \n. See the documentation about PerlIO for more.

pts