views:

731

answers:

3

The following Perl statements behave identically on Unixish machines. Do they behave differently on Windows? If yes, is it because of the magic \n?

  split m/\015\012/ms, $http_msg;
  split m/\015\012/s, $http_msg;

I got a failure on one of my CPAN modules from a Win32 smoke tester. It looks like it's an \r\n vs \n issue. One change I made recently was to add //m to my regexes.

+10  A: 

For these regexes:

m/\015\012/ms
m/\015\012/s

Both /m and /s are meaningless.

  • /s: makes . match \n too. Your regex doesn't contain .
  • /m: makes ^ and $ match next to embedded \n in the string. Your regex contains no ^ nor $, or their synonyms.

What is possible is indeed if your input handle (socket?) works in text mode, the \r (\015) characters will have been deleted on Windows.

So, what to do? I suggest making the \015 characters optional, and split against

/\015?\012/

No need for /m, /s or even the leading m//. Those are just cargo cult.

bart
ARGH! I think you're right and I was on the wrong track with the regex modifiers. I'm using `` to get input from a subprocess and never thought to worry about binmode...
Chris Dolan
+1  A: 

Why did you add the /m? Are you trying to split on line? To do that with /m you need to use either ^ or $ in the regex:

my @lines = split /^/m, $big_string;

However, if you want to treat a big string as lines, just open a filehandle on a reference to the scalar:

open my $string_fh, '<', \ $big_string;
while( <$string_fh> ) {
    ... process a line
    }
brian d foy
+3  A: 

There is no magic \n. Both \n and \r always mean exactly one character, and on all ASCII-based platforms that is \cJ and \cM respectively. (The exceptions are EBCDIC platforms (for obvious reasons) and MacOS Classic (where \n and \r both mean \cM).)

The magic that happens on Windows is that when doing I/O through a file handle that is marked as being in text mode, \r\n is translated to \n upon reading and vice versa upon writing. (Also, \cZ is taken to mean end-of-file – surprise!) This is done at the C runtime library layer.

You need to binmode your socket to fix that.

You should also remove the /s and /m modifiers from your pattern: since you do not use the meta-characters whose behaviour they modify (. and the ^/$ pair, respectively), they do nothing – cargo cult.

Aristotle Pagaltzis