views:

138

answers:

1

I have this conditional in a perl script:

if ($lnFea =~ m/^(\d+) qid\:([^\s]+).*?\#docid = ([^\s]+) inc = ([^\s]+) prob = ([^\s]+)$/)

and the $lnFea represents this kind of line:

0 qid:7968 1:0.000000 2:0.000000 3:0.000000 4:0.000000 5:0.000000 6:0.000000 7:0.000000 8:0.000000 9:0.000000 10:0.000000 11:0.000000 12:0.000000 13:0.000000 14:0.000000 15:0.000000 16:0.005175 17:0.000000 18:0.181818 19:0.000000 20:0.003106 21:0.000000 22:0.000000 23:0.000000 24:0.000000 25:0.000000 26:0.000000 27:0.000000 28:0.000000 29:0.000000 30:0.000000 31:0.000000 32:0.000000 33:0.000000 34:0.000000 35:0.000000 36:0.000000 37:0.000000 38:0.000000 39:0.000000 40:0.000000 41:0.000000 42:0.000000 43:0.055556 44:0.000000 45:0.000000 46:0.000000 #docid = GX000-00-0000000 inc = 1 prob = 0.0214125

The problem is that the if is true on Windows but false on Linux (Fedora 11). Both systems are using the most recent perl version. So what is the reason of this problem?

+10  A: 

Assuming that $InFea is read from a file, I'd wager that the file is in DOS format. That would cause the $ anchor to prevent matching on Linux due to differences in the line-endings between those platforms. Perl's automagic newline transformation only works for platform-native text files. If the input file is in DOS format, the Linux box would see an extra carriage return before the end-of-line.

It's probably best to convert the input file to the native format for each platform. If that's not possible you should binmode the filehandle (preventing Perl from performing newline transformations) before reading from it and account for the various newline sequences in the regex and anywhere else the data is used.

Michael Carman
+1 I concur. The OP should either convert the line ending format or include an optional CRLF sequence in the regex.
Adam Bellaire
+1 I was going to say the same thing. If you remove the $ at the end of your regex, it might work if this is the case.
Mark Synowiec
Alternatively: s/\r//g and chomp() to remove any EOL characters and accept arbitrary mixed line endings.But Mark is probably right, that the $ adds no value to the regex and could be eliminated. Accepting a superset syntax is not in general a bug, and regexes make poor validity parsers.Finally: no need to escape your ':' and '#' characters in that regex.
Andy Ross
Thanks all, I removed the $ at the end and worked. Thanks a lot.
vitorcoliveira