tags:

views:

66

answers:

3

I'm reading a text file via CGI in, in perl, and noticing that when the file is saved in mac's textEdit the line separator is recognized, but when I upload a CSV that is exported straight from excel, they are not. I'm guessing it's a \n vs. \r issue, but it got me thinking that I don't know how to specify what I would like the line terminator token to be, if I didn't want the one it's looking for by default.

+4  A: 

Yes. You'll want to overwrite the value of $/. From perlvar

$/

The input record separator, newline by default. This influences Perl's idea of what a "line" is. Works like awk's RS variable, including treating empty lines as a terminator if set to the null string. (An empty line cannot contain any spaces or tabs.) You may set it to a multi-character string to match a multi-character terminator, or to undef to read through the end of file. Setting it to "\n\n" means something slightly different than setting to "", if the file contains consecutive empty lines. Setting to "" will treat two or more consecutive empty lines as a single empty line. Setting to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. (Mnemonic: / delimits line boundaries when quoting poetry.)

local $/;           # enable "slurp" mode
local $_ = <FH>;    # whole file now here
s/\n[ \t]+/ /g;

Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-)

Setting $/ to a reference to an integer, scalar containing an integer, or scalar that's convertible to an integer will attempt to read records instead of lines, with the maximum record size being the referenced integer. So this:

local $/ = \32768; # or \"32768", or \$var_containing_32768
open my $fh, "<", $myfile or die $!;
local $_ = <$fh>;

will read a record of no more than 32768 bytes from FILE. If you're not reading from a record-oriented file (or your OS doesn't have record-oriented files), then you'll likely get a full chunk of data with every read. If a record is larger than the record size you've set, you'll get the record back in pieces. Trying to set the record size to zero or less will cause reading in the (rest of the) whole file.

On VMS, record reads are done with the equivalent of sysread, so it's best not to mix record and non-record reads on the same file. (This is unlikely to be a problem, because any file you'd want to read in record mode is probably unusable in line mode.) Non-VMS systems do normal I/O, so it's safe to mix record and non-record reads of a file.

See also "Newlines" in perlport. Also see $..

mobrule
thanks! I realize this is a dumb question, but will ask just in case... the setting is scoped only for this script, right? It's not going to affect how other scripts behave?
Dr.Dredel
@Dr.Dredel => the `local` keyword limits the scope of your change to a global. The scoping rules for `local` are the same as `my`, so the localization will end at the close of the current block. It is very important to note that any variables that are localized will have the new value in code called from within the local's scope. So long as your localization is small `my $file = do {local $/; <$fh>}` then you shouldn't have much to worry about. And all changes are of course lost when the script ends, even if you didn't use local.
Eric Strom
+2  A: 

The variable has multiple names:

  • $/
  • $RS
  • $INPUT_RECORD_SEPARATOR

For the longer names, you need:

use English;

Remember to localize carefully:

{
local($/) = "\r\n";
...code to read...
}
Jonathan Leffler
+1  A: 

If you are reading in a file with CRLF line terminators, you can open it with the CRLF discipline, or set the binmode of the handle to do automatic translation.

open my $fh, '<:crlf', 'the_csv_file.csv' or die "Oh noes $!";

This will transparently convert \r\n sequences into \n sequences.

You can also apply this translation to an existing handle by doing:

binmode( $fh, ':crlf' );

:crlf mode is typically default in Win32 Perl environments and works very well in practice.

daotoad