tags:

views:

118

answers:

5

I'm trying to add " at beginning and ", at end of each non-empty line of text file in Perl.

perl -pi -e 's/^(.+)$/\"$1\",/g' something.txt

It adds " at beginning of each non-empty line, but i have problem with ",.

Example input:

bla
bla bla
blah

That's output i'm getting:

"bla
",
"bla bla
",
"blah
",

And that's output i actually want:

"bla",
"bla bla",
"blah",

How do I fix this?

Edit: I opened my output file in vim now (I opened it in kwrite before so it wasn't visible) and I noticed vim shows ^M before each ", - I don't know what in code adds this.

A: 
sed 's/.\{1,\}/"&",/'

This was asked before http://stackoverflow.com/questions/1688952/

pixelbeat
+1 Perl was originally conceived as a more advanced sed/awk. B
Byron Whitlock
...but any idea why code in my question does what it does?
Phil
well, pixel... it doesn't work. it deletes first letter of each line and substitutes it with ",
Phil
works for me with gnu sed on linux. What version are you using?
pixelbeat
+5  A: 

Looks like a line ending problem - did you edit the file in windows? Try dos2unix

If you don't want to use dos2unix you can match for the \r:

perl -pi -e 's/^(.+)\r$/\"$1\",/g'

The problem is that if you have returns in the file it will match them in .* so you'll get:

"bla^M",
"bla bla^M",
"blah^M",
no, i didn't. i only run that script on it, no manual editing.
Phil
so how do i get rid of ^M?
Phil
you can just match for it - s/\r//g
+2  A: 

Your data file must have originated on Windows, which uses CRLF as a line delimiter instead of just LF. This means your text file looks like this:

bla[CR][LF]bla bla[CR][LF]blah[CR][LF]

You can verify this by using od -c something.txt.

$ od -c something.txt
0000000    b   l   a  \r  \n   b   l   a       b   l   a  \r  \n   b   l
0000020    a   h  \r  \n                                                
0000024

Under Unix or Linux, it will appear like this:

bla\r
bla bla\r
blah\r

When perl makes it's substitution, it results in this:

"bla\r",
"bla bla\r",
"blah\r",

And when you cat the result, you get what you see:

"bla
",
"bla bla
",
"blah
",

The easy thing to do is to use dos2unix to convert the line endings to Unix format, then your scripts will behave as expected.

Craig Trader
+1 for additional explanation :)
Phil
+1  A: 

On systems that use CRLF text files, Perl uses an IO layer to filter the CRLF to that we only see an LF in our scripts. However, if you open a CRLF file on a system that does not use CRLF normally, you can enable the CRLF translation in a number of ways.

You can use binmode. I use the OO interface here because I think it is cleaner, YMMV:

use IO::File;

open( my $fh, '<', 'winfile.txt' ) 
    or die "Oh poo - $!\n";

$fh->binmode(':crlf');

You can also use a tweaked open:

open( my $fh, '<:crlf', 'winfile.txt' ) 
    or die "Oh poo - $!\n";

Or for your one-liner you can set the PERLIO environment variable (see PerlIO):

PERLIO=crlf perl -pi -e 's/^(.+)$/\"$1\",/g' something.txt

Of course, this approach will preserve the CRLF line endings in the processed file--which may or may not be what you want.

daotoad
no, i don't want to keep these windowsy line endings in this case but i voted you up because it's interesting and may be of use for something else
Phil
A: 

since you want to add at beginning and end, you don't a regex substitution for that simple task.

perl -ne 'chomp;print "\"".$_."\",\n"' file
Except that it adds quotes for empty lines too.
J. A. Faucett
that should be easy to fix. just do a check for empty lines first. I will leave it to OP.