tags:

views:

1828

answers:

11

I am just a beginner in Perl and need some help in filtering columns using a Perl script. I have about 10 columns separated by comma in a file and I need to keep 5 columns in that file and get rid of every other columns from that file. How do we achieve this?

Thanks a lot for anybody's assistance.

cheers, Neel

+5  A: 

Use split to pull the line apart then output the ones you want (say every second column), create the following xx.pl file:

while(<STDIN>) {
    chomp;
    @fields = split (",",$_);
    print "$fields[1],$fields[3],$fields[5],$fields[7],$fields[9]\n"
}

then execute:

$ echo 1,2,3,4,5,6,7,8,9,10 | perl xx.pl
2,4,6,8,10
paxdiablo
Do not just use split unless you are confident the values don't contain any commas
Cebjyre
I think the definition of "CSV" precludes that
Sparr
Some variants of CSV allow use of commas inside of quotes.
Brad Gilbert
I like this solution best for a beginner. Bring in modules later.
slim
A: 

This answers a much larger question, but seems like a good relevant bit of information.

The unix cut command can do what you want (and a whole lot more). It has been reimplemented in Perl.

Sparr
Not really; cut cannot manage the mysteries of CSV with commas inside strings and so on.
Jonathan Leffler
I would probably not call those CSV files, then.
Sparr
Plenty of other things call them CSV files, and in the presence of a good solution, we don't need to resort to less than good solutions :)
brian d foy
+5  A: 

CSV is an ill-defined, complex format (weird issues with quoting, commas, and spaces). Look for a library that can handle the nuances for you and also give you conveniences like indexing by column names.

Of course, if you're just looking to split a text file by commas, look no further than @Pax's solution.

jleedev
+3  A: 

If you are talking about CSV files in windows (e.g., generated from Excel), you will need to be careful to take care of fields that contain comma themselves but are enclosed by quotation marks.

In this case, a simple split won't work.

PolyThinker
+16  A: 

Have a look at Text::CSV (or Text::CSV_XS) to parse CSV files in Perl. It's available on CPAN or you can probably get it through your package manager if you're using Linux or another Unix-like OS. In Ubuntu the package is called libtext-csv-perl.

It can handle cases like fields that are quoted because they contain a comma, something that a simple split command can't handle.

Paul Tomblin
I'm not exactly sure what Text::CSV_XS is, or how it differs from Text::CSV, but when I installed libtext-csv-perl, I evidently got both.
Paul Tomblin
http://perldoc?Text::CSV_XS
Brad Gilbert
_XS means external subroutine, meaning that it's written in another language (usually C) , and results in faster results. In case of Text::CSV and Text_CSV_XS, the author of this module was kind enough to provide a Perl only implementation (Text::CSV) and a faster C implementation (Text::CSV_XS).
Tom Feiner
+2  A: 

Alternatively, you could use Text::ParseWords, which is in the standard library. Add

use Text::ParseWords;

to the top of Pax's example above, and then substitute

  my @fields = parse_line(q{,}, 0, $_);

for the split.

oylenshpeegul
A: 

Thanks a lot for everybody's help

Neel
You can thank us more by accepting (clicking the checkmark beside) the answer which you ended up using - I'm guessing probably Pax's?
Paul Tomblin
Don't post an 'answer' unless it answers the question.
Brad Gilbert
A: 

In addition to what people here said about processing comma-separated files, I'd like to note that one can extract the even (or odd) array elements using an array slice and/or map:

@myarray[map { $_ * 2 } (0 .. 4)]

Hope it helps.

Shlomi Fish
A: 

My personal favorite way to do CSV is using the AnyData module. It seems to make things pretty simple, and removing a named column can be done rather easily. Take a look on CPAN.

Jack M.
A: 

Went looking didn't find a nice csv compliant filter program thats flexible to be useful for than just a one-of, so I wrote one. Enjoy.

Basic usage is:

bash$ csvfilter [-r <columnTitle>]* [-quote] <csv.file>

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

use Text::CSV;

my $always_quote=0;

my @remove;
if ( ! GetOptions('remove:s'=> \@remove,
       'quote-always'=>sub {$always_quote=1;}) ) {
   die "$0:invalid option (use --remove  [--quote-always])";
}

my @cols2remove;

sub filter(@)
{
   my @fields=@_;
   my @r;
   my $i=0;
   for my $c (@cols2remove) {
       my $p;
       #if ( $i  $i ) {
       push(@r, splice(@fields, $i));
   }
   return @r;
}

# create just one if these
my $csvOut=new Text::CSV({always_quote=>$always_quote});

sub printLine(@)
{
    my @fields=@_;
    my $combined=$csvOut->combine(filter(@fields));
    my $str=$csvOut->string();
    if ( length($str) ) {
     print "$str\n";
    }
}

my $csv = Text::CSV->new();

my $od;
open($od, "| cat") || die "output:$!";
while () {
    $csv->parse($_);
    if ( $. == 1 ) {
    my $failures=0;
    my @cols=$csv->fields;
    for my $rm (@remove) {
        for (my $c=0; $c$b} @cols2remove);
    }
    printLine($csv->fields);
}

exit(0);
\
+1  A: 

You can use some of Perl's built in runtime options to do this on the command line:

$ echo "1,2,3,4,5" | perl -a -F, -n -e 'print join(q{,}, $F[0], $F[3]).qq{\n}' 1,4

The above will -a(utosplit) using the -F(ield) of a comma. It will then join the fields you are interested in and print them back out (with a line separator). This assumes simple data without nested comma's. I was doing this with an unprintable field separator (\x1d) so this wasn't an issue for me.

See http://perldoc.perl.org/perlrun.html#Command-Switches for more details.

haytona