tags:

views:

56

answers:

2

I have some data from a unix commandline call

1  ab  45  1234
2  abc 5
4  yy  999 2
3  987 11

I'll use the system() function for the call.

How can I extract the second column of data into an array in Perl? Also, the array size has to be dependent on the number of rows that I have (it will not necessarily be 4).

I want the array to have ("ab", "abc", "yy", 987).

+7  A: 
use strict;
use warnings;

my $data = "1  ab  45  1234
2  abc 5
2  abc 5
2  abc 5
4  yy  999 2
3  987 11";

my @second_col = map { (split)[1] } split /\n/, $data;

To get unique values, see perlfaq4. Here's part of the answer provided there:

my %seen;
my @unique = grep { ! $seen{ $_ }++ } @second_col;
FM
@FM: what does `my` do?
Lazer
@Lazer It declares the variable within the current lexical scope. Your asking this question suggests that you are not enabling `use strict` and possibly `use warnings` in your Perl scripts. If not, you should start doing so.
FM
@FM: thanks! while this solves my immediate problem, is there a simple way to get only unique results in `second_col`?
Lazer
There are multiple ways of removing duplicates from the array, and perlfaq explains it very nicely. See http://perldoc.perl.org/perlfaq4.html#How-can-I-remove-duplicate-elements-from-a-list-or-array?
gamen
@FM, +1, and when reading the scalar like a file, this can be compressed into: `open my $sh, '<', \$data;``my @second_col = grep !$_{$_}++, map +(split)[1], <$sh>;`regards, rbo
rubber boots
+3  A: 

You can chain a Perl cmd-line call (aka: one-liner) to your unix script:

 perl -lane 'print $F[1]' data.dat

instead of data.dat, use a pipe from your command line tool

 cat data.dat | perl -lane 'print $F[1]'

Addendum:

The extension for unique-ness of the resulting column is straightforward:

cat data.dat | perl -lane 'print $F[1] unless $seen{$F[1]}++'

or, if you are lazy (employing %_):

cat data.dat | perl -lane 'print unless $_{$_=$F[1]}++'

Regards

rbo

rubber boots
+1 For reminding me about the `-a` option.
FM
Good answer, must have been nice to explicitly state that the `-a` option autosplits to `@F`. Not sure what `-l` does though...
PP
@PP, -l does the proper *new line* handling, see: http://sial.org/howto/perl/one-liner/ , ok, I added a link for explanation (thanks)
rubber boots