ansaurus

Question

Extracting particular column name values using sed/ awk/ perl

Answer 1

+1 A:

Beta 2009-10-27 13:24:46

Answer 2

+2 A:

Here is the one-liner version:

$ perl -lpe '@x=/([bd]=[0-9])/g; $_="@x"' test.txt

m//g in list context returns all the matches as a list.

#!/usr/bin/perl
use strict; use warnings;

while ( <DATA> ) {
    if( my @cols = /([bd]=[0-9])/g ) {
        print "@cols";
    }
    print "\n";
}

__DATA__
a=1 b=2 c=3 d=4
a=2 b=3
a=0 c=7
a=3 b=9 c=0 d=5
a=4 d=1
c=9

Output:

C:\Temp> t.pl
b=2 d=4
b=3

b=9 d=5
d=1

Sinan Ünür 2009-10-27 13:26:33

This output is not what he asked for.

rsp 2009-10-27 13:32:47

@rsp Yeah, I somehow missed the sample output the first time around. It is fixed now.

Sinan Ünür 2009-10-27 13:33:21

Answer 3

+3 A:

perl -pe 's/[^bd]=\d+ *//g' data_file

FM 2009-10-27 13:26:41

Answer 4

+4 A:

sed 's/[^bd]=[0-9]* *//g'

John Kugelman 2009-10-27 13:27:52

Answer 5

+1 A:

In Ruby:

#!/usr/bin/env ruby
filename = ARGV[0]
fields = ARGV[1..ARGV.length]

File.open(filename) do |file|
  file.each_line do |line|
    pairs = line.split(' ').map { |expression| expression.split('=') }
    value_hash = Hash[pairs]

    requested_fields = []

    fields.each do |field|
      requested_fields << "#{field}=#{value_hash[field]}" unless value_hash[field].nil?
    end

    puts requested_fields.join(' ')
  end
end

Call using ruby ruby_script_name.rb input_file.txt field1 field2.

I like how short the sed/perl solution is -- but how easily can it be modified to take longer field names? Seems like the regex would become messy quickly... Anyway, that strategy would be applicable here as well, if you'd want to use it.

Benjamin Oakes 2009-10-27 13:37:58

`ruby -pe 'gsub(/[^bd]=\d+ */, "")' file`

Telemachus 2009-10-27 13:48:18

Ruby can do one-liners - even if it's not the most common or preferred use for the lang: http://www.fepus.net/ruby1line.txt

Telemachus 2009-10-27 13:48:48

Thanks, Telemachus. I'll use one-liners like that, but I've found that they have limited use in the long term. That is, I'm happy to use them for stuff *I* know will only be used a few times and not need to be maintained -- I tend to use them the most in `vim` (see `rubydo`). (Anything requested by someone else tends to be relied on, so it's bad when you come back to it in 3 months and can't figure out why a chain of 10 regular expressions is breaking. I've been there with my code and other people's and it's no fun.) Depending on what the asker needs, either could be useful.

Benjamin Oakes 2009-10-27 15:17:28

(Since this example input seems simple at the moment, the one-liner could be best. Things tend to get more complicated as you go on, though...)

Benjamin Oakes 2009-10-27 15:19:55

Answer 6

+3 A:

# awk '{ for(i=1;i<=NF;i++){if($i~/(b|d)=/){printf $i" "} }print ""}' file
b=2 d=4
b=3

b=9 d=5
d=1

ghostdog74 2009-10-27 13:53:14

Answer 7

+1 A:

Assuming you may want to do the something to the values in the future, other than just filtering, you could use this as a basis.

#! /usr/bin/env perl
use warnings;
use strict;

my @lines;

while(<>){
  my %kv = /([a-z])=([0-9])/ig;
  push @lines, \%kv;
}

for my $kv (@lines){
  # $kv->{a} ||= 1;
  # next unless $kv->{c};

  print "b=$kv->{b} " if defined $kv->{b};
  print "b=$kv->{d} " if defined $kv->{d};
  print "\n";
}

Brad Gilbert 2009-10-27 14:34:50

@Brad I am glad someone else had the same idea (see the first version of my post which got downvoted while I was expanding on it). +1. Note that you should use `if defined $kv->{b}` because `0` is an allowed value.

Sinan Ünür 2009-10-27 14:39:28

Answer 8

A:

Clearly, PostScript is the way to go ... XD

(%stdin) (r) file
{
    dup 100 string readline not {exit} if
    {
        dup () eq {pop exit} if
        token pop 3 string cvs
        dup 0 get << 98 / 100 / >> exch known
        {print ( ) print} {pop} ifelse
    } loop
    / =
} loop

Usage: gs -q -dNOPROMPT -dNODISPLAY -dBATCH thisfile.ps < input

Notes: Replace the << 98 / 100 / >> with the appropriate ASCII values (98 = b, 100 = d), each followed by a space-delimited slash (though you don't have to use the slash; it's just a dummy object). For example, to select 'c', 'e', and 'f', use << 99 / 101 / 102 / >>

Each line can be at most 100 characters; if your lines are longer replace the 100 string with some larger number. Likewise, replace the 3 string if your x=# entries are longer than three characters. This doesn't work if the x is more than one character, though.

KirarinSnow 2009-10-29 02:27:36

ansaurus

tags:

views:

answers:

Extracting particular column name values using sed/ awk/ perl

related questions