views:

417

answers:

6

I always deal with data files that consist of many data blocks of the following format:

*name* attr (
        VALID (
                late_lead_up xxx ar uclk reff xxx slope xxx
                late_lead_dn xxx af uclk reff xxx slope xxx
                early_trail_up xxx af uclk reff xxx slope xxx
                early_trail_dn xxx ar uclk reff xxx slope xxx
              )
        CEXT xxx
        CREF xxx
        REFF xxx
        QUALIFIED_CLOCK
)

Is there anyway I can extract the "name" that I interested in using a one-liner from the command line?

A: 

If your block always start with '*name* attr (' and always ends with a ')' alone with no leading space, you can try (given that foo is the block name and data.txt is the file to parse):

awk '/ attr \($/ {if($1==n)b=1}  {if(b)print}  /^\)$/ {b=0}' n=foo data.txt
mouviciel
A: 

Well, you tagged it Perl, so here is how I would do it in Perl:

#!/usr/bin/perl

use strict;
use warnings;

die "usage: $0 name datafile\n    or cat datafile | $0 name\n" 
    unless @ARGV > 0;

my $name = shift;
my $re   = qr/\A$name attr/; 

my $rec = '';
while (my $line = <>) {
    $rec .= $line;
    next unless $line =~ /^\)/;
    print $rec if $rec =~ /$re/;
    $rec = '';
}

You could turn it into a one-liner like this

perl -ne '$a.=$_;next unless/^\)/;print$a if$a=~/^name/;$a=""' datafile

but I prefer the script. Remember to replace name with the name of the record.

Chas. Owens
+1  A: 

Using this file for demo purposes:

of_interest attr (
    1:VALID (
        1:late_lead_up xxx ar uclk reff xxx slope xxx
        1:late_lead_dn xxx af uclk reff xxx slope xxx
        1:early_trail_up xxx af uclk reff xxx slope xxx
        1:early_trail_dn xxx ar uclk reff xxx slope xxx
    1:)
    1:CEXT xxx
    1:CREF xxx
    1:REFF xxx
    1:QUALIFIED_CLOCK
)

boring attr (
    2:VALID (
        2:late_lead_up xxx ar uclk reff xxx slope xxx
        2:late_lead_dn xxx af uclk reff xxx slope xxx
        2:early_trail_up xxx af uclk reff xxx slope xxx
        2:early_trail_dn xxx ar uclk reff xxx slope xxx
    2:)
    2:CEXT xxx
    2:CREF xxx
    2:REFF xxx
    2:QUALIFIED_CLOCK
)

of_interest attr (
    3:VALID (
        3:late_lead_up xxx ar uclk reff xxx slope xxx
        3:late_lead_dn xxx af uclk reff xxx slope xxx
        3:early_trail_up xxx af uclk reff xxx slope xxx
        3:early_trail_dn xxx ar uclk reff xxx slope xxx
    3:)
    3:CEXT xxx
    3:CREF xxx
    3:REFF xxx
    3:QUALIFIED_CLOCK
)

This one-liner (split for readability):

awk '
    BEGIN               {s=0}
    /^of_interest /     {s=1}
    /^)$/               {if (s==1) {print};s=0}
                        {if (s==1) print}'

or the minimum character version:

awk 'BEGIN{s=0}/^of_interest /{s=1}/^)$/{if(s==1){print};s=0}{if(s==1)print}'

gives you:

of_interest attr (
    1:VALID (
        1:late_lead_up xxx ar uclk reff xxx slope xxx
        1:late_lead_dn xxx af uclk reff xxx slope xxx
        1:early_trail_up xxx af uclk reff xxx slope xxx
        1:early_trail_dn xxx ar uclk reff xxx slope xxx
    1:)
    1:CEXT xxx
    1:CREF xxx
    1:REFF xxx
    1:QUALIFIED_CLOCK
)
of_interest attr (
    3:VALID (
        3:late_lead_up xxx ar uclk reff xxx slope xxx
        3:late_lead_dn xxx af uclk reff xxx slope xxx
        3:early_trail_up xxx af uclk reff xxx slope xxx
        3:early_trail_dn xxx ar uclk reff xxx slope xxx
    3:)
    3:CEXT xxx
    3:CREF xxx
    3:REFF xxx
    3:QUALIFIED_CLOCK
)

which I believe is what you were after.

It's basically a simple state machine that turns on printing when it finds the desired block start and turns it off when it finds the end of that block.

UPDATE: Here's a perl one-liner that takes care of your qualified_clock requirement. Enjoy :-)

perl -e '$s=1;while(<STDIN>){if(/^of_interest /){$s=1;$f=0;$x="";}if(($s==1)&&/QUALIFIED_CLOCK/){$f=1;}if(/^\)$/){if($s==1){$x.=$_;}if($f==1){print$x;}$s=0;next;}if($s==1){$x.=$_;}}'
paxdiablo
what if some data block has QUALIFIED_CLOCK and some don't have, and i would like to extract all blocks with QUALIFIED_CLOCK?
Then you would need to store the lines rather than print them, clear a flag when starting the block, set it if you find QUALIFIED_CLOCK and, when you find the block end, output all the lines if the flag is set.
paxdiablo
If there are going to be more requirements changes, I'd opt to move to a Python/Perl-based solution, but it won't be a readable one-liner in any language :-)
paxdiablo
thanks, i will try to code it.
A: 

Here is one way to to it as a Perl one-liner:

perl -ne '$m = 1 if /^insert_name_here attr/; print if $m; $m = 0 if /^\)$/' file.txt
jmcnamara
A: 

I see in your comments to another answer that you also want to search within the block for a string like 'QUALIFIED_CLOCK'.

In that case, if your data blocks are separated by a blank line you can use Perl's paragraph mode to read it in blocks and print out the ones you are interested in. For example:

perl -00 -ne 'print if /^block_name/ and /QUALIFIED_CLOCK/' file.txt

This is also possible in awk by setting RS.

jmcnamara
Your solution is elegant! thanks a lot.
+1  A: 

Far less characters and simpler solution than Pax's one

perl -ne '/^of_interest /../^\)/ and print'

or

awk '/^of_interest /,/^\)/{print}'

or

sed -n '/^of_interest /,/^)/p'
Hynek -Pichi- Vychodil