tags:

views:

148

answers:

5

I have text file of this kind

File 1
-------
ABC 123
DEF 456
GHI 111

And I have another file

File 2
------
stringaa ttt stringbb yyy

Ouput
-----
stringaa ABC stringbb 123
stringaa DEF stringbb 456
stringaa GHI stringbb 111

Reading the file File 1 update File2 such that the Ouput is produced, any ideas.

+1  A: 
use strict;
use warnings;

my ($file1, $file2) = @ARGV;
open F, $file2 or die "Can't open $file2: $!\n";
$_ = <F>; # File2 should have one line only
close F;

die "$file2 in unexpected format for second file '$_'\n" unless /(\w+)\s\w+\s(\w+)/;
my ($stra, $strb) = ($1, $2);

open F, $file1 or die "Can't open $file1: $!\n";
while(<F>)
{
    s/(\w+)\s(\d+)/$stra $1 $strb $2/;
        print;
}
Motti
+1  A: 

Even though I'm not sure this is what you want (see comment). This is a way to get that output:

vinko@parrot:~$ more file1.txt
ABC 123
DEF 456
GHI 111
vinko@parrot:~$ more file2.txt
stringaa ttt stringbb yyy
vinko@parrot:~$ more concat.pl
use strict;
use warnings;

open (F1,"<",file1.txt) or die $!;
open (F2,"<",file2.txt) or die $!;

while (<F2>) {
        my ($field1, $field2, $field3, $field4) = split /\s/;
        while (<F1>) {
                my ($innerfield1, $innerfield2) = split /\s/;
                print "$field1 $innerfield1 $field3 $innerfield2\n";
        }
}
close F1;
close F2;
vinko@parrot:~$ perl concat.pl
stringaa ABC stringbb 123
stringaa DEF stringbb 456
stringaa GHI stringbb 111
Vinko Vrsalovic
Please use 3 arg form of `open` http://perldoc.perl.org/open.html
Brad Gilbert
+1  A: 

Try this:

my $file1 = shift @ARGV;
my $file2 = shift @ARGV;

open F2, $file2 or die $!;
chomp(my $template = <F2>);
my @fields = split/\s+/,$template;
close F2;

open F1, $file1 or die $!;
while (<F1>) {
    chomp;
    ($val1,$val2) = split/\s+/;
    print join("\t",$fields[0],$val1,$fields[2],$val2),"\n";

}
close F1;
bubaker
+1  A: 

This code is more verbose than the other suggestions posted here.

But it has several advantages:

  • It is commented.
  • It uses lexical filehandles and 3 argument open().
  • Variable names are descriptive and not file1 and file2.
  • It is more flexible
    • Easy to add/modify replacement fields.
    • Easy to process multiple data files in one script
    • Easy to apply same data to multiple specifications
  • Does not split or modify the specification except to make substitutions.

While this has no bearing on whether this a good design to use in practical terms, this code demonstrates several useful techniques.

  • It generates closures to handle the formatting.
  • It uses atomic exception handling instead of the flawed eval {}; if ($@) { ...handle exception... } idiom.


#!/usr/bin/perl

use strict;
use warnings;

# Supply test data - remove from real code.
my $test_data = <<'END';
ABC 123
DEF 456
GHI 111
JKL
MNO 999 888
END

my $test_spec = <<'END';
stringaa ttt stringbb yyy
END

# Use test data if no files specified.
# works because you can open() a scalar ref as a file.
# remove from real code -> should display usage information and die.
my $file_data = shift @ARGV || \$test_data;
my $file_spec = shift @ARGV || \$test_spec;

# List of tokens to replace in spec file.
# Real code should probably take list of tokens as argument.
my @replace = qw( ttt yyy );

my $spec   = Read_Spec_From_File( $file_spec );
my $format = Make_Formatter( $spec, @replace );
Print_Formatted_Data( $format, $file_data );

exit;

# -----------------------------------------------------------


# Read a specification from a file.
sub Read_Spec_From_File {
    my $file = shift;   # path to file

    open( my $fh, '<', $file )
        or die "Unable to open format specification file '$file' - $!\n";

    my $spec;

    local $_;
    while( <$fh> ) {

        die "Specification file format error - too many lines.\n"
            if defined $spec;

        $spec = $_;
    }

    die "Specification file format error - no specification.\n"
        unless defined $spec;


    return $spec;
}

# Create a formatting function that can be used to apply data to a
# specification.
#
# Formatting function takes a list of data values to apply to replacement
# tokens.
#
# Given spec 'word aaa blah bbb cheese ccc bar aaa'
# With token list is 'aaa', 'bbb', 'ccc',
# and data 111, 222, 333
# The result is 'word 111 blah 222 cheese 333 bar 111'
# 
sub Make_Formatter {
    my $spec = shift;
    my @replacement_tokens = @_;

    # formatter expects a list of data values.
    return sub {
        my $new_line = $spec;

        die "More data than tokens\n" 
            if @_ > @replacement_tokens;

        for ( 0..$#replacement_tokens ) {

            my $token = $replacement_tokens[$_];
            my $value = $_[$_];


            if ( not defined $value ) {
                die "No data for '$token'\n"; 
                $value = '<UNDEF>';
            }

            $new_line =~ s/$token/$value/g;

        }

        return $new_line;
    };
}

# Process a data file and print a set of formatted data.
sub Print_Formatted_Data {
    my $format    = shift; # Formatter function
    my $data_file = shift; # Path to data file.

    open( my $data_fh, '<', $data_file )
        or die "Unable to open data file '$data_file' - $!\n";

    while ( my $raw_data = <$data_fh> ) { 
        my @data_set  = split /\s+/, $raw_data;

        eval { 
            my $formatted = $format->(@data_set);

            print $formatted;
            1;
        }
        or do {
            warn "Error processing line $. of '$data_file' - $@";
        }

    }
}
daotoad
A: 

Hopefully this will work for you.

#! /usr/bin/env perl
use strict;
use warnings;
use 5.010;
use autodie;

my($in_file,$filter,$out_file);

if( @ARGV == 0 ){
  die "Must have filter at least\n";
}elsif( @ARGV == 1 ){
  ($filter) = @ARGV;
}elsif( @ARGV >= 2 ){
  ($in_file,$filter) = @ARGV;
}else{
  ($in_file,$filter,$out_file) = @ARGV;
}


{
  # autodie checks open() for errors
  # so we don't have to
  my($IN,$OUT);
  if( defined $in_file ){
    open $IN,  '<', $in_file;
  }else{
    $IN = *STDIN{IO};
  }
  if( defined $out_file ){
    open $OUT, '>', $out_file;
  }else{
    $OUT = *STDOUT{IO};
  }

  ProcessFiles($IN,$OUT,$filter);

  close $OUT;
  close $IN;
}

sub ProcessFilter{
  my($filter,$str) = @_;

  my @elem = grep {$_} split ' ', $str;

  $filter =~ s/\$(?|(?:{(\d+)})|(\d+))/ $elem[$1-1] /eg;

  return $filter;
}
sub ProcessFiles{
  my($IN,$OUT,$filter) = @_;

  while( my $line = <$IN> ){
    chomp $line;
    next unless $line;
    $line = ProcessFilter($filter,$line);
    say {$OUT} $line;
  }
}

It is called in one of the following manners

perl program.pl <input-file> 'filter string' <output-file>
perl program.pl <input-file> 'filter string' # sends to STDOUT
perl program.pl 'filter string' # recieves from STDIN, sends to STDOUT

If called like this

program.pl FILE1 'stringaa ${1} stringbb $2'

it reads FILE1 and outputs:

stringaa ABC stringbb 123
stringaa DEF stringbb 456
stringaa GHI stringbb 111
Brad Gilbert