ansaurus

Question

Searching/reading another file from awk based on current file's contents, is it possible?

Answer 1

A:

This seems to work for some test files I set up matching your examples. Involving perl in this manner (interposed with grep) is probably going to hurt the performance a great deal, though...

## perl code to do some dirty work

for my $line (`grep 'X Y Z' myhugefile`) {
    chomp $line;
    my ($a, $b, $c, $d, $e) = split(/ /,$line);
    my $cmd = 'grep -P "' . $d . ' .+? ' . $e .'" otherfile';
    for my $from_otherfile (`$cmd`) {
        chomp $from_otherfile;
        my ($oa, $ob, $oc, $od) = split(/ /,$from_otherfile);
        print "$a $ob $oc\n";
    }
}

EDIT: Use tsee's solution (above), it's much more well-thought-out.

Adam Bellaire 2008-09-27 20:31:50

I'll try it out on Monday, thanks.

Zsolt Botykai 2008-09-28 05:48:15

Involving perl doesn't hurt the performance at all! Calling shell commands using backticks from perl (like you do) ruins performance. If you'd use the shell-typical idiom of piping things through lots of programs or calling many extra processes, you're going to send performance down the john.

tsee 2008-09-29 07:00:55

You're quite correct, tsee. I meant that involving perl in this particular way was going to hurt performance. Your properly written script is much better.

Adam Bellaire 2008-09-29 11:40:56

Sorry for being so blunt, I misunderstood your post's second sentence. Cheers,

tsee 2008-09-29 16:08:33

Answer 2

+2 A:

Let me start out by saying that your problem description isn't really that helpful. Next time, please just be more specific: You might be missing out on much better solutions.

So from your description, I understand you have two files which contain whitespace-separated data. In the first file, you want to match the first three columns against some search pattern. If found, you want to find all lines in another file which contain the fourth and and fifth column of the matching line in the first file. From those lines, you need to extract the second and third column and then print the first column of the first file and the second and third from the second file. Okay, here goes:

#!/usr/bin/env perl -nwa
use strict;
use File::Find 'find';
my @search = qw(X Y Z);

# if you know in advance that the otherfile isn't
# huge, you can cache it in memory as an optimization.

# with any more columns, you want a loop here:
if ($F[0] eq $search[0]
    and $F[1] eq $search[1]
    and $F[2] eq $search[2])
{
  my @files;
  find(sub {
      return if not -f $_;
      # verbatim search for the columns in the file name.
      # I'm still not sure what your file-search criteria are, though.
      push @files, $File::Find::name if /\Q$F[3]\E/ and /\Q$F[4]\E/;
      # alternatively search for the combination:
      #push @files, $File::Find::name if /\Q$F[3]\E.*\Q$F[4]\E/;
      # or search *all* files in the search path?
      #push @files, $File::Find::name;
    }, '/search/path'
  )
  foreach my $file (@files) {
    open my $fh, '<', $file or die "Can't open file '$file': $!";
    while (defined($_ = <$fh>)) {
      chomp;
      # order of fields doesn't matter per your requirement.
      my @cols = split ' ', $_;
      my %seen = map {($_=>1)} @cols;
      if ($seen{$F[3]} and $seen{$F[4]}) {
        print join(' ', $F[0], @cols[1,2]), "\n";
      }
    }
    close $fh;
  }
} # end if matching line

Unlike another poster's solution which contains lots of system calls, this doesn't fall back to the shell at all and thus should be plenty fast.

tsee 2008-09-29 07:19:02

Sorry about not specifying correctly. I'll try your solution as well at work. One question: how to solve, that the name of otherfile (t.txt in your answer) is unknown: so I need to search fpr a file which matches my criterias?

Zsolt Botykai 2008-09-29 08:22:54

What are your criteria for the file name?What you should do is: use File::Find. It's a module for recursively traversing directories. It's been in perl 5.0, so you can safely use it.

tsee 2008-09-29 10:16:23

This is a much better solution than my hack, which would load the entire contents of both greps into memory and (probably) be painfully slow. It would be nice to see the addition of File::Find for a complete solution.

Adam Bellaire 2008-09-29 11:35:35

Answer 3

+1 A:

This is the type of work that got me to move from awk to perl in the first place. If you are going to accomplish this, you may actually find it easier to create a shell script that creates awk script(s) to query and then update in separate steps.

(I've written such a beast for reading/updating windows-ini-style files - it's ugly. I wish I could have used perl.)

Tanktalus 2008-09-29 17:18:26

Answer 4

+1 A:

I often see the restriction "I can't use any Perl modules", and when it's not a homework question, it's often just due to a lack of information. Yes, even you can use CPAN contains the instructions on how to install CPAN modules locally without having root privileges. Another alternative is just to take the source code of a CPAN module and paste it into your program.

None of this helps if there are other, unstated, restrictions, like lack of disk space that prevent installation of (too many) additional files.

Corion 2008-09-30 07:25:30

You are almost right, except the case of - the very rigorous sysadmins- in a very big bank- on a live system- where I just got a call for not properly logging what I done with a file - my .vimrc- without any internet connection to the machine- and I have to ask admins to upload files...

Zsolt Botykai 2008-10-09 20:46:03

ansaurus

tags:

views:

answers:

Searching/reading another file from awk based on current file's contents, is it possible?

related questions