views:

508

answers:

5

Is there a way to make awk (gawk) ignore or skip missing files? That is, files passed on the command line that no longer exist in the file system (e.g. rapidly appearing/disappearing files under /proc/[1-9]*).

By default, a missing file is a fatal error :-(

I would like to be able to do the equivalent of something like this:

BEGIN { MISSING_FILES_ARE_FATAL = 0 }  # <- Wishful thinking!
      { count++ }
END   { print count }

A wrapper script cannot check that files exist befor awk is run as they may disappear between the time they are checked and awk then tries to open them, i.e., it is a race condition. (It is also a race condition to check-and-then-open within awk, although the timing is tighter)

A: 

In the finest of traditions, I will answer your awk question with a Perl program.

#!/usr/bin/perl -w

for my $file (@ARGV) {
    open my $fh, $file or next;
    while(<$fh>) {
        ...do your thing here...
    }
}

(It's not awk, but it is the only solution without a race condition.)

Schwern
+1  A: 

Even sticking a perl or shell wrapper around your awk script, I think there's still going to be a race condition. For example, using ADEpt's otherwise fine shell snippet:

[ -r "$filename" ] && awk -f ... $filename

there's nothing preventing the process from going away between the -r and the time awk gets around to trying to open the file...

The only answer I can think of is to use LD_PRELOAD to replace the system open call for awk, so that if the file is missing, a read file descriptor on /dev/null is opened instead.

That might work...

Mike G.
+1  A: 

Well you can check with system call on the contents of ARGV, then process them via getline.

 if (system("test -r " ARGV[1]) == 0)
   while ( (getline aline < ARGV[1]) >0 )
     # process ARGV[1] via `aline` instead of $0

...

Then process ARGV[2], etc HTH

Zsolt Botykai
A: 

Oh, sorry. Disregard my previous answer. Here is another suggestion:

cat /proc/[1-9]* 2>/dev/null | awk ....

Cat will gobble up all files, missing and existing alike, cat's error will be dumped to oblivion (the missing file is non-fatal error for cat), and the awk wil be able to process the result.

ADEpt
+1  A: 

It looks to me that a "MISSING_FILES_ARE_FATAL = 0" feature will be part of the next gawk release. See the ChangeLog file of the current gawk-stable source code:

--- snip ---

Fri Aug 22 14:43:49 2008 Arnold D. Robbins [email protected]

* io.c (nextfile): Users Strong In The Ways Of The Source can use
non-existant files on the command line without it being a fatal error.

--- snip ---

http://cvs.savannah.gnu.org/viewvc/gawk-stable/ChangeLog?revision=1.87&amp;root=gawk&amp;view=markup

Hermann