I have two implementations of what I call sgrep
, one in Perl, one using just pre-Perl (pre-GNU) standard Unix commands. If you've got GNU grep
, you've no particular need of these. It would be more complex to deal with forwards and backwards context searches, but that might be a useful exercise.
Perl solution:
#!/usr/perl/v5.8.8/bin/perl -w
#
# @(#)$Id: sgrep.pl,v 1.6 2007/09/18 22:55:20 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless @ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar @ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my @lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from @lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = @_;
while (scalar(@lines) > $leave)
{
my $line = shift @lines;
my $curr = $tail - scalar(@lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of @lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push @lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(@lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
@lines = ();
}
}
Pre-Perl Unix solution (using plain ed
, sed
, and sort
- though it uses getopt
which was not necessarily available back then):
#!/bin/ksh
#
# @(#)$Id: old.sgrep.sh,v 1.5 2007/09/15 22:15:43 jleffler Exp $
#
# Special grep
# Finds a pattern and prints lines either side of the pattern
# Line numbers are always produced by ed (substitute for grep),
# which allows us to eliminate duplicate lines cleanly. If the
# user did not ask for numbers, these are then stripped out.
#
# BUG: if the pattern occurs in in the first line or two and
# the number of lines to go back is larger than the line number,
# it fails dismally.
set -- `getopt "f:b:hn" "$@"`
case $# in
0) echo "Usage: $0 [-hn] [-f x] [-b y] pattern [files]" >&2
exit 1;;
esac
# Tab required - at least with sed (perl would be different)
# But then the whole problem would be different if implemented in Perl.
number="'s/^\\([0-9][0-9]*\\) /\\1:/'"
filename="'s%^%%'" # No-op for sed
f=3
b=3
nflag=no
hflag=no
while [ $# -gt 0 ]
do
case $1 in
-f) f=$2; shift 2;;
-b) b=$2; shift 2;;
-n) nflag=yes; shift;;
-h) hflag=yes; shift;;
--) shift; break;;
*) echo "Unknown option $1" >&2
exit 1;;
esac
done
pattern="${1:?'No pattern'}"
shift
case $# in
0) tmp=${TMPDIR:-/tmp}/`basename $0`.$$
trap "rm -f $tmp ; exit 1" 0
cat - >$tmp
set -- $tmp
sort="sort -t: -u +0n -1"
;;
*) filename="'s%^%'\$file:%"
sort="sort -t: -u +1n -2"
;;
esac
case $nflag in
yes) num_remove='s/[0-9][0-9]*://';;
no) num_remove='s/^//';;
esac
case $hflag in
yes) fileremove='s%^$file:%%';;
no) fileremove='s/^//';;
esac
for file in $*
do
echo "g/$pattern/.-${b},.+${f}n" |
ed - $file |
eval sed -e "$number" -e "$filename" |
$sort |
eval sed -e "$fileremove" -e "$num_remove"
done
rm -f $tmp
trap 0
exit 0
The shell version of sgrep
was written in February 1989, and bug fixed in May 1989. It then remained unchanged except for an administrative change (SCCS to RCS transition) in 1997 until 2007, when I added the -h
option. I switched to the Perl version in 2007.