Shell scripts can use pipes.
cat "$@" |
tr -cs "a-zA-Z0-9." '\012' |
{
old="aaa."
while read new
do
case "$old" in
*.) : OK;;
*) echo "$old $new";;
esac
old="$new"
done
}
The code uses cat
as the universal collector of data - tr
is a pure filter that does not accept any filename arguments. The basic idea is that the variable old contains the first word, and new reads the new word. When old ends with a period (as it does in the beginning), it does not form a valid bigram under your rules. If you want to remove the dots from the sentence-ending bigrams, you can use:
echo "$old ${new%.}"
The unadorned version (with dots echoed) works with Bourne shell as well as derivatives; the version with the ${new%.}
only workers with Korn shell and derivatives - not the original Bourne shell.
If you must use temporary files, then make their names contain the process ID ($$) and use trap to remove them:
tmp=${TMPDIR:-/tmp}/bigram.$$
trap 'rm -f $tmp.?; exit 1' 0 1 2 3 13 15
...code using $tmp.1, $tmp.2, etc...
rm -f $tmp.?
trap 0
Signal 1 is hangup (HUP), 2 is interrupt (INT), 3 is quit (QUIT), 13 is pipe (PIPE) and 15 is terminate (TERM); 0 is 'any exit' and is almost juju in this context. Before actually exiting, remember to cancel the exit trap, as shown.