ansaurus

Question

Answer 1

+1 A:

Just replace the paste line with this:

paste out1 out2 | grep -v '\..'

This will filter out any lines that contain a period which is not the last character of a line.

Robert Gamble 2008-10-28 22:29:55

The grep expression matches (and the `-v` option excludes) anything that matches a dot followed by something that is not a dollar sign. Since the output of `tr` does not include dollars, it works, but it is not obvious that the character class was necessary. It could have been just '\..'.

Jonathan Leffler 2008-10-28 23:31:13

Doh, thanks for pointing that out, fixed.

Robert Gamble 2008-10-29 05:38:35

Answer 2

A:

thank you very much!

2008-10-28 22:41:04

please just upvote and avoid such noisy answers

bene 2008-10-28 23:32:48

Answer 3

+2 A:

Shell scripts can use pipes.

cat "$@" |
tr -cs "a-zA-Z0-9." '\012' |
{
old="aaa."
while read new
do
    case "$old" in
    *.) : OK;;
    *)  echo "$old $new";;
    esac
    old="$new"
done
}

The code uses cat as the universal collector of data - tr is a pure filter that does not accept any filename arguments. The basic idea is that the variable old contains the first word, and new reads the new word. When old ends with a period (as it does in the beginning), it does not form a valid bigram under your rules. If you want to remove the dots from the sentence-ending bigrams, you can use:

 echo "$old ${new%.}"

The unadorned version (with dots echoed) works with Bourne shell as well as derivatives; the version with the ${new%.} only workers with Korn shell and derivatives - not the original Bourne shell.

If you must use temporary files, then make their names contain the process ID ($$) and use trap to remove them:

tmp=${TMPDIR:-/tmp}/bigram.$$
trap 'rm -f $tmp.?; exit 1' 0 1 2 3 13 15

...code using $tmp.1, $tmp.2, etc...

rm -f $tmp.?
trap 0

Signal 1 is hangup (HUP), 2 is interrupt (INT), 3 is quit (QUIT), 13 is pipe (PIPE) and 15 is terminate (TERM); 0 is 'any exit' and is almost juju in this context. Before actually exiting, remember to cancel the exit trap, as shown.

Jonathan Leffler 2008-10-28 22:57:26

Answer 4

+1 A:

You may also want to browse Ken Church's "Unix for Poets" (PDF) - a classic describing solutions to similar problems.

Yuval F 2008-10-29 06:36:21

ansaurus

tags:

views:

answers:

Shell script to find bigrams.

related questions