ansaurus

Question

AWK/BASH: how to match a field in one file from a field in another?

Answer 1

+1 A:

You can use grep as:

grep -f file2 file1   # file2 is the file with the names.

The -f option of grep obtains the pattern to be search for from the file.

To remove exact duplicate lines from the output you can use sort as:

grep -f file2 file1 | sort -u

codaddict 2010-10-16 05:40:11

I tried to use the command but got the following, "grep: Unmatched [ or [^", I then tried using -F to force but there was no output? Is this command using all of file2 in one search pattern to try and match to file1?

Tomek 2010-10-16 05:48:42

I specified the -F flag wrong (I replaced -f with -F) so the final command grep -f file2 -F file1 worked. Thanks for the help.

Tomek 2010-10-16 05:56:20

@codaddict for the unique, I was looking to remove the lines from file1 that have just the names repeated (the other columns have different data)

Tomek 2010-10-16 06:00:34

If there is a chance of duplicate lines being everywhere, sort is needed.

ghostdog74 2010-10-16 06:04:46

@ghostdog74 duplicate lines(names) only occur in the same place right after each other, the problem is that the rest of the line (after the name) can be different.

Tomek 2010-10-16 06:07:06

Answer 2

+1 A:

awk

#! /bin/bash
awk 'FNR==NR{!a[$0]++;next }{ b[$0]++ }
END{
  for(i in a){
    for(k in b){
      if (a[i]==1 && i ~ k ) { print i }
    }
  }
}' file1 file2

ghostdog74 2010-10-16 05:58:06

Answer 3

+1 A:

expanding on codaddict's answer:

grep -f file2 file1 | sort | uniq

this will remove lines that are exactly the same, but the side effect (which may be unwanted) is that your datafile will now be sorted. It also requires the lines to be exactly the same, which is not the case in your example data. The names are the same, but the data after those same names is different. uniq can take a field or character count option, but this won't work on your data because your names have variable length and a variable number of fields. If you know your data fields are always the last 3 fields on a line, then you can do this:

grep -f file2 file1 | sort | rev | uniq -f 3 | rev

your output will be only one of each name, but which one? the lowest one lexicographically because it was sorted (sort is needed for uniq to work right). If you don't want to sort it first, or need to be careful about which of the lines are dropped, then an awk or perl or ruby or python solution will probably work best using associative arrays.

codeboy2k 2010-10-16 06:07:27

@codeboy2k Yea that's exactly my problem. The names can be of variable length and the data after the name is different. I am just looking to get the first occurrence of say John Allen Smith II. I'll look into some associative arrays with awk. Thanks for the info.

Tomek 2010-10-16 06:12:29

ansaurus

tags:

views:

answers:

AWK/BASH: how to match a field in one file from a field in another?

related questions