views:

26

answers:

1

I have the following code:

get_list_a()
{
    $MYSQL -B -u $USER --passwword="$PW" $DB <<EOF
select name, value from mytable_a
EOF
}
get_list_b()
{
    $MYSQL -B -u $USER --passwword="$PW" $DB <<EOF
select name, value from mytable_b
EOF
}

get_list_a >$test.txt


Now I need to combine a and b first and remove all dups(key is name, the first column) and then write them to test.txt. List a and list b by itself are assumed to be distinct. If x in a and y in b exist such that x.name=y.name, then I only want to keep x. how do I do it?

A: 

You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is:

A + B - {w in B | (w,value) in A }

There are many ways of accomplishing this, depending on access and needed efficiencies.

  • If you can modify DB1 (with A), then download table B from DB2, upload it to DB1, then extract your data with the appropriate join
  • If you can't modify DB1, then download both A and B and concatenate them to the same stream, with A followed by B. Then sort by the first field. Then process the stream one record at time. Duplicate names will be side-by-side. If the same name appears more than one time, print the first and ignore subsequent records with the same name.

Here is a sample solution to your problem (starting with two lists of names/values):

#!/bin/bash

A="Smith value1
Jones value2
Wilson value3"

B="Smith value10
Wilson value11
Fox value12
Brown value13"

PrevName="Not a valid name"
echo "$A
$B" | sort -k1  |
while read Name Value
do
   if [ "$Name" != "$PrevName" ]; then
      echo $Name $Value
   fi
   PrevName="$Name"
done > outfile

Here is the output:

Brown value13
Fox value12
Jones value2
Smith value1
Wilson value11

This solution, of course, assumes you are allowed to change the order of the records, as is done in the sort step.

Larry Morell