views:

725

answers:

4

Hello, Is it possible to take the difference of two arrays in bash.. Would be really great if you could suggest me the way to do it..

Code :

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" ) 

Array3 =diff(Array1, Array2)

Array3 ideally should be :
Array3=( "key7" "key8" "key9" "key10" )

Appreciate your help,

Thanks Kiran

+1  A: 

In Bash 4:

declare -A temp    # associative array
for element in "${Array1[@]}" "${Array2[@]}"
do
    ((temp[$element]++))
done
for element in "${!temp[@]}"
do
    if [[ ${temp[$element]} > 1 ]]
    then
        unset "temp[$element]"
    fi
done
Array3=(${!temp[@]})    # retrieve the keys as values

Edit:

ephemient pointed out a potentially serious bug. If an element exists in one array with one or more duplicates and doesn't exist at all in the other array, it will be incorrectly removed from the list of unique values. The version below attempts to handle that situation.

declare -A temp1 temp2    # associative arrays
for element in "${Array1[@]}"
do
    ((temp1[$element]++))
done

for element in "${Array2[@]}"
do
    ((temp2[$element]++))
done

for element in "${!temp1[@]}"
do
    if [[ ${temp1[$element]} > 1 && ${temp2[$element]} > 1 ]]
    then
        unset "temp1[$element]" "temp2[$element]"
    fi
done
Array3=(${!temp1[@]} ${!temp2[@]})
Dennis Williamson
That performs a symmetric difference, and assumes that the original arrays have no duplicates. So it's not what I would have thought of first, but it works well for OP's one example.
ephemient
@ephemient: Right, the parallel would be to `diff(1)` which is also symmetric. Also, this script will work to find elements unique to any number of arrays simply by adding them to the list in the second line of the first version. I've added an edit which provides a version to handle duplicates in one array which don't appear in the other.
Dennis Williamson
Thanks A lot.. I was thinking if there was any obvious way of doing it.. If i am not aware of any command which would readily give the diff of 2 arrays..Thanks for your support and help. I modified the code to read the diff of 2 files which was little easier to program
Kiran
A: 
ephemient
A: 
Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=( "key1" "key2" "key3" "key4" "key5" "key6" "key11" )
a1=${Array1[@]};a2=${Array2[@]}; a3=${Array3[@]}
diff(){
    a1="$1"
    a2="$2"
    awk -va1="$a1" -va2="$a2" '
     BEGIN{
       m= split(a1, A1," ")
       n= split(a2, t," ")
       for(i=1;i<=n;i++) { A2[t[i]] }
       for (i=1;i<=m;i++){
            if( ! (A1[i] in A2)  ){
                printf A1[i]" "
            }
        }
    }'
}
Array4=( $(diff "$a1" "$a2") )  #compare a1 against a2
echo "Array4: ${Array4[@]}"
Array4=( $(diff "$a3" "$a1") )  #compare a3 against a1
echo "Array4: ${Array4[@]}"

output

$ ./shell.sh
Array4: key7 key8 key9 key10
Array4: key11
ghostdog74
Thanks a lot.. Gave a very good insight on Shell Programming..
Kiran
A: 

Anytime a question pops up dealing with unique values that may not be sorted, my mind immediately goes to awk. Here is my take on it.

Code

#!/bin/bash

diff(){
  awk 'BEGIN{RS=ORS=" "}
       {NR==FNR?a[$0]++:a[$0]--}
       END{for(k in a)if(a[k])print k}' <(echo -n "${!1}") <(echo -n "${!2}")
}

Array1=( "key1" "key2" "key3" "key4" "key5" "key6" "key7" "key8" "key9" "key10" )
Array2=( "key1" "key2" "key3" "key4" "key5" "key6" )
Array3=($(diff Array1[@] Array2[@]))
echo ${Array3[@]}

Output

$ ./diffArray.sh
key10 key7 key8 key9

Note*: Like other answers given, if there are duplicate keys in an array they will only be reported once; this may or may not be the behavior you are looking for. The awk code to handle that is messier and not as clean.

SiegeX