views:

127

answers:

5

I am looking for the most efficent way of solving the following

Problem:

given an array Before = { 8, 7, 2, 1} and an array After ={1, 3, 8, 8}
find the added and the removed elements

the solution is:
        added = 3, 8 
        removed = 7, 2

My idea so far is:

for i = 0 .. B.Lenghtt-1
{
    for j= 0 .. A.Lenght-1
    {
        if A[j] == B[i]

            A[j] = 0;
            B[i] = 0;

            break;
    }
}

// B elemnts different from 0 are the Removed elements
// A elemnts different from 0 are the Added elemnts

Does anyone know a better solution perhaps more efficent and that doesn't overwrite the original arrays

+1  A: 

In some sort of C++ pseudo code:

Before.sort();
After.sort();
int i = 0;
int j = 0;
for (; i < Before.size() && j < After.size(); ) {
    if (Before[i] < After[j]) {
        Removed.add(Before[i]);
        ++i;
        continue;
    }
    if (Before[i] > After[j]) {
        Added.add(After[j]);
        ++j;
        continue;
    }
    ++i;
    ++j;
}
for (; i < Before.size(); ++i) {
     Removed.add(Before[i]);
}
for (; j < After.size(); ++j) {
     Added.add(After[j]);
}
Andreas Brinck
+7  A: 

Sorting is your friend.

Sort the two arrays (a and b), and then walk them (using x and y as counters). Move down both 1 at a time. You can derive all your tests from there:

  • if a[x] < b[y], then a[x] was removed (and only increment x)
  • if a[x] > b[y], then b[y] was added (and only increment y)

(I may have missed an edge case, but you get the general idea.)

(edit: the primary edge case that isn't covered here is handling when you reach the end of one of the arrays before the other, but it's not hard to figure out. :)

Joe
Thanks Joe, Andreas and Kriss.Just a final check, in terms of efficency is correct to say that your solutions costs isn log(n) + n log(n) + n = O (n log n)Therefore, in approching this kind of problems is always recomended to sort first, right?jj
jj
@jj: basically yes, sorting before is better. In this particular case you can also avoid sorting using hashes because you don't really need the order but need only know if the item is here or not.
kriss
I would prefer hashes too for the lower complexity `Theta(N)`.
Matthieu M.
Good points re: hashing.
Joe
Joe and Andreas, how did you find this algorithm? can you shed some light in your thinking and how did you identify that sorted arrays and that way of scanning will prove to be efficient? jj
jj
+3  A: 

You could also use a Dictionary<int, int> and a algorithm similar to this:

foreach i in source_list: dictionary[i]++;
foreach i in dest_list: dictionary[i]--;

The final dictionary tells you which elements were inserted/removed (and how often). This solution should be quite fast even for bigger lists - faster than sorting.

mafutrct
+2  A: 

if your language as multiset available (set with count of elements) your question is a standard operation.

B = multiset(Before)
A = multiset(After)

result is A.symdiff(B) (symdiff is union minus intersection and that is exactly what you are looking for to have removed and added).

Obviously you can also get removed only or added only using classical difference between sets.

It can trivially be implemented using hashes and it's O(n) (using sort is slightly less efficient as it is O(n.log(n)) because of the sort itself).

kriss
A: 

perl:

@a = ( 8, 7, 2, 2, 1 );
@b = ( 1, 3, 8, 8, 8 );

$d{$_}++ for(@a);
$d{$_}-- for(@b);

print"added = "; 
for(keys %d){print "$_ " x (-$d{$_}) if($d{$_}<0)}
print"\n";
print"removed = "; 
for(keys %d){print "$_ " x ($d{$_}) if($d{$_}>0)}
print"\n";

result:

$ ./inout.pl 
added = 8 8 3 
removed = 7 2 2 
oraz
This is not a code golf entry! Seriously, perl is cryptic at the best of times so use an easier language (or pseudo-code) to illustrate an algorithm you wish to share. I've got a hard time understanding what you're doing here even though the problem is trivial.
Matthieu M.