views:

434

answers:

3

I use this question in interviews and I wonder what the best solution is.

Write a Perl sub that takes n lists, and then returns 2^n-1 lists telling you which items are in which lists; that is, which items are only in the first list, the second, list, both the first and second list, and all other combinations of lists. Assume that n is reasonably small (less than 20).

For example:

list_compare([1, 3], [2, 3]);
  => ([1], [2], [3]);

Here, the first result list gives all items that are only in list 1, the second result list gives all items that are only in list 2, and the third result list gives all items that are in both lists.

list_compare([1, 3, 5, 7], [2, 3, 6, 7], [4, 5, 6, 7])
  => ([1], [2], [3], [4], [5], [6], [7])

Here, the first list gives all items that are only in list 1, the second list gives all items that are only in list 2, and the third list gives all items that are in both lists 1 and 2, as in the first example. The fourth list gives all items that are only in list 3, the fifth list gives all items that are only in lists 1 and 3, the sixth list gives all items that are only in lists 2 and 3, and the seventh list gives all items that are in all 3 lists.

I usually give this problem as a follow up to the subset of this problem for n=2.

What is the solution?

Follow-up: The items in the lists are strings. There might be duplicates, but since they are just strings, duplicates should be squashed in the output. Order of the items in the output lists doesn't matter, the order of the lists themselves does.

A: 

Here is my solution:

Construct a hash whose keys are the union of all the elements in the input lists, and the values are bit strings, where bit i is set if the element is present in list i. The bit strings are constructed using bitwise or. Then, construct the output lists by iterating over the keys of the hash, adding keys to the associated output list.

sub list_compare {
    my (@lists) = @_;
    my %compare;
    my $bit = 1;
    foreach my $list (@lists) {
        $compare{$_} |= $bit foreach @$list;
        $bit *= 2; # shift over one bit
    }


    my @output_lists;
    foreach my $item (keys %compare) {
        push @{ $output_lists[ $compare{$item} - 1 ] }, $item;
    }

    return \@output_lists;

}

Updated to include the inverted output list generation suggested by Aristotle

nohat
+1  A: 

First of all I would like to note that nohat's answer simply does not work. Try running it, and look at the output in Data::Dumper to verify that.

That said, your question is not well-posed. It looks like you are using sets as arrays. How do you wish to handle duplicates? How do you want to handle complex data structures? What order do you want elements in? For ease I'll assume that the answers are squash duplicates, it is OK to stringify complex data structures, and order does not matter. In that case the following is a perfectly adequate answer:

sub list_compare {
  my @lists = @_;

  my @answers;
  for my $list (@lists) {
    my %in_list = map {$_=>1} @$list;
    # We have this list.
    my @more_answers = [keys %in_list];
    for my $answer (@answers) {
      push @more_answers, [grep $in_list{$_}, @$answer];
    }
    push @answers, @more_answers;
  }

  return @answers;
}

If you want to adjust those assumptions, you'll need to adjust the code. For example not squashing complex data structures and not squashing duplicates can be done with:

sub list_compare {
  my @lists = @_;

  my @answers;
  for my $list (@lists) {
    my %in_list = map {$_=>1} @$list;
    # We have this list.
    my @more_answers = [@$list];
    for my $answer (@answers) {
      push @more_answers, [grep $in_list{$_}, @$answer];
    }
    push @answers, @more_answers;
  }

  return @answers;
}

This is, however, using the stringification of the data structure to check whether things that exist in one exist in another. Relaxing that condition would require somewhat more work.

Sorry, I wasn't clear in the problem. I meant each output list should contain the items that are *only* in the input list(s) the output list corresponds to. This solution doesn't match the results in the examples; it gives *any* items in the input list(s), which is a much simpler problem.
nohat
+2  A: 

Your given solution can be simplified quite a bit still.

In the first loop, you can use plain addition since you are only ever ORing with single bits, and you can narrow the scope of $bit by iterating over indices. In the second loop, you can subtract 1 from the index instead of producing an unnecessary 0th output list element that needs to be shifted off, and where you unnecessarily iterate m*n times (where m is the number of output lists and n is the number of unique elements), iterating over the unique elements would reduce the iterations to just n (which is a significant win in typical use cases where m is much larger than n), and would simplify the code.

sub list_compare {
 my ( @list ) = @_;
 my %dest;

 for my $i ( 0 .. $#list ) {
  my $bit = 2**$i;
  $dest{$_} += $bit for @{ $list[ $i ] };
 }

 my @output_list;

 for my $val ( keys %dest ) {
  push @{ $output_list[ $dest{ $val } - 1 ] }, $val;
 }

 return \@output_list;
}

Note also that once thought of in this way, the result gathering process can be written very concisely with the aid of the List::Part module:

use List::Part;

sub list_compare {
 my ( @list ) = @_;
 my %dest;

 for my $i ( 0 .. $#list ) {
  my $bit = 2**$i;
  $dest{$_} += $bit for @{ $list[ $i ] };
 }

 return [ part { $dest{ $_ } - 1 } keys %dest ];
}

But note that list_compare is a terrible name. Something like part_elems_by_membership would be much better. Also, the imprecisions in your question Ben Tilly pointed out need to be rectified.

Aristotle Pagaltzis
Not doing bitwise or means the solution fails if there are duplicates in the input lists. However, inverting the loop for the output list generation is a significant improvement. Thanks.
nohat