views:

129

answers:

2

Hi all,

I have two arrays (or arraylists if it is easier) of strings. I need to compare these, find which only exist in the first array, which exist in both, and which only exist in the second array. These arrays are different lengths, and may be in different orders. If necessary, I suppose I could sort them...

I know I could hack this together, but I think this might have a fairly standard and efficient / "best" solution, and I am curious more than anything.

I am using c# for this, but if you want to write your solution in another language, any help is welcome.

Thanks for the help!

+1  A: 
var onlyinfirst = from s in list1 where !list2.Contains(s) select s;
var onlyinsecond = from s in list2 where !list1.Contains(s) select s;
var onboth = from s in list1 where list2.Contains(s) select s;
Luiscencio
This is what I had come up with. I just figured there is some nice c# / .net way of doing it using comparables or something. I'll also give some linq a try, but if all else fails this will work.
Wes
Note that if the lists are of size n and m then these solutions are all O(n*m). There exist far more efficient solutions if m and n are large.
Eric Lippert
+4  A: 

If the arrays are large then you'll want to use a data structure that is efficient for these operations; arrays are not.

The naive solution is O(n^2) in time if the arrays are of size n.

If you sort the arrays in place then you can binary search them for the items; sorting will likely be O(n lg n) and searching n times at a cost of lg n per search will also be O(n lg n) in time.

If you turn each array into a HashSet<T> first then you can do it in O(n) time and O(n) extra space.

Eric Lippert
I've never used a hashset but am very curious, I'll look into it, thanks!
Wes
@Wes: `HashSet` was introduced in .NET Framework 3.5.
Brian
@Wes: @Brian: And it was called HashSet instead of the more sensible name "Set" because... wait for it... because "Set" is *a keyword of Visual Basic*.
Eric Lippert