tags:

views:

354

answers:

5

I'm having a problem with my work that hopefully reduces to the following: I have two List<int>s, and I want to see if any of the ints in ListA are equal to any int in ListB. (They can be arrays if that makes life easier, but I think List<> has some built-in magic that might help.) And I'm sure this is a LINQ-friendly problem, but I'm working in 2.0 here.

My solution so far has been to foreach through ListA, and then foreach through ListB,

foreach (int a in ListA)
{
    foreach (int b in ListB)
    {
        if (a == b)
        {
            return true;
        }
    }
}

which was actually pretty slick when they were each three items long, but now they're 200 long and they frequently don't match, so we get the worst-case of N^2 comparisons. Even 40,000 comparisons go by pretty fast, but I think I might be missing something, since N^2 seems pretty naive for this particular problem.

Thanks!

A: 

How about using BinarySearch method instead of iterating over all elements in the inner loop?

shahkalpesh
Doesn't BinarySearch rely on the list being sorted? http://msdn.microsoft.com/en-us/library/w4e7fxsh.aspx
Fabrizio C.
+3  A: 

Load the whole of ListA into a HashSet instance, and then test foreach item in ListB against the HastSet: I'm pretty sure that this would be O(N).

//untested code ahead
HashSet<int> hashSet = new HashSet<int>(ListA);
foreach (int i in ListB)
{
    if (hashSet.Contains(i))
        return true;
}

Here's the same thing in one line:

return new HashSet<int>(ListA).Overlaps(ListB);

HashSet doesn't exist in .NET 3.5, so in .NET 2.0 you can use Dictionary<int,object> (instead of using HashSet<int>), and always store null as the object/value in the Dictionary since you're only interested in the keys.

ChrisW
Hashset was not introduced until .NET 3.5.
casperOne
Hashing in general is not a bad idea. It's not difficult to implement one if necessary.
PolyThinker
In that case, using .Net 2.0, you can use Dictionary<int,object> instead of HashSet (and always store null as the object/value in the Dictionary, since you're only interested in the keys).
ChrisW
A: 

Instead of iterating through each list, take a look at the List.Contains method:

foreach (int a in ListA)
{
  if (ListB.Contains(a))
    return true;
}
Metro Smurf
That's no better than the original solution: still O(N^2)
ChrisW
Teach me to post just before bed... in looking at the Contains method in more depth, it does indeed perform an internal iteration of the list. In this case, a Dictionary object is probably the better route.
Metro Smurf
+2  A: 

Chris gives an O(N) solution by hashing. Now, depending on the constant factor (due to hashing), it might be worth considering an O(N log(N)) solution by sorting. There are a few different variants that you may consider depending on your use case.

  1. Sort ListB ( O(N log(N) ), and use a searching algorithm to parse each element in ListA (which is again O(N) * O(log(N))).

  2. Sort both ListA and ListB ( O(N log(N) ), and use an O(N) algorithm to compare these lists for duplicates.

If both lists are going to be used more than once, the second method is preferred.

PolyThinker
+2  A: 

You are going to want to use a Dictionary here:

// The dictionary.
IDictionary<int, bool> dict = new Dictionary<int, bool>();

// Cycle through the first list.
foreach (int a in ListA)
{
  // Add the number if it isn't in the set already.
  if (!dict.ContainsKey(a))
  {
    // Add the number.
    dict.Add(a, true);
  }
}

// The list that has the intersection.
List<int> intersection = new List<int>();

// Cycle through b.
foreach (int b in ListB)
{
  // If the dictionary contains the item, add to the intersection.
  if (dict.ContainsKey(b))
  {
    // Add to the intersection.
    intersection.Add(b);
  }
}

Note that you can get duplicates in the intersection list if there are duplicate values in ListB. You can get around that easily though by using the same approach in the first loop and checking for it's existence already.

casperOne