views:

33

answers:

3

I've written a short function for array intersection and wanted to know why one function is faster than the other.

1)

Dim list2() As String 'Assume it has values'
Dim list2length As Integer = list2.length

Function newintersect(ByRef list1() As String) As String()
    Dim intersection As New ArrayList
    If (list1.Length < list2length) Then
        'use list2'
        For Each thing As String In list2
            If (Array.IndexOf(list1, thing) <> -1) Then
                intersection.Add(thing)
            End If
        Next
    Else
        'use list1'
        For Each thing As String In list1
            If (Array.IndexOf(list2, thing) <> -1) Then
                intersection.Add(thing)
            End If
        Next
    End If
    Return intersection
End Function

2)

Dim list2() As String 'Assume it has values'
Dim list2length As Integer = list2.length

Function newintersect(ByRef list1() As String) As String()
    Dim intersection As New ArrayList
    If (list1.Length > list2length) Then 'changed >'
        'use list2'
        For Each thing As String In list2
            If (Array.IndexOf(list1, thing) <> -1) Then
                intersection.Add(thing)
            End If
        Next
    Else
        'use list1'
        For Each thing As String In list1
            If (Array.IndexOf(list2, thing) <> -1) Then
                intersection.Add(thing)
            End If
        Next
    End If
    Return intersection
End Function

3)

Dim list2() As String 'Assume it has values'
Dim list2length As Integer = list2.length

Function newintersect(ByRef list1() As String) As String()
    For Each thing As String In list1
        If (Array.IndexOf(list2, thing) <> -1) Then
            intersection.Add(thing)
        End If
    Next
    Return intersection
End Function

So for my testcase, 1 take 65 seconds, 3 takes 63 seconds, while 2 actually takes 75 seconds. Anyone know why 3 is the fastest? And why is 1 faster than 2?

(Sorry about the poor formatting...can't seem to paste it right)

A: 

I'd expect you'd find that with different test cases you can reverse the results you had above and reach a situation where 2 is fastest and 1 & 3 are slower.

It's difficult to comment without knowing the make-up of the test case, it'll depend on the location of 'intersecting' items within the two arrays - if these tend to be close the to front on one array and closer to the end of another then the nesting order of the array iteration / IndexOf will have markedly different performance.

BTW - there are better ways of performing an intersection - sorting one or other array and performing a BinarySearch is one means - using a Dictionary(Of String, ...) or similar is another - and either would result in far better performance.

Will A
+1  A: 

That's not much of a difference. Also, it doesn't seem like the methods would produce the same result, so it would be pointless to compare the performance, right?

Anyhow, the Array.IndexOf is not very efficient way to find items, and doesn't scale well. You should get a dramatic improvement if you use a hash key based collection as lookup instead, something like this:

Function newintersect(ByRef list1 As String(), ByRef list2 As String()) As String()
  Dim smaller As HashSet(Of String)
  Dim larger As String()
  If list1.Length < list2.Length Then
    smaller = New HashSet(Of String)(list1)
    larger = list2
  Else
    smaller = New HashSet(Of String)(list2)
    larger = list1
  End If
  Dim intersection As New List(Of String)
  For Each item As String In larger
    If smaller.Contains(item) Then
      intersection.Add(item)
    End If
  Next
  Return intersection.ToArray()
End Function
Guffa
Whoa...that made a big difference...takes only 3.3 seconds as compared to 65 seconds before=D. You did make a mistake though...it shouldn't have a "not"...I'm doing an intersection..."not" would give me all those that aren't both in each list. Also this requires .net 3.5+, and while that's okay for me, is there a similar method for .net 2.0?
@user389823: Yes, the intersection... then it makes better sense. :) In framework 2.0 there is no HashSet, so you have to use a Dictionary, where the value would be unused.
Guffa
A: 

This is from the MSDN documentation

    Dim id1() As Integer = {44, 26, 92, 30, 71, 38}
    Dim id2() As Integer = {39, 59, 83, 47, 26, 4, 30}

    ' Find the set intersection of the two arrays.
    Dim intersection As IEnumerable(Of Integer) = id1.Intersect(id2)

    For Each id As Integer In intersection
        Debug.WriteLine(id.ToString)
    Next
dbasnett
Would that be faster than using a hashset comparison?