ansaurus

Question

Finding distinct lines in large datatables

Answer 1

+1 A:

Using a Dictionary rather than a List will be quicker:

    Dim seen As New Dictionary(Of String, String)
    ...
        If Not seen.ContainsKey(value) Then
            seen.Add(value, "")
        End If

When you search a List, you're comparing each entry with value, so by the end of the process you're doing ~124K comparisons for each record. A Dictionary, on the other hand, uses hashing to make the lookups much quicker.

When you want to return the list of unique values, use seen.Keys.

(Note that you'd ideally use a Set type for this, but .NET 2.0 doesn't have one.)

RichieHindle 2010-10-06 10:28:14

That seems to have gotten it down to sub 1 second, looks like that was 1 massive bottle neck!

themaninthesuitcase 2010-10-06 10:49:12

ansaurus

tags:

views:

answers:

Finding distinct lines in large datatables

related questions