views:

96

answers:

3

I have a large list of values (100-200 character strings) and I need to return a distinct listing of them. What is the most efficient way to do this using .NET? The 2 ways that I can think of are:

  1. Use the Distinct() method of the IEnumerable class
  2. Use a Dictionary

If the Dictionary approach is faster in raw terms, consider a trade-off decision around maintainability of code.

+1  A: 

I would siggest you to use profiling here. Generate a list with sample items, sort it say 1M times using both ways, and measure the time used by each way.

If readability is a concern, create a GetDistinctItems method and put your code inside it: voilà, self-documented code.

Konamiman
+2  A: 

I would personally go with the Distinct() method provided by LINQ. It's far easier to read and maintain. Whilst using LINQ will be slower than using a dictionary the difference will be small (in the case you've listed) and you'd be better spending time optimizing database queries or web service calls.

Kane
+6  A: 

I would expect Enumerable.Distinct to be about as fast as using a dictionary if you're only doing it once. If you want to be able to add/remove values and keep the distinct-ness, you could build a HashSet<string> (which is basically what I expect Distinct is doing under the hood, but Distinct() will obviously return new values as it finds them, maintaining order.

In fact, just using:

HashSet<string> distinctItems = new HashSet<string>(list);

will be a pretty good (and simple) solution if you don't mind the ordering being messed up. It's simpler than using a Dictionary, and conceptually cleaner as well (as you don't really want to map keys to values).

(As ever, I would suggest finding the most readable solution first, and benchmark it - if it's "fast enough" then go with that. If you want to use this as part of another query, then Distinct may well be the most readable way. Otherwise, I'd suggest HashSet.)

Jon Skeet