tags:

views:

187

answers:

5

I have a class Agent with a property Id

Given a collection of Agents I need to check if any of them have duplicate Ids.

I am currently doing this with a hash table but am trying to get Linq-ified, what's a good way of doing this?

+1  A: 
foreach(var agent in Agents) {
    if(Agents.Count(a => a.ID == agent.ID) > 1)
        Console.WriteLine("Found: {0}", agent.ID);
}
Y Low
+1  A: 
bool b = list.Any(i => list.Any(j => j.ID == i.ID && j != i));

That's a bit of a brute-force approach but it works. There might be a smarter way to do it using the Except() extension method.

Edit: You didn't actually say that you needed to know which items are "duplicated", only that you needed to know whether any where. This'll do the same thing except give you a list you can iterate over:

list.Where(i => list.Any(j => j.ID == i.ID && j != i))

I like the grouping approach too (group by ID and find the groups with count > 1).

Matt Hamilton
+5  A: 

Similar to Y Low's approach,

Edited:

 var duplicates = agents.GroupBy(a => a.ID).Where(a=>a.Count() > 1);

 foreach (var agent in duplicates)
 {
         Console.WriteLine(agent.Key.ToString());
 }
Codewerks
Hmm, GroupBy, interesting. Wouldn't this work then:bool b = (agents.GroupBy(a=>a.Id)).Count() == agents.Count();
George Mauer
Just posted an update, just combine the two (GroupBy and Where) to get the key of the duplicate object...
Codewerks
Ha - comparing the number of groups to the number of elements. That's an inspired idea.
Matt Hamilton
I'm wondering whether grouping would be a bit less efficient than the Any() method, though, since Any() gives up as soon as it finds a match, whereas grouping has to visit every element.
Matt Hamilton
Matt...we'd have to test that I guess. I just approached it like a SQL problem. I'm a db guy, so I love Linq for it's Sql-like approach to problems like this...
Codewerks
Yeah from an SQL perspective it's GROUP BY vs WHERE NOT EXISTS.
Matt Hamilton
Glad you like it Matt. You're probably right about efficiency but I find this more readable.
George Mauer
As an aside, the Parallel Extensions to .NET are going to make stuff like this very interesting (and fast)...!
Codewerks
A: 

My take (no counting!):

var duplicates = agents
  .GroupBy(a => a.ID)
  .Where(g => g.Skip(1).Any());
David B
+2  A: 

For what it's worth, I just compared the two methods we've struck upon in this thread. First I defined a helper class:

public class Foo
{
    public int ID;
}

... and then made a big list of instances with a random ID:

var list = new List<Foo>();

var r = new Random();

for (int i = 0; i < 10000; i++) list.Add(new Foo { ID = r.Next() });

... and lastly, timed the code:

var sw = new Stopwatch();
sw.Start();
bool b = list.Any(i => list.Where(j => i != j).Any(j => j.ID == i.ID));
Console.WriteLine(b);
Console.WriteLine(sw.ElapsedTicks);

sw.Reset();
sw.Start();
b = (list.GroupBy(i => i.ID).Count() != list.Count);
Console.WriteLine(b);
Console.WriteLine(sw.ElapsedTicks);

Here's one output:

False

59392129

False

168151

So I think it's safe to say that grouping and then comparing the count of groups to the count of items is way, way faster than doing a brute-force "nested Any" comparison.

Matt Hamilton