tags:

views:

222

answers:

6

How can i employ Linq to select Top value from each group

when i have a code segment like :

var teams = new Team[]
 { 
  new Team{PlayerName="Ricky",TeamName="Australia", PlayerScore=234},
  new Team{PlayerName="Hussy",TeamName="Australia", PlayerScore=134},
  new Team{PlayerName="Clark",TeamName="Australia", PlayerScore=334},

  new Team{PlayerName="Sankakara",TeamName="SriLanka", PlayerScore=34},
  new Team{PlayerName="Udana",TeamName="SriLanka", PlayerScore=56},
  new Team{PlayerName="Jayasurya",TeamName="SriLanka", PlayerScore=433},

 new Team{PlayerName="Flintop",TeamName="England", PlayerScore=111},
 new Team{PlayerName="Hamirson",TeamName="England", PlayerScore=13},
 new Team{PlayerName="Colingwood",TeamName="England", PlayerScore=421}
 };

Desired Result :


Team Name         Player Name     Score

Srilanka          Jayasurya        433

England           colingwood       421

Australia         Clark            334 
+4  A: 

The following code gets the desired value:

foreach (Team team in teams
    .GroupBy(t => t.TeamName)
    .Select(ig => ig.MaxValue(t => t.PlayerScore)))
{
    Console.WriteLine(team.TeamName + " " + 
        team.PlayerName + " " + 
        team.PlayerScore);
}

It requires the following extension that I wrote earlier today:

public static T MaxValue<T>(this IEnumerable<T> e, Func<T, int> f)
{
    if (e == null) throw new ArgumentException();
    var en = e.GetEnumerator();
    if (!en.MoveNext()) throw new ArgumentException();
    int max = f(en.Current);
    T maxValue = en.Current;
    int possible = int.MaxValue;
    while (en.MoveNext())
    {
        possible = f(en.Current);
        if (max < possible)
        {
            max = possible;
            maxValue = en.Current;
        }
    }
    return maxValue;
}

The following gets the answer without the extension, but is slightly slower:

foreach (Team team in teams
    .GroupBy(t => t.TeamName)
    .Select(ig => ig.OrderByDescending(t => t.PlayerScore).First()))
{
    Console.WriteLine(team.TeamName + " " + 
        team.PlayerName + " " + 
        team.PlayerScore);
}
Yuriy Faktorovich
Thank you very much Yuriy for showing different approach
A: 

I would suggest you first implement an extension method on the IEnumerbale class called Top For example:

IEnumerable<T,T1> Top(this IEnumerable<T> target, Func<T1> keySelector, int topCount)
{
    return target.OrderBy(i => keySelector(i)).Take(topCount);
}

Then you can write:

teams.GroupBy(team => team.TeamName).Top(team => team.PlayerScore, 1).

There might be some slight modifications to make it compile.

Vitaliy
+6  A: 

This will require you to group by team name then select the max score.

The only tricky part is getting the corresponding player, but its not too bad. Just select the player with the max score. Of coarse, if its possible for more than one player to have identical scores do this using the First() function as shown below rather than the Single() function.

var x =
    from t in teams
    group t by t.TeamName into groupedT
    select new
    {
     TeamName = groupedT.Key,
     MaxScore = groupedT.Max(gt => gt.PlayerScore),
     MaxPlayer = groupedT.First(gt2 => gt2.PlayerScore == 
                    groupedT.Max(gt => gt.PlayerScore)).PlayerName
    };

FYI - I did run this code against your data and it worked (after I fixed that one, little data mistake).

Michael La Voie
Thank you very much
A: 

You can do it without any extension methods, LINQ has a built-in Max:

teams.GroupBy(t => t.TeamName).Select(t => new { t.Key, Score = t.Max(p => p.PlayerScore) });

Results:

Key Score     
Australia 433
SriLanka 56
England 421
Slace
That doesn't give the player name.
Jon Skeet
Thank you very much
@Jon - yeah i noticed after I posted it. Couldn't be bothered taking it down though :P
Slace
+2  A: 

My answer is similar to Yuriy's, but using MaxBy from MoreLINQ, which doesn't require the comparison to be done by ints:

var query = from player in players
            group player by player.TeamName into team
            select team.MaxBy(p => p.PlayerScore);

foreach (Player player in query)
{
    Console.WriteLine("{0}: {1} ({2})",
        player.TeamName,
        player.PlayerName,
        player.PlayerScore);
}

Note that I've changed the type name from "Team" to "Player" as I believe it makes more sense - you don't start off with a collection of teams, you start off with a collection of players.

Jon Skeet
well said jon players would be the appropriate name.:)
More link is a separate assembly ?
i fear you have to change "player.TeamName" instead "team.TeamName", is won't it?
Thanks, yes - fixed. And yes, MoreLINQ is a separate assembly - it's an open source project with useful LINQ operators. See the linked page.
Jon Skeet
Wow, I had never heard of MaxBy(). Thanks!
Michael La Voie
Wow ! There is no doubts that MoreLink will solve even more complex problem.Thanks for the link.
A: 

The implementation proposed by The Lame Duck is great, but requires two O(n) passes over the grouped set to figure out the Max. It would benefit from calculating MaxScore once and then reusing. This is where SelectMany (the let keyword in C#) comes in handy. Here is the optimized query:

var x = from t in teams 
        group t by t.TeamName into groupedT 
        let maxScore = groupedT.Max(gt => gt.PlayerScore)
        select new 
        { 
           TeamName = groupedT.Key,
           MaxScore = maxScore, 
           MaxPlayer = groupedT.First(gt2 => gt2.PlayerScore == maxScore).PlayerName 
        };
Drew Marsh