views:

116

answers:

2

I'm using LINQPad to learn LINQ and I've run into a stumbling block.

The goal is to get a list of Network Ids, Network Names and how many Stations each has.

Here is my original SQL:

SELECT n.iStationId AS NetworkID, n.sPrettyName AS NetworkName, COUNT(s.iStationID) AS StationCount
FROM T_StationInfo AS s, T_StationInfo as n
WHERE s.iNetworkId = n.iStationId
GROUP BY n.sPrettyName, n.iStationId
ORDER BY COUNT(s.iStationID) DESC

Here is my LINQ:

from s in T_stationInfo
from n in T_stationInfo
where s.INetworkID == n.IStationID
group s by s.INetworkID into stations
orderby stations.Count(x => x.INetworkID == stations.Key) descending
select new {
    NetworkId = stations.Key,
    NetworkName = T_stationInfo.Single(x => x.IStationID == stations.Key).SPrettyName,
    StationCount = stations.Count(x => x.INetworkID == stations.Key)
};

LINQ takes 5 times longer to execute. I'm looking at the SQL that the linq statement generates and it pulls in the t_stationInfo table 7 times.

I believe this is because I am misusing LINQ but I don't see where or how.

What LINQ statement would create equivalent SQL or, at least, SQL that isn't so poor performing?

A couple notes:

  1. The structure of the table/database can not be changed.
  2. This question is more about learning to use LINQ than getting the list of ids, names, and counts.
  3. I do appreciate it! :)

--EDIT--

Just to clarify the structure: Each row in the table is an entity that has various information (name, contact, etc) and can have a parent. Those parents are also in the table. In this case parents can't have parents. Their parent field is NULL or 0.

So to get the Name of the Parent of a Station(called Network in the table), I pull the station info table in twice and join the parent id (network id) to the entity id (station id) so that on a single row I have the station's info and the parent's info. Hence the two froms of the same table.

Did that make sense?

--EDIT2--

This is the sql generated by the original LINQ query:

SELECT [t2].[iNetworkID] AS [NetworkId], (
    SELECT [t5].[sPrettyName]
    FROM [t_stationInfo] AS [t5]
    WHERE (CONVERT(Decimal(29,4),[t5].[iStationID])) = [t2].[iNetworkID]
    ) AS [NetworkName], (
    SELECT COUNT(*)
    FROM [t_stationInfo] AS [t6], [t_stationInfo] AS [t7]
    WHERE ([t6].[iNetworkID] = [t2].[iNetworkID]) AND ([t2].[iNetworkID] = [t6].[iNetworkID]) AND ([t6].[iNetworkID] = (CONVERT(Decimal(29,4),[t7].[iStationID])))
    ) AS [StationCount]
FROM (
    SELECT [t0].[iNetworkID]
    FROM [t_stationInfo] AS [t0], [t_stationInfo] AS [t1]
    WHERE [t0].[iNetworkID] = (CONVERT(Decimal(29,4),[t1].[iStationID]))
    GROUP BY [t0].[iNetworkID]
    ) AS [t2]
ORDER BY (
    SELECT COUNT(*)
    FROM [t_stationInfo] AS [t3], [t_stationInfo] AS [t4]
    WHERE ([t3].[iNetworkID] = [t2].[iNetworkID]) AND ([t2].[iNetworkID] = [t3].[iNetworkID]) AND ([t3].[iNetworkID] = (CONVERT(Decimal(29,4),[t4].[iStationID])))
    ) DESC
A: 

Thanks to Alexander Taran's comment, I reinvestigated the group syntax and finally understood how to group by more than one field. That led me to this:

from s in T_stationInfo
from n in T_stationInfo
where s.INetworkID == n.IStationID
group s by new { s.INetworkID, n.SPrettyName } into stations
orderby stations.Count() descending
select new {
    NetworkId = stations.Key.INetworkID,
    NetworkName = stations.Key.SPrettyName,
    StationCount = stations.Count()
};

That generates the following SQL:

SELECT [t2].[iNetworkID] AS [NetworkId], [t2].[sPrettyName] AS [NetworkName], [t2].[value2] AS [StationCount]
FROM (
    SELECT COUNT(*) AS [value], COUNT(*) AS [value2], [t0].[iNetworkID], [t1].[sPrettyName]
    FROM [t_stationInfo] AS [t0], [t_stationInfo] AS [t1]
    WHERE [t0].[iNetworkID] = (CONVERT(Decimal(29,4),[t1].[iStationID]))
    GROUP BY [t0].[iNetworkID], [t1].[sPrettyName]
    ) AS [t2]
ORDER BY [t2].[value] DESC

The outer select looks like simple renaming and reordering. If I move the ORDER BY to the inside select and strip off the outer select, it runs at the same speed as my hand made SQL.

As a bonus, the new query seems much easier to understand.

-- EDIT --

In accordance with Justin Rusbatch's answer I pulled stations.Count() out into a let and it did have a small performace increase (about 10%). It also cleaned up the rendered sql somewhat.

As of now this is the best I have accomplished:

from station in T_stationInfo 
from network in T_stationInfo 
where station.INetworkID == network.IStationID 
group station by new { station.INetworkID, network.SPrettyName } into stations 
let stationCount = stations.Count()
orderby stationCount descending 
select new 
{ 
    NetworkId = stations.Key.INetworkID, 
    NetworkName = stations.Key.SPrettyName, 
    StationCount = stationCount
}; 

And that creates the following SQL:

SELECT [t2].[iNetworkID] AS [NetworkId], [t2].[sPrettyName] AS [NetworkName], [t2].[value] AS [StationCount]
FROM (
    SELECT COUNT(*) AS [value], [t0].[iNetworkID], [t1].[sPrettyName]
    FROM [t_stationInfo] AS [t0], [t_stationInfo] AS [t1]
    WHERE [t0].[iNetworkID] = (CONVERT(Decimal(29,4),[t1].[iStationID]))
    GROUP BY [t0].[iNetworkID], [t1].[sPrettyName]
    ) AS [t2]
ORDER BY [t2].[value] DESC

Which is a massive improvement over where I started.

Jere.Jones
+1  A: 

I don't how big of an impact this will have on your performance, if any. But when I look at your query I see one function declared twice:

stations.Count(s => s.INetworkID == stations.Key)

Does using a let clause improve performance at all?

from station in T_stationInfo
from network in T_stationInfo
where station.INetworkID == network.IStationID
group station by station.INetworkID into stations
let stationCount = stations.Count(x => x.INetworkID == stations.Key)
orderby stationCount descending
select new
{
    NetworkId = stations.Key,
    NetworkName = T_stationInfo.First(x => x.IStationID == stations.Key).sPrettyName,
    StationCount = stationCount
};

I feel like there should also be a better way to assign the NetworkName property, but I'm not sure.

Oh, and sorry for renaming the variables. I changed s to station and n to network to help me follow it a little better.

Justin R.