views:

287

answers:

4

I have a basically sql select question that people gave me different answers over the years. Say I have a couple of tables designed each with over 40 columns and potentially will hold ten and thousands of row, I'm using SqlServer2005.

On joining these tables, in the where clause if I have things like

select * from t1, t2
where t1.UserID = 5 
and t1.SomeID = t2.SomeOtherID

some people say you should alwasys have the constant "t1.UserID = 5" up front rather than after the "t1.SomeID = t2.SomeOtherID", it boosts the select performance. While others say it doesn't matter.

What is the correct answer?

Also, if I use ADO.NET Entity Framework to implement my DAL, will modeling tables that have over 40 columns and doing CRUD operations be a performance issue to it?

Thank you,

Ray.

+4  A: 

In general, with database optimization, you should write SQL which is conceptually correct first, and then tweak performance if profiling shows it to be necessary. When doing an inner join, it is better to use SQL-92, explicit INNER JOINs than Cartesian products. So I would begin by writing your SQL as follows:

SELECT * 
FROM t1
  INNER JOIN t2
    ON t1.SomeID = t2.SomeOtherID
WHERE
  t1.UserID = 5

The t1.SomeID = t2.SomeOtherID that goes in the ON part of the INNER JOIN, because it expresses the relationship between the two tables. The UserID that goes in the WHERE clause because it is a filter to limit the result set. Writing your SQL in this way gives more information to the database optimizer, because it expresses your intentions about the join versus the filtering.

Now IF you are not getting acceptable performance with this syntax in a real-world database, then do feel free to experiment with moving bits around. But like I said, start with something which is conceptually correct.

With regards to the second part of your question, the most obvious performance implication is that when you select a collection of entities, the Entity Framework needs to bring back all properties for the entities it is materializing. So if you have 40 columns, then you will be pulling that data back over the wire, if you materialize them as entities. It is, however, possible to write LINQ queries which return anonymous types containing only the columns you need. However, to do full CRUD, you will need to return entities.

Craig Stuntz
+1  A: 

I know this answer is kind of trite, but I would suggest writing benchmarks. Whip up a console app and test it out yourself. Run the query a couple hundred times and see how long it takes for each way.

There is a lot of superstition when it comes to SQL query performance and optimization. Some people do things thinking it is faster but they don't actually check their facts. Also, the way EF or LinqToSql work and interact with the DB may introduce performance differences not evident in SQL.

If you're optimizing code you may also want to use a profiler like RedGate ANTS. Its not free, but it can help a lot to find bottlenecks in your code. Then you can find places in your code to optimize much easier. It's not always your database slowing your apps down. Or sometimes you're executing a fast query, but doing it a jillion times when you could actually be caching the result.

WillCodeForCoffee
+2  A: 

People's opinion on this will change over time because RDBMS query optimisation has evolved over time, and different RDBMSs will have different approaches. I can't speak for every syste out there but it's really unlikely that in 2008 this is going to make any difference. YMMV if you are interested only in a specific system.

I can tell you that for any recent version of Oracle it makes no difference.

David Aldridge
A: 

Firstly, construct the query using an explicit JOIN syntax, rather than the cartesian product. It probably won't make any difference performance-wise for any modern optimiser, but it does make the information on how the JOINs work more accessible for the programmers.


SELECT Player.Name, Game.Date
 FROM Player
  INNER JOIN Game ON Game.WinnerPlayerID = Player.PlayerID
 WHERE Game.WinnerFrags > Game.TotalFrags/2
 ORDER BY Player.Name

Which will give us all the players sorted by name who have take more frags in a game than all the other players in the game put together, and the dates of the games. Putting both the conditions are in the JOIN probably won't affect performance either, since the optimiser is likely do the filtering as part of the JOIN anyway. It does start to matter for LEFT JOINs though. Lets say we're looking for how many games the week's top ten players have ever won by the margin described above. Since it is possible that some of them have never one this spectacularly, we'll need LEFT JOIN.


SELECT Player.WeekRank, Player.Name, COUNT(Game.*) AS WhitewashCount
 FROM Player
  LEFT JOIN Game ON Game.WinnerPlayerID = Player.PlayerID
 WHERE Player.WeekRank >= 10
  AND Game.WinnerFrags > Game.TotalFrags/2
 GROUP BY Player.WeekRank, Player.Name
 ORDER BY Player.WeekRank

Well, not quite. The JOIN will return records for each game played by a player, or the player data and NULL game data if the player has played no games. These results will get filtered, during or after the JOIN depending on the optimiser's decision, based on the frag criteria. This will eliminate all the records that don't meet the frag criteria. So there will be no records to group for players who have never had such a spectacular win. Effectively creating an INNER JOIN .... FAIL.


SELECT Player.WeekRank, Player.Name, COUNT(Game.*) AS WhitewashCount
 FROM Player
  LEFT JOIN Game ON Game.WinnerPlayerID = Player.PlayerID
   AND Game.WinnerFrags > Game.TotalFrags/2
 WHERE Player.WeekRank >= 10
 GROUP BY Player.WeekRank, Player.Name
 ORDER BY Player.WeekRank

Once we move the frag criteria into the JOIN the query will behave correctly, returning records for all players in the week's top ten, irrespective of whether they've achieved a whitewash.

After all of that, the short answer is:

For INNER JOIN situations it probably doesn't make a to performance difference where you put the conditions. The queries are more readable if you separate the the join and filtering conditions though. And getting a condition in the wrong place can seriously mess up the results of a LEFT JOIN.

Bell