views:

551

answers:

11

Which SQL statement is faster?

SELECT TOP 2 c1.Price, c2.Price, ..... c49.Price, c50.Price
FROM Table1 AS c1, Table2 AS c2, ..... Table49 AS c49, Table50 AS c50
WHERE c1.Date = c2.Date AND c2.Date = c3.Date ..... c49.Date = c50.Date
ORDER BY c1.ID DESC

  OR

SELECT TOP 2 c1.Price, c2.Price, ..... c49.Price, c50.Price
FROM (Table1 AS c1 
 INNER JOIN (Table2 AS c2 
    ........
  INNER JOIN (Table49 AS c49
   INNER JOIN Table50 AS c50
  ON c49.Date = c50.Date)
    ........
 ON c2.FullDate__ = c3.FullDate__)
ON c1.FullDate__ = c2.FullDate__)
ORDER BY c1.ID DESC";

Basically I need to extract 2 rows from each table to produce a summary periodically. Which statement is faster?

+5  A: 

WHERE would usually be better but the best way is case by case and throw this into profiler, or simpler yet display execution plan. Folk often have very strong opinions on which approach is fastest/best in theory but there is no replacement for actually tuning according to the data you actually deal with as theories applicable change depending on your data load.

If you do not have real data in your app yet, try and create some realistic stress data. This will continue to be useful for testing. Then schedule time to tune once the application is live.

dove
A: 

Ignore JOINs whenever possible. Performance wise, Join statements are not efficient at all.

mnour
The where clause also does a "join", but then without using the join keyword. Avoiding joins for performance reasons is done in the database model designing phase, by not denormalizing specific parts of your model.
Ruben
This answer is 100% wrong. With the right indexes JOINs are very efficient.
Mitch Wheat
I have never heard of a DBMS that has inefficient joins (when indexed).
Giovanni Galbo
And there's the issue of readability.
Romulo A. Ceccon
To be fair I think this was true in some old versions of MySQL - before 3?. Of course there are many of us who would content that this bit of software wasn't really a database in any meaningful sense anyway :-). It's completely untrue of any modern RDBMS
Cruachan
No it's not 100% wrong! By default joins are not efficient. This is a fact. However, if you have two or more tables that are frequently joined together, then the columns used for the joins on all tables should have an appropriate index.
mnour
Read this: http://www.sql-server-performance.com/tips/tuning_joins_p1.aspxThe link show that you need to make extra stuff to be increase the performance of the join statements.
mnour
Did you notice this sentence in the later link: "This comes back to the original statement, that the number of rows in a table can affect JOIN performance."
mnour
Performance-wise, your programmers will have a painful time keeping everything running smoothly if your database schema sucks. If your schema is good and your programming smooth but you need more performance, then get more indexes or more hardware.
Justice
mnour, when the page says "joins are not efficient" they refer to table joining in general. The question here refers to the syntax used for joining (SQL-92 joins vs. SQL-86 joins). Just because you match the rows in the WHERE clause doesn't mean you aren't doing a JOIN.
Romulo A. Ceccon
+3  A: 

You will probably find that the SQL optimising engine will generate the same internal query (if the logic is the same), and as a result there will be no difference.

As mentioned by others, run this through a profiler (such as Query analyser) to determine the difference (if there is one).

Ady
+8  A: 

What is faster is not having 50 tables to start with. Joining 50 tables might be ok, but it's a highly counter-intuitive design and probably not the most maintainable solution.

Can you not store your data in rows (or columns) of a single (or fewer) tables rather than 50 tables??!

MarkR
yeah, interesting observation from the table names.
duckworth
Almost certainly weeks in a year I guess, and probably being harvested from elsewhere. Certainly a consolidation script before he hits the point would seem sensible.
Cruachan
A: 

Normally the database optimises both statements, so the difference would not be that big. But you can verify this by compairing the explain plan for both queries.

One thing that could possibly optimize the query with joins (I have not verified this) is to have extra constraints (non join constraints) in the join statement. Although this is not a recommended style since it does not clearly separate the join conditions and other conditions.

For example:

select *     
   from A a 
       join B b on b.x = a.y    
       where b.z = 'ok';

can be written as

select * 
   from A a 
       join B b on b.x = a.y and b.z = 'ok';
Ruben
I would say that the optimzer/execution plan can have a huge effect. if one results in a table scan and the other does not its a big deal. I really cant imagine in the general case that the optmizer would turn those statements into equivelant execution paths.
MikeJ
A: 

If you attach a screen shot of your Query Plans and a Profiler trace I will be happy to let you know which is faster. There really isn't enough information to answer the question otherwise.

My gut feeling is that both have very similar performance in SQL Server and that SQL server optimises both to use the same Query plan, but who knows, its possible the a fifty table join makes the optimiser go a little crazy.

I will in general stick to the JOIN semantics cause I find it easier to read and to maintain. The cross joining is very prone to errors and extremely uncommon.

Sam Saffron
A: 

Thanks for answers guys.

I do not have access to Query Analyser since I am currently moving that database from MS Access, where I was doing a quick prototype, to MySQL. I believe that Query Analyser is only available on SQL Server but I may be wrong, so I can't attach my Profiler trace.

Each table is distinct (i.e. the values in it are unique even though the column names may be the same) and is used separately to generate other objects but I need to occasionally run a summary that collects rows from each table. So, I believe I need 50 tables although I haven't fleshed out the whole scheme of things and will therefore look into it. (p.s. I am new to databases and SQL but not new to programming). I also need to consider the ramifications on memory size if I was to put all info into one table when only small section of it will be used a time.

However, from what I have gathered, the difference should not be that major since the 2 statements will probably be compiled to the same internal query. I asked the question wanting to know if the internals will be different. Will run tests on actual data to find out.

By the way, will the performance on the 2 statements matter if we though into the equation concurrent queries by multiple users?

A: 

You don't specify the expected volume of your tables, but be aware that if the queries do optimize to different query plans then what is the fastest with 100 rows in your table may not be the same as when you have 100,000 rows or more.

In fact there is usually little to be gained from obsessively optimizing for queries using tables less than 10,000 records so long as you have reasonably sensibly designed indexes and queries. However somewhere around 100,000 records performance of badly optimized queries will start to degrade, generally catastrophically. The exact figure depends up the row size and the amount of memory you have in the server but it's not unusual to see performance degrade by an order of magnitude or more for a doubling of table size.

Generally then the best strategy not to spend time with minor queries on smaller tables, the effort can usually be spent more profitably elsewhere. However aggressively optimize any queries which are are running against you main tables if these are expected to grow over 10,000 rows. Usually this will mean using a QA instance and loading with 10 times the expected volume to check the actual behavior.

Cruachan
+1  A: 

All the talk about having fewer tables got me thinking (Thanks MarkR). I have been going through the MySQL documentation for the past couple of hours and realized that a better solution would be to create a new summary table that would hold the initial results. Thereafter, I would create a trigger that would update the new table whenever an insert happens on one of the tables that is always touched.

Another idea I thought of is creating a view of the query. However it seems that MySQL runs the underlying query to a view everytime it is called. Am I right? Is there a way to make MySQL store the resultant table of a pre-executed view and then use a trigger to tell the view when to update the table? Is there any RDBMS that does this?

SQL Server has the concept of an indexed view, I'm not sure if MySQL has an equivalent.
Ady
A: 

Optimizing the join order takes exponential time. Each database engine simply picks a small number of possible join orders and estimates the best one out of those.

It seems like you will always want to join ... on c*1*.Date = c*n*.Date for all n.

You will also want to get rid of the extremely odd database schema you have.

Justice
A: 

What happened when you tried?

I mean seriously, Query Analyzer has a little timer for a reason. Different query structures sometimes give wildly different execution times, often with no intuitive reason behind it.

Write both queries. Test them. Then come back and answer your own question.

Jason Kester