views:

93

answers:

5

I have a doubt. Assume R and S are 2 relations with attributes A and B respectively . If I have a query

Select * 
From R, S
Where R.A = S.B 

Does this work like a double For Loop in say c or c++

For( i=0; i<n; i++)
    For( j=0; j<n; j++)
        if (i == j)
           //DO some work 
+1  A: 

Functionality-wise, yes. That's how it works. You can imagine it loops through all rows in both tables in a nested for loop and only selects the ones where the fields match.

Implementation-wise the situation is very different. Database engines use many kinds of optimization to speed up queries. How the database engine actually executes the query depends on many factors, such as the type of database engine, (very importantly) indexes, the amount of data, etc.

Matti Virkkunen
+2  A: 

First of all: there is no knowing how mysql will internally optimize the query (without knowing the internals of mysql).

In pure relational databases words, this is what you are doing:

SELECT * FROM R, S -> perform cross join, that generates all (r,s) tuples.

WHERE R.A = S.B -> now select those tuples that have this behaviour

So it will go over all tuples (more or less like your code). However, it is perfectly possible mysql will internally reduce this to a more efficient inner join that never creates all tuples but only the tuples where R.A=S.B is valid.

KillianDS
I should hope there *is* a way to know how MySQL will optimize the query. It's deterministic, isn't it? So someone knowledgeable about MySQL's internals should be able to describe what MySQL does. And it's open source, right? Anyone sufficiently dedicated should be able to study the program and *learn* what MySQL does.
Rob Kennedy
Okay yes, if you know the implementation you can know that :).
KillianDS
@Rob Kennedy: There absolutely is a way to know what MySQL will do -- EXPLAIN SELECT. Just by looking at the query, and knowing the indices, you can try to guess what MySQL will do, but you can still get it wrong. MySQL will actually look at the distribution of data inside the tables, as well as the query itself, and can make different decisions at different times, as the data changes.
Ian Clelland
+2  A: 

Yes, at least conceptually. The join creates the Cartesian cross of the elements in the two tables, which is what you are doing with your two loops, and then the Where clause restricts to those members of the Cartesian cross for which the condition is true. Of course the implementation won't actually create the entire Cartesian cross; it will use indexes to identify the matches without going through all the pairwise comparisons.

Dan Menes
+2  A: 

If there are no indexes on either of those attributes, then that is exactly what MySQL will have to do, and it can be very inefficient.

Having indexes makes all the difference in the world, though. If there is an index on S.B, for instance, then MySQL can do something more like this:

for (i=0; i<n_r; i++) { // loop over all rows in R
    matching_rows = retrieve_from_index_s_b(i); // very fast operation, like direct array access
    for (j=0; j<matching_rows.length(); j++)
        // do some work 
}

Similarly, if the index is on R.A instead, then the outer loop will be on rows in S, and the inner loop will be just on matching rows in R.

If there are indexes on both attributes, then MySQL can look at the amount of data in each table, and organize the loops so that the least amount of work is required. This is the job of the MySQL query optimizer, and it can do quite a bit of work to determine the proper order to look at tables, to minimize the number of disk accesses required.

As other people have mentioned already, though, SQL is primarily a declarative language, where you just say what results you want, without specifying how the database goes about getting those results. You can imagine that the database is always doing the full set of nested loops, if that helps you visualize the results, but as long as you have indexes set up correctly, it will usually be doing something smarter.

Ian Clelland
+1 for tweaking his example to show impact of using indexes.
MJB
+1  A: 

What you are describing is a nested loops join strategy. The optimiser might choose this or another join strategy (options available will depend on RDBMS here is a summary of some common join algorithms).

Which will be chosen will depend on a variety of issues including JOIN condition (e.g. some will only work for equijoins), whether the data is already sorted, amount of memory available, size of tables, availability of indexes etc.

Martin Smith