views:

1663

answers:

8

My question is not how to use inner join in sql. I know about how it matches between table a and table b.

I'd like to ask how is the internal working of inner working. What algorithm it involves? What happens internally when joining multiple tables?

A: 

In this case you should see how to saved data in b-tree after it i think you will understand JOIN algorithm.

Galkin
+2  A: 

There are different algorithms, depending on the DB server, indexes and data order (clustered PK), whether calculated values are joined or not etc.

Have a look at a query plan, which most SQL systems can create for a query, it should give you an idea what it does.

Lucero
any article to begin with?
henry
What DB engine?
Lucero
preferably MySql
henry
A: 

It creates a Cartesian Product of the two tables and then selects the rows out of it. Read Korth book on Databases for the same.

Geek
I'm pretty sure thats not true, producing a cartesian product would be very inefficient
codeulike
I'm pretty sure that this is what is there in the book written by Korth. Modern Databases might not do that.
Geek
If you do an full outer join with a condition which cannot be computed before joining, this *may* be what happens. But for inner joins, that would not make much sense, since it yields way too many records.
Lucero
+1  A: 

In MS Sql, different join algorithms will be used in different situations depending on the tables (their size, what sort of indexes are available, etc). I imagine other DB engines also use a variety of algorithms.

The main types of join used by Ms Sql are:
- Nested loops joins
- Merge joins
- Hash joins

You can read more about them on this page: Msdn -Advanced Query Tuning Concepts

If you get SQL to display the 'execution plan' for your queries you will be able to see what type of join is being used in different situations.

codeulike
Can you throw light on what Algorithms ?
Geek
See the msdn link in my answer. That page links to three further 'understanding ...' pages that outline the basic algorithm that SQL follows in each case.
codeulike
+1  A: 

It depends on what database you're using, what you're joining (large/small, in sequence/random, indexed/non-indexed etc).

For example, SQL Server has several different join algorithms; loop joins, merge joins, hash joins. Which one is used is determined by the optimizer when it is working out an execution plan. Sometimes it makes a misjudgement and you can then force a specific join algorithm by using join hints.

You may find the following MSDN pages interesting:
http://msdn.microsoft.com/en-us/library/ms191318.aspx (loop)
http://msdn.microsoft.com/en-us/library/ms189313.aspx (hash)
http://msdn.microsoft.com/en-us/library/ms190967.aspx (merge)
http://msdn.microsoft.com/en-us/library/ms173815.aspx (hints)

KristoferA - Huagati.com
A: 

All based set theory, been around a while. Try not to link too many table at any one time, seems to conk out database resources with all the scanning. Indices help with performance, look at some sql sites and search on optimising sql queries to get some insight. SQL Management Studio has some inbuilt execution plan utility that's often interesting, especially for large complex queries.

A: 

The optimizer will (or should) choose the fastest join algo.

However there are two different kinds of determining what is fast:

  1. You measure the time that it takes to return all the joined rows.
  2. You measure the time that it takes to return the first joined rows.

If you want to return all the rows as fast as possible the optimizer will often choose a hash join or a merge join. If you want to return the first few rows as fast as possible the optimzer will choose a nested loops join.

tuinstoel
A: 

Some DB join algorithms are described: http://iusoltsev.wordpress.com/home/individual-sql-and-cbo/cbo-access-path/