Basically when do we need a self join on a table?
Here is an example.
A self join is commonly used when you have a table with dates and you want to compare one date to another within the same table, by extension, when you compare values of a field to a subset of values from the same field in the table. An article discussing this in the context of an Oracle database can be found here.
Whe you have child / parent relations in the table.
CREATE TABLE Users(
UserID INT,
FName VARCHAR(50),
LName VARCHAR(50),
ManagerID INT
)
SELECT u.UserID,
u.FName,
u.LName,
um.ManagerID,
um.FName ManagerFName,
um.LName ManagerLName
FROM Users u INNER JOIN
Users um ON u.ManagerID = um.UserID
Use a self-join to simplify nested SQL queries where the inner and outer queries reference the same table. Self joins are often used in subqueries.Sometimes Group By can be used to avoid self join.
This explains how subqueries and self-join relates http://www.firstsql.com/tutor3.htm#self
self joins are common in (arguably poorly-designed) tables that store more than one entity type, e.g. in a typical EAV (Entity-Attribute-Value) design, queries will tend to join to the same table many times.
In a system I'm currently working with, the majority of attributes aren't stored together in single rows at all; instead, the designers created a series of "property" tables, each with a structure like (ID, AttributeName, AttributeValue, StartDate, EndDate) - which means that typical queries will join these property tables once for each attribute needed. Not the most efficient design for batch processing and reporting, believe me!
Here are a few reasons off the top of my head:
- The table design is self-referencing.
- This is typical of hierarchical structures such as employee -> manager relationships.
- Double-entry transaction systems may store both legs of a transaction in the same table.
- Sometimes self joins are required purely as a result of the information required - having nothing to do with the actual design.
- Gap finding queries are a classic case: how-to-find-missing-data-rows-using-sql
- Finding all customers that bought both product x and product y: Here's a similar problem
- Another example where a self-join could be used as a possible solution: select-the-x-closest-ids
I just used it to store and display a hierarchical menu system for a CMS without going all drupally and storing everything in some sort of JSON string thingy.