views:

1566

answers:

4

I recently asked a question about Self-Joins and I got a great answer.

The query is meant to find the ID, Start Date, and Price of Event2, Following Event1 by 1 Day.

The code WORKS fine. But I don't understand HOW.

Could someone explain as thoroughly as you can- what the different parts of the query are and what they do?

SELECT event2.id, event2.startdate, event2.price
FROM mm_eventlist_dates event1
JOIN mm_eventlist_dates event2 
ON event2.startdate = date_add(event1.enddate, INTERVAL 1 DAY)
WHERE event1.id=$id

I really appreciate your help, for whatever reason I'm having a really hard time wrapping my head around this.

+3  A: 

The way I'd try to understand this is to write out two lists on piece one labelled event1 and one event2. Then list a few records in each list (the lists will be indentical) now start at the WHERE in the description below.

We're taking data from two tables (OK the same table used twice, but thry ignore that for the moment)

FROM mm_eventlist_dates event1
JOIN mm_eventlist_dates event2

It probably helps to read the rest from the bottom up.

  WHERE event1.id=$id

So we want the record from event1 that has the specified record id. Presumably that's exactly one record. Now we figure out the day after that event ended.

 date_add(event1.enddate, INTERVAL 1 DAY)

Now that tells us the records from event2, they need to start on that date,

ON event2.startdate = date_add(event1.enddate, INTERVAL 1 DAY)

We now have two records identified, what fields do we want?

SELECT event2.id, event2.startdate, event2.price

Oh, just fields from the one whose start date we figured out.

djna
Thanks for your help with this.
Castgame
+3  A: 

When you create a join in a query you are literally doing that - joining 2 tables together. This can be 2 different or 2 of the same tables, it doesn't matter. When specifying a join, creating an alias for the table (a name that refers to it in the rest of the query) is useful if the tables are different, and essential if they are the same. Your query is taking table 1 (event1) which has the columns:

event1.id, event1.startdate, event1.price

and joining table 2 (event2):

event2.id, event2.startdate, event2.price

which leaves you with the result set:

event1.id, event1.startdate, event1.price, event2.id, event2.startdate, event2.price

The criteria for the join is specified as:

ON event2.startdate = date_add(event1.enddate, INTERVAL 1 DAY)

which is saying 'For each row in event1 join the row(s) in event2 that has a startdate of 1 day after the startdate in event1'

Then as you are only interested in the data for one event the where clause limits the result set.

WHERE event1.id=$id

Finally, as you don't need the information from event1 about the original event your select statement simply selects the event2 columns from the resultset:

SELECT event2.id, event2.startdate, event2.price
Macros
Thanks, your response was very helpful.
Castgame
+1  A: 

A self join works by referencing the same table twice using different selection criteria. Think of each reference to the table as a different "Virtual Table" created by filtering the original table. Generally, one of the tables is "filtered" using the WHERE clause and the second is "filtered" in the join clause. This is the most efficient way to do it, it is also possible to "filter" both in the join clause.

So we have two virtual tables based on data in the same underlying table and they are joined together as though they were two totally separate tables.

The crux of it is that you store data in one table that takes on slightly different meaning based on context.

    Consider a table of people, each with a unique id, and a column for father
    id   name    fatherID
    1    Joseph  [null]
    2    Greg     1

    SELECT child.name as childName, father.name as fatherName
        FROM people as child
        INNER JOIN people as father on (child.fatherID = father.id)  

    Would yield 1 row

    childName   fatherName
    Greg       Joseph
Scott
This is the simplest example I could ask for. Thank you very much.
Castgame
A: 

A join combines two tables based on a certain criteria. The criteria is what follows the ON clause.

If you join the a table with itself, it effectively is the same as creating a copy of the table, rename it and perform the join with that copy.

For example

Table foo         Table bar
+---+---+---+     +---+---+---+
| a | b | c |     | a | d | e |
+---+---+---+     +---+---+---+
| 1 | 2 | 3 |     | 1 | 0 | 0 |
+---+---+---+     +---+---+---+
| 1 | 3 | 4 |     | 2 | 9 | 3 |
+---+---+---+     +---+---+---+
| 1 | 3 | 5 |
+---+---+---+
| 2 | 4 | 6 |
+---+---+---+

If we do

select * from foo join bar on (foo.a = bar.a and foo.c > 4)

we end up with


foo join bar on (foo.a = bar.a and foo.c > 4)
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
| 1 | 3 | 5 | 0 | 0 |
+---+---+---+---+---+
| 2 | 4 | 6 | 9 | 3 |
+---+---+---+---+---+

Now,

SELECT event2.id, event2.startdate, event2.price 
FROM mm_eventlist_dates event1                     
JOIN mm_eventlist_dates event2 
ON event2.startdate = date_add(event1.enddate, INTERVAL 1 DAY)
WHERE event1.id=$id

that query follows the same principle, but with two instances of the table mm_eventlist_dates, one aliased as event1 and the other as event2. We can think of this like having two tables and perform the join just as in the real two tables scenario.

The join criteria in this case is that for table event2, the startdate matches the enddate plus one day of table event1.

The where clause restricts over what is the join performed, in this case it performs the join only over the rows of event1 table having the supplied id.

Vinko Vrsalovic