views:

586

answers:

5

In just about any formally structured set of information, you start reading either from the start towards the end, or occasionally from the end towards the beginning (street addresses, for example.) But in SQL, especially SELECT queries, in order to properly understand its meaning you have to start in the middle, at the FROM clause. This can make long queries very difficult to read, especially if it contains nested SELECT queries.

Usually in programming, when something doesn't seem to make any sense, there's a historical reason behind it. Starting with the SELECT instead of the FROM doesn't make sense. Does anyone know the reason it's done that way?

+5  A: 

It's designed to be English like. I think that's the primary reason.

As a side note, I remember the initial previews of LINQ were directly modeled after it (select ... from ...). This was changed in later previews to be more programming language like (so that the scope goes downwards). Anders Hejlsberg specifically mentioned this weird fact about SQL (which makes IntelliSense harder and doesn't match C# scope rules) as the reason they made this decision.

Anyhow, good or bad, it's what it is and it's too late to change anything.

Mehrdad Afshari
That's hardly an invention of LINQ, though. And I think the LINQ designers explicitly acknowledge the influence of XQuery, which I believe is the first "SQL-cousin" to put the clauses in the right order. LINQ just copied XQuery. After all, LINQ is not "SQL in C#" but rather "the Haskell Query Monad in C#, with syntax familiar to most programmers".
Jörg W Mittag
Yes. I was mentioning that Anders specified the weirdness of scope in SQL. It was mostly influenced by SQL in the beginning. It's fundamentally changed since that time.
Mehrdad Afshari
+8  A: 

I think the way in which a SQL statement is structured makes logical sense as far as English sentences are structured. Basically

I WANT THIS
FROM HERE
WHERE WHAT I WANT MEETS THESE CRITERIA

I don't think it makes much sense, In English at least, to say

FROM HERE
I WANT THIS
WHERE WHAT I WANT MEETS THESE CRITERIA
Russ Cam
It makes sense in a statement that simple. But when your query is 100 lines long, with 10 joins and two sub-selects, the "looks like English" approach stops making sense real quickly.
Mason Wheeler
On the other hand, it does make sense to say, "Go to the rack and get me all the hats with hatbands", which is exactly the order you want: "FROM rack SELECT hat WHERE has_hatband = 1". You could equally comfortably say, "Bring me all the hats from the rack that have hatbands", which is "SELECT hat FROM rack WHERE has_hatband = 1" --- oh, wait, that's regular SQL.
JasonFruit
@Mason - 100 lines of English can quickly become complicated too, and stops being a useful model for langugage design :) I think the goal was to lower the entrance barrier. It worked for me: when I first learned SELECT * FROM MyTable, I thought, "SQL is easy!". Oh, how naive I was...
RedFilter
+7  A: 

The SQL Wikipedia entry briefly describes some history:

During the 1970s, a group at IBM San Jose Research Laboratory developed the System R relational database management system, based on the model introduced by Edgar F. Codd in his influential paper, "A Relational Model of Data for Large Shared Data Banks". Donald D. Chamberlin and Raymond F. Boyce of IBM subsequently created the Structured English Query Language (SEQUEL) to manipulate and manage data stored in System R. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley aircraft company.

The original name explicitly mentioned English, explaining the syntax.

Digging a little deeper, we find the FLOW-MATIC programming language.

FLOW-MATIC, originally known as B-0 (Business Language version 0), is possibly the first English-like data processing language. It was invented and specified by Grace Hopper, and development of the commercial variant started at Remington Rand in 1955 for the UNIVAC I. By 1958, the compiler and its documentation were generally available and being used commercially.

FLOW-MATIC was the inspiration behind the Common Business Oriented Language, one of the oldest programming languages still in active use. Keeping with that spirit, SEQUEL was designed with English-like syntax (1970s is modern, compared with 1950s and 1960s).

In perspective, "modern" programming systems still access databases using the age old ideas behind

MULTIPLY PRICE BY QUANTITY GIVING COST.
gimel
good catch - I thought I remembered there being the word "English" in the original acronym, which goes some way to explain clause order in statements.
Russ Cam
Added historical perspective.
gimel
+7  A: 

I must disagree. SQL grammar is not inside-out.

From the very first look you can tell whether the query will SELECT, INSERT, UPDATE, or DELETE data (all the rest of SQL, e.g. DDL, omitted on purpose).


Back to your SELECT statement confusion: The aim of SQL is to be declarative. Which means you express WHAT you want and not HOW you want it. So it makes every sense to first state WHAT YOU WANT (list of attributes you're selecting) and then provide the DBMS with some additional info on where that should be looked up FROM.

Placing the WHERE clause at the end makes great sense too: Imagine a funnel, wide at the top, narrow at the bottom. By adding a WHERE clause towards the end of the statement, you are choking down the amount of resulting data. Applying restrictions to your query any place else than at the bottom would require the developer to turn their head around.


ORDER BY clause at the very end: once the data has gone through the funnel, sort it.

JOINS (JOIN criteria) really belong into the FROM clause.

GROUPING: basically running data through a funnel before it gets into another funnel.

SQL sytax is sweet. There's nothing inside out about it. Maybe that's why SQL is so popular even after so many decades. It's rather easy to grasp and to make sense out of. (Although I have once faced a 7-page (A4-size) SQL statement which took me quite a while to get my head around.)

Peter Perháč
Once? I face queries like that once or twice a week. The constant exposure to it is what led me to this question.
Mason Wheeler
+1  A: 

It's consistent with the rest of SQL's syntax of having every statement start with a verb (CREATE, DROP, UPDATE, etc.).

The major disadvantage of having the column list first is that it's inconvenient for auto-complete (as Hejlsberg has mentioned), but this wasn't a concern when the syntax was designed in the 1970s.

We could have had the best of both worlds with a syntax like SELECT FROM SomeTable: ColumnA, ColumnB, but it's too late to change it now.

Anyhow, SQL's SELECT statement order isn't unique. It exactly matches that of Python list comprehensions:

[(rec.a, rec.b) for rec in data where rec.a > 0]

dan04