views:

1161

answers:

12

In my last job we worked on a very database heavy application and I developed some formatting standards so that we would all write SQL with a common layout. We also developed coding standards but these are more platform specific so I'll not go into them here.

I'm interested to know what other people use for SQL formatting standards. Unlike most other coding environments as I haven't found much of a consensus online for them.

To cover the main query types:

select
    ST.ColumnName1,
    JT.ColumnName2,
    SJT.ColumnName3
from 
    SourceTable ST
inner join JoinTable JT
    on JT.SourceTableID = ST.SourceTableID
inner join SecondJoinTable SJT
    on ST.SourceTableID = SJT.SourceTableID
    and JT.Column3 = SJT.Column4
where
    ST.SourceTableID = X
    and JT.ColumnName3 = Y

There was some disagreement about line feeds after "select", "from" and "where". The intention on the select line is to allow other operators such as "top X" without altering the layout. Following on from that simply keeping a consistent line feed after the key query elements seemed to result in a good level of readability.

Dropping the linefeed after the "from" and "where" would be an understandable revision. However in queries such as the "update" below we see that the line feed after the "where" gives us good column alignment. Similarly a linefeed after "group by" or "order by" keeps our column layouts clear and easy to read.

update
    TargetTable
set
    ColumnName1 = @value,
    ColumnName2 = @value2
where
    Condition1 = @test

Finally an insert:

insert into TargetTable (
    ColumnName1,
    ColumnName2,
    ColumnName3
) values (
    @value1,
    @value2,
    @value3
)

For the most part these don't deviate that far from the way MS sql managements studio / query analyser write out SQL, however they do differ.

I look forward to seeing whether there is any consensus in the stackoverflow community on this topic. I'm constantly amazed how many developers can follow standard formatting for other languages and suddenly go so random when hitting SQL.

A: 

Not really answering your question, but SQLInform is a useful tool if you come across a lot of badly formatted SQL.

Galwegian
+4  A: 

Hi,

I am of the opinion that so long as you can read the source code easily, the formatting is secondary. So long as this objective is achieved then there are a number of good layout styles that can be adopted.

The only other aspect that is important to me is that whatever coding layout/style you choose to adopt in your shop, ensure that it is consistently used by all coders.

Just for your reference, here is how I would present the example you provided, just my layout preference. Of particular note, the ON clause is on the same line as the join, only the primary join condition is listed in the join (i.e the key match) and other conditions are moved to the where cluase.

select
    ST.ColumnName1,
    JT.ColumnName2,
    SJT.ColumnName3
from 
    SourceTable ST
inner join JoinTable JT on 
    JT.SourceTableID = ST.SourceTableID
inner join SecondJoinTable SJT on 
    ST.SourceTableID = SJT.SourceTableID
where
        ST.SourceTableID = X
    and JT.ColumnName3 = Y
    and JT.Column3 = SJT.Column4

One tip, get yourself a copy of SQL Refactor from RedGate. You can customise the tool to use your desired layout preferences, and then the coders in your shop can all use it to ensure the same coding standards are being adopted by everyone.

Cheers, John

John Sansom
+1  A: 

Sql Prompt Pro is also a great tool to format SQL, as you type and existing statements. You can by and large configure the tool to format the SQL according to predefined standards

Conrad
A: 

If I am making changes to already written T-SQL, then I follow the already used convention (if there is one).

If I am writing from scratch or there is no convention, then I tend to follow your convention given in the question, except I prefer to use capital letters for keywords (just a personal preference for readability).

I think with SQL formatting as with other code format conventions, the important point is to have a convention, not what that convention is (within the realms of common sense of course!)

Russ Cam
A: 

Yeah I can see the value of laying out your sql in some rigourously defined way, but surely the naming convention and your intent are far more important. Like 10 times more important.

Based on that my pet hates are tables prefixed by tbl, and stored procedures prefixed by sp - we know they're tables and SPs. Naming of DB objects is far more important than how many spaces there are

Just my $0.02 worths

MrTelly
A: 

Nice. As a Python programmer, here are my preferences:

Newlines after "select", "from" and "where" only when it is needed for readability.

When code code can be more compact and equally readable, I usually prefer the more compact form. Being able to fit more code in one screenful improves productivity.

select ST.ColumnName1, JT.ColumnName2, SJT.ColumnName3
from SourceTable ST
inner join JoinTable JT
    on JT.SourceTableID = ST.SourceTableID
inner join SecondJoinTable SJT
    on ST.SourceTableID = SJT.SourceTableID
    and JT.Column3 = SJT.Column4
where ST.SourceTableID = X and JT.ColumnName3 = Y

Ultimately, this will be a judgment call that will be made during code review.

For insert, I would place the parens differently:

insert into TargetTable (
    ColumnName1,
    ColumnName2,
    ColumnName3)
values (
    @value1,
    @value2,
    @value3)

The reasoning for this formatting is that if SQL used idendation for block structure (like Python), then the parenthesis would not be needed. So if indentation is used anyway, then parens should have the minimum effect on the layout. This is achieved by placing them at the end of the lines.

ddaa
I would prefer:insert into TargetTabel(ColumnName1, ColumnName2, ColumnName3This way I could remark out one of the columns without having to reformatinsert into TargetTable(ColumnName1--, ColumnName2, ColumnName3
Jeff O
A: 

I tend to use a layout similar to yours although I even go a few steps further, e.g.:-

select
        ST.ColumnName1
    ,   JT.ColumnName2
    ,   SJT.ColumnName3
from
                SourceTable     ST

    inner join  JoinTable       JT
        on  JT.SourceTableID    =   ST.SourceTableID

    inner join  SecondJoinTable SJT
        on  ST.SourceTableID    =   SJT.SourceTableID

where
        ST.SourceTableID    =   X
    and JT.ColumnName3      =   Y
    and JT.Column3          =   SJT.Column4

Perhaps it looks a little over the top at first but IMHO the use of tabulation in this way gives the cleanest, most systematic layout given the declarative nature of SQL.

You'll probably end up with all sorts of answers here, in the end it's down to personal or team-agreed preference .

AdamRalph
I prefer to have the comma at the beginning. It makes it easier to remark out the field --,ST.ColumnName1. The same is true in the Where clause with ands and ors.
Jeff O
+1  A: 

I use a format similar to yours except that I put the "ON" keyword on the same line as the join and I put "AND" and "OR" operators at the end of lines so that all of my join/selection criteria line up nicely.

While my style is similar to John Sansom's I disagree about putting join criteria in the WHERE clause. I think that it should be with the joined table so that it's organized and easy to find.

I also tend to put parentheses on new lines, aligned with the line above it and then indenting on the next line, although for short statements I may just keep the parentheses on the original line. For example:

SELECT
     my_column
FROM
     My_Table
WHERE
     my_id IN
     (
          SELECT
               my_id
          FROM
               Some_Other_Table
          WHERE
               some_other_column IN (1, 4, 7)
     )

For CASE statements I give a new line and indentation for each WHEN and ELSE and I align the END back to the CASE:

CASE
     WHEN my_column = 1 THEN 'one'
     WHEN my_column = 2 THEN 'two'
     WHEN my_column = 3 THEN 'three'
     WHEN my_column = 4 THEN 'four'
     ELSE 'who knows'
END
Tom H.
+1  A: 

I'm late to the party, but I'll just add my preferred formatting style, which I must've learned from books and manuals: it's compact. Here's the sample SELECT statement:

SELECT  st.column_name_1, jt.column_name_2,
        sjt.column_name_3
FROM    source_table AS st
        INNER JOIN join_table AS jt USING (source_table_id)
        INNER JOIN second_join_table AS sjt ON st.source_table_id = sjt.source_table_id
                AND jt.column_3 = sjt.column_4
WHERE   st.source_table_id = X
AND     jt.column_name_3 = Y

In short: 8-space indentation, keywords in caps (although SO colours them better when in lowercase), no camelcase (pointless on Oracle), and line wraps when needed.

The UPDATE:

UPDATE  target_table
SET     column_name_1 = @value,
        column_name_2 = @value2
WHERE   condition_1 = @test

And the INSERT:

INSERT  INTO target_table (column_name_1, column_name_2,
                column_name_3)
VALUES  (@value1, @value2, @value3)

Now, let me be the first to admit that this style has it's problems. The 8-space indent means that ORDER BY and GROUP BY either misalign the indent, or split the word BY off by itself. It would also be more natural to indent the entire predicate of the WHERE clause, but I usually align following AND and OR operators at the left margin. Indenting after wrapped INNER JOIN lines is also somewhat arbitrary.

But for whatever reason, I still find it easier to read than the alternatives.

I'll finish with one of my more complex creations of late using this formatting style. Pretty much everything you'd encounter in a SELECT statement shows up in this one. (It's also been altered to disguise its origins, and I may have introduced errors in so doing.)

SELECT  term, student_id,
        CASE
            WHEN ((ft_credits > 0 AND credits >= ft_credits) OR (ft_hours_per_week > 3 AND hours_per_week >= ft_hours_per_week)) THEN 'F'
            ELSE 'P'
        END AS status
FROM    (
        SELECT  term, student_id,
                pm.credits AS ft_credits, pm.hours AS ft_hours_per_week,
                SUM(credits) AS credits, SUM(hours_per_week) AS hours_per_week
        FROM    (
                SELECT  e.term, e.student_id, NVL(o.credits, 0) credits,
                        CASE
                            WHEN NVL(o.weeks, 0) > 5 THEN (NVL(o.lect_hours, 0) + NVL(o.lab_hours, 0) + NVL(o.ext_hours, 0)) / NVL(o.weeks, 0)
                            ELSE 0
                        END AS hours_per_week
                FROM    enrollment AS e
                        INNER JOIN offering AS o USING (term, offering_id)
                        INNER JOIN program_enrollment AS pe ON e.student_id = pe.student_id AND e.term = pe.term AND e.offering_id = pe.offering_id
                WHERE   e.registration_code NOT IN ('A7', 'D0', 'WL')
                )
                INNER JOIN student_history AS sh USING (student_id)
                INNER JOIN program_major AS pm ON sh.major_code_1 = pm._major_code AND sh.division_code_1 = pm.division_code
        WHERE   sh.eff_term = (
                        SELECT  MAX(eff_term)
                        FROM    student_history AS shi
                        WHERE   sh.student_id = shi.student_id
                        AND     shi.eff_term <= term)
        GROUP   BY term, student_id, pm.credits, pm.hours
        )
ORDER   BY term, student_id

This abomination calculates whether a student is full-time or part-time in a given term. Regardless of the style, this one's hard to read.

yukondude
+2  A: 

I use Red Gate SQL ReFactor within SSMS, but another tool that does reformating (and is a replacement for SSMS) is Apex's SQL Edit. If you're looking to post code on-line there's The Simple-Talk SQL Prettifier.

K. Brian Kelley
+1 for the links
aberrant80
+1  A: 

I realise I am very late to this debate but I would like to give my thoughts. I am definitely in favour of commas at the start of the line. Like you say AdamRalph it's easier to comment out a field and I also find it is more difficult to miss out a comma accidently when they are at the beginning, whilst this doesn't sound like a major issue I have spent hours in the past trying to track down accidental syntax errors in lengthy T-SQL procedures where I have accidently missed out a comma at the end of the line (I'm sure some of you have probably done this as well). I'm also in favour of aliasing as much as possible.

Overall though I realise it's all down to personal preference, what works for some doesn't for others. As long as you can read the code easily and each developer shows some consistency in their style throughout I think that's most important.

A: 

I am working on writing an open source SQL Formatter (MS SQL only at this stage) in C#, so i put the above queries through it.

It employs a similar strategy to the OP, namely that each 'section' has child elements indented beneath it. Where required, I add white space between sections to aid clarity - these wouldn't be added when there are no joins or minimal where conditions

Result:

SELECT
    ST.ColumnName1,
    JT.ColumnName2,
    SJT.ColumnName3

FROM SourceTable ST

INNER JOIN JoinTable JT
        ON JT.SourceTableID = ST.SourceTableID

INNER JOIN SecondJoinTable SJT
        ON ST.SourceTableID = SJT.SourceTableID
       AND ST.SourceTable2ID = SJT.SourceTable2ID

WHERE ST.SourceTableID = X
  AND JT.ColumnName3 = Y
  AND JT.Column3 = SJT.Column4

ORDER BY
    ST.ColumnName1
Ben Laan