+1  A: 

Lets assume this is your db data:

column1 | column2 | column3
1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

First query

In the first example you will get all column combinations from the db (as GROUP BY 1,2,3 does nothing) including duplicates, so it will return:

1       | 2       | 1
1       | 2       | 2
1       | 2       | 1
3       | 1       | 2
1       | 2       | 2
1       | 2       | 2
1       | 2       | 2

2nd query

Second example takes unique values for column tuples so you will end with

1       | 2       | 1
1       | 2       | 2
3       | 1       | 2

3rd query

Last query takes all values from three columns and then it removes duplicates from that set. So you will get all values from any of the tables. In the end this will return

1
2
3

Does this makes it clear?

RaYell
In some dialects of SQL GROUP BY 1, 2, 3 means "Group by the first, second and third column" and thus would be equivalent to the second query
Mr. Shiny and New
Thanks. But what will happen if you do UNION on the same table?. Group by 1,2,3 means group by column1,column2, column3 in teradata sql. So the 1st and second query returns same.
Enjoy coding
Group by 1,2,3 is specific to some SQL dialects. Some may group by first, second and third column, others may ignore it. Union on the same table will combine the values from the three columns into one column that can be filtered later on. I think that those fields may have to be of the same/similar type. I'm not sure if you can make a union of INT and TEXT fields.
RaYell
+3  A: 

Starting with what I think is the simplest, DISTINCT, really is just that. It returns the distinct combinations of rows. Think of this dataset:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I
A         B         C   <- duplicate of row 1

This will return 3 rows because the 4th row in the dataset exactly matches the first row. Result:

COL1      COL2      COL3
A         B         C
D         E         F
G         H         I

The GROUP BY is frequently used for summaries and other calculations select COL1, SUM(COL2) from table group by column1;

For this dataset:

COL1      COL2
A         5
A         6
B         2
C         3
C         4
C         5

would return

COL1     SUM(COL2)
A        11
B        2
C        12

a UNION just takes results from different queries and presents them as 1 result set:

Table1
COL1
A

Table2
COLX
B

Table3
WHATEVER_COLUMN_NAME
Giddyup

select COL1 from Table1
UNION
select COLX from Table2
UNION 
select WHATEVER_COLUMN_NAME from Table3;

Result Set:

A
B
Giddyup

When performing a union, the column datatypes must match up. You can't UNION a number column with a char column (unless you explicitly perform a data conversion)

Brett McCann
Thanks. Actually I am confused about the UNION on same table two times as in the table. How will the result set come when we do UNION on same table?
Enjoy coding
In your example you were union'ing off of different columns. So you are in effect, taking row based data, and presenting it as column based. If you had a table with 4 text columns and 1 row of data, and you selected each column using unions, your result set would be 1 column with 4 rows of data.
Brett McCann
A: 

If you include "Actual Execution Plan" (control + M in MS SQL Management Studio), it will give you a diagram of how the SQL engine optimises each of your statements. Understanding this will help you write better queries.

Joshua
Well.. I asked in general sql. But my problem is in teradata sql.
Enjoy coding
+1  A: 

Lets go with a sample set of data

orderid    customer orderdate
1          B        July 29
2          A        Aug 1
3          A        Aug 4
4          C        Aug 5
5          B        Aug 6
6          A        Aug 11

Distinct basically returns a single instance of a given record with no duplicates of the entire set of columns in the result set. Ex: "select distinct customer from orders" would return "A", "B", "C" defaulted in alpha order of column(s) chosen.

Group by is to do aggregations within a given set of fields in a query. Ex:

select customer, count(*) as NumberOfOrders from Orders group by 1

Would result with...
A    3
B    2
C    1

You can also apply distinct (only once), within a query, but within a given group..

select customer, count(*) as NumberOfOrders, count( distinct {month of orderdate} ) as CustomerMonths from orders group by customer

Would result with
A    3    1  (all orders were in August)
B    2    2  (had orders in July and August)
C    1    1  (only one order in August)

Unions are queries that must be the exact same result format, column names and sequence of fields. Lets say you have an orders table that is the exact same structure as an archived version of data too. You only keep current data over the most current year, all historical is pushed to archive. If you wanted to get ALL order activity for a given customer in one query, you would want to do a union

select customerid, orderdate, amount from CurrentOrders where customerid = ?? order by 2 descending UNION select customerid, orderdate, amount from ArchivedOrders where customerid = ??

The ORDER by clause of the first select will drive the results all all subsequent records being pulled into the results. Its like SQL saying go to table one, get all that qualify, then sort. Then, go to table two, get all that qualify there and pull into the existing sorted list from table one. Final result is ALL records.

HTH

DRapp
Thank you very much. Your explanation of group by is fantastic. The last 3 paras of your answer was very much helpful. But What will happen when you do UNION on same table? Also "group by 1,2,3" and "distinct" will give the same results. Thanks
Enjoy coding
Don't know... never had an instance to need that... However, what you may do is a second select based on the result of the first using the same group by to "roll-up" common elements coming from the second instance table.
DRapp