views:

241

answers:

3

Hello,

I was wondering how sql server sorts it's data. I noticed that if I have a table that doesn't contain the column "Id" and you select data without "ORDER BY" sql server doesn't automatically sort on the primary column.

Does anyone know what rule sql server follows to sort it's data?

Regards, M.

A: 

When SQL Server has no index, the SQL engine needs to visit every row in a table. In database terminology we call this behavior a table scan, or just scan.

I expect in this case it will just read through the table and output the data in whatever order the data was entered into the table.

kevchadders
No, even without indexes or an ORDER BY, there is no reason to assume that a scan will always scan the rows in the physical/chronological order of insertion. This may be a valid assumption with file-based databases like Access, but not with SQL Server.
Aaron Bertrand
interesting i didnt know this and assumed it would just read the data the ways its stored in the table... thanks.
kevchadders
+1  A: 

If you don't specify an ORDER BY clause explicitly, there is no guaranteed order that the results will be sorted in. It isn't even guaranteed to be based on the clustered index.

You can see an example of this in this article.

AdaTheDev
+3  A: 

Although it's good to wonder about how it might be explained that you often see the same order, I'd like to point out that it never a good idea to rely on implicit order caused by the particular implementation of the underlying database engine. In other words, its nice to know why, but you should never ever rely on it. For MS SQL, the only thing that reliably delivers the rows in a certain order, is an explicit ORDER BY clause.

Not only do different RDMBS-es behave differently, one particular instance may behave differently due to an update (patch). Not only that, even the state of the RDBMS software may have an impact: a "warm" database behaves differently than a "cold" one, a small table behaves differently than a large one.

Even if you have background information about the implementation (ex: "there is a clustered index, thus it is likely the data will be returned by order of the clustered index"), there is always a possibility that there is another mechanism you don't know about that causes the rows to be returned in a different order (ex1: "if another session just did a full table scan with an explicit ORDER BY the resultset may have been cached; a subsequent full scan will attempt to return the rows from the cache"; ex2: "a GROUP BY may be implemented by sorting the data, thus impacting the order the rows are returned"; ex3: "If the selected columns are all in a secondary index that is already cached in memory, the engine may scan the secondary index instead of the table, most likely returning the rows by order of the secondary index").

Here's a very simple test that illustrates some of my points.

First, startup SQL server (I'm using 2008). Create this table:

create table test_order (
    id int not null identity(1,1) primary key
,   name varchar(10) not null 
)

Examine the table and witness that a clusted index was created to support the primary key on the id column. For example, in sql server management studio, you can use the tree view and navigate to the indexes folder beneath your table. There you should see one index, with a name like: PK__test_ord__3213E83F03317E3D (Clustered)

Insert the first row with this statement:

insert into test_order(name)
select RAND()

Insert more rows by repeating this statement 16 times:

insert into test_order(name)
select RAND()
from   test_order

You should now have 65536 rows:

select COUNT(*) 
from   test_order

Now, select all rows without using an order by:

select *
from   test_order

Most likely, the results will be returned by order of the primary key (although there is no guarantee). Here's the result I got (which is indeed by order of primary key):

#      id    name
1      1     0.605831
2      2     0.517251
3      3     0.52326
.      .     .......
65536  65536 0.902214

(the # is not a column but the ordinal position of the row in the result)

Now, create a secondary index on the name column:

create index idx_name on test_order(name)

Select all rows, but retrieve only the name column:

select name
from   test_order

Most likely the results will be returned by order of the secondary index idx_name, since the query can be resolved by only scanning the index (i.o.w. idx_name is a covering index). Here's the result I got, which is indeed by order of name.

#      name
1      0.0185732
2      0.0185732
.      .........
65536  0.981894

Now, select all columns and all rows again:

select * 
from test_order

Here's the result I got:

#      id    name
1      17    0.0185732
2      18    0.0185732
3      19    0.0185732
...    ..    .........

as you can see, quite different from the first time we ran this query. (It looks like the rows are ordered by the secondary index, but I don't have an explanation why that should be so).

Anyway, the bottom line is - don't rely on implicit order. You can think of explanations why a particular order can be observed, but even then you can't always predict it (like in the latter case) without having intimate knowledge of implementation and runtime state.

Roland Bouman
thank you for the well explained example.you are right, we do use order by but I was just wondering why sql behaves like that.
Sem Dendoncker

related questions