Is a table intrinsically sorted by it's primary key? If I have a table with the primary key on a BigInt identity column can I trust that queries will always return the data sorted by the key or do I explicitly need to add the "ORDER BY". The performance difference is significant.
Data is physically stored by clustered index, which is usually the primary key but doesn't have to be.
Data in SQL is not guaranteed to have order without an ORDER BY clause. You should always specify an ORDER BY clause when you need the data to be in a particular order. If the table is already sorted that way, the optimizer won't do any extra work, so there's no harm in having it there.
Without an ORDER BY clause, the RDBMS might return cached pages matching your query while it waits for records to be read in from disk. In that case, even if there is an index on the table, data might not come in in the index's order. (Note this is just an example - I don't know or even think that a real-world RDBMS will do this, but it's acceptable behaviour for an SQL implementation.)
EDIT
If you have a performance impact when sorting versus when not sorting, you're probably sorting on a column (or set of columns) that doesn't have an index (clustered or otherwise). Given that it's a time series, you might be sorting based on time, but the clustered index is on the primary bigint. SQL Server doesn't know that both increase the same way, so it has to resort everything.
If the time column and the primary key column are a related by order (one increases if and only if the other increases or stays the same), sort by the primary key instead. If they aren't related this way, move the clustered index from the primary key to whatever column(s) you're sorting by.
In SQL Server: no, by it's clustering key - which default to the primary key, but doesn't have to be the same.
The primary key's main function is to uniquely identify each row in the table - but it doesn't imply any (physical) sorting per se.
Not sure about the other database systems.
Marc
This may be implementation-specific, but MySQL seems to sort by the primary key by default. However, any time where you need a guarantee that rows will be ordered a certain way, you should add ORDER BY.
A table by default is not 'clustered' , i.e. organized by PK. You do have the option of specifying it as such. So the default is "HEAP" (in no particular order), and the option you are looking for is "CLUSTERED" (SQL Server, in Oracle its called IOT).
- A table can only have one CLUSTERED (makes sense)
- Use the PRIMARY KEY CLUSTERED syntax on the DDL
- Order by PK still needs to be issued on your SELECTS, the fact of it being clustered will cause the query to run faster, as the optimizer plan will know it does not need to do the sorting on a clustered index
The earlier poster is correct, SQL (and the theoretical basis of it) specifically defines a select as an unordered set/tuple.
SQL usually tries to stay in the logical-realm and not make assumptions about the physical organization / locations etc. of the data. The CLUSTERED option allows us to do that for practical real-life situations.
Almost everytime it will sort by the tables Identity. It does sort by the clustered index as and may not always be sorted by the identity, but I've never seen it not sorted by the identity id when selecting *. What's the reason behind not specifying an order by? I don't see why it causes a difference in performance.
Without an explicit ORDER BY, there is no default sort order. A very common question. As such, there is a canned answer:
Without ORDER BY, there is no default sort order.
Can you elaborate why "The performance difference is significant."?