views:

774

answers:

12

I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother? Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?

+7  A: 

No, the order of the columns in a SQL database table is totally irrelevant - except for display / printing purposes. There's no point in reordering columns - most systems don't even provide a way to do that (except dropping the old table and recreating it with the new column order).

Marc

EDIT: from the Wikipedia entry on relational database, here's the relevant portion which to me clearly shows that column order should never be of concern:

A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of items, although some DBMSs impose an order to their data. In mathematics, a tuple has an order, and allows for duplication. E.F. Codd originally defined tuples using this mathematical definition. Later, it was one of E.F. Codd's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations. This insight is still being used today.

marc_s
I've seen column difference have a big impact with my own eyes, so I can't believe this is the right answer. Even though the voting puts it first. Hrm.
Andomar
What SQL environment would that be in?
marc_s
The biggest impact I've seen was on Sql Server 2000, where moving a foreign key forward sped up some queries by 2 to 3 times. Those queries had big table scans (1M+ rows) with a condition on the foreign key.
Andomar
The whole concept of SQL and RDBMS revolves around "tuples", which are - by definition - unordered bags of elements. So I would almost say if a particular RDBMS *depends* on a given order of columns, it's not a *REAL* RDBMS.
marc_s
RDBMS don't depend on table ordering *unless you care for performance*. Different implementations will have different performance penalties for the order of the columns. It could be huge or it could be tiny, it depends on the implementation. Tuples are theoretical, RDBMS are practical.
voyager
I agree - tuples are a theoretical concept. And I agree with other folks in this thread that having a good consistent layout makes sense. But I still think neither performance nor functionality of any decent RDBMS should really depends on the order of your columns. At least not in any serious RDBMS in the 21st century.
marc_s
@marc_s: Sure, there are databases that perform equally poor despite the column order :)
Quassnoi
+3  A: 

Readability of the output when you have to type:

select * from <table>

in your database management software?

It's a very spurious reason, but at the moment I can't think of anything else.

ChrisF
+4  A: 

Some badly-written applications might be dependent on column order / index instead of column name. They shouldn't be, but it does happen. Changing the order of the columns would break such applications.

Craig Walker
Application developers that make their code dependent on column order in a table DESERVE to have their applications broken. But the users of the application don't deserve the outage.
spencer7593
A: 

The only time you'll need to worry about column order is if your software specifically relies on that order. Typically this is due to the fact that the developer got lazy and did a select * and then referred to the columns by index rather than by name in their result.

Soviut
+21  A: 

Column order had a big performance impact on some of the databases I've tuned, spanning Sql Server, Oracle, and MySQL. This post has good rules of thumb:

  • Primary key columns first
  • Foreign key columns next.
  • Frequently searched columns next
  • Frequently updated columns later
  • Nullable columns last.
  • Least used nullable columns after more frequently used nullable columns

An example for difference in performance is an Index lookup. The database engine finds a row based on some conditions in the index, and gets back a row address. Now say you are looking for SomeValue, and it's in this table:

 SomeId int,
 SomeString varchar(100),
 SomeValue int

The engine has to guess where SomeValue starts, because SomeString has an unknown length. However, if you change the order to:

 SomeId int,
 SomeValue int,
 SomeString varchar(100)

Now the engine knows that SomeValue can be found 4 bytes after the start of the row. So column order can have a considerable performance impact.

EDIT: Sql Server 2005 stores fixed-length fields at the start of the row. And each row has a reference to the start of a varchar. This completely negates the effect I've listed above. So for recent databases, column order no longer has any impact.

Andomar
Every database engine I know of reserves 100 bytes for SomeString, even if it's null
James L
Wow, hadn't know this. Doesn't it have to fetch the entire block anyway, so you're not really saving any time in IO, just computation speed when calculating the offset.
Allain Lalonde
So this would have some impact on "select SomeValue from t"... (surely more to do with returning that value from lots of rows rather than about index lookups?) But how much impact?
araqnid
@TopBanana: Postgresql will not reserve 100 chars, but then with MVCC you don't rewrite the value if it gets updated so there wouldn't be value in doing it, I guess
araqnid
@TopBanana: not with varchars, that's what diffentiates them to normal char columns.
Allain Lalonde
How do you know the engine has to guess? It could simply reorder the columns on the disk storage. It could store the varchars as pointers to the end of the record where the variable data is stored. The whole row could be stuffed in a text file in a CSV format. You have no idea how a "generic" DB is going to store this, or what optimizations it may have to solve this exact problem. If you want to know for a specific database, contact the vendor. But it's a silly "guideline" or rule of thumb.
Will Hartung
I don't think the order of the columns IN THE TABLE makes any difference - it definitely makes a difference in the INDEXES you might create, true.
marc_s
@Allain: no, varchars and chars both reserve the space. (Although I don't know about Postgresql!)
James L
@TopBanana: not sure if you know Oracle or not, but it doesn't reserve 100 bytes for a VARCHAR2(100)
Quassnoi
@Andomar: I wouldn't call this impact "considerable", but there cetrainly is some impact on finding where the field starts. +1.
Quassnoi
@Quassnoi: Thanks! The "considerable" is something experience has taught me, I've seen column order can change the speed of a query by 2-3x. I might not be giving the best or right reason for it.
Andomar
@Andomar: if it's Oracle, it must be putting NULL columns to the end of the table.
Quassnoi
@Quassnoi: the biggest impact was on Sql Server, on a table with many nullable varchar() columns.
Andomar
@Andomar: this must be row chaining then
Quassnoi
@Andomar: the point on CLUSTERED INDEXES in not so good :) Physical table layout will not depend on the logical position of the PRIMARY KEY: it will always be stored first, despite the logical position of a column (of columns) in the table definition.
Quassnoi
@Quassnoi: Makes sense, I'll remove it.
Andomar
@TopBanana - I actually made the test with SQL Server, and Varchar's DON'T reserve the space (although I thought they would). I created 2 identical DBs, one table each, one with CHARs, one with VARCHARs, and I filled them up with identical random data (or random length). The VARCHAR one took about half the space in the hard drive.
Daniel Magliola
+5  A: 

During Oracle training at a previous job, our DBA suggested that putting all the non-nullable columns before the nullable ones was advantageous... although TBH I don't remember the details of why. Or maybe it was just the ones that were likely to get updated should go at the end? (Maybe puts off having to move the row if it expands)

In general, it shouldn't make any difference. As you say, queries should always specify columns themselves rather than relying on the ordering from "select *". I don't know of any DB that allows them to be changed... well, I didn't know MySQL allowed it until you mentioned it.

araqnid
He was right, Oracle does not write trailing NULL columns to disk, saving some bytes. See http://www.dba-oracle.com/oracle_tips_ault_nulls_values.htm
Andomar
absolutely, it can make a big difference in the size on disk
Alex
Is that the link you meant? It's related to the non-indexing of null in indices rather than column order.
araqnid
Wrong link, and can't find the original. Though you can google for it, for example http://www.tlingua.com/new/articles/Chapter2.html
Andomar
A: 

As is often the case, the biggest factor is the next guy who has to work on the system. I try to have the primary key columns first, the foreign key columns second, and then the rest of the columns in descending order of importance / significance to the system.

James L
We typically start out with the last column being "created" (timestamp for when the row is inserted). With older tables, of course, it can have several columns added after that...And we have the occasional table where a compound primary key was changed to a surrogate key so the primary key is several columns over.
araqnid
A: 

If you're going to be using UNION a lot, it makes matching columns easier if you have a convention about their ordering.

Allain Lalonde
Sounds like your database needs normalizing! :)
James L
Hey! Take it back, I didn't say my database. :)
Allain Lalonde
There are licit reasons to use UNION ;)See http://www.postgresql.org/docs/current/static/ddl-partitioning.htmland http://stackoverflow.com/questions/863867/database-speed-optimization-few-tables-with-many-rows-or-many-tables-with-few-r
voyager
+11  A: 

Update:

In MySQL, there may be a reason to do this.

Since variable datatypes (like VARCHAR) are stored with variable lengths in InnoDB, the database engine should traverse all previous columns in each row to find out the offset of the given one.

The impact may be as big as 17% for 20 columns.

See this entry in my blog for more detail:

In Oracle, trailing NULL columns consume no space, that's why you should always put them to the end of the table.

Also in Oracle and in SQL Server, in case of a large row, a ROW CHAINING may occur.

ROW CHANING is splitting a row that doesn't fit into one block and spanning it over the multiple blocks, connected with a linked list.

Reading trailing columns that didn't fit into the first block will require traversing the linked list, which will result in an extra I/O operation.

See this page for illustration of ROW CHAINING in Oracle:

That's why you should put columns you often use to the beginning of the table, and columns you don't use often, or columns that tend to be NULL, to the end of the table.

Important note:

If you like this answer and want to vote for it, please also vote for @Andomar's answer.

He answered the same thing, but seems to be downvoted for no reason.

Quassnoi
So you're saying this would be slow :select tinyTable.id, tblBIG.firstColumn, tblBIG.lastColumnfrom tinyTableinner join tblBIGon tinyTable.id = tblBIG.fkIDIf tblBIG records are over 8KB (in which case some row chaining would occur) and the join would be synchronous...But this would be fast:select tinyTable.id, tblBIG.firstColumnfrom tinyTableinner join tblBIGon tinyTable.id = tblBIG.fkIDSince I wouldn't use the column in other blocks hence no need to traverse the linked listDid I get this right?
jfrobishow
@jfrobishow: right.
Quassnoi
+2  A: 

The only reason I can think about is for debugging and fire-fighting. We have a table whose "name" column's appears about 10th on the list. It's a pain when you do a quick select * from table where id in (1,2,3) and then you have to scroll across to look at the names.

But that's about it.

Chris Simpson
A: 

In general what happens in SQL Server when you change column order through Management Studio, is that it creates a temp table with the new structure, moves the data to that structure from the old table, drops the old table and renames the new one. As you might imagine, this is a very poor choice for performance if you have a large table. I don't know if My SQL does the same, but it is one reason why many of us avoid reordering columns. Since select * should never be used in a production system, adding columns at the end is not aproblem for a well-designed system. Order of columns inthe table should in genral not be messed with.

HLGEM
A: 

As noted, there are numerous potential performance issues. I once worked on a database where putting very large columns at the end improved performance if you didn't reference those columns in your query. Apparently if a record spanned multiple disk blocks, the database engine could stop reading blocks once it got all the columns it needed.

Of course any performance implications are highly dependent not just on the manufacturer that you're using, but also potentially on the version. A few months ago I noticed that our Postgres could not use an index for a "like" comparison. That is, if you wrote "somecolumn like 'M%'", it wasn't smart enough to skip to the M's and quit when it found the first N. I was planning to change a bunch of queries to use "between". Then we got a new version of Postgres and it handled the like's intelligently. Glad I never got around to changing the queries. Obviously not directly relevant here but my point is that anything you do for efficiency considerations could be obsolete with the next version.

Column order is almost always very relevant to me because I routinely write generic code that reads the database schema to create screens. Like, my "edit a record" screens are almost always built by reading the schema to get the list of fields, and then displaying them in order. If I changed the order of columns, my program would still work, but the display might be strange to the user. Like, you expect to see name / address / city / state / zip, not city / address / zip / name / state. Sure, I could put the display order of the columns in code or a control file or something, but then every time we added or removed a column we'd have to remember to go update the control file. I like to say things once. Also, when the edit screen is built purely from the schema, adding a new table can mean writing zero lines of code to create an edit screen for it, which is way cool. (Well, okay, in practice usually I have to add an entry to the menu to call the generic edit program, and I've generally given up on generic "select a record to update" because there are too many exceptions to make it practical.)

Jay