views:

118

answers:

11

This may be a pretty naive and stupid question, but I'm going to ask it anyway

I have a table with several fields, none of which are unique, and a primary key, which obviously is.

This table is accessed via the non-unique fields regularly, but no user SP or process access data via the primary key. Is the primary key necessary then? Is it used behind the scenes? Will removing it affect performance Positively or Negatively?

A: 

The primary key when defined will help improve performance within the database for indexing and relationships.

I always tend to define a primary key as an auto incrementing integer in all my tables, regardless of if I access it or not, this is because when you start to scale up your application, you may find you do actually need it, and it makes life a lot simpler.

Tom Gullen
+2  A: 

I would never have a table without a primary key. Suppose you ever need to remove a duplicate - how would you identify which one to remove and which to keep?

HLGEM
In Oracle, the rowid can be used to identify a row.
ar
Yes but this is a SQL Server question
HLGEM
Well, since nothing references that primary key, it wouldn't matter which one you removed, would it? Your statement could just delete the amount of duplicates - 1
roryok
+13  A: 

Necessary? No. Used behind the scenes? Well, it's saved to disk and kept in the row cache, etc. Removing will slightly increase your performance (use a watch with millisecond precision to notice).

But ... the next time someone needs to create references to this table, they will curse you. If they are brave, they will add a PK (and wait for a long time for the DB to create the column). If they are not brave or dumb, they will start creating references using the business key (i.e. the data columns) which will cause a maintenance nightmare.

Conclusion: Since the cost of having a PK (even if it's not used ATM) is so small, let it be.

Aaron Digulla
Actually: having a primary key (which by default also is the clustering key) can actually **speed up** many operations on SQL Server tables - even if the key itself is not really needed. The point is: without a clustering key, the table is a "heap" which uses a rather messy structure and organization. See Kim Tripp's blog post explaining in great detail what a clustering key does (as a positive!) to your tables: http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-Continues.aspx
marc_s
A: 

A PK is not necessary.

But you should consider to place a non-unique index on the columns that you use for querying (i.e. that appear in the WHERE-clause). This will considerably boost lookup performance.

HTH!
Thomas

Thomas Weller
A: 

If you are accessing them via non-key fields the performance probably will not change. However it might be nice to keep the PK for future enhancements or interfaces to these tables. Does your application only use this one table?

ProfessionalAmateur
+3  A: 

The primary key is behind the scenes a clustered index (by default unless generated as a non clustered index) and holds all the data for the table. If the PK is an identity column the inserts will happen sequentially and no page splits will occur.

But if you don't access the id column at all then you probably want to add some indexes on the other columns. Also when you have a PK you can setup FK relationships

SQLMenace
"The primary key is behind the scenes a clustered index" - that might mislead people. Just because it's the default, doesn't mean it has to be (as you know).
Mitch Wheat
And whilst on an anti confusion drive you don't need a PK to setup FK relationships. A unique constraint will suffice.
Martin Smith
Mitch, yes that is why I added the stuff in parentheses...but most people just either click the yellow key or do create table bla(id int primary key)
SQLMenace
Martin yes that is true, you can of course also use triggers to 'enforce' this
SQLMenace
A: 

Hi,

The primary key is to ensure you have an unique identifier for each record. It is not necessary to use if you do not need this functionality. In terms of performance; the most import item is to ensure you have indexes on the columns you are using to lookup your records. Generally these are the fields in the where clause of your query's.

Enjoy!

Doug
+1  A: 

As SQLMenace said, the clustered index is an important column for the physical layout of the table. In addition, having a clustered index, especially a well chosen one on a skinny column like an integer pk, actually increases insert performance.

marr75
you can only ever have one clustered index...
Mitch Wheat
By definition, since it determines the physical storage of the rows, you can only have one. My emphasis was "having one CLUSTERED index" not "having ONE clustered index". In other words, this is an index that actually improves insert performance.
marr75
+2  A: 

Do you have any foreign keys, do you ever join on the PK?

If the answer to this is no, and your app never retrieves an item from the table by its PK, and no query ever uses it in a where clause, therefore you just added an IDENTITY column to have a PK, then:

  • the PK in itself adds no value, but does no damage either
  • the fact that the PK is very likely the clustered index too is .. it depends.

If you have NC indexes, then the fact that you have a narrow artificial clustered key (the IDENTITY PK) is helpful in keeping those indexes narrow (the CDX key is reproduced in every NC leaf slots). So a PK, even if never used, is helpful if you have significant NC indexes.

On the other hand, if you have a prevalent access pattern, a certain query that outweighs all the other is frequency and importance, or which is part of a critical time code path (eg. is the query run on every page visit on your site, or every second by and app etc) then that query is a good candidate to dictate the clustered key order.

And finally, if the table is seldom queried but often written to then it may be a good candidate for a HEAP (no clustered key at all) since heaps are so much better at inserts. See Comparing Tables Organized with Clustered Indexes versus Heaps.

Remus Rusanu
Thanks, I do have a lot of NC indexes (indeces?) so perhaps it's best to keep it. I will read up on heaps though, as we do an awful lot of inserts and very little reading
roryok
A: 

A primary key is really a property of your domain model, and it uniquely identifies an instance of a domain object.

Having a clustered index on a montonically increasing column (such as an identity column) will mean page splits will not occur, BUT insertions will unbalance the index over time and therefore rebuilding indexes needs to be done regulary (or when fragmentation reaches a certain threshold).

I have to have a very good reason to create a table without a primary key.

Mitch Wheat
A: 

In the logical model, a table must have at least one key. There is no reason to arbitarily specify that one of the keys is 'primary'; all keys are equal. Although the concept of 'primary key' can be traced back to Ted Codd's early work, the mistake was picked up early on has long been corrected in relational theory.

Sadly, PRIMARY KEY found it's way into SQL and we've had to live with it ever since. SQL tables can have duplicate rows and, if you consider the resultset of a SELECT query to also be a table, then SQL tables can have duplciate rows too. Relational theorists dislike SQL a lot. However, just because SQL lets you do all kinds of wacky non-relational things, that doesn't mean that you have to actually do them. Is is good practise to ensure that every SQL table has at least one key.

In SQL, using PRIMARY KEY on its own has implications e.g. NOT NULL, UNIQUE, the table's default reference for foreign keys. In SQL Server, using PRIMARY KEY on its own has implications e.g. the table's clustered index. However, in all these cases, the implicit behaviour can be made explicit using specific syntax.

You can use UNIQUE (constraint rather than index) and NOT NULL in combination to enforce keys in SQL. Therefore, no, a primary key (or even PRIMARY KEY) is not necessary in SQL Server.

onedaywhen