views:

67

answers:

5

I am desiging a new table that will potentially have 200K rows. I would like to make sure that querys to this table are efficiant.

in the past I had always given a row a unique id in the assumption that this would result in an index:

CREATE TABLE [dbo].[Equipment](
    [EquipID] [nchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [EquipDescription] [nchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [Category] [nchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [id] [int] IDENTITY(1,1) NOT NULL
) ON [PRIMARY]

is this enough? , should I be setting a primary key.

If any one has suggestions please let them fly.

T-SQL, SQL2000,

+3  A: 

You only get an index if you either create a primary key or explicitly create one. The indices that you need are determined by your queries, just having an index on a column does not make that query faster unless you are querying or joining on that column.

Indices are also not without cost, they make your database larger, and they increase the cost of modifying the table.

This article, although old seems to give a good overview of indices.

If you are going to do a bunch of work with databases getting a database textbook like Database Systems and reading it, will prove invaluable. Finding the most efficient arrangement of indices and queries is difficult, and trial and error is not a good way of trying to optimize queries.

deinst
A: 

The first thing you should ask yourself is "what are the properties of 'Equipment'?" Can a piece of Equipment exist without EquipID, Description or Category? If not, then those columns should not be allowed to be null.

Second is "what uniquely defines a piece of 'Equipment'?" Is it the EquipID? Then that should be your Primary Key. Is it a combination of EquipID and Category? Then you have a composite Primary Key consisting of both columns. Sometimes, however, the data doesn't naturally lend itself to a primary key that can easily be joined against in a fully relational model. Therefore, you could consider the Identity ID column like you show - this is known as a surrogate key. Know that creating a primary key by default creates a clustered index on those key columns. If you go with the surrogate key approach, IMHO it is a good idea then to create another unique index on the uniqueness of your object (i.e. EquipID).

As far as other indexes go, you should further ask yourself "what columns am I going to be querying against most often?" Perhaps you will have a lot of queries like "SELECT EquipID FROM Equipment WHERE Category = 3". This would suggest that Category is a good candidate column for an index.

Finally, another good rule of thumb is to index any foreign key columns - which it appears that Category could be. This optimizes any join queries that you may perform.

A good approach to this would be something like below (quickly thrown together, not tested):

CREATE TABLE [dbo].[Equipment]( 
  [EquipID] [nchar](20) NOT NULL
  ,[EquipDescription] [nchar](100) NOT NULL
  ,[CategoryID] [bigint] NOT NULL
  ,CONSTRAINT [PK_Equipment] PRIMARY KEY CLUSTERED (
    [EquipID] ASC
  )
) ON [PRIMARY] 
GO

CREATE TABLE [dbo].[Categories]( 
  [CategoryID] [bigint] IDENTITY(1,1) NOT NULL 
  ,[CategoryName] [nchar](100) NOT NULL
  ,CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED (
    [CategoryID] ASC
  )
) ON [PRIMARY] 
GO

CREATE NONCLUSTERED INDEX [IDX_Equipment_Category] ON [dbo].[Equipment] (
  [CategoryID] ASC
) ON [PRIMARY]

CREATE UNIQUE NONCLUSTERED INDEX [IDX_Categories_CategoryName] ON [dbo].[Categories] (
  [CategoryName] ASC
) ON [PRIMARY]

ALTER TABLE [dbo].[Equipment]  WITH CHECK ADD  CONSTRAINT [FK_Equipment_Categories] 
FOREIGN KEY([CategoryID]) REFERENCES [dbo].[Categories] ([CategoryID])
GO
BradBrening
+1  A: 

It is best practice to create a primary key on the table. As @deinst indicated, you won't get indexes unless you explicitly create them. Creating the primary key is one way to create an index.

The [id] column is probably a good candidate for the primary key. And, it's probably okay to have it be your clustered index (you get one clustered index per table), which is the default when creating the primary key.

You may want to create an indexes on the other columns based on how the table will be queried (again as indicated by @deinst).

Is [EquipID] the natural key for the table. The natural key is a unique attribute in the business domain. How do people in the business reference each item. If [EquipID] is the natural key, you might want to add a unique constraint, or a unique index, for this column, and you may want to change its nullability to NOT NULL.

bobs
+1  A: 

Database optimization is a whole art/science unto itself, but so is data modeling. First make sure your schema is solid and supports your application. Then, you can add indexes to improve query performance. But don't make data modeling choices based on some notion that the choice will somehow make the table faster to query.

1) All tables should have a primary key for reasons other than performance. You need to be able to uniquely identify a record.

2) The performance of queries and which indexes will boost their efficiency depend on the query. If you're using a column in a JOIN, a WHERE clause, or an ORDER BY clause, then it should have an index on it. Your primary key will automatically get an index, so consider which other columns might be used in this way. In some cases, multi-column indexes are the best choice.

grossvogel
A: 

This question is impossible to answer without posting also the Queries and Relations which you will have.

  • Are you going to always refer to an Equipment by the id? Or by EquipID?
  • Do other tables store references (foreign keys) to Equipment by storing the id value or the EquipID?
  • Are you going to aggregate by Category?
  • Is there a hierachy implied by Category?
  • Does the EquipID ever change for an Equipment?
Remus Rusanu