I have a table where I store comments for user users. I will have 100 Million+ comments.
2 ways I can create it:
Option 1: user name and comment id as PK. That way all comments are stored physically by user name and comment id.
CREATE TABLE [dbo].[Comments](
[user] [varchar](20) NOT NULL,
[com_id] [int] IDENTITY(1,1) NOT NULL,
[com_posted_by] [varchar](20) NOT NULL,
[com_posted_on] [smalldatetime] NOT NULL CONSTRAINT DEFAULT (getdate()),
[com_text] [nvarchar](225) COLLATE NOT NULL,
CONSTRAINT [PK_channel_comments] PRIMARY KEY CLUSTERED
([channel] ASC, [com_id] ASC) WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]
Pros: My query will be get all or top 10 comments for a user order by comment_id DESC. This is SEEK
Option 2: I can make the comment id as the PK. That will store the comments sorted by the comment id, not user name.
Cons: Getting latest top 10 comments of a given user is not a seek anymore as data not stored by user (ie. not sorted by user). So I have to create other index to improve the query performance.
Which way is best way to proceed? How about insertion and deletion? These operations are allowed. But read is frequent.
User can't modify their comments.
I tested both tables with 1.1M rows. Here is the result:
table_name rows reserved data index_size unused
comments2 1079892 99488 KB 62824 KB 36576 KB 88 KB (PK: com_id Second Index on (user_name, com_id))
comments1 1079892 82376 KB 82040 KB 328 KB 8 KB (PK: user_name, no other indices)
--------------------------------------------------------------------
diff: same rows 17112KB -19216KB 36,248KB 80KB
So the table with com_id as PK is using 36MB extra disk space just for the 2 index The select top query on both table using SEEK, but table with com_id as PK is slower But insertion is slightly faster when I have com_id as PK
Any comments?