views:

59

answers:

4

I have a 'SessionVisit' table which collects data about user visits. The script for this table is below. There may be 25,000 rows added a day.

The table CREATE statement is below. My database knowledge is definitely not up to scratch as far as understanding the implications of such a schema.

Can anyone give me their 2c of advice on some of these issues :

  • Do I need to worry about ROWSIZE for this schema for SQL Server 2008. I'm not even sure how the 8kb rowsize works in 2008. I don't even know if I'm wasting a lot of space if I'm not using all 8kb?
  • How should I purge old records I don't want. Will new rows fill in the empty spaces from dropped rows?
  • Any advice on indexes

I know this is quite general in nature. Any 'obvious' or non obvious info would be appreciated.

Here's the table :

USE [MyDatabase]
GO
/****** Object:  Table [dbo].[SessionVisit]    Script Date: 06/06/2009 16:55:05 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[SessionVisit](
    [SessionGUID] [uniqueidentifier] NOT NULL,
    [SessionVisitId] [int] IDENTITY(1,1) NOT NULL,
    [timestamp] [timestamp] NOT NULL,
    [SessionDate] [datetime] NOT NULL CONSTRAINT [DF_SessionVisit_SessionDate]  DEFAULT (getdate()),
    [UserGUID] [uniqueidentifier] NOT NULL,
    [CumulativeVisitCount] [int] NOT NULL CONSTRAINT [DF_SessionVisit_CumulativeVisitCount]  DEFAULT ((0)),
    [SiteUserId] [int] NULL,
    [FullEntryURL] [varchar](255) NULL,
    [SiteCanonicalURL] [varchar](100) NULL,
    [StoreCanonicalURL] [varchar](100) NULL,
    [CampaignId] [int] NULL,
    [CampaignKey] [varchar](50) NULL,
    [AdKeyword] [varchar](50) NULL,
    [PartnerABVersion] [varchar](10) NULL,
    [ABVersion] [varchar](10) NULL,
    [UserAgent] [varchar](255) NULL,
    [Referer] [varchar](255) NULL,
    [KnownRefererId] [int] NULL,
    [HostAddress] [varchar](20) NULL,
    [HostName] [varchar](100) NULL,
    [Language] [varchar](50) NULL,
    [SessionLog] [xml] NULL,
    [OrderDate] [datetime] NULL,
    [OrderId] [varchar](50) NULL,
    [utmcc] [varchar](1024) NULL,
    [TestSession] [bit] NOT NULL CONSTRAINT [DF_SessionVisit_TestSession]  DEFAULT ((0)),
    [Bot] [bit] NULL,
 CONSTRAINT [PK_SessionVisit] PRIMARY KEY CLUSTERED 
(
    [SessionGUID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[SessionVisit]  WITH CHECK ADD  CONSTRAINT [FK_SessionVisit_KnownReferer] FOREIGN KEY([KnownRefererId])
REFERENCES [dbo].[KnownReferer] ([KnownRefererId])
GO
ALTER TABLE [dbo].[SessionVisit] CHECK CONSTRAINT [FK_SessionVisit_KnownReferer]
GO
ALTER TABLE [dbo].[SessionVisit]  WITH CHECK ADD  CONSTRAINT [FK_SessionVisit_SiteUser] FOREIGN KEY([SiteUserId])
REFERENCES [dbo].[SiteUser] ([SiteUserId])
GO
ALTER TABLE [dbo].[SessionVisit] CHECK CONSTRAINT [FK_SessionVisit_SiteUser]
+1  A: 

Well, I'd recommend NOT inserting a few k of data with EVERY page!

First thing I'd do would be to see how much of this information I could get from a 3rd party analytics tool, perhaps combined with log analysis. That should allow you to drop a lot of the fields.

25k inserts a days isn't much, but the catch here is that busier your site gets, the more load this is going to put on the db. Perhaps you could build a queuing system that batches the writes, but really, most of this information is already in the logs.

chris
I concur -- if you really can't get the info you need from the text logs (and can't add them), then make sure you batch the writes together.
Jonathan
and you really, really don't want to include any fields that'll have to be computed - I just noticed the "cumulative visit count" field. This is something that you can do while processing the information, there's no good reason you need that real-time.
chris
@chris one view of this data is a list of purchases and we're showing how many visits it took that user before they made a purchase. it needs to be here beause if old sessions are purged we'd lose that data otherwise
Simon_Weaver
If you tag users with persistent cookie, and log that, you have that information - you can tie the user id to multiple session ids.
chris
A: 

Agre with Chris that you would probably be better off using log analysis (check out Microsoft's free Log Parser)

Failing that, I would remove the Foreign Key constraints from your SessionVisit table.

You mentioned rowsize; the varchar's in your table do not pre-allocate to their maximum length (more 4 + 4 bytes for an empty field (approx.)). But saying that, a general rule is to keep rows as 'lean' as possible.

Also, I would remove the primary key from the SessionGUID (GUID) column. It won't help you much.

Mitch Wheat
BTW, if you do decide to use log analysis, in IIS you can log cookies, so you should be able to reduce a lot of the fields you list above - you'll have session id, referrer, url, etc. which you can use to tie a user to a specific session id.
chris
A: 

That's also an awful lot of nulls in that table. I think you should group together the columns that must be non-null at the same time. In fact, you should do a better analysis of the data you're writing, rather than lumping it all together in a single table.

John Saunders
+1  A: 

I see SessionGUID and SessionVisitId, why have both a uniqueidentifier and an Identity(1,1) on the same table? Seems redundant to me.

I see referer and knownrefererid, think about getting the referer from the knownrefererid if possible. This will help reduce excess writes.

I see campaignkey and campaignid, again if possible get from the campaigns table if possible.

I see orderid and orderdate. I'm sure you can get the order date from the orders table, correct?

I see hostaddress and hostname, do you really need the name? Usually the hostname doesn't serve much purpose and can be easily misleading.

I see multiple dates and timestamps, is any of this duplicate?

How about that SessionLog column? I see that it's XML. Is it a lot of data, is it data you may already have in other columns? If so get rid of the XML or the duplicated columns. Using SQL 2008 you can parse data out of that XML column when reporting and possibly eliminate a few extra columns (thus writes). Are you going to be in trouble in the future when developers add more to that XML? XML to me just screams 'a lot of excessive writing'.

Mitch says to remove the primary key. Personally I would leave the index on the table. Since it is clustered that will help speed up write times as the DB will always write new rows at the end of the table on the disk.

Strip out some of this duplicate information and you'll probably do just fine writing a row each visit.

qbn