views:

118

answers:

5

I have to design a database to store log data but I don't have experience before. My table contains about 19 columns (about 500 bytes each row) and daily grows up to 30.000 new rows. My app must be able to query effectively again this table.

I'm using SQL Server 2005.

How can I design this database?

EDIT: data I want to store contains a lot of type: datetime, string, short and int. NULL cells are about 25% in total :)

A: 

Well, given the description you've provided all you can really do is ensure that your data is normalized and that your 19 columns don't lead you to a "sparse" table (meaning that a great number of those columns are null).

If you'd like to add some more data (your existing schema and some sample data, perhaps) then I can offer more specific advice.

Adam Robinson
A: 

Throw an index on every column you'll be querying against.

Huge amounts of test data, and execution plans (with query analyzer) are your friend here.

Sii
I'd actually wait on this one. I'm not saying, "don't do it", but adding 30K records per day means that un-needed indexes could negatively impact performance. So you need a strong emphasis on "you'll be querying against". You don't really know that until you used it for a while. **Profile** first.
Joel Coehoorn
A: 

In addition to the comment on sparse tables, you should index the table on the columns you wish to query.

Alternatively, you could test it using the profiler and see what the profiler suggests in terms of indexing based on actual usage.

Colin Mackay
A: 

Some optimisations you could make:

  1. Cluster your data based on the most likely look-up criteria (e.g. clustered primary key on each row's creation date-time will make look-ups of this nature very fast).
  2. Assuming that rows are written one at a time (not in batch) and that each row is inserted but never updated, you could code all select statements to use the "with (NOLOCK)" option. This will offer a massive performance improvement if you have many readers as you're completely bypassing the lock system. The risk of reading invalid data is greatly reduced given the structure of the table.

If you're able to post your table definition I may be able to offer more advice.

Adamski
+1  A: 

However else you'll do lookups, a logging table will almost certainly have a timestamp column. You'll want to cluster on that timestamp first to keep inserts efficient. That may mean also always constraining your queries to specific date ranges, so that the selectivity on your clustered index is good.

You'll also want indexes for the fields you'll query on most often, but don't jump the gun here. You can add the indexes later. Profile first so you know which indexes you'll really need. On a table with a lot of inserts, unwanted indexes can hurt your performance.

Joel Coehoorn