ansaurus

Question

VERY huge SQL Database: How should the schema look like?

Answer 1

A:

This is a very common question. When you create indexes it potentially reduces the time required for queries but increases the time required for updates/inserts and also increase the amount of disk space required per record.

You need to decide for each column if the index offers a performance boost for your queries and if it warrants the impact to insert/update performance and disk space utilization.

As an alternative to indexes, you might be able to utilize an OLAP cube. If your query is producing an aggregate or applying computations then you might want to consider performing the query nightly and storing the results in a different table. You can run simpler queries against the smaller table and achieve the same result with less impact on performance.

Mayo 2009-08-31 19:46:01

Answer 2

A:

How you do your indexes and primkeys depends. If you just want to analyze the data and if you're pretty sure subsequent DML commands will only be SELECTs (no INSERTs), then removing the PK should be fine. In fact, the hand_id column is an IDENTITY (auto-increment) column, meaning that SQL Server manages that value anyway (in fact, you can't insert values into that column without going to the extra trouble of switching on IDENTITY_INSERT mode prior to beginning your INSERT statements, IIRC).

Be wary of evolving needs for this database, of course. Should needs change, then you should consider constraints/indexes/keys.

If data mining is a consideration in the future, consider using Microsoft's SSAS (Analysis Services).

UPDATE: After reading mayo's reply, I agree that indexes (purely for speed, not constraint enforcement) are advisable for subsequent queries (recall that indexes speed up read operations but make inserts/updates take longer, typically). Since your goal is to do a single bulk insert followed by SELECT queries, you could do your bulk insert, then add the necessary indexes to your database on columns that are likely candidates in your queries.

Garrett 2009-08-31 19:51:05

Actually I won't use the hand_id at all. I created the PK because I've been taught to always create a PK in every table. Also, in my scenario there will never ever be any inserts or updates at all once the data is inserted. Also I will always to a query with the hand_index, so each query will return 1176 rows. So is it normal that after creating an index on the hand_index column that the DB size doubles up? I thought this is odd, but if it works like this then let it be it.

Simon 2009-08-31 19:57:49

Answer 3

+1 A:

To answer your question about needing a primary key - with only the information you provided in the question:

Based on your table schema, you might as well keep it there. If you remove that identity column, you'd also be removing your clustered index. Your clustered index value (4 bytes) is stored as the pointer in each non-clustered index row. By removing that clustered index, you'd be leaving the table as a heap - and SQL will create an 8 byte RID (row identifier) for each row in the table, and use that as the pointer in the non-clustered index instead. So, in your case, based on the schema you've provided in the question - you could potentially INCREASE the size of your non-clustered indexes, and in the end slow them down.

With that all said - based on the queries that you could be running (and their usage patterns) that weren't included in the question - evaluating your clustered index to be something other than an identity column could be in line as well.

Scott Ivey 2009-08-31 19:52:46

Answer 4

+1 A:

Well you could break up the table into smaller tables if for example the hs(X) and ppot(X) need to grow past nine.

This is what you have:

[hand_id] [int] IDENTITY(1,1) NOT NULL,
    [flop_index] [smallint] NULL,
    [hand_index] [smallint] NULL,
    [hs1] [real] NULL,
    [ppot1] [real] NULL,
    etc...

You could break it up into 2 tables (maybe 3 if you need to)

Table hand: (EXAMPLE)
[hand_id] [int] IDENTITY(1,1) NOT NULL,
    [flop_index] [smallint] NULL,
    [hand_index] [smallint] NULL


Table hs_ppot (EXAMPLE)
[hand_id] [int] IDENTITY(1,1) NOT NULL,
[hs] [real] NULL,
    [ppot] [real] NULL

Then you could reference by hand_id in each table. Just a though.

BTW what is hs and ppot?

Phill Pafford 2009-08-31 19:55:21

hs means Handstrength and ppot means "Positive Potential"

Simon 2009-08-31 20:12:00

ok thanks, not big into poker

Phill Pafford 2009-08-31 20:23:30

I'm actually trying to break the data into multiple tables, i'll let you know how it works out. Unfortunately I'm not big into SQL ;)

Simon 2009-08-31 20:36:56

Answer 5

A:

Let me preface my response by saying that putting every possible combination in a database feels wrong. I'll get to why in a minute.

I'd start with a table called Cards. There would be 1 record for every possible card and it would include fields for Suit, Face value, rank and yes, a CardID as a primary key. Also index the suit, and face value.

If you want to table out every possible Hold'em hand, then I would make separate tables for the pocketCards(pocketID, pCardID1, pCardID2), flopCards(flopID, fCardID1, fCardID2, fCardID3) then a table for the TurnAndRiver(turnAndRiverID, turnCardID, riverCardID). Then a Hand table with (handID, pocketID, flopID, turnAndRiverID, handScore).

HandScore would be a calculated field run off of a table or scalar value function.

By separating out those bits, you avoid a great deal of the duplication, but you will still have to worry about card selection and overlap.

Ideally, I would forgo the hand tables and calculate the hand and score in what ever application I was building to consume this data.

Putting too much of your logic in the database may make it hard to adapt when the client asks you to model Omaha or five-card draw for example.

In reguard to your index question, yes, I would use a primary key as that will allow you to quickly reference a specific hand in your code.

Update

In response to the OP's Edit: It sounds like you are using the wrong tool for this task. What is the value of having the data in a Database if you are always going to select the exact same recordset? Examine other options (like a flat XML file, or a static DataSet in your code for example). It will save you the connection time and the overhead of running a server for what is essentially static data.

Rob Allen 2009-08-31 20:03:24

ansaurus

tags:

views:

answers:

VERY huge SQL Database: How should the schema look like?

related questions