views:

1068

answers:

4

Hi,

On my current project, I came across our master DB script. Taking a closer look at it, I noticed that all of our original primary keys have a data type of numeric(38,0) We are currently running SQL Server 2005 as our primary DB platform.

For a little context, we support both Oracle and SQL Server as our back-end. In Oracle, our primary keys have a data type of number(38,0).

Does anybody know of possible side-effects and performance impact of such implementation? I have always advocated and implemented int or bigint as primary keys and would love to know if numeric(38,0) is a better alternative.

Thank your for your input,

Eric.

+3  A: 

Barring the storage considerations and some initial confusion from future DBAs, I don't see any reason why NUMERIC(38,0) would be a bad idea. You're allowing for up to 9.99 x 10^38 records in your table, which you will certainly never reach. My quick digging into this didn't turn up any glaring reason not to use it. I suspect that your only potential issue will be the storage space consumed by that, but seeing as how storage space is so cheap, that shouldn't be an issue.

I've seen this a fair number of times in Oracle databases since it's a pretty big default value that you don't need to think about when you're creating a table, similar to using INT or BIGINT by default in SQL Server.

Jeremiah Peschka
+6  A: 

Well, you are spending more data to store numbers that you will never really reach.

bigint goes up to 9,223,372,036,854,775,807 in 8 Bytes

int goes up to 2,147,483,647 in 4 bytes

A NUMERIC(38,0) is going to take, if I am doing the math right, 17 bytes.

Not a huge difference, but: smaller datatypes = more rows in memory (or fewer pages for the same # of rows) = fewer disk I/O to do lookups (either indexed or data page seeks). Goes the same for replication, log pages, etc.

For SQL Server: INT is an IEEE standard and so is easier for the CPU to compare, so you get a slight performance increase by using INT vs. NUMERIC (which is a packed decimal format). (Note in Oracle, if the current version matches the older versions I grew up on, ALL datatypes are packed so an INT inside is pretty much the same thing as a NUMERIC( x,0 ) so there's no performance difference)

So, in the grand scheme of things -- if you have lots of disk, RAM, and spare I/O, use whatever datatype you want. If you want to get a little more performance, be a little more conservative.

Otherwise at this point, I'd leave it as it is. No need to change things.

Matt Rogish
One small comment: "smaller datatypes = more pages in memory" should read "smaller datatypes = more rows or records in memory". Pages are always 8K, regardless of the size of the columns or indexes they store.
Rick
Rick: Yep my bad. Should be: FEWER pages in memory for the same number of rows, or like you said -- more ROWS. Good catch. I think I can edit it without becoming wiki, I'll check
Matt Rogish
Out of curiosity, why does the fact that INT is an IEEE standard make it easier for the CPU to compare?
Ian Varley
Ian Varley: Because most CPU manufacturers follow the IEEE standard. Oracle must convert it's packed format into the IEEE equivalents for processing. Note that 10G finally has IEEE floating point numbers
Matt Rogish
+2  A: 

This is overly large because you are never going to have that many rows. The larger size will result in more storage space. This is not a big deal in itself but will also mean more disk reads to retrieve data from a table or index. It will mean less rows will fit into memory on the database server.

I don't think it's broken enough to be bothered fixing.

WW
"Never" is a dangerous word. If I built a logging table to store all the traffic from StackOverflow's webserver(s), I imagine I could overflow BigInt without much trouble.
Rick
He forgot the most important problem: comparison time for joins. It takes longer to match a numeric(38) than an int. If you're joining a couple large tables together that can add up.
Joel Coehoorn
@Rick - Even if SO were getting 10 million hits a day, it would take 2.5 billion years before the BIGINT overflowed.
Tom H.
+1  A: 

You'd be better off using a GUID. Really. The normal reason not to use one is that an integer performs better. But GUID is smaller than numeric(38), and has the added benefit of making it a little easier to do thing like let disconnected users create and sync new records.

Joel Coehoorn
Joel, Thanks for your reply, but since we need to use those PKs in FKs, GUIDs are a little more cumbersome to work with for our purpose. Now, I would totally agree that we should have 3 keys: internal (int), business, and of course, the GUID for disconnected needs.
How are GUIDs more cumbersome for FKs? You shouldn't be looking at them anyway, and since GUID is smaller than numeric(38) the database will handle joins and such faster.
Joel Coehoorn