views:

61

answers:

4

Hello,

i think the title says everything. Is it better(faster,space-saving according memory and disk) to store 8-digit unsigned numbers as Int or as char(8) type? Would i get into trouble when the number will change to 9 digits in future when i use a fixed char-length?

Background-Info: i want to store TACs

Thanks

+1  A: 

Stick to INT for this one, DEFFINITELY INT (OR BIGINT)

Have a look at int, bigint, smallint, and tinyint (Transact-SQL)

Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). Storage size is 4 bytes,

bigint (whole number) data from -2^63 (-9,223,372,036,854,775,808) through 2^63-1 (9,223,372,036,854,775,807). Storage size is 8 bytes.

compared to

char and varchar

Fixed-length non-Unicode character data with length of n bytes. n must be a value from 1 through 8,000. Storage size is n bytes.

Also, once you query against this, you will have degraded performance if you use ints compared to your char column, as SQL Server will have to do as cast for you...

astander
I had added that i need it for storing TAC's that are unique numeric identifiers. Other mentioned that a char would be better because of leading zeros. Thanks anyway.
Tim Schmelter
+2  A: 

If it is a number, store it as a number.

Integers are stored using 4 bytes, giving them the range:

-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)

So, suitable for your needs.

char[8] will be stored as 8 bytes, so double the storage, and of course suffers from the need to expand in the future (converting almost 10M records from 8 to 9 chars will take time and will probably require taking the database offline for the duration).

So, from storage, speed, memory and disk usage (all related to the number of bytes used for the datatype), readability, semantics and future proofing, int wins hands down on all.


Update

Now that you have clarified that you are not storing numbers, I would say that you will have to use char in order to preserve the leading zeroes.

As for the issue with future expansion - since char is a fixed length field, changing from char[8] to char[9] would not lose information. However, I am not sure if the additional character will be added on the right or left (though this is possibly undetermined). You will have to test and once the field has been expanded you will need to ensure that the original data has been preserved.

A better way may be to create a new char[9] field, migrate all the char[8] data to it (to keep things reliable and consistent), then remove the char[8] field and rename the new field to the original name. Of course this would ruin all statistics on the table (but so would expanding the field directly).

Oded
+3  A: 

Given that TACs can have leading zeroes, that they're effectively an opaque identifier, and are never calculated with, use a char column.

Don't start optimizing for space before you're sure you've modelled your data types correctly.

Edit

But to avoid getting junk in there, make sure you apply a CHECK constraint also. E.g if it's meant to be 8 digits, add

CONSTRAINT CK_Only8Digits CHECK (not TAC like '%[^0-9]%' and LEN(RTRIM(TAC)) = 8)
Damien_The_Unbeliever
Exactly. THis is not na NUMBER, it is a numeric identifier. YOu will never do maths, but you may want to look for all ending with certain digits or containing certain digits. Not a number by definition.
TomTom
Thank you for clarifying. I would have added leading zeros in the backend, but i think that could cause more problems than leave it as what it was. Char(10) would be best in this case, wouldn't it? Because the number of digits could change in future.
Tim Schmelter
@Tim Schmelter - I'm not familiar with this particular field - is such a change currently under discussion, or due in N number of years (where N is small compared to the projected life of your system)? If it is, then yes, I might make a wider column.
Damien_The_Unbeliever
@Damien: Before 2004 they were 6 digit long, so they might increase the length in future too.
Tim Schmelter
+1  A: 

An int will use less memory space and give faster indexing than a char.

If you need to take these numbers apart -- search for everything where digits 3-4 are "02" or some such -- char would be simpler and probably faster.

I gather you're not doing arithmetic on them. You'd not adding two TACs together or finding the average TAC for a set of records or anything like that. If you were, that would be a slam-dunk argument for using int.

If they have leading zeros, its probably easier to use char so you don't have to always pad the number with zeros to the correct length.

If none of the above applies, it doesn't matter much. I'd probably use char. I can't think of a compelling reason to go either way.

Jay