views:

1178

answers:

14

When defining datatypes in a database, I have always had a problem with choosing whether to use integers or strings to store certain 'numerical' data.

Say I am building Yet Another Address Book and there is a post code field. Provided that post codes are always a 4 digit number, which data type do I store it as? Integer or string? Technically it is an integer, but I'm not doing any sort of calculations on it, I'm just spitting it out into a table. Would your opinion change if I want to sort the table by post code?

Now, I'm not stupid. I do recognize a valid need for integers, such as page views and unique users or logged in users and guest users. But what about for storing how many files are in a torrent? Integer or string?

A: 

Postcodes are strings. For some comtries, those strings may consist onl of numeric digits, but that doesn't make them integers. And sooner or later your potal system will run out of digits and decide to start using letters as well. If your database uses integers for the postcode field, you will be in deep doo-doo.

Bottom line - if you don't do arithmetic on it, it's probably not really a number.

anon
+2  A: 

For a postal code I would choose a string. It is not intrinsically an integer. It is just an identifier for something and it could just as well have been a series of four characters.

As for the number of files inside a torrent, that should be an integer.

Ronald Wildenberg
+2  A: 

Is '0000' a postcode ? Is it distinct from '0' ?

If it's always a four-digit number, I would always store it as 4 digits, and that would point to keeping it as a string.

Brian Agnew
+9  A: 

in my opinion for postal codes you have to use strings, because you can have postal codes that stards with zeros (09100) and if you use integers it would be 9100: sorting is not a problem, because there is still an alphabetical order ('09100' comes before '09101'). For Storing file numbers I would expect an interger, so you don't have any problem in incresing / decreasing its number. So integer vs strings depends upon the use you make!

Enreeco
+4  A: 

This is a question of semantics. You are trying to decide the appropriate datatype for storage which can be a tricky question. The best rule of thumb is to store your data as integers if you will need to use the data as an integer.

In other words, since you will never be using a postal code as a number it does not make sense to store it as one. It doesn't matter what the data looks like, it matters what it is. Is a postal code a number? No, it's a string of characters that just happens to be made up of wholly numeric characters. Therefore a postal code is best stored as a string.

Andrew Hare
+11  A: 

In my country, post-codes are also always 4 digits. But the first digit can be zero.

If you store "0700" as an integer, you can get a lot of problems:

  • It may be read as an octal value
  • If it is read correctly as a decimal value, it gets turned into "700"
  • When you get the value "700", you must remember to add the zero
  • I you don't add the zero, later on, how will you know if "700" is "0700", or someone mistyped "7100"?

Technically, our post codes is actual strings, even if it is always 4 digits.

You can store them as integers, to save space. But remember this is a simple DB-trick, and be careful about leading zeroes.

But what about for storing how many files are in a torrent? Integer or string?

That's clearly an integer.

myplacedk
+1  A: 

I don't use a numerical data type unless I expect to do math on the data. Why risk finding a problem in the future for something that you were "sure" would always be a number that someone decides to put a non-numeric character in.

If you aren't going to do math on it make it a string.

Jim Blizard
A: 

The critical determinant, imho, is whether the applcation will need to do numerical arithmetic calculations on the values, if not, then the only reason to use integers is to reduce the storage requirements, (which "May" be important for performance in a critical application - by reducing the width of a table index to increase index performance, for example) but otherwise, is generally not important...

If there's no need to do arithmetic with the values, then a string is best.

Charles Bretana
+3  A: 

Post code is not a number: it's a code or identifier. The same applies to phone numbers.

Number of files in a torrent is an integer.

Not least, in this case you can create a CHECK CONSTRAINT LIKE '[09][09][09][09]' to keep data correct at the database level.

gbn
+1  A: 

It is also good to remember that not all postal codes in all counrties are numbers only. Just because you don't have any addreses in Canada right now doesn't mean you won't have any. I've always gone by the rule, if you want to do math calculations store it in a numeric type, if it is just a code (postalcodes, phones, SSN, partnumber, etc) then I store it as a string. What you want to avoid is any unnecessary casting of the data into another format every time you call it up (for instance code to add the leading zeros if you store the postal code as a number or code to convert a string to a number for calulations). These can be expensive operations if you need to do them repeatedly, espcially when the tables are large and you end up having to do the conversion in the where clause. It is far better to store the data the way you need to use it.

HLGEM
+3  A: 

I always use the following rule:

If you plan on performing mathematical calculations on it (adding/subtracting/etc) make it an integer or other numerical data type.

If you do not plan on performing any types of mathematical calculations on the field, store it as a string.

In the instance of Zip codes, you should never have a time where you need to add to a zip code, or subtract, or multiply two zip codes together. Mathematical functions generally are not used on ZIP codes because they are used as identifiers and not quantities. Therefore you should store your zip code as a string datatype

TheTXI
I completely agree with and use that rationale as well. +1
Cerebrus
+1  A: 

Well as far as postcodes go, this is a typical UK postcode:

EC2R 6PK

In university my databases lecturer told me something that has stuck with me and still holds 15+ years later:

If you perform arithmetic on it, store it as a number. Otherwise it's a string.

Frankly I don't think you can go wrong with that advice.

Obviously you don't perform arithmetic on postcodes, therefore they're strings.

cletus
A: 

Somtimes "always" means "for the next month". I wouldn't count on 4 digit codes not going alphanumeric within the lifespan of my responsibility.

Some dialects of SQL support a dataype that's like NUMBER(4). This works much like a character string, but the alphabet is 0 through 9.

Walter Mitty
A: 

I see no problem with storing a zip code as a number even if you don't expect to perform math operations on it.

In our corporate data warehouse, we are the recipients of data from many legacy systems. As a result, we see a lot of garbage data being used.

Take our case where we have a Geographical identifier that is a zero filled 4-digit "numeric" value. This field is often used to join tables together.

I would take one of two approaches: 1) declare the column as a char field of length 4 and add a CONSTRAINT LIKE '[09][09][09][09]' 2) define it as a numeric length 4 and, if the users want it, format the value WHEN DISPLAYING only.

Approach numeric 1 saves you the hassle of constantly formatting, which is no big deal, but if you are often filtering and even indexing/joining on the column, I'd consider saying that we're off with option #2.

A third reason is that my experience is that people are just plain lazy when it comes to adding constraints to a database or they are ignorant. I think it is more laziness, personally. I find the constraints that do exist are mostly applied as edits in the application which originally captures the data and these that edits are not applied uniformly.

As a result, our data warehouse ends up receiving all sorts of variations including inconsistant pre-filling with zeros or justification of the value.

When you define something as an INTEGER, you automatically get more efficient storage, esp. when indexing on the column, and and edit which everyone understands and is more likely to be applied consistently across legacy systems by database designers of various abilities.

I have no problem with option #1, with the exception of using the field in an index and my concern over the approach of once you accept a field as being an apha numeric, people tend to throw more junk into it.

Take for example, our Peoplesoft employee identifier. Somebody decided to add an "X" in front of an employee 6-char zero filled "number" to designate that the employee is a contractor. This violates a personal practice of mine not to combine separate pieces of information into a single field. This caused all sorts of inconsistency problems across various systems. If this field were a numeric, no one would've tried to do that.

Comments?

Velika