views:

480

answers:

10

I have id values for products that I need store. Right now they are all integers, but I'm not sure if the data provider in the future will introduce letters or symbols into that mix, so I'm debating whether to store it now as integer or string.

Are there performance or other disadvantages to saving the values as strings?

A: 

Integers are more efficient from a storage and performance perspective. However, if there is a remote chance that alpha characters may be introduced, then you should use a string. In my opinion, the efficiency and performance benefits are likely to be negligible, whereas the time it takes to modify your code may not be.

G Mastros
+1  A: 

I'm not sure how good databases are at comparing whether one string is greater than another, like it can with integers. Try a query like this:

SELECT * FROM my_table WHERE integer_as_string > '100';
Mike Mazur
Databases are very good at that (however, they will use string comparison rules instead of numeric comparison rules). But how often will you want to compare product IDs like that?
Greg Hewgill
Whenever you sort by part number (product ID)...and that's a problem if the data is currently presented in numeric order and that changes to string order.
Jonathan Leffler
+3  A: 

It really depends on what kind of id you are talking about. If it's a code like a phone number it would actually be better to use a varchar for the id and then have your own id to be a serial for the db and use for primary key. In a case where the integer have no numerical value, varchars are generally prefered.

googletorp
+1 for recommending an integer PK if he converts to strings
colithium
+8  A: 

Unless you really need the features of an integer (that is, the ability to do arithmetic), then it is probably better for you to store the product IDs as strings. You will never need to do anything like add two product IDs together, or compute the average of a group of product IDs, so there is no need for an actual numeric type.

It is unlikely that storing product IDs as strings will cause a measurable difference in performance. While there will be a slight increase in storage size, the size of a product ID string is likely to be much smaller than the data in the rest of your database row anyway.

Storing product IDs as strings today will save you much pain in the future if the data provider decides to start using alphabetic or symbol characters. There is no real downside.

Greg Hewgill
+1 Exactly what I would have said. :) Use a numeric type only if you're storing quantities or magnitudes.
cheduardo
+1 The point is, the ID is _conceptually_ not a number, as evidenced by the possibility of mixing letters with digits.
Svante
A: 

As answered in Integer vs String in database

In my country, post-codes are also always 4 digits. But the first digit can be zero.

If you store "0700" as an integer, you can get a lot of problems:

It may be read as an octal value If it is read correctly as a decimal value, it gets turned into "700" When you get the value "700", you must remember to add the zero I you don't add the zero, later on, how will you know if "700" is "0700", or someone mistyped "7100"? Technically, our post codes is actual strings, even if it is always 4 digits.

You can store them as integers, to save space. But remember this is a simple DB-trick, and be careful about leading zeroes.

But what about for storing how many files are in a torrent? Integer or string?

That's clearly an integer.

If the ID would ever start with zero, store it as in interger.

joshhunt
-1: Leading zeros for integers are a terrible idea in Python. 0900 is (exception Python 3.0) a syntax error. If there are leading zeroes, you MUST use a string.
S.Lott
Yeah, I assumed joshhunt meant to say "If the ID would ever start with a zero, store it as a _string_". Sorting order is another consideration with this stuff (also mentioned on that linked thread).
cheduardo
+6  A: 

Do NOT consider performance. Consider meaning.

ID "numbers" are not numeric except that they are written with an alphabet of all digits.

If I have part number 12 and part number 14, what is the difference between the two? Is part number 2 or -2 meaningful? No.

Part numbers (and anything that doesn't have units of measure) are not "numeric". They're just strings of digits.

Zip codes in the US, for example. Phone numbers. Social security numbers. These are not numbers. In my town the difference between zip code 12345 and 12309 isn't the distance from my house to downtown.

Do not conflate numbers -- with units -- where sums and differences mean something with strings of digits without sums or differences.

Part ID numbers are -- properly -- strings. Not integers. They'll never be integers because they don't have sums, differences or averages.

S.Lott
Your response is good. One nitpick: there are numbers where the sum is meaningless. What is the sum of two temepratures? The difference is still meaningful.
Walter Mitty
The sum of two temperatures means twice their average temperature ;) And if you use Kelvins, the sum means the sum of their internal energies.
Kiv
@Walter Mitty: You are oh, so correct in spotting the issue with any measure that is actually an average. Averages (and other quantized samples of continuous data) are what the data warehouse people call a "semi-additive" dimension -- they don't sum -- but they do average. Semi-additive dimensions (like temperature) are still numbers. ID's are still not numbers.
S.Lott
+1  A: 

The space an integer would take up would me much less than a string. For example 2^32-1 = 4,294,967,295. This would take 10 bytes to store, where as the integer would take 4 bytes to store. For a single entry this is not very much space, but when you start in the millions... As many other posts suggest there are several other issues to consider, but this is one drawback of the string representation.

Milhous
+1  A: 

I've just spent the last year dealing with a database that has almost all IDs as strings, some with digits only, and others mixed. These are the problems:

  1. Grossly restricted ID space. A 4 char (digit-only) ID has capacity for 10,000 unique values. A 4 byte numeric has capacity for over 4 billion.
  2. Unpredictable ID space coverage. Once IDs start including non-digits it becomes hard to predict where you can create new IDs without collisions.
  3. Conversion and display problems in certain circumstances, when scripting or on export for instance. If the ID gets interpreted as a number and there is a leading zero, the ID gets altered.
  4. Sorting problems. You can't rely on the natural order being helpful.

Of course, if you run out of IDs, or don't know how to create new IDs, your app is dead. I suggest that if you can't control the format of your incoming IDs then you need to create your own (numeric) IDs and relate the user provided ID to that. You can then ensure that your own ID is reliable and unique (and numeric) but provide a user-viewable ID that can have whatever format your users want, and doesn't even have to be unique across the whole app. This is more work, but if you'd been through what I have you'd know which way to go.

Anil G

Anil G
A: 
  1. You won't be able to do comparisons correctly. "... where x > 500" is not same as ".. where x > '500'" because "500" > "100000"
  2. Performance wise string it would be a hit especially if you use indexes as integer indexes are much faster than string indexes.

On the other hand it really depends upon your situation. If you intend to store something like phone numbers or student enrollment numbers, then it makes perfect sense to use strings.

sharjeel
A: 

Better use independent ID and add string ID if necessary: if there's a business indicator you need to include, why make it system ID?

Main drawbacks:

  1. Integer operations and indexing always show better performance on large scales of data (more than 1k rows in a table, not to speak of connected tables)

  2. You'll have to make additional checks to restrict numeric-only values in a column: these can be regex whether on client or database side. Anyway, you'll have to guarantee somehow that there's actually integer.

  3. And you will create additional context layer for developers to know, and anyway someone will always mess this up :)

terR0Q