views:

537

answers:

6

I have found a few threads in regards to this issue. Most people appear to favor using int in their c# code accross the board even if a byte or smallint would handle the data unless it is a mobile app. I don't understand why. Doesn't it make more sense to define your C# datatype as the same datatype that would be in your data storage solution?

My Premise: If I am using a typed dataset, Linq2SQL classes, POCO, one way or another I will run into compiler datatype conversion issues if I don't keep my datatypes in sync across my tiers. I don't really like doing System.Convert all the time just because it was easier to use int accross the board in c# code. I have always used whatever the smallest datatype is needed to handle the data in the database as well as in code, to keep my interface to the database clean. So I would bet 75% of my C# code is using byte or short as opposed to int, because that is what is in the database.

Possibilities: Does this mean that most people who just use int for everything in code also use the int datatype for their sql storage datatypes and could care less about the overall size of their database, or do they do system.convert in code wherever applicable?

Why I care: I have worked on my own forever and I just want to be familiar with best practices and standard coding conventions.

+3  A: 

For the most part, 'No'.

Unless you know upfront that you are going to be dealing with 100's of millions of rows, it's a micro-optimisation.

Do what fits the Domain model best. Later, if you have performance problems, benchmark and profile to pin-point where they are occuring.

Mitch Wheat
I believe you are saying 'no' to using these types, although it's slightly ambiguous with the question asking whether to avoid them. Anyway, it's good advice regarding micro-optimisation.
Noldorin
So both of you are suggesting stick with int accross the board unless its millions of rows and your dealing with micro-optimisation?
Breadtruck
Yes, to stick to int,unless in the Domain a tinyint (for example) makes more sense. When I say micro-optimisation, I mean that it's a bad idea. It's not the way to optimise.
Mitch Wheat
I think "unless in the domain a tinyint makes more sense" is sort of contradictory. The question I am asking is even though I could use a byte or smallint based on what number I am storing it seems that everyone prefers or uses an int, even on the database side of things. Using ints everywhere in the database just because it maps better for programming purposes seems silly to me, yet from my limited .Net experience seems to be probably the easiest way to go. Easy doesn’t always translate the “Right Way”
Breadtruck
@Breadtruck: good point. I'm saying there is no big deal with using int everywhere unless you are going to be dealing with huge number of rows. I'm also saying that if a tinyint makes more sense in the domain, then I would tend to use it because of the extra implicit information it conveys.
Mitch Wheat
+1  A: 

If int is used everywhere, no casting or conversions are required. That is a bigger bang for the buck than the memory you will save by using multiple integer sizes.

It just makes life simpler.

Robert Harvey
+2  A: 

The .NET runtime is optimised for Int32. See previous discussion at http://stackoverflow.com/questions/129023/net-integer-vs-int16

Dan Diplo
+12  A: 

Performance-wise, an int is faster in almost all cases. The CPU is designed to work efficiently with 32-bit values.

Shorter values are complicated to deal with. To read a single byte, say, the CPU has to read the 32-bit block that contains it, and then mask out the upper 24 bits.

To write a byte, it has to read the destination 32-bit block, overwrite the lower 8 bits with the desired byte value, and write the entire 32-bit block back again.

Space-wise, of course, you save a few bytes by using smaller datatypes. So if you're building a table with a few million rows, then shorter datatypes may be worth considering. (And the same might be good reason why you should use smaller datatypes in your database)

And correctness-wise, an int doesn't overflow easily. What if you think your value is going to fit within a byte, and then at some point in the future some harmless-looking change to the code means larger values get stored into it?

Those are some of the reasons why int should be your default datatype for all integral data. Only use byte if you actually want to store machine bytes. Only use shorts if you're dealing with a file format or protocol or similar that actually specifies 16-bit integer values. If you're just dealing with integers in general, make them ints.

jalf
I think another case where it's o.k. to use byte/short is when you receive arguments in a methods/property and you know, by definition, that it should be constrained to 8/16-bit values.Otherwise the only (worser in my opinion) options I see are:1. Ignoring bad values2. Trimming bad values3. Raising an exception
maayank
@maayank: Even if you know the values are always constrained, there are still two problems: 1) what if I change the code later, so that larger values may be passed, which would overflow my needlessly narrow 8/16-bit variable, and 2) why would I do this, when a 32-bit value is generally faster?
jalf
+1  A: 

You would have to be dealing with a few BILLION rows before this makes any significant difference in terms of storage capacity. Lets say you have three columns, and instead of using a byte-equivalent database type, you use an int-equivalent.

That gives us 3 (columns) x 3 (bytes extra) per row, or 9 bytes per row.

This means, for "a few million rows" (lets say three million), you are consuming a whole extra 27 megabytes of disk space! Fortunately as we're no longer living in the 1970s, you shouldn't have to worry about this :)

As said above, stop micro-optimising - the performance hit in converting to/from different integer-like numeric types is going to hit you much, much harder than the bandwidth/diskspace costs, unless you are dealing with very, very, very large datasets.

Jon Grant
+1  A: 

Not that I didn't believe Jon Grant and others, but I had to see for myself with our "million row table". The table has 1,018,000. I converted 11 tinyint columns and 6 smallint columns into int, there were already 5 int & 3 smalldatetimes. 4 different indexes used a combo of the various data types, but obviously the new indexes are now all using int columns.

Making the changes only cost me 40 mb calculating base table disk usage with no indexes. When I added the indexes back in the overall change was only 30 mb difference overall. So I was suprised because I thought the index size would be larger.

So is 30 mb worth the hassle of using all the different data types, No Way! I am off to INT land, thanks everyone for setting this anal retentive programmer back on the straight and happy blissful life of no more integer conversions...yippeee!

Breadtruck