views:

96

answers:

5

If i takes larger datatype where i know i should have taken datatype that was sufficient for possible values that i will insert into a table will affect any performance in sql server in terms of speed or any other way.

eg.

IsActive (0,1,2,3) not more than 3 in any case.

I know i must take tinyint but due to some reasons consider it as compulsion, i am taking every numeric field as bigint and every character field as nVarchar(Max)

Please give statistics if possible, to let me try to overcoming that compulsion.

I need some solid analysis that can really make someone rethink before taking any datatype.

EDIT

Say, I am using

SELECT * FROM tblXYZ where IsActive=1

How will it get affected. Consider i have 1 million records Whether it will only have memory wastage only or perforamance wastage as well. I know more the no of pages more indexing effort is required hence performance will also get affected. But I need some statistics if possible.

+4  A: 

You are basically wasting 7 bytes per row on bigint, this will make your tables bigger and thus less will be stored per page so more IO will be needed to bring the same amount of rows back if you used tinyint. If you have a billion row table it will add up

SQLMenace
+1  A: 

A lot of it comes down to space. Your bigints are going to take 8 times the space (8 byte vs 1 byte for tinyint). Your nvarchar is going to take twice as many bytes as a varchar. Making it max won't affect much of anything.

This will really come into play if you're doing look ups on values. The indexes you will (hopefully) be applying will be much larger.

Mike M.
If it's varchar(max) it won't be possible to make it part of an index.
Martin Smith
I do agree with Martin, No indexing can be applied on such datatypes like varchar(max), xml, image etc. Full text search has to be applied on varchar(max) to search a record if string exceeds range of 4000 char i suppose.
Shantanu Gupta
Turns out I was slightly wrong in my statement above in that it can be an included column to make the index covering but that still isn't going to help in lookups on that column.
Martin Smith
Good point! I was talking about doing lookups on the bigints though.
Mike M.
lookups means, indexes will become large and slow when performing lookups on there referal tables. Is it so _
Shantanu Gupta
+1  A: 
Joel Coehoorn
In my case IsActive field seems to be bit, thats our mistake in naming. It contains status basically. like deleted, pending etc. around 5,6 fields
Shantanu Gupta
+1  A: 

All the 'wasted' space also comes into play for DR (if you are 4-6 times the size due to poor data type configuration, your recovery can be just as long).

Not only do the larger pages/extents require more IO to serve.... you also decrease your memory cache with the size. With billions of rows, depending on your server you could be dealing with constant memory pressure and clearing memory cache simply because you chose a datatype that was 8 times the size you needed it to be.

Gator
+1, for slight difference, but still looking for statistics.
Shantanu Gupta
+2  A: 

Defining this in statistical terms is somewhat difficult, you can literally do the maths and work out the additional IO overhead.

Let's take a table with 1 million rows, and assume no page padding, compression and use some simple figures.

Given a table whose row size is 100 bytes, that contains 10 tinyints. The number of rows per page (assuming no padding / fragmentation) is 80 (8096 / 100)

By using Bigints, a total of 70 bytes would be added to the row size (10 fields that are 7 bytes more each), giving a row size of 170 bytes, and reducing the rows per page to 47.

For the 1 million rows this results in 12,500 pages for the tinyints, and 21277 pages for the Bigints.

Taking a single disk, reading sequentially, we might expect 300 IOs per second sequential reading, and each read is 8k (e.g. a page).

The respective read times given this theoretical disk is then 41.6 seconds and 70.9 seconds - for a very theoretical scenario of a made up table / row.

That however only applies to a scan, under an index seek, the increase in IO would be relatively small, depending on how many of the bigint's were in the index or clustered key. In terms of backup and restore as mentioned, the data is expanded out and the time loss can be calculated as linear unless compression is at play.

In terms of memory caching, each byte wasted on a page on disk is a byte wasted in the memory, but only applies to the pages in memory - this is were it will get more complex, since the memory wastage will be based on how many of the pages are sitting in the buffer pool, but for the above example it would be broadly 97.6 meg of data vs 166meg of data, and assuming the entire table was scanned and thus in the buffer pool, you would be wasting ~78 megs of memory.

Andrew
I would like to change accepted answer if some one can provide me some statistical data along with such a nice conceptual information provided by Andrew
Shantanu Gupta
In what sense statistical data - if you detail the specifics of your table, we can calculate the difference in size, IO. Tell us what the clustered key is and we can also calculate what the affect on that is in terms of index levels and index storage. But to expec there to be one magical statistical value that tells you how to manage SQL performance is not going to happen.
Andrew