ansaurus

Question

Statistical analysis for performance measurement on usage of bigger datatype wherever they were not required at all

Answer 1

+4 A:

You are basically wasting 7 bytes per row on bigint, this will make your tables bigger and thus less will be stored per page so more IO will be needed to bring the same amount of rows back if you used tinyint. If you have a billion row table it will add up

SQLMenace 2010-06-08 13:17:51

Answer 2

+1 A:

A lot of it comes down to space. Your bigints are going to take 8 times the space (8 byte vs 1 byte for tinyint). Your nvarchar is going to take twice as many bytes as a varchar. Making it max won't affect much of anything.

This will really come into play if you're doing look ups on values. The indexes you will (hopefully) be applying will be much larger.

Mike M. 2010-06-08 13:23:24

If it's varchar(max) it won't be possible to make it part of an index.

Martin Smith 2010-06-08 13:26:43

I do agree with Martin, No indexing can be applied on such datatypes like varchar(max), xml, image etc. Full text search has to be applied on varchar(max) to search a record if string exceeds range of 4000 char i suppose.

Shantanu Gupta 2010-06-08 13:31:16

Turns out I was slightly wrong in my statement above in that it can be an included column to make the index covering but that still isn't going to help in lookups on that column.

Martin Smith 2010-06-08 13:33:31

Good point! I was talking about doing lookups on the bigints though.

Mike M. 2010-06-08 13:42:00

lookups means, indexes will become large and slow when performing lookups on there referal tables. Is it so _

Shantanu Gupta 2010-06-08 13:44:20

Answer 3

+1 A:

Joel Coehoorn 2010-06-08 13:32:48

In my case IsActive field seems to be bit, thats our mistake in naming. It contains status basically. like deleted, pending etc. around 5,6 fields

Shantanu Gupta 2010-06-08 13:42:29

Answer 4

+1 A:

All the 'wasted' space also comes into play for DR (if you are 4-6 times the size due to poor data type configuration, your recovery can be just as long).

Not only do the larger pages/extents require more IO to serve.... you also decrease your memory cache with the size. With billions of rows, depending on your server you could be dealing with constant memory pressure and clearing memory cache simply because you chose a datatype that was 8 times the size you needed it to be.

Gator 2010-06-08 14:05:09

+1, for slight difference, but still looking for statistics.

Shantanu Gupta 2010-06-08 14:13:06

Answer 5

+2 A:

Defining this in statistical terms is somewhat difficult, you can literally do the maths and work out the additional IO overhead.

Let's take a table with 1 million rows, and assume no page padding, compression and use some simple figures.

Given a table whose row size is 100 bytes, that contains 10 tinyints. The number of rows per page (assuming no padding / fragmentation) is 80 (8096 / 100)

By using Bigints, a total of 70 bytes would be added to the row size (10 fields that are 7 bytes more each), giving a row size of 170 bytes, and reducing the rows per page to 47.

For the 1 million rows this results in 12,500 pages for the tinyints, and 21277 pages for the Bigints.

Taking a single disk, reading sequentially, we might expect 300 IOs per second sequential reading, and each read is 8k (e.g. a page).

The respective read times given this theoretical disk is then 41.6 seconds and 70.9 seconds - for a very theoretical scenario of a made up table / row.

That however only applies to a scan, under an index seek, the increase in IO would be relatively small, depending on how many of the bigint's were in the index or clustered key. In terms of backup and restore as mentioned, the data is expanded out and the time loss can be calculated as linear unless compression is at play.

In terms of memory caching, each byte wasted on a page on disk is a byte wasted in the memory, but only applies to the pages in memory - this is were it will get more complex, since the memory wastage will be based on how many of the pages are sitting in the buffer pool, but for the above example it would be broadly 97.6 meg of data vs 166meg of data, and assuming the entire table was scanned and thus in the buffer pool, you would be wasting ~78 megs of memory.

Andrew 2010-06-08 15:05:53

I would like to change accepted answer if some one can provide me some statistical data along with such a nice conceptual information provided by Andrew

Shantanu Gupta 2010-06-09 05:58:09

In what sense statistical data - if you detail the specifics of your table, we can calculate the difference in size, IO. Tell us what the clustered key is and we can also calculate what the affect on that is in terms of index levels and index storage. But to expec there to be one magical statistical value that tells you how to manage SQL performance is not going to happen.

Andrew 2010-06-09 08:59:40

ansaurus

tags:

views:

answers:

Statistical analysis for performance measurement on usage of bigger datatype wherever they were not required at all

related questions