SQL / WINDOWS - optimal char lengths - DOES IT MATTER

views:

224

answers:

+1 Q:

SQL / WINDOWS - optimal char lengths - DOES IT MATTER

Three little questions for the clever people of stackoverflow....

WINDOWS:

max file name length in windows is 255 - why is this and why not 256?
why is the maximum fully qualified filename (full path) stated as 32,767 when in reality it has to be a max of 255/260 to avoid any errors.

SQL:

when creating chars or varchars fields in sql does the lengths you specify affect performance. Example: does a 256 varchar perform better than a 260 or a 4096 better than a 4000?

Thanks for any help given.

+2 A:

Lots of things used to be a max of 255, because that is the maximum value that an unsigned 1 byte ( 8-bit ) number can represent 11111111 = 255. to get to 256, you have to have 9 bits (1 0000 0000)

As to the second part I'm not sure, but the sql question is: it used to be an issue much more than it is today, What you specified controlled the maximum size of each row of data, and the Database engine would have to allocate enough space for each row to be able to store that maximum. So the values you entered affected how much space was allocated on disk for each record, and therefopre also affected the maximum number of records per "page" of disk space... The fewer the records per page, the more Disk I/O operations it takes to retrieve any specific number of records... As disk I/O is the overriding factor in Database performance, this had a major impact.

Nowadays, however, modern RDBMS systems are coded to optimize that problem away, I Think, by dynamically controlling how many records are actually stored on a page based on the data that is actually in the record instead of on the maximums you specify.

Charles Bretana 2009-07-30 13:19:16

This is wrong. 0 to 255 is 256 VALUES. The string length was often required to be 255 bytes because you needed one byte to store the terminating zero.

Christopher 2009-07-30 13:31:09

Not one less bit: one less byte, because you'd want that line up for even bytes. And not just one less byte- one less byte for every file on the system.

Joel Coehoorn 2009-07-30 13:37:05

@Christopher, When there are 256 possible values, the last value is 255 more than the first one. So if the first one is zero, the last one is .... 255. but if that doesn't convince you just add up the individual bit values in 1111 1111 ...... 1 + 2 + 4 + 8 + ... + 64 + 128 = 255.

Charles Bretana 2009-07-30 13:56:17

@Christopher, RDBMS systems don;t store terminating zeros for strings. The max was 255 because they were using a single byte to store the value of MAX SIZE. It's exactly the same reason why the highest memory address on a machine using a CPU with a 32 bit address register is (2^32 -1) because that's the value of a 32 bit integer with all the bits set to one.

Charles Bretana 2009-07-30 14:01:13

@Charles: there are 256 possible values in a byte. The zero always counts. Additionally, it doesn't matter whether the RDBMS used a terminating 0 or a byte to indicate the length of the string. The fact of the matter is that they had a block with a maximum size of 256 , and had to indicate how much was actually in the block. Apparently they chose to use internal boundary tags of some sort. Finally, while you are correct that the maximum address in memory is (2^32)-1, there are actually 2^32 SLOTS to store data. Just like you can use 8 bits to store 256 bytes.

Christopher 2009-07-30 19:35:28

@Christopher, you're (may be) focusing on the wrong thing... if the RDBMS uses a byte to record the users choice for how many characters to store in a column, how is the system going to represent the value 256 in that byte? Please include in your answer the 8 individual bit values you think the RDBMS system would store in that 8-bit byte...

Charles Bretana 2009-07-30 19:56:48

@Christopher, to put it another way. Yes you're right, there are 256 distinct values possible in a byte, but the 256 values that are possible start at 0 and end at 255, not at 256. The rdbms designers could have started at 1 and gone to 256, (for that matter they could start at 100 and go to 355 if they wanted) butthey didn't they started at value = 0 and ended at value= 255

Charles Bretana 2009-07-30 19:59:51

+1 A:

There are 256 possible values because the sequence starts with zero :)

Matias 2009-07-30 13:20:42

+3 A:

1) see Charles's answer above

2) come one now. this one is too easy. Path != Filename

3) What matters is the relationship between page size and record size. If you use varchars and have lots of data changes where fields go from being short to long, SQL spends time moving records to different pages. If your char fields are really long such that not very many records fit on a page, it may hurt performance a bit. If the length of the data varies a lot, varchars will make better use of storage space. And they don't have those pesky spaces on the end of your data that you have to strip off all the time.

Bill 2009-07-30 13:27:27

SQL Server 2000 has a hard 8KB field, page and row size limit. In SQL Server 2005+ the 8KB limit on pages and fields still applies but rows can overflow (with dire consequences to performance of course).

I've been told by a few DBAs that ideal row sizes should be divisors of 8KB though none of them have ever been able to explain exactly how to calculate row sizes accurately.

CptSkippy 2009-07-30 13:35:57

Not necessarily exact divisors. But the more rows on a page, the better performance can be expected. (So a row of 4000 bytes is much better than one of 4100 bytes, since you can fit two instead of one row on a page.)

Arjan Einbu 2009-07-30 13:45:55

8KB is the size of a page, some of that page is used for some internal structures. You're left with about 8030 bytes or so...

Arjan Einbu 2009-07-30 13:47:14

+1 A:

You'll want to give yourself considerably more room than 255 characters to store filenames in your database, if you're storing path information as well.

While the Windows shell has a maximum path of 255 characters, the NTFS file system actually supports filenames up to 32,000 characters for compatibility with UNIX.

It's trivially easy to fake out the Windows shell and trick it into storing paths/filenames that exceed 255 characters. Mapping a drive or sharing a folder can do it, for example.

I discuss this in further detail in this blog post on 256 character filenames should be enough for anybody.

dthrasher 2009-07-30 13:40:06

A varchar(256) will perform just the same as a varchar(8000) or a varchar(5). They are stored the same... (In MS SQL Server)

Arjan Einbu 2009-07-30 13:51:01

+1 A:

Byte 0 = length, 8 bit unsigned so allows 255 characters. Zero = empty string. The other 255 are, er, the 255 limit.

char(4000) is very different to char(10). SQL Server always pads the data if it's not null.

There is no difference between varchar(4000) and varchar(10), unless you want to index it, where you have a limit of 900 bytes

SQL Server before 7 had varchar limit of 255 and no unicode support. That was a pain...

gbn 2009-07-30 13:57:37

CHAR is fixed length and will take exactly what you specify as the length regardless of whether you fill it or not.

VARCHAR is variable length and only takes what you put in it, however having a data model with the freedom to blast in 4000 characters when the real world application requires 30 is asking for trouble.

Do field sizes matter? On small to medium systems probably not because the hardware is sufficient to carry a poor design. On large high traffic high concurrency systems every byte counts.

2009-07-30 14:31:58

ansaurus

tags:

views:

answers:

SQL / WINDOWS - optimal char lengths - DOES IT MATTER

related questions