ansaurus

Question

Answer 1

+5 A:

There is no general rule. Powers of twos work for buffers, I use 1024 quite often for string buffers in C but any other number would work. Prime numbers are useful for hash tables where simple modulo-hashing works well with prime-number sizes. Of course you define the size as a symbolic constant so that you can change it later.

Juho Östman 2010-09-17 19:59:03

hah. ok, so you actually take into account the data structure you will be working with. Thanks.

Dervin Thunk 2010-09-17 20:00:03

It's actually not a great idea to use a power of two for buffers, because the system might allocate a couple of record-keeping bytes, and since page sizes are almost always powers of two, it could end up wasting a page on your allocation.

Chuck 2010-09-17 20:10:53

+1 for the SIZE tip.

dheerosaur 2010-09-17 21:00:13

@Chuck: For which systems does this actually apply? As for me, I would consider an allocator that requires the user to know about its internal book-keeping to work efficiently to be badly leaking the abstraction.

Juho Östman 2010-09-18 21:33:19

@Juho Östman: To use something with perfect low-level efficiency, of *course* you'd need to know low-level details. But memory pages *are* allocated in powers of two almost everywhere that I know of, so *any* bookkeeping makes exact power-of-two allocation sizes the most likely worst-case scenario. For one example I know off the top of my head, OS X's allocation granularities are 2^4, 2^9 and 2^22 bytes.

Chuck 2010-09-18 21:43:28

Answer 2

A:

If I need to do this I usually go with either a power of two, or for larger data sets, the number of pages required to hold the data. Most of the time though I prefer to allocate a chunk of memory on the heap and then realloc if the buffer size is insufficient later.

CodeninjaTim 2010-09-17 20:01:51

Answer 3

+5 A:

If I can't pin down a reasonable maximum I tend to use malloc and realloc to grow the array as needed. Using a fixed size array when you can't gurantee that it is large enough for the intended purpose is hazardous.

torak 2010-09-17 20:02:46

Answer 4

+3 A:

Best practice is to avoid arbitrary limits whenever possible.

It's not always possible, so second-best practice is to take an educated estimate of the largest thing that the array is ever likely to need to hold, and then round up by a healthy margin, at least 25%. I tend to prefer powers of ten when I do this, because it makes it obvious on inspection that the number is an arbitrary limit. (Powers of two also often signify that, but only if the reader recognizes the number as a power of two, and most readers-of-code don't have that table memorized much past 2¹⁶. If there's a good reason to use a power of two and it needs to be bigger than that, write it in hex. End of digression.) Always document the reasoning behind your estimate of the largest thing the array needs to hold, even if it's as simple as "anyone with a single source file bigger than 2GB needs to rethink their coding style" (actual example)

Don't use a prime number unless you specifically need the properties of a prime number (e.g. as Juho mentions, for hash tables -- but you only need that there if your hash function isn't very good -- but often it is, unfortunately.) When you do, document that you are intentionally using prime numbers and why, because most people do not recognize prime numbers on sight or know why they might be necessary in a particular situation.

Zack 2010-09-17 20:04:07

Answer 5

A:

I only define a maximum when I have a strong reason for a particular number to be the maximum. Otherwise, I size it dynamically, perhaps with a sanity-check maximum (e.g. a person's name should not be several megabytes long).

Chuck 2010-09-17 20:04:10

Answer 6

A:

Round numbers (powers of 2) are used because they are often easy for things like malloc to use (many implementations keep up with memory in blocks of various power of two sizes), easier for linkers to use (in the case of static or global arrays), and also because you can use bitwise operations to test for limits of them, which are often faster than < and >.

Prime numbers are used because using prime number sized hash tables is supposed to avoid collision.

Many people likely use both prime number and power of two sizes for things in cases where they don't actually provide any benefit, though.

nategoose 2010-09-17 20:05:41

Keep in mind that allocating blocks in powers of 2 on the heap is not always necessarily the best way to go. In fact, in some cases, depending on the implementation, it may be the worst way to go, because a few bytes per blocks are used for bookkeeping.The implementation of the QString class in the Qt library considers this possibility. When regrowing itself (beyond a certain threshold), it expands to the next power of two, minus 12.

scrapdog 2010-09-17 20:19:47

Qt actually considers the size of the internal header of QString, it does not take any system allocation headers into account. The needed size is the size of the header and the buffer. If the needed size is less than a page, the allocated space is 8*2^n such that the result is at least the needed size. If the needed size is at least one page, the allocated space is pagesize*2^n. For sizes less than 64, the allocated space is the next number aligned to 8. The algorithm is in the function qAllocMore in src/corelib/tools/qbytearray.cpp.

Juho Östman 2010-09-18 19:20:50

Answer 7

A:

It really isn't possible to predict at the outset what the maximum size could be.

For example, I coded a small cmdline interpreter, where each line of output produced was stored in a char array of size 200. Sufficient for all possible outputs, don't you think?

That was until I issued the env command which had a line with ~ 400 characters(!).

LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;
05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;
32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;
31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;
35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:';

Moral of the story: Try to use dynamic allocation as far as possible.

Kedar Soparkar 2010-09-18 19:03:14

ansaurus

tags:

views:

answers:

Maximum values for array sizes in C

related questions