tags:

views:

165

answers:

7

Hello. Just a quick question: What are people's practices when you have to define the (arbitrary) maximum that some array can take in C. So, some people just choose a round number hoping it will be big enough, others the prime number closer to the round number (!), etc., other some more esoteric number, like the prime number closer to... and so on.

I'm wondering, then, what are some best practices for deciding such values?

Thanks.

+5  A: 

There is no general rule. Powers of twos work for buffers, I use 1024 quite often for string buffers in C but any other number would work. Prime numbers are useful for hash tables where simple modulo-hashing works well with prime-number sizes. Of course you define the size as a symbolic constant so that you can change it later.

Juho Östman
hah. ok, so you actually take into account the data structure you will be working with. Thanks.
Dervin Thunk
It's actually not a great idea to use a power of two for buffers, because the system might allocate a couple of record-keeping bytes, and since page sizes are almost always powers of two, it could end up wasting a page on your allocation.
Chuck
+1 for the SIZE tip.
dheerosaur
@Chuck: For which systems does this actually apply? As for me, I would consider an allocator that requires the user to know about its internal book-keeping to work efficiently to be badly leaking the abstraction.
Juho Östman
@Juho Östman: To use something with perfect low-level efficiency, of *course* you'd need to know low-level details. But memory pages *are* allocated in powers of two almost everywhere that I know of, so *any* bookkeeping makes exact power-of-two allocation sizes the most likely worst-case scenario. For one example I know off the top of my head, OS X's allocation granularities are 2^4, 2^9 and 2^22 bytes.
Chuck
A: 

If I need to do this I usually go with either a power of two, or for larger data sets, the number of pages required to hold the data. Most of the time though I prefer to allocate a chunk of memory on the heap and then realloc if the buffer size is insufficient later.

CodeninjaTim
+5  A: 

If I can't pin down a reasonable maximum I tend to use malloc and realloc to grow the array as needed. Using a fixed size array when you can't gurantee that it is large enough for the intended purpose is hazardous.

torak
+3  A: 

Best practice is to avoid arbitrary limits whenever possible.

It's not always possible, so second-best practice is to take an educated estimate of the largest thing that the array is ever likely to need to hold, and then round up by a healthy margin, at least 25%. I tend to prefer powers of ten when I do this, because it makes it obvious on inspection that the number is an arbitrary limit. (Powers of two also often signify that, but only if the reader recognizes the number as a power of two, and most readers-of-code don't have that table memorized much past 216. If there's a good reason to use a power of two and it needs to be bigger than that, write it in hex. End of digression.) Always document the reasoning behind your estimate of the largest thing the array needs to hold, even if it's as simple as "anyone with a single source file bigger than 2GB needs to rethink their coding style" (actual example)

Don't use a prime number unless you specifically need the properties of a prime number (e.g. as Juho mentions, for hash tables -- but you only need that there if your hash function isn't very good -- but often it is, unfortunately.) When you do, document that you are intentionally using prime numbers and why, because most people do not recognize prime numbers on sight or know why they might be necessary in a particular situation.

Zack
A: 

I only define a maximum when I have a strong reason for a particular number to be the maximum. Otherwise, I size it dynamically, perhaps with a sanity-check maximum (e.g. a person's name should not be several megabytes long).

Chuck
A: 

Round numbers (powers of 2) are used because they are often easy for things like malloc to use (many implementations keep up with memory in blocks of various power of two sizes), easier for linkers to use (in the case of static or global arrays), and also because you can use bitwise operations to test for limits of them, which are often faster than < and >.

Prime numbers are used because using prime number sized hash tables is supposed to avoid collision.

Many people likely use both prime number and power of two sizes for things in cases where they don't actually provide any benefit, though.

nategoose
Keep in mind that allocating blocks in powers of 2 on the heap is not always necessarily the best way to go. In fact, in some cases, depending on the implementation, it may be the worst way to go, because a few bytes per blocks are used for bookkeeping.The implementation of the QString class in the Qt library considers this possibility. When regrowing itself (beyond a certain threshold), it expands to the next power of two, minus 12.
scrapdog
Qt actually considers the size of the internal header of QString, it does not take any system allocation headers into account. The needed size is the size of the header and the buffer. If the needed size is less than a page, the allocated space is 8*2^n such that the result is at least the needed size. If the needed size is at least one page, the allocated space is pagesize*2^n. For sizes less than 64, the allocated space is the next number aligned to 8. The algorithm is in the function qAllocMore in src/corelib/tools/qbytearray.cpp.
Juho Östman
A: 

It really isn't possible to predict at the outset what the maximum size could be.

For example, I coded a small cmdline interpreter, where each line of output produced was stored in a char array of size 200. Sufficient for all possible outputs, don't you think?

That was until I issued the env command which had a line with ~ 400 characters(!).

LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;
05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;
32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;
31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;
35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:';

Moral of the story: Try to use dynamic allocation as far as possible.

Kedar Soparkar