ansaurus

Question

Why is that data structures usually have a size of 2^n ?

Answer 1

+1 A:

Because of the simplicity (read also cost) of base 2 arithmetic in electronics: shift left (multiply by 2), shift right (divide by 2).

In the CPU domain, lots of constructs revolve around base 2 arithmetic. Busses (control & data) to access memory structure are often aligned on power 2. The cost of logic implementation in electronics (e.g. CPU) makes for arithmetics in base 2 compelling.

Of course, if we had analog computers, the story would be different.

FYI: the attributes of a system sitting at layer X is a direct consequence of the server layer attributes of the system sitting below i.e. layer < x. The reason I am stating this stems from some comments I received with regards to my posting.

E.g. the properties that can be manipulated at the "compiler" level are inherited & derived from the properties of the system below it i.e. the electronics in the CPU.

jldupont 2009-11-29 20:56:43

Answer 2

+7 A:

Cache lines are usually some multiple of 2 (often 32 or 64). Data that is an integral multiple of that number would be able to fit into (and fully utilize) the corresponding number of cache lines. The more data you can pack into your cache, the better the performance.. so I think people who design their structures in that way are optimizing for that.

int3 2009-11-29 20:57:01

Answer 3

+10 A:

There may be a number of reasons, although many people will as you say just do it out of habit.

One place where it is very useful is in the efficient implementation of circular buffers, especially on architectures where the % operator is expensive (those without a hardware divide - primarily 8 bit micro-controllers). By using a 2^n buffer in this case, the modulo, is simply a case of bit-masking the upper bits, or in the case of say a 256 byte buffer, simply using an 8-bit index and letting it wraparound.

In other cases alignment with page boundaries, caches etc. may provide opportunities for optimisation on some architectures - but that would be very architecture specific. But it may just be that such buffers provide the compiler with optimisation possibilities, so all other things being equal, why not?

Clifford 2009-11-29 21:00:53

exactly, comes back to my point too.

jldupont 2009-11-29 21:04:11

Some systems are also able to do zero copy i/o to user space given a number of criterias, one of them being the to read/write is a multiple of rhe systems page size.

nos 2009-11-29 21:55:30

Answer 4

+2 A:

Another reason in addition to what everyone else has mentioned is, SSE instructions take multiple elements, and the number of elements input is always some power of two. Making the buffer a power of two guarantees you won't be reading unallocated memory. This only applies if you're actually using SSE instructions though.

I think in the end though, the overwhelming reason in most cases is that programmers like powers of two.

Ray Hidayat 2009-11-29 21:02:56

Answer 5

+2 A:

Hash Tables, Allocation by Pages

This really helps for hash tables, because you compute the index modulo the size, and if that size is a power of two, the modulus can be computed with a simple bitwise-and or & rather than using a much slower divide-class instruction implementing the % operator.

Looking at an old Intel i386 book, and is 2 cycles and div is 40 cycles. A disparity persists today due to the much greater fundamental complexity of division, even though the 1000x faster overall cycle times tend to hide the impact of even the slowest machine ops.

There was also a time when malloc overhead was occasionally avoided at great length. Allocation's available directly from the operating system would be (still are) a specific number of pages, and so a power of two would be likely to make the most use of the allocation granularity.

And, as others have noted, programmers like powers of two.

DigitalRoss 2009-11-29 21:02:57

It is noteworthy to say that some hashing algorithms suggests to use hash size equal to some prime number and not to 2^n, to make its behaviour more predictable; these algorithms usually have strong background in theoretical computer science.

liori 2009-11-29 22:04:22

"allocation granularity" Can go horribly wrong, though. If your memory allocator has any per-item overhead (to store the size of the allocation, perhaps), then a power of two actually makes the worst possible use of allocator granularity. Best to ignore the issue and do what you say first, which is to pick the most convenient size, hope for the best, and do platform-specific optimisations if necessary. Or if you want to count on it then do what many games do: write your own allocator to remove all doubt.

Steve Jessop 2009-11-29 22:32:19

It's not just hashtables that use a "modulo the size of the array" operation - circular buffers are another example.

caf 2009-11-29 22:53:23

Answer 6

A:

I was going to use the shift argument, but could think of a good reason to justify it.

One thing that is nice about a buffer that is a power of two is that circular buffer handling can use simple ands rather than divides:

#define BUFSIZE 1024

++index;                // increment the index.
index &= BUFSIZE;       // Make sure it stays in the buffer.

If it weren't a power of two, a divide would be necessary. In the olden days (and currently on small chips) that mattered.

Richard Pennington 2009-11-29 21:03:36

Yuliy 2009-11-29 21:08:04

+1 for "clever" code getting the wrong answer in the name of efficiency.

Roger Pate 2009-11-29 21:11:48

You're absolutely correct. It should have been index %= BUFSIZE, which would have been correct in all cases and also implemented using AND in any decent compiler.

Richard Pennington 2009-11-29 21:53:46

Answer 7

A:

It's also common for pagesizes to be powers of 2.

On linux I like to use getpagesize() when doing something like chunking a buffer and writing it to a socket or file descriptor.

Sean A.O. Harney 2009-11-29 21:04:18

common sounds like you've seen page sizes that were _not_ power of 2. I have a hard time believing any hardware would do that (all page table lookups I've seen are based on specific bits of the virtual address. the remaining bits necessarily form a power of 2 size)

Bahbar 2009-11-29 21:16:58

Depends on how you count. :-) The PDP-8 had pages of 128 words, but 12-bit words, so it was a power of 2 in words but not bits.

Ken 2009-11-29 21:32:19

I said common cause I knew somebody would find some ancient architecture which was an exception to the rule. So the PDP-8 used a 12-bit CPU in that case? CPU bit size is always the word size I think.

Sean A.O. Harney 2009-11-29 21:39:43

The 1970's are "ancient"? Damn, I feel old. :-) How about the IBM AS/400, then, which used 48-bit processors up until they switched to PPC in 1995? The 21st century x86 monoculture is depressing.

Ken 2009-11-29 22:49:40

Who talked about x86 ? powerpc, some risc, 68000... we have lots of playgrounds too :D. Seriously, I agree with you that it depends how you count. I'm standing by my comment that hw implementors will keep on using bit subsets to select pages and/or bit subsests to select elements from within a single page.

Bahbar 2009-11-30 12:41:41

In a desktop or laptop PC, I haven't seen a PPC in years, and I haven't seen a 68K or any RISC this century. Like the Burroughs 5000, they are great architectures, and also dead. :-)

Ken 2009-12-05 05:58:39

go program some consoles or embedded systems. There's still life out there :p

Bahbar 2009-12-05 18:54:53

Answer 8

+1 A:

I can think of a few reasons off the top of my head:

2^n is a very common value in all of computer sizes. This is directly related to the way bits are represented in computers (2 possible values), which means variables tend to have ranges of values whose boundaries are 2^n.
Because of the point above, you'll often find the value 256 as the size of the buffer. This is because it is the largest number that can be stored in a byte. So, if you want to store a string together with a size of the string, then you'll be most efficient if you store it as: SIZE_BYTE+ARRAY, where the size byte tells you the size of the array. This means the array can be any size from 1 to 256.
Many other times, sizes are chosen based on physical things (for example, the size of the memory an operating system can choose from is related to the size of the registers of the CPU etc) and these are also going to be a specific amount of bits. Meaning, the amount of memory you can use will usually be some value of 2^n (for a 32bit system, 2^32).
There might be performance benefits/alignment issues for such values. Most processors can access a certain amount of bytes at a time, so even if you have a variable whose size is let's say) 20 bits, a 32 bit processor will still read 32 bits, no matter what. So it's often times more efficient to just make the variable 32 bits. Also, some processors require variables to be aligned to a certain amount of bytes (because they can't read memory from, for example, addresses in the memory that are odd). Of course, sometimes it's not about odd memory locations, but locations that are multiples of 4, or 6 of 8, etc. So in these cases, it's more efficient to just make buffers that will always be aligned.

Ok, those points came out a bit jumbled. Let me know if you need further explanation, especially point 4 which IMO is the most important.

Edan Maor 2009-11-29 21:04:41

Answer 9

A:

It's makes a nice, round number in base 2. Just as 10, 100 or 1000000 are nice, round numbers in base 10.

If it wasn't a power of 2 (or something close such as 96=64+32 or 192=128+64), then you could wonder why there's the added precision. Not base 2 rounded size can come from external constraints or programmer ignorance. You'll want to know which one it is.

Other answers have pointed out a bunch of technical reasons as well that are valid in special cases. I won't repeat any of them here.

laalto 2009-11-29 21:12:49

10, 100 or 1000000 are nice, round numbers in any base. :)

aib 2009-11-29 23:22:13

Answer 10

A:

In hash tables, 2^n makes it easier to handle key collissions in a certain way. In general, when there is a key collission, you either make a substructure, e.g. a list, of all entries with the same hash value; or you find another free slot. You could just add 1 to the slot index until you find a free slot; but this strategy is not optimal, because it creates clusters of blocked places. A better strategy is to calculate a second hash number h2, so that gcd(n,h2)=1; then add h2 to the slot index until you find a free slot (with wrap around). If n is a power of 2, finding a h2 that fulfills gcd(n,h2)=1 is easy, every odd number will do.

ammoQ 2009-11-29 21:50:02

ansaurus

tags:

views:

answers:

Why is that data structures usually have a size of 2^n ?

Hash Tables, Allocation by Pages

related questions