views:

171

answers:

5

Howdy all,

I've been learning to program for a Mac over the past few months (I have experience in other languages). Obviously that has meant learning the Objective C language and thus the plainer C it is predicated on. So I have stumbles on this quote, which refers to the C/C++ language in general, not just the Mac platform.

With C and C++ prefer use of int over char and short. The main reason behind this is that C and C++ perform arithmetic operations and parameter passing at integer level, If you have an integer value that can fit in a byte, you should still consider using an int to hold the number. If you use a char, the compiler will first convert the values into integer, perform the operations and then convert back the result to char.

So my question, is this the case in the Mac Desktop and IPhone OS environments? I understand when talking about theses environments we're actually talking about 3-4 different architectures (PPC, i386, Arm and the A4 Arm variant) so there may not be a single answer.

Nevertheless does the general principle hold that in modern 32 bit / 64 bit systems using 1-2 byte variables that don't align with the machine's natural 4 byte words doesn't provide much of the efficiency we may expect.

For instance, a plain old C-Array of 100,000 chars is smaller than the same 100,000 ints by a factor of four, but if during an enumeration, reading out each index involves a cast/boxing/unboxing of sorts, will we see overall lower 'performance' despite the saved memory overhead?

+3  A: 

The processor is very very fast compared to the memory speed. It will always pay to store values in memory as chars or shorts (though to avoid porting problems you should use int8_t and int16_t). Less cache will be used, and there will be fewer memory accesses.

Doug Currie
This is the principle I had been working under until I encountered the quote in my original question. Am I therefore right in saying you believe that assertion to just be plain wrong or merely not a black and white truth. I will confess it, along with the common responses to similar questions on here (see the related "difference in speed between char and integer arrays?" and "Memory alignment on modern processors?") still maintain the argument that time critical applications (such as those dealing with large data sets like graphics) benefit from an "aligned approach".
Dean P
It would be bad to align a value so that it crosses a word boundary, or especially a cache line boundary. For small (8 or 16 bit) values, it's easy to avoid it. The quote in your question doesn't say much about memory accesses. I agree that using byte or short doesn't generally buy anything when values are in registers.
Doug Currie
The bit shifts to get stuff in and out of sub-word variables are incredibly fast compared to the cost of an L1 cache miss. And you can fit more small variables in cache at once.The exception is mutable data shared between threads. In this case you may actually want to use an entire cache line (about 128 bytes) per variable even if it's boolean, to prevent false sharing.Embedded processors don't typically have multilevel transparent caches, but most still have on-chip SRAM which is faster than the off-chip RAM by an order of magnitude, so it pays to fit the working set on chip.
Ben Voigt
+2  A: 

Can't speak for PPC/Arm/A4Arm, but x86 has the ability to operate on data as if it was 8bit, 16bit, or 32bit (64bit if an x86_64 in 64bit mode), although I'm not sure if the compiler would take advantage of those instructions. Even when using 32bit load, the compiler could AND the data with a mask that'd clear the upper 16/24bits, which would be relatively fast.

Likely, the ability to fit far more data into the cache would at least cancel out the speed difference... although the only way to know for sure would be to actually profile the code.

Kitsune
Thank you for your reply! Of course, tricky questions like mine are always hypothetical and they can only truly be answered by profiling. I suppose the muse I was posing was is the original quote I cited in my question an accepted universal truth (by the seasoned C-dev) or merely an opinion presented more as fact. Is the reality that judgements need to be made as to the best circumstances, for example, random access vs sequential, reading vs writing, small sets (that fit in registers) vs large sets (that require paging).
Dean P
+1  A: 

Of course there is a need to use data structures less than the register size of the target machine. Imagine your are storing text data encoded as UTF-8, or ASCII in memory where each character is mostly like a byte in size, do you want to store the characters as 64 bit quantities?

The advice you are looking is a warning not to over optimizes. You have to balance the savings in space versus the computation performance of you choice.

I wouldn't worry to much about it, today's modern CPUs are complicated enough that its hard to make this kind of judgement on your own. Choose the obvious datatype and let the compiler worry about the rest.

BeWarned
A: 

The addressing model of the x86 architecture is that the basic unit of memory is 8 bit bytes.

This is to simplify operation with character strings and decimal arithmetic.

Then, in order to have useful sizes of integers, the instruction set allows using these in units of 1, 2, 4, and (recently) 8 bytes.

Mike Dunlavey
Thank you for your reply! Since you mention instruction sets... With x86 working in units of 8 bit bytes is it therefore incorrect to say, as the original source I cited did, that it 'prefers' to work in sets of 4 bytes, especially such as when using SSE? What the author seemed to suggest is that in time critical code (graphics handling say) declaring a function with char parameters will include an inherent char to int to char conversion which may prove to be slower than if we had declared our parameter as int, even if we know in advance our argument value will never exceed 255.
Dean P
@Dean: The CPU has registers, most likely 32 bits. That's what it "likes". Showing my age, the first machine to address bytes was the IBM 360. Before that (IBM 7094), they would tend to have 36-bit addressable words, but string manipulation was very awkward in such machines. In modern machines, almost any arithmetic involving bytes involves expanding to 32 bits and back, but it's very quick. Only in the very tightest code would it make a difference. Very rarely do people write such tight code, though they imagine they always do.
Mike Dunlavey
Thank you again Mike, in the time since I posted this question I have been trying to perform some profiler tests in order to identify a best use / best case scenario. Empirical evidence seems, as you correctly identify, to require a magnitude of iterations in the multiple millions before there's a discernible difference in enumeration performance.
Dean P
A: 

A Fact to remember, is that most software development takes place writing for different processors than most of us here deal with on a day to day basis.

C and assembler are common languages for these.

About ten billion CPUs were manufactured in 2008. About 98% of new CPUs produced each year are embedded.

Romain Hippeau