views:

345

answers:

6

Recently, I was challenged in a recent interview with a string manipulation problem and asked to optimize for performance. I had to use an iterator to move back and forth between TCHAR characters (with UNICODE support - 2bytes each).

Not really thinking of the array length, I made a curial mistake with not using size_t but an int to iterate through. I understand it is not compliant and not secure.

int i, size = _tcslen(str);    
for(i=0; i<size; i++){
   // code here
}

But, the maximum memory we can allocate is limited. And if there is a relation between int and register sizes, it may be safe to use an integer.

E.g.: Without any virtual mapping tools, we can only map 2^register-size bytes. Since TCHAR is 2 bytes long, half of that number. For any system that has int as 32-bits, this is not going to be a problem even if you dont use an unsigned version of int. People with embedded background used to think of int as 16-bits, but memory size will be restricted on such a device. So I wonder if there is a architectural fine-tuning decision between integer and register sizes.

+2  A: 

AFAIK, there is no direct link between register size and the size of int.

However, since you know for which platform you're compiling the application, you can define your own type alias with the sizes you need:

Example

#ifdef WIN32 // Types for Win32 target
#define Int16 short
#define Int32 int
// .. etc.
#elif defined // for another target

Then, use the declared aliases.

Cătălin Pitiș
+3  A: 

Going strictly by the standard, there is no guarantee as to how big/small an int is, much less any relation to the register size. Also, some architectures have different sizes of registers (i.e: not all registers on the CPU are the same size) and memory isn't always accessed using just one register (like DOS with its Segment:Offset addressing). With all that said, however, in most cases int is the same size as the "regular" registers since it's supposed to be the most commonly used basic type and that's what CPUs are optimized to operate on.

Tal Pressman
+2  A: 

I am not totally aware, if I understand this correct, since some different problems (memory sizes, allocation, register sizes, performance?) are mixed here.

What I could say is (just taking the headline), that on most actual processors for maximum speed you should use integers that match register size. The reason is, that when using smaller integers, you have the advantage of needing less memory, but for example on the x86 architecture, an additional command for conversion is needed. Also on Intel you have the problem, that accesses to unaligned (mostly on register-sized boundaries) memory will give some penality. Off course, on todays processors things are even more complex, since the CPUs are able to process commands in parallel. So you end up fine tuning for some architecture.

So the best guess -- without knowing the architectore -- speeedwise is, to use register sized ints, as long you can afford the memory.

Juergen
+3  A: 

The C++ standard doesn't specify the size of an int. (It says that sizeof(char) == 1, and sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long).

So there doesn't have to be a relation to register size. A fully conforming C++ implementation could give you 256 byte integers on your PC with 32-bit registers. But it'd be inefficient.

So yes, in practice, the size of the int datatype is generally equal to the size of the CPU's general-purpose registers, since that is by far the most efficient option.

If an int was bigger than a register, then simple arithmetic operations would require more than one instruction, which would be costly. If they were smaller than a register, then loading and storing the values of a register would require the program to mask out the unused bits, to avoid overwriting other data. (That is why the int datatype is typically more efficient than short.)

(Some languages simply require an int to be 32-bit, in which case there is obviously no relation to register size --- other than that 32-bit is chosen because it is a common register size)

jalf
Thanks, this answers my performance oriented concerns and shows it is not so silly to use an integer if platform choice is stable.
Burcu Dogan
That int is the size of a register is in general true on 16 and 32bit platforms, but all 64bit platforms(That I know about) have int as 32bit and long as 64bit. (And they have 64bit registers).
Martin Tilsted
but these 64-bit architectures generally still have all the 32-bit instructions, so there is no loss of performance. On a "pure" 64-bit architecture, an int would likely be 64-bit. Maybe. Good point though.
jalf
+1  A: 

I don't have a copy of the standard, but my old copy of The C Programming Language says (section 2.2) int refers to "an integer, typically reflecting the natural size of integers on the host machine." My copy of The C++ Programming Language says (section 4.6) "the int type is supposed to be chosen to be the most suitable for holding and manipulating integers on a given computer."

You're not the only person to say "I'll admit that this is technically a flaw, but it's not really exploitable."

Max Lybbert
+1  A: 

There are different kinds of registers with different sizes. What's important are the address registers, not the general purpose ones. If the machine is 64-bit, then the address registers (or some combination of them) must be 64-bits, even if the general-purpose registers are 32-bit. In this case, the compiler may have to do some extra work to actually compute 64-bit addresses using multiple general purpose registers.

If you don't think that hardware manufacturers ever make odd design choices for their registers, then you probably never had to deal with the original 8086 "real mode" addressing.

Mr Fooz