views:

152

answers:

4
+2  Q: 

Analysis of C code

Hello!

Here is function that i am writing on 64 bit linux machine.

void myfunc(unsigned char* arr) //array of 8 bytes is passed by reference
{
   unsigned long a = 0; //8 bytes
   unsigned char* LL = (unsigned char*) &a;

   LL[0] = arr[6];
   LL[1] = arr[3];
   LL[2] = arr[1];
   LL[3] = arr[7];
   LL[4] = arr[5];
   LL[5] = arr[4];
   LL[6] = arr[0];
   LL[7] = arr[2];
}

Now my questions are:

  1. Will variable 'a' be stored in a register so that It wont be accessed again and again from RAM or chache?
  2. Working on 64 bit architecture, should I assume that 'arr' array will be stored in a register as functions parameters are stored in a register in 64 bit arch?
  3. How efficient is Pointer type casting? my guess is that It should be inefficient at all?

Any help would be appriciated.

Regards

+2  A: 

You might do better to use explicit shift and mask instructions to accomplish this, instead of using array indexing.

The array operations are going to make it harder for the compiler to use registers for this, because there typically are not instructions that do things like "load 8 bits from the 3rd byte of register A". (An optimizing compiler could figure out that it's possible to do this with shifts/masks, but I'm not sure how likely that is).

David Gelhar
+2  A: 
  1. a cannot be stored in a register, as you have taken its address. (valdo correctly points out that a really smart compiler could optimize the array accesses into bit operations and leave a in a register, but I've never seen a compiler do that, and I'm not sure it would wind up being faster).
  2. arr (the pointer itself) is stored in a register (%edi, on amd64). The contents of the array are in memory.
  3. Pointer type casting by itself often generates no code at all. However, doing silly things with type casts can lead to very inefficient code, or even to code whose behavior is undefined.

It looks like you are trying to permute the bytes in an array and then shove them into a number, and the machine code your example generates is not terribly bad for that. David's suggestion to use shift and mask operations instead is good (this will also avoid problems if your code ever needs to run on a big-endian machine), and there are also the SSE vector permute instructions, but I have heard they're kind of a pain to use.

Incidentally, you should make the return type of your example function be unsigned long and put return a; at the very end; then you can use gcc -O2 -S and see exactly what you get from compilation. Without the change to return a, GCC will cheerfully optimize away the entire body of the function, since it has no externally visible side effects.

Zack
Ok so there is little chance that either 'a' or 'arr' will be stored in registers. what about cache hits in this code? Can i assume that reading and writing to variables 'arr' and 'a' produced 100% cache hits?
Jewel Thief
Yeah, that's probably a safe assumption - the only thing I can think of is if you were so unlucky as to get context switched in the middle of the function, then they might not be in cache anymore when control came back to your process.
Zack
A: 
  1. The question about if the variable a will be stored in the register is a matter of optimization. Since there's no volatile modifier IMHO a smart compiler will do this.

  2. It's a question of the calling convention. If by convention a single pointer parameter is transferred in a register - so will be arr.

  3. Pointer type casting is not an operation that CPU interprets. There's no code generated for it. It just the information for the compiler about what do you mean.

(Actually sometimes casting does produce extra code, but this is related to multiple inheritance and polymorphism)

valdo
GCC doesn't optimize `a` into a register, and I'm not sure it would be faster in this case. I filed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45861 to see what they think.
Zack
Zack! I saw yout post at gcc website. The second bitwise operations function that you wrote in your C file does a lot of type casting on char array. Can you tell me how expansive these type castings shall be? Sorry I dont know much about assembly code so couldnt read it.
Jewel Thief
Why are you so worried about the cost of type casts? In general, casts in C cost zero or one instructions. They're not at all like conversion operations in high-level languages. (In *this* case, they don't do anything by themselves, but they force the compiler to emit shift instructions that operate on the full width of a register -- which is necessary, or all the shifts would produce zeroes, not at all what you wanted.)
Zack
A: 

Depends on your optimization level. You can examine the assembly to answer your questions. With gcc, use the "-S" flag.

gcc -S -O0 -o /tmp/xx-O0.s /tmp/xx.c
gcc -S -O3 -o /tmp/xx-O3.s /tmp/xx.c

The generated assembly is completely different. (Be sure to make the return a; change suggested by Zack.)

See also this message for hints on how to generate a mixed c/assembly listing (which quickly becomes useless with optimization).

bstpierre
`-O4` is exactly the same as `-O3`, fyi (it is unlikely that GCC will ever add an optimization level higher than `-O3`).
Zack
@Zack - thanks, fixed.
bstpierre