ansaurus

Question

Answer 1

+4 A:

I can see a couple of things here. One is that the code for the two loops looks identical. Secondly, the compiler will probably realise that the variable i and the variable j will always have the same value and optimise one of them away. You should look at the generated assembly and see what is really going on.

Another theory is that the change to the inner body of the loop has affected the cachability of the code - this could have moved it across cache lines or some other thing.

Since the code is so trivial, you may find it difficult to get an accuate timing value, even if you are doing 5000 iterations, you may find that the time is inside the margin for error for the timing code you are using. A modern computer can probably run that in far less than a millisecond - perhaps you should increase the number of iterations?

To see the generated assembly in gcc, specify the -S compiler option:

Q: How can I peek at the assembly code generated by GCC?

Q: How can I create a file where I can see the C code and its assembly translation together?

A: Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:

gcc -O2 -S -c foo.c

will leave the generated assembly code on the file foo.s.

If you want to see the C code together with the assembly it was converted to, use a command line like this:

gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst

which will output the combined C/assembly listing to the file foo.lst.

1800 INFORMATION 2009-03-23 00:11:47

The first thing is just a copy mistake. About the second one - I don't really know how to do it... any tips?

R S 2009-03-23 00:20:06

And also - why would it optimize them away anyway if they are defined volatile?

R S 2009-03-23 00:21:00

I don't really know if it has or not, but since they are stack allocated local variables and you never take the address of them or other weird things, maybe the compiler can tell that they cannot ever be touched by some external source

1800 INFORMATION 2009-03-23 00:26:20

Generally your compiler has options to view the generated assembly - I don't know what yours would be

1800 INFORMATION 2009-03-23 00:27:20

It's gcc. Thanks :)

R S 2009-03-23 00:28:56

More iterations don't fix it :)

R S 2009-03-23 01:04:50

Then in that case, you should definitely look at the generated compiler output. You could be left scratching your head though, if your real code is as simple as this.

1800 INFORMATION 2009-03-23 01:16:16

The volatile qualifiers mean that the optimizer cannot tell what is in i or j; there could apparently be something else that changes them that the compiler cannot detect.

Jonathan Leffler 2009-03-23 05:54:47

Answer 2

A:

It's sometimes difficult to guess at this sort of thing, especially due to the small number of iterations. One thing that might be happening, though, is the increment could be executing on a free integer execution unit, gaining some slight degree of parallelism, since it has no dep on the value of i.

Since you mentioned this was 64 bit os, it's almost certain all these values are in registers, since there's more registers in the x86_64 architecture. Other than that, i'd say perform many more iterations, and see how stable the results are.

Brian Mitchell 2009-03-23 01:25:35

Answer 3

A:

If you are truly trying to test the operation of a piece of code ("j++;" in this case), you're actually better off doing the following:

1/ Do it in two separate executables since there is a possibility that position within the executable may affect the code.

2/ Make sure you use CPU time rather than elapsed time (I'm not sure what "tsc_readCycles_C()" gives you). This is to avoid errant results from a CPU loaded up with other tasks.

3/ Turn off compiler optimization (e.g., "gcc -O0") to ensure gcc doesn't put in any fancy stuff that's likely to skew the results.

4/ You don't need to worry about volatile if you use the actual result, such as placing:

printf ("%d\n",j);

after the loop, or:

FILE *fx = fopen ("/dev/null","w");
fprintf (fx, "%d\n", j);
fclose (fx);

if you don't want any output at all. I can't remember whether volatile was a suggestion to the compiler or enforced.

5/ Iterations of 5,000 seem a little on the low side, where "noise" could affect the readings. Maybe a higher value would be better. This may not be an issue if you're timing a larger piece of code and you've just included "j++;" as a place-holder.

paxdiablo 2009-03-23 01:27:03

Answer 4

A:

When I'm running tests similar to this, I normally:

Ensure that the times are measured in at least seconds, preferably (small) tens of seconds.
Have a single run of the program call the first function, then the second, then the first again, then the second again, and so on, just to see if there are weird cache warmup issues.
Run the program multiple times to see how stable the timing is across runs.

I'm still at a loss to explain your observed results, but if you're sure you've got your functions identified properly (not self-evidently the case given that there were copy'n'paste errors earlier, for example), then looking at the assembler output is the main option left.

Jonathan Leffler 2009-03-23 06:02:37

ansaurus

tags:

views:

answers:

Illogical benchmarking?

related questions