ansaurus

Question

how the code behaves different for java and C compiler ?

Answer 1

+4 A:

In C++, the result is indeterminate, i.e., not specified or guaranteed to be consistent - the compiler is free to do whatever suits it best at any time based on sequence points.

I suspect the same for Java [and C# etc.]

Ruben Bartelink 2009-11-24 08:37:07

Where does it state that this is indeterminate? By my understanding of the operator, the result should always be 6.

Thorarin 2009-11-24 08:44:40

Could you explain this statement? There are usually very specific rules in C and C++ for when L-values are updated in an expression... ?

Dathan 2009-11-24 08:45:21

Read about sequence points at http://en.wikipedia.org/wiki/Sequence_point

Ruben Bartelink 2009-11-24 08:49:38

Pre/post decrement cases are [obviously] not simple L-values

Ruben Bartelink 2009-11-24 08:50:22

Ruben Bartelink 2009-11-24 08:52:17

Ah, good reading. I was unaware of the implications of that. More specifically than just the concept of sequence points, come to find out, the C++ standard says that the behavior of any expression is undefined if the prior value of a modified expression is accessed for any purpose other than to determine the value to be stored. So, the behavior of array[x++] is undefined - even though most compilers in my experience play nice and defer evaluation of the postdecrement until the sequence point at the end of the assignment.

Dathan 2009-11-24 08:58:00

Michael Foukarakis 2009-11-24 09:00:11

unlike c, this order IS defined in java.

Blindy 2009-11-24 09:28:00

@Blindy: Any citation? What is the ordering?

Ruben Bartelink 2009-11-24 09:47:51

Answer 2

A:

I don't know for sure, but I'm guessing it's because Java evaluates the postdecrement on the last x-- before evaluating the -= operator, whereas C++ evaluates the -= first and the postdecrement after the entire rest of the expression is done.

Dathan 2009-11-24 08:46:36

In C++, the order is not guaranteed to be consistent across compilers (or even within the same compiler in the same file). It's more lilely to be specified in Java, but not sure

Ruben Bartelink 2009-11-24 08:53:42

@Rubin: Nothing about the expression is guaranteed to be consistent in C or C++, since it's undefined behavior according to the standard. I don't know about Java.

David Thornley 2009-11-24 17:09:25

Answer 3

+27 A:

You are wrong when you say that the output of this code considered as a C program is 6.

Considered as a C program, this is undefined. You just happened to get 6 with your compiler, but you could just as well have gotten 24, segmentation fault, or a compile-time error.

See the C99 standard, 6.5.2:

Between the previous and next sequence point an object shall have its stored value modiﬁed at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.71)

--x-x-- is explicitly forbidden by this paragraph.

EDIT:

Aaron Digulla writes in the comments:

Is it really undefined?

Did you notice that I linked to the C99 standard and indicated the paragraph that says this is undefined?

gcc -Wall (GCC 4.1.2) doesn't complain about this and I doubt that any compiler would reject this code.

The standard describes some behaviors as "undefined" precisely because not all ways for a C program to be nonsense can be detected reliably at compile-time. If you think that "no warning" should mean everything's fine, you should switch to another language than C. Many modern languages are better defined. I use OCaml when I have a choice, but there are countless other well-defined languages.

There is a reason why it returns 6 and you should be able to explain it.

I did not notice your explanation of why this expression evaluated to 6. I hope you don't spend too much time writing it, because for me it returns 0.

Macbook:~ pascalcuoq$ cat t.c
#include <stdio.h>

int main(int argc, char **argv)
{
  int y;
  printf("argc:%d\n", argc);
  y = --argc - argc--;
  printf("y:%d\n", y);
  return 0;
}
Macbook:~ pascalcuoq$ gcc t.c
Macbook:~ pascalcuoq$ ./a.out 1 2 3 4 5 6 7 8 9
argc:10
y:0

This is the time at which you argue that there is a bug in my compiler (since it doesn't return the same thing as yours).

Macbook:~ pascalcuoq$ gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5490~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5490)

Aaron also writes:

As an engineer, you should still be able to explain why it returns one result or the other.

Exactly! I gave the simplest explanation why one might get 6: the result is explicitly specified in C99 as undefined behavior, and it was in earlier standards too.

and:

Lastly, please show a compiler which warns about this construct.

To the best of my knowledge, no compiler warns about *(&x - 1) where x is defined by int x;. Are you claiming that this construct is valid C and that a good engineer should be able to predict the result because no compiler warns about it? This construct is undefined, just like the one being discussed.

Lastly, if you absolutely need warnings to believe there is a problem, consider using a verification tool such as Frama-C. It needs to make some assumptions that are not in the standard to capture some existing practices, but it correctly warns about --x-x-- and most other undefined C behaviors.

Pascal Cuoq 2009-11-24 08:55:20

I stand corrected - thanks for the reference!

Paul Dixon 2009-11-24 09:05:26

Is it really undefined? I haven't look at C99, yet, but in ANSI C, there is no rule why this should be illegal. I don't have many C compilers at hand but `gcc -Wall` (GCC 4.1.2) doesn't complain about this and I doubt that any compiler would reject this code.

Aaron Digulla 2009-11-24 12:04:09

@Aaron: yes, it's really undefined. If you think Pascal is quoting the standard incorrectly, but haven't looked at it yourself, then there's not much else that can be done to convince you. The sentence he quotes is also in C89, at "3.3 Expressions" (I'm only looking at a mirror of a draft, though, I don't have a copy of the C89 standard).

Steve Jessop 2009-11-24 12:32:30

@Aaron: llegal != undefined. There's also this language from appendix J.2 (Undefined behavior): "Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored (6.5)." The expression `--x - x--` attempts to modify the object more than once between sequence points, hence the behavior is undefined.

John Bode 2009-11-24 14:59:34

@Aaron: "Undefined" means the standard puts absolutely no restrictions on what an implementation can do, so anything it does (refusing to compile, doing something maybe halfway reasonable, sending geeky jokes to your mother-in-law) is conformant. Apparently gcc 4.1.2 tries to do something that might be what you were thinking, which is perfectly legal, although a warning would be nice. Other compilers may guess your meaning differently, or refuse to guess, and those would also be perfectly legal.

David Thornley 2009-11-24 16:05:34

Let me put it differently: the standard says that the behavior is undefined but saying "fobidden" is too strong, IMO. Compilers compile such code and you get a result. So maybe the standard doesn't say what result you can expect but that doesn't mean it's random. There is a *reason* why it returns 6 and you should be able to explain it.

Aaron Digulla 2009-11-24 16:06:49

@Aaron: C and C++ compilers may compile such code; it's not guaranteed. You will probably get a result. There is indeed a reason why it returns 6 on one particular trial on one particular version of one particular compiler, and you could explain it by studying how that particular compiler parses expressions. There is no particular reason to believe it would return 6 consistently, or that another compiler would return 6. Trying to establish expectations for undefined behavior is foolish and generally futile.

David Thornley 2009-11-24 16:35:48

As an engineer, you should still be able to explain why it returns one result or the other. Moreover, standing on the standard doesn't mean you're right. Prove your assumption by posting a compiler version or options which returns something else. Lastly, please show a compiler which warns about this construct. The standard can say a lot which means exactly nothing unless someone writes the code that implements it.

Aaron Digulla 2009-11-26 15:01:24

@aaron I replied to your comments in my answer.

Pascal Cuoq 2009-11-26 20:50:19

"you could just as well have gotten 24, segmentation fault, or a compile-time error"Or, indeed, "a suffusion of yellow" -- http://www.thateden.co.uk/dirk/

Cowan 2009-11-26 20:52:23

+1 for the lengthy explanation.

Aaron Digulla 2009-11-27 12:06:16

Oh, and your example is wrong. It should be `int y = argc; y -= ...` (your have an assignment instead of a -=). You get 0 because argc-argc == 0.

Aaron Digulla 2009-11-27 12:08:23

I just posted the complete flow of my reasoning. I'm pretty sure, now, that you will always get 6 for a C compiler on a register based CPU. Oh, and I hope you have as much fun with this as I do :)

Aaron Digulla 2009-11-27 12:56:27

Answer 4

+2 A:

Well... which do you think is correct, and what is your reasoning?

I believe x is pretty well determined for the first three steps

x = 10
x is decremented (its initial value is used first)
x is decremented again (its resulting value is used after)

Now x == 8. But look at what you're doing to it here (pardon the insertion of human-friendly whitespace):

x -= --x - x--

Which could be compiled to (this is what I'd do if I had to include the ++ and -- operators in my language — the side effects are identified first and removed to the fore and aft of the statement as a whole):

--x
t = x - x
x -= t
x--

Giving a result of x == 8. Or maybe it's been compiled to (the statement is reduced first by subexpression):

t1 = --x     // t1 = 7, x = 7
t2 = x--     // t2 = 7, x = 6
t = t1 - t2  // t = 7 - 7 = 0
x -= t       // x = 6

Or the subexpressions could have landed the other way round:

t1 = x--     // t1 = 8, x = 7
t2 = --x     // t2 = 6, x = 6
t = t2 - t1  // t = 6 - 8 = -2
x -= t       // x = 8

In the absence of a formal description of the operators behaviour in such a case, who's to say which is correct?

Edmund 2009-11-24 08:57:16

In C, all possible ways of handling `x -= --x - x--` are perfectly correct according to the C standard, including either of your examples, returning 42, or formatting your hard drive. It's undefined behavior.

David Thornley 2009-11-24 16:12:42

Answer 5

+6 A:

How is the term evaluated? The right hand side --x - x-- evaluates to 0 for both Java and C but it changes x. So the question is: How does -= work? Does it read x before the right hand side (RHS) is evaluated and then subtracts the RHS or does it do that after the RHS was evaluated. So do you have

tmp = x // copy the value of x
x = tmp - (--x - x--) // complicated way to say x = x

or

tmp = (--x - x--) // first evaluate RHS, from left to right, which means x -= 2.
x = x - tmp // substract 0 from x

In Java, here is the rule:

A compound assignment expression of the form E1 op= E2 is equivalent to E1 = (T)((E1) op (E2)), where T is the type of E1, except that E1 is evaluated only once. (see 15.26.2 Compound Assignment Operators)

This means the value of is copied, so the pre- and post-decrements have no effect. Your C compiler probably uses a different rule.

For C, this article might help:

The moral is that writing code that depends on order of evaluation is a bad programming practice in any language.

[EDIT] Pascal Cuoq (see below) insist that the standard says the result is undefined. This is probably correct: I stared the the part of he copied out of the standard for a couple of minutes and couldn't understand what that sentence said. I guess I'm not alone here :) So I went to see how the C interpreter works which I developed for my master thesis. It's not standard compliant but I understand how it works. Guess, I'm a Heisenberg-type guy: I can have either at any precision but not both ;) Anyway.

When parsing this construct, you get this parse tree:

        +---- (-=) ----+
        v     -=       v
        x        +--- (-) ----+
                 v            v
              PREDEC x    POSTDEC x

The standard states that modifying x three times (once on the left and twice in the two decrement ops), leaves x undefined. Okay. But a compiler is a deterministic program, so when it accepts some input, it will always produce the same output. And most compilers work the same. I think we all agree that any C compiler will in fact accept this input. What outputs can we expect? Answer: 6 or 8. Reasoning:

x-x is 0 for any value of x.
--x-x is 0 for any value of x, because it can be written as --x, x-x
x-x-- is 0 because the result of the minus operator is calculated before the post-decrement.

So if the pre-decrement has no influence on the result and neither has the post-decrement has no influence. Also, there is no inference between the two operators (using them both in the same expression as in a = --y - x-- doesn't change their behavior). Conclusion: all and any C compiler will return 0 for --x - x-- (well, except the buggy ones).

Which leaves us with my original assumption: The value RHS has no influence on the result, it always evaluates to 0 but it modifies x. So the question is how is -= implemented? There are quite a few factors which play a role here:

Does the CPU have an native operator for -=? Register based CPU do (in fact, they only have such operators. To do a+b, they have to copy a into a register and then they can +=b to it), stack based CPUs don't (they push all the values on the stack and then use operators which use the topmost stack elements as operands).
Are the values saved on the stack or in registers? (Another way to ask the first question)
Which optimization options are active?

To go any further, we must look at the code:

#include <stdio.h>

int main() {
  int x = 8;
  x -= --x - x--;
  printf("x=%d\n", x);
}

When compiled, we get this assembler code for the assignment (x86 code):

    .loc 1 4 0
    movl    $8, -4(%rbp)    ; x = 8
    .loc 1 5 0
    subl    $1, -4(%rbp)    ; x--
    movl    $0, %eax        ; tmp = 0
    subl    %eax, -4(%rbp)  ; x -= tmp
    subl    $1, -4(%rbp)    ; x--
    .loc 1 6 0
    movl    -4(%rbp), %esi  ; push `x` into the place where printf() expects it

The first movl sets x to 8 which means -4(%rbp) is x. As you can see, the compiler actually notices x-x and optimizes that to 0 as predicted (even without any optimization options). We also have the two expected -- operations which means the result must always be 6.

So who is right? We both are. Pascal is right when he says that the standard doesn't define this behavior. But that doesn't mean it's random. All the pieces of the code have a well-defined behavior, so the behavior of the sum can't suddenly be undefined (unless there is something else missing - but not in this case). So even though the standard doesn't treat this problem, it's still deterministic.

For stack based CPUs (that don't have any registers), the result should be 8 since they will copy the value of x before they start evaluating the right hand side. For register based CPUs, it should always be 6.

Morale: The standard is always right but if you must understand, look at the code ;)

Aaron Digulla 2009-11-24 09:24:42

The program is not valid C, and its behavior is *undefined* by the language.

Stephen Canon 2009-11-24 15:29:53

Okay, let's put it this way: It's undefined by the standard but I'd be willing to bet that you can't find a compiler which behaves differently. Also, since the compiler compiles it, it's valid. Maybe the behavior is undefined by the standard but the code itself is valid.

Aaron Digulla 2009-11-24 16:04:17

Valid in a sense that it does not have random behavior, it's not a "bug", or even much of a hack.

Aaron Digulla 2009-11-24 16:08:04

@Aaron: Valid in the sense that the standard doesn't forbid it. There is no guaranteed result. If your compiler documentation doesn't mention it, the compiler could come up with any result, and this can vary. Compiler implementors don't tend to care about what their compilers do with undefined behavior, so in general the compiler will do whatever was convenient to define for similar things. Note that order of evaluation is never guaranteed for arithmetic operators (or much else), and can vary within the same compiler (e.g., to assist common subexpression evaluation).

David Thornley 2009-11-24 16:31:39

Whether or not a compiler accepts it, it is *not* valid C. Quoting from the standard (6.5, paragraph 2): "Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression."

Stephen Canon 2009-11-24 17:03:20

The `--x - x--` in C has undefined behavior, meaning that if it "evaluates" to something, the result is essentially random.

AndreyT 2009-11-24 18:24:04

@Aaron: I expect it to be very easy to find a C compiler that, in some circumstances, returns a different thing. I'd even expect it to vary by optimization level.

erikkallen 2009-11-24 19:45:05

@AndreyT,@erikkallen: looking at the grammar tree, I doubt it can return anything but 8 or 6 and, based on my knowledge of C, I'm pretty sure you'll get 6 everywhere. Please prove me wrong and post a compiler version/options which returns something else.

Aaron Digulla 2009-11-26 15:05:06

@aaron Done: my compiler gives the value 0 to this expression. Details in my answer.

Pascal Cuoq 2009-11-26 21:02:03

@Pascal: That's because your example isn't the same as above. See my comment below.

Aaron Digulla 2009-11-27 12:57:21

Some years ago, I wrote code to explore all the permutations of x++ + x++ for both pre- and post-increment and decrement, for all arithmetic operators. I compiled and ran the code on a Mac (PPC, OS 8ish, MPW) and a PC (x86, Win NT, VC). The results *did not match*. The two compilers used different strategies for applying the side effects. IIRC, MPW deferred all side effects until after all the expressions had been evaluated, whereas VC applied side effects earlier. Even *that* wasn't 100% consistent; it really depended on the individual expression.

John Bode 2009-12-03 16:03:19

Continuing from the above comment -- You *cannot* say that the result will always be `x` for a given CPU, because exactly when the side effect is applied will vary based on the compiler, optimization settings, even the expression itself.

John Bode 2009-12-03 16:06:24

@John Bode: You may get different results for other permutations than "--x - x--" but this specific expression *must* evaluate to 0. Always. The side effects of the pre- and post-decrement can't influence the result. Never. No way. Even when hell freezes over. Note that I'm not talking about the assignment, just the RHS of the "-=".

Aaron Digulla 2009-12-03 16:25:04

*If* the side effects of `--x` and `x--` are deferred until after the subtraction or the assignment (which is possible, although not apparently common), then you will get a different result. `--x` does not mean "decrement x right now"; it means, give me the current value of x subtracted by 1, and sometime before the next sequence point, change the value of x. You cannot rely on that side effect being applied immediately.

John Bode 2009-12-03 18:49:09

@John Bode: And your point is? Please tell me something that isn't already in my answer. "x-x == 0" for any x. The decrement doesn't influence that, so the value of the RHS must be 0. Things would be different for "--x - ++x", for example.

Aaron Digulla 2009-12-04 08:22:38

My point is that the expression `--x - x--` will not evaluate to 0 *if* the side effects of the pre- and post-decrement operators are deferred until after the subtraction.

John Bode 2009-12-04 21:54:58

Answer 6

A:

whoops, wrong explanation

Fred 2009-11-24 18:38:33

Answer 7

+3 A:

The fundamental difference between Java and C is that in C language the temporal relationships between different actions (what happens "before" and what happens "after") is determined by so called sequence points. Sequence points implement the concept of time in the process of execution of C program. If two actions are separated from each other by a sequence point, then you can say that one action happens "before" and another happens "after". When two actions have no sequence point between them, there's no defined temporal ordering between them and there's no way to say what happens "first" and what happens "later". Consider a pair of adjacent sequence points in C program as the minimal indivisible unit of time. What happens within that unit of time cannot be described in terms of "before" and "after". One might as well think that between two adjacent sequence points everything is happening simultaneously. Or in random order, whichever you prefer.

In C language the statement

x -= --x - x--;

has no sequence points inside. It only has a sequence point at the very beginning and at the very end. This means that there's no way to say in which order this expression statement is evaluated. It is indivisible it terms of C time, as described above. Every time someone tries to explain what happens here by imposing a specific temporal ordering, they are just wasting their time and producing utter nonsense. This is actually the reason why C language does not (and cannot) make any attempts to define the behavior of expressions with multiple modifications of the same object (x in the above example). The behavior is undefined.

Java is significantly different in this respect. In Java the concept of time is defined differently. In Java the expressions are always evaluated in strict order defined by the operator precedence and associativity. This imposes a strict temporal ordering on the events that take place during the evaluation of the above expression. This makes the result of this expression defined, as opposed to C.

AndreyT 2009-11-24 18:44:50

This is absolutely correct, but the use of the word "time" is rather misleading. Expressions that precede a sequence point don't actually need to be evaluated chronologically earlier than those that follow the sequence point; rather, the observable behavior of the code needs to be indistinguishable from what the behavior would be if such a chronological ordering were strictly observed. This an important detail, as a compiler will often reorder instructions across sequence points in the process of optimization.

Stephen Canon 2009-11-24 19:03:32

Yes, but I consider this to be different levels of abstraction. The "time" that I describe is the time of "abstract C++ machine", which evaluates everytihing in strict order as long as that strict order is defined by the language specification. The C++ standard describes the behavior of that "abstract machine", while practical implementations just need to emulate the observable behavior of that machine.

AndreyT 2009-11-24 19:12:26

I would just prefer that be made clear, which it doesn't seem to be in your post.

Stephen Canon 2009-11-24 21:16:41

ansaurus

tags:

views:

answers:

how the code behaves different for java and C compiler ?

related questions