The correct answer is that, from a formal point of view, the code is labelled as "undefined behaviour". This has nothing to do with logic or Superior Laws. Simply, the standard tries to define what can be done and what can't be, or what produces "good" code and what produces "bad" code, what it is guaranteed to work and what is not, and between standard and what actually happens there are people that have to interpret the standard, in order to write a compiler or in order to write reliable code or to judge someone else code.
So, from this point of view, there's nothing to be understood: the standard defines the code as "undefined behaviour" (of course, indirectly: your code it not present as example in the standard's papers!). You learn the standard, apply it blindly to analyse your code, and say if it is "good" or not. Formally and blindly often are two dangerous words, so let's take a look at your code of our own. Since after all, for languages that has one, it is the standard that defines the language, and not viceversa (even though common practice or implementation can trigger a modification of the standard), let's start from a code that is considered good.
int x=1;
printf("%d %d",x+1,x+1);
Compiler does not need to know anything about printf, even though extensions allow to say if a function is a "printf-like" function, to add extra checking... but these are compiler tricks to help programmers, not mandatory features. Moreover, printf is described in the standard (not the language part) too, and this is why I added the %d
: the code of printf
, using the stdargs variable argument feature (another stuff described by the standard), scans the first argument to use properly the others.
So, if the descriptor (%d) and the argument (x+1) does not match, we are doing something wrong. I don't know if the standard labels this "undefined behaviour". However we can be sure strange things may happen. (Compiler may check for it, but the feature is not mandatory, more likely it is an "extension" provided by many if not all compilers).
Now let's take a look at printf. In order to pick its arguments, it will use va_start
, va_arg
and va_end
. Let's take into account only va_arg
, which is a macro by the way. It is the way you can pick the next argument and cast it to the datum you expect. The cast is driven by the "%d", "%s" and so on, and the cast determines also the size of the argument, so that the macro can access the next.
Knowing this, you can easily imagine that if things do not match, odd things may happen.
Now, what if we call va_arg
once more (with respect to the actual number of argument passed)? Of course, we go and pick a datum on the stack (or wherever) that do not "match" anything real or usable.
What does it happen if we pick one less? Nothing special! It is perfectly doable and "legal". Of course, it could be a clue about the fact that you're doing something wrong, but it could be also intentional (in general, in the printf particular case, it would be only odd)... Nonetheless, you can do it; again, compilers may treat printf and similar in a special way and check if each "%" has its matching argument(s) and warn if more (a dangerous case) or less (the no-harm case) arguments are passed.
So
int x=1;
printf("%d",x+1, x+1);
is still good.
Now let's talk briefly about side-effects. Considered as "atomic" expression, both x+1
have no side-effects. The x
is unchanged. Someone in comments said that implementation could indeed modify x
in memory. This is ok, if and only if the modification is reverted and all "atomically", i.e. the modification can't be seen outside the expression. If it could happen, even the perfectly legal code I've written so far could fail.
Now, let's modify the code to change it into the bad "undefined behaviour".
int x=1;
printf("%d",++x, x+1);
Why is this code "undefined behaviour"? It is so since standard says so. No special reason, but indeed the fact must be a consequence of something; currently the only reason I can see is the order of evaluation; no matter what other people says here, at least currently, since explanations are citing the standard "rule" and not the reason behind the rule (the parallel argument fits "Unspecified Behaviour" well so that the existance of that Undefined Behaviour is still unjustified - this is also why in this answer it will seem I am confusing Undefined with Unspeficied still)
Moreover in a lazy attempt to pick a fast definition for "sequence point", I landed on wikipedia, where the note 3 cite the same piece of the standard cited in another answer here, but with the observation that "Accessing the value of j inside f therefore invokes undefined behavior", and I interpret this so that printf("%d", inc_x(&x), x_plus_1(x)) is Undefined Behaviour too, and not only Unspecified behaviour, as said in that answer.
If the standard would dictate that expressions must be evaluated in order, "from left to right" in out example, we could be able to predict the passed arguments. In fact, first will be evaluated ++x
, and the second arg pushed would be 2, and x would be now 2; then x+1
would be evaluated as 2+1
, i.e. 3, and the third argument would be 3.
Given an order of evaluation, we can predict the real values of the arguments.
It is worth noting that many languages do not allow assignments into expression (++x
could be read as x = x + 1
). If we would be forced to write x++; printf("%d %d", x, x+1)
there wouldn't be problems...
Anyway... If the order of evaluation is not forced upon implementation, each implementation can pick up one; even the same implementation, according to optimizations or for whatever reasons, could use different order of evaluation in the same code.
So we have two possibilities, as already written by another user: (2, 2) (first x+1, then ++x), or (2, 3) (first ++x, then x+1). Other options are not possible since they would break even "legal" code.
But your printf ignores (and vararg functions can do it legally) the third value, so it happens that the output is predictable and it is always 2.
Are there any reason why one could say that indeed "special" implementation could output a value different from 2? Theoretically could an evil programmer write a compiler that purposely gives "random outputs" when an UB, as defined by the standard, is met?
The short answer is no, since doing so the compiler could break its ability to compile correctly some "legal" code (it suffices the existance of one of such a code); demonstrating it formally would be long - and I am not sure I am able to. But it is not important.
What I think it shouldn't be forgotten, is the fact that behind UBs or whatever, there are reasons that should sound "logical" within the "system". Saying a priori that a complex expression like printf("%d", ++x, x+1)
must be UB, is illogical. It must instead be a consequence of a "relaxed condition", or of a basic "axiom" we can "backtrack".
The complex description provided in the standard by someone
Between the previous and next sequence
point an object shall have its stored
value modified at most once by the
evaluation of an expression.
Furthermore, the prior value shall be
read only to determine the value to be
stored
does not explain too much of those reasons, maybe even though we also give the standard definition for "sequence point", "stored value", "evaluation", "expression", to be pedantic.
If I consider printf("%d", ++x, x+1)
as an expression and sequence points ++x
and x+1
among others, stored value of the object x is modified only once. The whole "expression", modifies the stored value of x only once. So, in this "interpretation" of ungiven definitions, what makes the code bad should be the second part, from "furthermore".... .... ... ...
To teachers students could answer simply: "according to current standard, it is formally undefined behaviour; but implementations not purposely built to work with this particular UB, should output 2 because " (and here you can add the explanation I've given before).