The reason is that the compiler has plenty to do already, without also being a full-fledged interpreter, able to evaluate arbitrary C++ code.
If they stick with single expressions, they limit the number of cases to consider dramatically. Loosely speaking, it simplifies things a lot that there are no semicolons in particular.
Every time a ;
is encountered, it means the compiler has to deal with side effects. It means that some local state was changed in the previous statement, which the following statement is going to rely on. It means that the code being evaluated is no longer just a series of simple operations each taking as its inputs the previous operation's output, but require access to memory as well, which is much harder to reason about.
In a nutshell, this:
7 * 2 + 4 * 3
is simple to compute. You can build a syntax tree which looks like this:
+
/\
/ \
* *
/\ /\
7 2 4 3
and the compiler can simply traverse this tree performing these primitive operations at each node, and the root node is implicitly the return value of the expression.
If we were to write the same computation using multiple lines we could do it like this:
int i0 = 7;
int i1 = 2;
int i2 = 4;
int i3 = 3;
int i4 = i0 * i1;
int i5 = i2 * i3;
int i6 = i4 + i5;
return i6;
which is much harder to interpret. We need to handle memory reads and writes, and we have to handle return statements. Our syntax tree just became a lot more complex. We need to handle variable declarations. We need to handle statements which have no return value (say, a loop, or a memory write), but which simply modify some memory somewhere. Which memory? Where? What if it accidentally overwrites some of the compiler's own memory? What if it segfaults?
Even without all the nasty 'what-if's, the code the compiler has to interpret just got a lot more complex. The syntax tree might now look something like this: (LD
and ST
are load and store operations respectively)
;
/\
ST \
/\ \
i0 3 \
;
/\
ST \
/\ \
i1 4 \
;
/\
ST \
/ \ \
i2 2 \
;
/\
ST \
/\ \
i3 7 \
;
/\
ST \
/\ \
i4 * \
/\ \
LD LD \
| | \
i0 i1 \
;
/\
ST \
/\ \
i5 * \
/\ \
LD LD \
| | \
i2 i3 \
;
/\
ST \
/\ \
i6 + \
/\ \
LD LD \
| | \
i4 i5 \
LD
|
i6
Not only does it look a lot more complex, it also now requires state. Before, each subtree could be interpreted in isolation. Now, they all depend on the rest of the program. One of the LD leaf operations doesn't make sense unless it is placed in the tree so that a ST
operation has been executed on the same location previously.