Let's look at the class layout of the two cases.
Without the virtual, you have two base classes ("X" and "Y") with an integer each, and each of those classes have integrated into them a "Base" base class which also has an integer. That is 4 integers, 32-bits each, totalling your 16 bytes.
Offset Size Type Scope Name
0 4 int Base a
4 4 int X x
8 4 int Base a
12 4 int Y y
16 size (Z members would come at the end)
(Edit: I've written a program in DJGPP to get the layout and tweaked the table to account for it.)
Now let's talk about virtual base classes: they replace the actual instance of the class with a pointer to a shared instance. Your "Z" class has only one "Base" class, and both instances of "X" and "Y" point to it. Therefore, you have integers in X, Y, and Z, but you only have the one Z. That means you have three integers, or 12 bytes. But X and Y also have a pointer to the shared Z (otherwise they wouldn't know where to find it). On a 32-bit machine two pointers will add an additional 8 bytes. This totals the 20 that you see. The memory layout might look something like this (I haven't verified it... the ARM has an example where the ordering is X, Y, Z, then Base):
Offset Size Type Scope Name Value (sort of)
0 4 Base offset X ? 16 (or ptr to vtable)
4 4 int X x
8 4 Base offset Y ? 16 (or ptr to vtable)
12 4 int Y y
16 4 int Base a
20 size (Z members would come before the Base)
So the memory difference is a combination of two things: one less integer and two more pointers. Contrary to another answer, I don't believe vtables pay any (edit) direct (/edit) roll in this, since there are no virtual functions.
Edit: ppinsider has provided more information on the gcc case, in which he demonstrates that gcc implements the pointer to the virtual base class by making use of an otherwise empty vtable (i.e., no virtual functions). That way, if there were virtual functions, it wouldn't require an additional pointer in the class instance, requiring more memory. I suspect the downside is an additional indirection to get to the base class.
We might expect all compilers to do this, but perhaps not. The ARM page 225 discusses virtual base classes without mentioning vtables. Page 235 specifically addresses "virtual base classes with virtual functions" and has a diagram indicating a memory layout where there are pointers from the X and Y parts that are separate from the pointers to the vtable. I would advise anyone not to take for granted that the pointer to Base will be implemented in terms of a table.