I take it for granted that we're discussing a traditional "stack-based" virtual machine here.
You pop off the current function's local stack preserving the still-relevant parts in non-stack "registers" (where the "relevant parts" are, clearly, the argument for the forthcoming recursive tail call), then (once all of the function's local stack and arguments are cleaned up) you push the arguments for the recursive call. E.g., suppose the function you're optimizing is something like:
def aux(n, tot):
if n <= 1: return tot
return aux(n-1, tot * n)
which without optimization might produce byte-code symbolically like:
AUX: LOAD_VAR N
LOAD_CONST 1
COMPARE
JUMPIF_GT LAB
LOAD_VAR TOT
RETURN_VAL
LAB: LOAD_VAR N
LOAD_CONST 1
SUBTRACT
LOAD_VAR TOT
LOAD_VAR N
MULTIPLY
CALL_FUN2 AUX
RETURN_VAL
the CALL_FUN2 means "call a function with two arguments". With the optimization, it could become sometime like:
POP_KEEP 2
POP_DISCARD 2
PUSH_KEPT 2
JUMP AUX
Of course I'm making up my symbolic bytecodes as I go along, but I hope the intent is clear: POP_DISCARD n
is the normal pop that just discards the top n
entries from the stack, but POP_KEEP n
is a variant that keeps them "somewhere" (e.g. in an auxiliary stack not directly accessible to the application but only to the VM's own machinery -- storage with such a character is sometimes called "a register" when discussing VM implementation) and a matching PUSH_KEPT n
which empties the "registers" back into the VM's normal stack.