views:

305

answers:

1

Some (many? all?) 64-bit1 Linux distros allow running 32-bit applications by shipping parallel collections of 32-bit and 64-bit libraries (including libc). So a 32-bit application can link against 32-bit libs and be run by a 64-bit kernel.

I'd like to know the mechanics of how 32-bit applications make system calls on a 64-bit kernel. I suspect the answer is somewhere in libc and/or the kernel source, but it would be time-consuming for me to dive into the source since I don't know where to look.

And a more important question, is there any performance overhead?2 Logically, a call from a 32-bit app system call will have to translate to 64-bit internal kernel environment. How and where is this accomplished?

1 "32-bit" = IA-32, and "64-bit" = AMD64
2 In your answer make an assumption that it matters :)

+10  A: 

From the userspace side, the mechanics are identical to making a syscall on a 32 bit native kernel - all the usermode code, including the 32 bit glibc, works the same way.

From the kernel side, the old IA32 entry points from userspace (eg. int 0x80) are set up to call the ia32_syscall assembler routine. (The transition to kernel space involves the processor loading the kernel's code segment selector, which causes a transition to 64 bit "long" mode).

The ia32_syscall routine then shuffles some of the arguments around to match the x86_64 syscall calling convention:

movl    %edi,%r8d
.if \noebp
.else
movl    %ebp,%r9d
.endif
xchg    %ecx,%esi
movl    %ebx,%edi
movl    %edx,%edx   /* zero extension */

It then uses the IA32 syscall number to make a function call through a table, ia32_sys_call_table. This essentially matches up the IA32 syscall numbers with the native syscall implementations (syscall numbers differ wildly between IA32 and x86_64). The first part of this table looks like:

ia32_sys_call_table:
    .quad sys_restart_syscall
    .quad sys_exit
    .quad stub32_fork
    .quad sys_read
    .quad sys_write

For most syscalls, the x86_64 implementation can be now called directly - like exit(). For others, like fork(), a wrapper is provided that correctly implements the expected IA32 semantics (in particular, if sign extension of arguments from 32 bit to 64 bit is required).

As you can see, the overhead in kernel code is minimal - a few trivial modifications to register values, and for a few functions, an extra function call. I'm not sure if loading a code segment selector that causes a transition from 32 bit mode to 64 bit mode is slower for the processor to execute than one that doesn't - check the processor architecture manuals for that.

caf
Seems like all of the wrappers start with `compat_*`, with most of them being in `kernel/compat.c`. Looks like there is some overhead, but not as much as I expected.
Alex B
@Alex B: Yes, that turned out to be an advantage of passing system call parameters in registers.
caf