views:

244

answers:

2

Hi

Many a times i read/hear the argument that making a lot of system calls etc would be inefficient since the application make a mode switch i.e goes from user mode to kernel mode and after executing the system call starts executing in the user mode by making a mode switch again.

My question is what is the overhead of a mode switch ? Does cpu cache gets invalidated or tlb entries are flushed out or what happens that causes overhead ?

Please note that i am asking about the overhead involved in mode switch and not context switch. I know that mode switch and context switch are two different things and i am fully aware about overhead associated with a context switch, but what i fail to understand is what overhead is caused by a mode switch ?

If its possible please provide some information about a particular *nix platform like Linux, FreeBSD, Solaris etc.

Regards

lali

+1  A: 

There should be no CPU cache or TLB flush on a simple mode switch.

A quick test tells me that, on my Linux laptop it takes about 0.11 microsecond for a userspace process to complete a simple syscall that does an insignificant amount of work other than the switch to kernel mode and back. I'm using getuid(), which only copies a single integer from an in-memory struct. strace confirms that the syscall is repeated MAX times.

#include <unistd.h>
#define MAX 100000000
int main() {
  int ii;
  for (ii=0; ii<MAX; ii++) getuid();
  return 0;
}

This takes about 11 seconds on my laptop, measured using time ./testover, and 11 seconds divided by 100 million gives you 0.11 microsecond.

Technically, that's two mode switches, so I suppose you could claim that a single mode switch takes 0.055 microseconds, but a one-way switch isn't very useful, so I'd consider the there-and-back number to be the more relevant one.

Eric Seppanen
A: 

There are many ways to do a mode switch on the x86 CPUs (which I am assuming here). For a user called function, the normal way is to do a Task jump or Call (referred to as Task Gates and Call Gates). Both of these involve a Task switch (equivalent to a context switch). Add to that a bit of processing before the call, the standard verification after the call, and the return. This rounds up the bare minimum to a safe mode switch.

As for Eric's timing, I am not a Linux expert, but in most OS I have dealt with, simple system calls cache data (if it can be done safely) in the user space to avoid this overhead. And it would seem to me that a getuid() would be a prime candidate for such data caching. Thus Eric's timing could be more a reflection of the overhead of the pre-switch processing in user space than anything else.

Juice
No data caching is happening; that's why I used strace to verify that the system call is taking place. A syscall, by definition, is a call into kernel space.
Eric Seppanen
Also, your assertion that a mode switch is equivalent to a context switch is not true. A context switch involves swapping out the entire CPU state and page tables; this is significant work that is not required for system calls. A syscall is a simple software interrupt (x86 assembly "int $0x80")
Eric Seppanen
So in the syscall there is only a preamble and an INT instruction?
Juice
Inform yourself, a software interrupt to go into privileged mode on a x86, has to be made through a Trap or Interrupt gate and they are the same as a Call gate, which involves a full saving of the state of the current task, followed by a load for the destination Task.
Juice
getuid() on i386 Linux is mov $0xC7,%eax ; int $0x80. Disassemble it yourself and see. Note that the syscall number in being passed in eax (0xC7=199=getuid); that demonstrates that CPU registers survive the interrupt. And of course there is no effect on page tables, TLB, or CPU caches, which are the important costs of a context switch.
Eric Seppanen
Juice: The `int 0x80` syscall entry on x86 Linux uses a Trap Gate, which does *not* change the task (`tr` register). It merely specifies a Segment Selector and Offset to jump to. The Segment Selector itself includes the new privilege level (DPL) - and because we're changing privilege level, we get a new stack segment and stack pointer as well - these are loaded from the TSS.
caf
It seems you have read a bit on it but are still missing some. Unfortunately these comments are too small to debate the details of Task switching. But let me say this: It is not safe to not use a Task switch for anything accessing internal OS data (BTW this is a hardware task, not a thread, process or other OS entity). And I refuse to believe that Linux is unsafe. What might be happening is an Interrupt gate to a conforming code segment (no task switch, very fast and efficient) where some code makes the decision to either do a Call gate or return cached info.
Juice
I think you're mistaken - the Linux kernel hasn't used hardware tasks for some time. There is just one TSS per cpu, plus a double-fault handler TSS on i386 (double fault is the only place task gates are used). If it was unsafe I'm sure one of the Intel employees that are heavily involved in Linux kernel development would have said so by now. See the comment at line 35 in `init_task.c`: http://lxr.linux.no/#linux+v2.6.32/arch/x86/kernel/init_task.c
caf
I was mistaken. For my defense, it has been a while since I played with this stuff. You can do a safe, higher privilege call without a Task switch (with a Call, Interrupt or Trap gate). My apologies.
Juice