views:

233

answers:

5

I recently asked myself: If a program, like Mozilla Firefox for example, is started - the control must be somehow given to it. But when the program crashes, why doesn't my whole system crash like in early Windows version?

  • How can Windows take back the control from the program, or even not give it to it fully?

(Note: This is not my homework; I go to school but in my informatics class are really only guys that would answer with "Can I eat that?" when I ask them about kernels. Same with my teacher.)

+2  A: 

Usually, a crash will cause an interrupt to occur in the processor. The OS has handlers set up for each of these interrupts, so at that point control is given back to the OS.

Not all interrupts are bad (eg. IO interrupts for reading from disk/network). However, when the OS does encounter a bad interrupt, it will either:

  • Ignore it and allow the program to continue running
  • Shut down the program and (most OS's) notify the user
  • If the program causes an interrupt the OS can't ignore and is unable to recover from, the OS itself will crash.

As for how the OS can not give full control to programs: modern processors have a flag (called the PE bit) which determines if a process is running at full privileges (kernel mode) or limited privileges (user mode). User-mode programs are isolated from one another, and must communicate with each other through the OS ("system calls")

BlueRaja - Danny Pflughoeft
No, a crash does not cause an interrupt. An operating system causes an application to crash, which results from a protection interrupt.
WhirlWind
@Whirlwind: The operating system causes the program to shut down - the program itself is the thing that did something bad to cause the interrupt. There are many types of interrupts that would cause the program to crash (be shut down by the operating system) - protection faults are just one of them.
BlueRaja - Danny Pflughoeft
@Whirlwind - Malformed code can ask the processor to do something that the processor itself recognizes as impossible. While it may not be an interrupt in the usual sense, the cpu reacts similarly, suspending execution of the (in this case, offensive) code, and jumping into the OS kernel where decisions can be made as to how to deal with the bad program, and often times that means terminating it. (Sometimes you hear these referred to as 'exceptions')
JustJeff
Yup, my mistake, I meant, from any sort of exception. Still, it's possible to write an application that recovers from most signals before it gets an exception, unless that exception can't be caught.
WhirlWind
The PE bit doesn't determine if the process is running in kernel mode or user mode; it determines if the processor is running in protected mode (as opposed to real mode), which in a modern OS will the case all of the time. The current privilege level is actually determined by the CPL bits of the CS register.
caf
+3  A: 

That's the story about rings and exceptions. Access violation would throw control to a pre-set OS handler to decide what to do. A program may also set a handler, but if it doesn't, it is an unhandled access violation, which is one of things you call a crash.

In some cases, such mechanism is used for good things. For example, this is how page faults work, when disk immitates actual memory. The OS catches access violation and loads the needed stuff, and then resumes program execution as if nothing happened.

Other things may cause crash.

Invalid instruction will also be caught by OS. If it's a valid instruction from a newer, not yet supported (by the CPU) instruction set, OS will implement it in software. If not, it will declare an unhandled exception and shut your process.

Access to hardware ports from a process which is not running in proper mode, would also cause the program to crash.

Blue screens are caused by deliberate call of a special function, known as KeBugCheckEx(). This will be done by the kernel or device drivers running in kernel mode. This is to announce that they reached themselves an inconsistent logical state, and they are important enough to believe this is a great reason to bring the whole system down immediately, to avoid further damage to the hardware or other components.

Pavel Radzivilovsky
+1 because I never realized device drivers could deliberately cause a blue screen
BlueRaja - Danny Pflughoeft
+1  A: 

This is because modern operating systems run user (as opposed to kernel) processes in a virtual environment. A process has access to full addressable memory range (this is a gross simplification) but that is virtual memory. It also utilizes the CPU, but kernel time-slices all the processes so the CPUs are shared more or less fairly (again a simplification) among all processes, so this is sort of a virtual CPU. A process does not talk to hardware directly, but via kernel syscall "API".

Kernel with help from hardware (MMU for memory access, and user/supervisor separation for privileged instruction assess) protects itself and user processes from each other. Termination of a user-level process with a crash is a normal, well defined, event for the kernel.

Nikolai N Fetissov
+1  A: 

It is actually very simple. Because Windows is a multitasking operating system, it continually switches (each X milliseconds) from one application to the next. By giving each program very often a very short time to run, it creates the illusion that the programs are working simultaneously.

When an application hangs, the application is probably in a long (possibly endless) loop. Windows keeps giving the application a short time to run and doesn't notice this (unless you want to interact with the application and it doesn't respond within a second). This is the first type of 'crash'.

In the second type, a real crash, some serious error occurred so that Windows cannot let the program continue. For example, the program attempts to write to a memory area that is reserved for some other program or Windows itself. The processor has a build-in mechanism which generates an interrupt (sort of event for the processor) when this happens. Windows is programmed to react on this interrupt, and because it has no way to fix the problem, it will treat the program as being 'crashed' and will terminate it immediately.

As mentioned, writing to the wrong memory address causes an (protection) interrupt by the processor automatically. Other things which may cause such an interrupt for an unrecoverable error are amongst others:

  • Reading from an unallowed memory address
  • Insufficient memory for this specific application (however, paging mostly removes this problem)
  • Attempt to execute unexecutable memory (for example, data)
  • Jumping to an invalid address (e.g. in the middle of a machine instruction)

Windows constructs special tables which are used by the Memory Management Unit (MMU) on the processor, which contains information about which areas of the memory the current process can access. For each process, this table is different. Obviously, because each process resides at a different location in memory, and it has to be able to access its own data and code.

So the OS using special access tables, combined with protection interrupts fired by the processor, are mainly the reason that a program doesn't take the whole operating system with it. Otherwise, timesharing allows the rest of the OS and programs to continue when a program is hanging.

Virtlink
+1  A: 

In early windows, there was no real isolation between processes, and no real preemptive scheduling. Multiple processes had share the system resources in something known as 'cooperative multitasking'. So if one process stopped cooperating, even by accident, your whole system was toast.

Modern OS's, (and windows, since NT/2K anyway) isolate processes from one another by use of virtual memory, and control is periodically transferred from one process to the next by a hardware interrupt driven timing mechanism known as preemptive multitasking. If one process goes bonkers and gets into a tight loop, it's only a matter of time (milliseconds!) before the dud process is preempted, the OS gets control, and transfers it to the next process. If a process goes berserk and dereferences a bad pointer, it cannot corrupt another process's data because the memory management unit (MMU) has each process's virtual memory mapped to different areas of physical memory.

Now, detecting when a program is off the rails is another matter. Maybe you WANT to spin in a tight loop, is it up to the OS to decide that's a crash? So generally you don't see a program that's gone into a loop terminated, but you will see it load down the CPU. How much load depends on how details of the OS scheduler, but generally the system soldiers on. Bad pointers are easier to recognize, the null pointer being the most obvious one. Modern CPUs usually have segment descriptors that can be used to recognize when an illegal memory reference has been attempted, for example, using up all the stack space allotted to the process. The MMU will usually allow programs fairly liberal access to the address space, but if the OS designer so desires, the MMU can be configured to put certain virtual addresses off limits, and if a program tries to access one of these areas, an exception will result which will let the OS immediately seize control and deal with the offending process.

JustJeff