views:

88

answers:

4

My understanding is that a user application requesting a file system path (eg. /aFile) will invoke the File System and get back the virtual address of the requested file. Then the application will attempt a read/write operation with that address as argument, that as a CPU instruction? On execution of the read command the Memory Management Unit will translate that address into the phisical address, looking into a page table. In case the user has not privilege to access that memory location (where is that information carried?) the operation is aborted. Otherwise, if the physical address page is found in memory, the read/write operation is carried on it, otherwise the page is brought in from disk and the operation is repeated.

So, there seems to be no system call at all. Could someone correct possible mistakes in the above procedure detail?

+2  A: 

In the very first sentence (invoke the filesystem) that implies a system call since it must of necessity transfer controller to the kernel....

ennuikiller
then how does a dbms avoid making a system call when it retrieves a block of memory (from disk when it's not in the buffer pool)?You want to say that when it wants a block of memory x, x is the phyisical address it stores in its own structures, thus avoiding all the OS/filesystem services to access memory and disk?
simpatico
Database code still needs to make system calls. Just which system calls it makes are different, for example, if it uses raw blocks instead of filesystem structures. Also most databases will not allocate memory from disk, it will setup shared memeory segments upon initialization and rely upon the memory management of the OS (system code) to swap memory if it needs to...
ennuikiller
please remeain on the same line of thought. So, if the dbms uses a filesystem structure, will the above described read sequence (and therefore system call) occur?
simpatico
yes every time an application interacts the filesystem it invokes kernel code and therefore system calls....
ennuikiller
+2  A: 

(typically) when you open/read/write a file in Java, a call is made to the OS kernel , aka. a system call,for opening/read/write that file. How that is done and the memory management involved is entierly in the hands of the kernel, but eventually bytes read from the file is copied back to a buffer supplied through the system call.

nos
A: 

What you are describing is Memory Mapped IO which is by no means the only way or even the standard way. And even in this scenario system calls happen although they might be behind the back of the application.

When you have a page fault, i.e. a piece of memory is missing, it will still be the kernel being notified to do all the magic to get the pages into memory from the block device. There is something that needs to figure out from which device to get the info. With software raid this can be very convoluted, with others it can be a simple as configuring a DMA transfer and let it rip. Maybe there are chipsets which can do this on their own under certain circumstances, but certainly not all.

It is not because your program is not doing them that there are no "system calls". However such an abstraction allows the kernel specialist more freedom in squeezing the last ounce of performance from the hardware.

Peter Tillemans
So, yes. Now how could a dbms be more efficient with its own buffer manager, for what Stonebraker's describes as: "the overhead to fetch a block from the buffer pool manager usually includes that of a system call and a core-to-core move". Forget about the buffer-replacement strategy, etc.. The only point I question is the quoted.
simpatico
In MMIO scenario the system call is replaced by a pagefault generated by the MMU and trapped in a kernel handler. Since a lot of the checks have already been done with the setup of the MMIO I can imagine this will be faster than descending through the layers of a regular system call.
Peter Tillemans
well, that still does not make sense of the quoted stmt. Please describe how the read will occur avoiding the system call (which until now you were claiming was indespensible).
simpatico
Schematically when the read hits a page which is not in memory the MMU will trigger a page fault and this directs the CPU to stop what it is doing and drop in a special handler in the kernel. This handler will figure where to get the missing block, load it in memory and exit, which returns control to your program. So an explicit system call which performs a work package in the kernel has now become an 'implicit' system call which performs a kernel work package. I refer to system call as "executing kernel work package" as opposed to "calling an API".
Peter Tillemans
Also note that the text from Stonebraker was written in the time of VAXen and PDP-11's and todays chipsets contain more logic in a single chip than in those refrigerators. The operating environment has not become simpler over the years. Oracle advice is using raw devices (although they are rumored to use NetApps and NFS for their hosted solutions) and Postgresql favors a small DB buffer cache and leverage the OS buffer cache. From experience I know that the small postgresql cache fails in a virtual environment because the OS cache is trashed by the other virtual servers. Benchmarking is key.
Peter Tillemans
A: 

What you are wondering about is operating system design. Many approaches are available and by having the file system abstraction on top of the file abstraction (everything is a file consisting of a stream of bytes) you can do quite a lot without having to change the abstraction.

Just think how different an operating system must treat a RAM disk, compared to a firewire drive, compared again to a Windows network share. The file abstraction is the same.

Now, if you want to actually KNOW what happens, I can strongly recommend downloading and installing OpenSolaris and learning how to work with dtrace. It allows you to ask the system what it does all the way down from your main method to the individual drivers on top of the physical hardware.

Thorbjørn Ravn Andersen