views:

131

answers:

3

Hi everybody, I have a small question:

For example I'm using System.IO.File.Copy() method from .NET Framework. This method is a managed wrapper for CopyFile() function from WinAPI. But how CopyFile function works? It is interacts with HDD's firmware or maybe some other operations are performed through Assembler or maybe something other...

How does it look like from the highest level to the lowest?

+2  A: 

The .NET framework invokes the Windows API.

The Windows API has functions for managing files across various file systems.

Then it depends on the file system in question. Remember, it's not necessarily a "normal" file system over a HDD; It could even be a shell extension that just emulates a drive and keeps the data in you gmail account, or whatever. The point is that the same file manipulation functions in the Windows API are used as an abstraction over many possible lower layers of data.

So the answer really depends on the kind of file system you're interested in.

Assaf Lavie
+5  A: 

This is a question demanding or a really long answer, but I'm trying to make it brief.

Basically, the .NET Framework wraps some "native" calls, calls that are processed in lower-level libraries. These lower-level calls are often wrapped in a buffer logic to hide complicated stuff like synchronizing file contents from you.

Below, there is the native level, interacting with the OS' kernel. The kernel, the core of any operating system, then translates your high-level instruction to something your hardware can understand. Windows and Linux are for example both using a Hardware Abstraction Layer, a system that hides hardware specific details behind a generic interface. Writing a driver for a specific device is then only the task of implementing all methods certain device has to provide.

Before anything gets called on your hardware, the filesystem gets involved, and the filesystem for itself also buffers and caches a lot, but again transparently, so you don't even notice that. The last element in the call-queue is the device itself, and again, most devices conform to some standard ( like SATA or IDE ) and can thus be interfaced in a similar manner.

I hope this helps :-)

moritz
+5  A: 

Better to start at the bottom and work your way up.

Disk drives are organized, at the lowest level, in to a collection of Sectors, Tracks, and Heads. Sectors are segments of a track, Tracks are area on the disks itself, represented by the heads position as the platters spins underneath it, and the head is the actual element that reads the data from the platter.

Since Tracks are measured based on the distance that a head is from the center of a disk, you can see how towards the center of the disk the "length" of a track is short than one at the outer edge of the disk.

Sectors are pieces of a track, typically of a fixed length. So, an inner track will hold fewer sectors than an outer track.

Much of this disk geometry is handled by the drive controllers themselves nowadays, though in the past this organization was managed directly by the operating systems and the disk drivers.

The drive electronics and disk drivers cooperate to try and represent the disk as a sequential series of fixed length blocks.

So, you can see that if you have a 10MB drive, and you use 512 byte disk blocks, then that drive would have a capacity of 20,480 "blocks".

This block organization is the foundation upon which everything else is built. Once you have this capability, you can tell the disk, via the disk driver and drive controller, to go to a specific block on the disk, and read/write that block with new data.

A file system organizes this heap of blocks in to it's own structure. The FS must track which blocks are being used, and by which files.

Most file systems have a fixed location "where they start", that is, some place that upon start up they can go to try and find out information about the disk layout.

Consider a crude file system that doesn't have directories, and support files that have 8 letter names and 3 letter extension, plus 1 byte of status information, and 2 bytes for block number where the file starts on the disk. We can also assume that the system has a hard limit of 1024 files. Finally, it must know which blocks on the disk are being used. For that it will use 1 bit per block.

This information is commonly called the "file system metadata". When a disk is "formatted", nowadays it's simply a matter of writing new file system metadata. In the old days, it was a matter of actually writing sector marks and other information on blank magnetic media (commonly known as a "low level format"). Today, most drives already have a low level format.

For our crude example, we must allocate space for the directory, and space for the "Table of Contents", the data that says which blocks are being used.

We'll also say that the file system must start at block 16, so that the OS can use the first 16 blocks for, say, a "boot sector".

So, at block 16, we need to store 14 bytes (each file entry) * 1024 (number of files) = 12K. Divide that by 512 (block size) is 24 blocks. For our 10MB drive, it has 20,480 blocks. 20,480 / 8 (8 bits/byte) is 2,560 bytes / 512 = 5 blocks.

Of the 20,480 block available on the disk, the file system metadata is 29 blocks. Add in the 16 for the OS, that 45 blocks out of the 20,480, leaving 20,435 "free blocks".

Finally, each of the data blocks reserves the last 2 bytes to point to the next block in the file.

Now, to read a file, you look up the file name in the directory blocks. From there, you find the offset to the first data block for the file. You read that data block, grab the last two bytes. If those two byte are 00 00, then that's the end of the file. Otherwise, take that number, load that data block, and keep going until the entire file is read.

The file system code hides the details of the pointers at the end, and simply loads blocks in to memory, for use by the program. If the program does a read(buffer, 10000), you can see how this will translate in to reading several blocks of data from the disk until the buffer has been filled, or the end of file is reached.

To write a file, the system must first find a free space in the directory. Once it has that, it then finds a free block in the TOC bitmap. Finally, it takes the data, write the directory entry, sets its first block to the available block from the bitmap, toggles the bit on the bitmap, and then takes the data and writes it to the correct block. The system will buffer this information so that it ideally only has to write the blocks once, when they're full.

As it writes the blocks, it continues to consume bits from the TOC, and chains the blocks together as it goes.

Beyond that, a "file copy" is a simple process, from a system leverage the file system code and disk drivers. The file copy simply reads a buffer in, fills it up, writes the buffer out.

The file system has to maintain all of the meta data, keep track of where you are reading from a file, or where you are writing. For example, if you read only 100 bytes from a file, obviously the system will need to read the entire 512 byte datablock, and then "know" it's on byte 101 for when you try to read another 100 bytes from the file.

Also, I hope it's obvious, this is a really, really crude file system layout, with lots of issues.

But the fundamentals are there, and all file systems work in some manner similar to this, but the details vary greatly (most modern file systems don't have hard limits any more, as a simple example).

Will Hartung