views:

133

answers:

4

What are some best practises for prototyping a filesystem?

I've had an attempt in Python using fusepy, and now I'm curious:

  • In the long run, should any respectable filesystem implementation be in C? Will not being in C hamper portability, or eventually cause performance issues?
  • Are there other implementations like FUSE?
  • Evidently core filesystem technology moves slowly (fat32, ext3, ntfs, everything else is small fish), what debugging techniques are employed?
  • What is the general course filesystem development takes in arriving at a highly optimized, fully supported implementation in major OSs?
+1  A: 

In the long run, should any respectable filesystem implementation be in C? Will not being in C hamper portability, or eventually cause performance issues?

Not necessarily, there are plenty of performing languages different to C (O'Caml, C++ are the first that come to mind.) In fact, I expect NTFS to be written in C++. Thing is you seem to come from a Linux background, and as the Linux kernel is written in C, any filesystem with hopes to be merged into the kernel has to be written in C as well.

Are there other implementations like FUSE?

There are a couple for Windows, for example, http://code.google.com/p/winflux/ and http://dokan-dev.net/en/ in various maturity levels

Evidently core filesystem technology moves slowly (fat32, ext3, ntfs, everything else is small fish), what debugging techniques are employed?

Again, that is mostly true in Windows, in Solaris you have ZFS, and in Linux ext4 and btrfs exist. Debugging techniques usually involve turning machines off in the middle of various operations and see in what state data is left, storing huge amounts of data and see performance.

What is the general course filesystem development takes in arriving at a highly optimized, fully supported implementation in major OSs?

Again, this depends on which OS, but it does involve a fair amount of testing, especially making sure that failures do not lose data.

Vinko Vrsalovic
Don't forget defacto standard filesystem benchmarks like bonnie++: http://www.coker.com.au/bonnie++/
caf
Well it's the best answer, but pretty sparser than I should have hoped. Probably not the right forum for a full blown article :)
Matt Joiner
@Matt: You'd be amazed at what people do for a bounty :-) If this isn't enough, modify your question a bit stating precisely what are you hoping for and add a bounty.
Vinko Vrsalovic
+4  A: 

A filesystem that lives in userspace (be that in FUSE or the Mac version thereof) is a very handy thing indeed, but will not have the same performance as a traditional one that lives in kernel space (and thus must be in C). You could say that's the reason that microkernel systems (where filesystems and other things live in userspace) never really "left monolithic kernels in the dust" as A. Tanenbaum so assuredly stated when he attacked Linux in a famous posting on the Minix mailing list almost twenty years ago (as a CS professor, he said he'd fail Linus for choosing a monolithic architecture for his OS -- Linus of course responded spiritedly, and the whole exchange is now pretty famous and can be found in many spots on the web;-).

Portability's not really a problem, unless perhaps you're targeting "embedded" devices with very limited amounts of memory -- with the exception of such devices, you can run Python where you can run C (if anything it's the availability of FUSE that will limit you, not that of a Python runtime). But performance could definitely be.

Alex Martelli
+1  A: 

I recommend you create a mock object for the kernel block device API layer. The mock layer should use a mmap'd file as a backing store for the file system. There are a lot of benefits for doing this:

  1. Extremely fast FS performance for running unit test cases.
  2. Ability to insert debug code/break points into the mock layer to check for failure conditions.
  3. Easy to save multiple copies of the file system state for study or running test cases.
  4. Ability to deterministically introduce block device errors or other system events that the file system will have to handle.
Casey
A: 

Respectable filesystems will be fast and efficient. For Linux, that will basically mean writing in C, because you won't be taken seriously if you're not distributed with the kernel.

As for other tools like Fuse, There's MacFUSE, which will allow you to use the same code on macs as well as linux.

Sean McMillan