views:

244

answers:

12

I've been thinking on this for a while now (you know, that dangerous thing programmers tend to do) and I've been wondering, is the method of storing data that we're so accustomed to really all that efficient? The trouble with answering this question is that I really don't have anything to compare it to, since it's the only thing I've ever used.

I don't mean FAT or NTFS or a particular type of file system, I mean the filesystem structure as a whole. We are simply used to thinking of "files" inside "folders" like our hard drive was one giant filing cabinet. This is a great analogy and indeed, it makes it a lot easier to learn when we think of it this way, but is it really the best way to go about describing programs and their respective parts?

I'd like to know if anyone can think of (or knows about) a data storage technique that might be used to store data for an Operating System to use that would organize the parts of data in a different manner. Does anything... different even exist?

A: 

Well, there's always Pick, where the OS and file system were an integrated database.

Paul Tomblin
+3  A: 

You can for example have dedicated solutions, like Oracle Raw Partitions. Other databases support similar thing. In these cases the filesystem provides unnecessary overhead and can be ommited - DB software will take care of organising the structure.

The problem seems very application dependent and files/folders seem to be a reasonable compromise for many applications (and is easy for human beings to comprehend).

Anonymous
+3  A: 

Mainframes used to just give programmers a number of 'devices' to use. The device corresponsed to a drive or a partition thereof and the programmer was responsible for the organisation of all data on it. Of course they quickly built up libraries to help with that.

The only OS I think think of that does use the common hierachical arrangement of flat files (like UNIX) is PICK. That used a sort of relational database as the filesystem.

U62
+1  A: 

Microsoft had originally planned to introduce a new file-system for windows vista (WinFS - windows future storage). The idea was to store everything in a relational database (SQL Server). As far as I know, this project was never (or not yet?) finished.

There's more information about it on wikipedia.

M4N
A: 

Traditional file systems are optimized for fast file access if you know the name of the file you want (including its path). Directories are a way of grouping files together so that they're easier to find if you know properties of the file but not its actual name.

Traditional file systems are not good at finding files if you know very little about them, however they are robust enough that one can add a layer on top of them to aid in retrieving files based on content or meta-information such as tags. That's what indexers are for.

The bottom line is we need a way to store persistently the bytes that the CPU needs to execute. So we have traditional file systems which are very good at organizing sequential sets of bytes. We also need to store persistently the bytes of files that aren't executed directly, but are used by things that do execute. Why create a new system for the same fundamental thing?

What more should a file system do other than store and retrieve bytes?

Welbog
I agree, I see very little need to change a standard that obviously works very well, but I'm just interested in the alternatives. I guess the real plan would be to meld good ideas together.
Nicholas Flynt
+1  A: 

I knew a guy who wrote his doctorate about a hard disk that comes with its own file system. It was based on an extension of SCSI commands that allowed the usual open, read, write and close commands to be sent to the disk directly, bypassing the file system drivers of the OS. I think the conclusion was that it is inflexible, and does not add much efficiency.

Anyway, this disk based file system still had a folder like structure I believe, so I don't think it really counts for you ;-)

Treb
A: 

I'll echo the other responses. If I could pick a filesystem type, I personally would rather see a hybrid approach: a flat database of subtrees, where each subtree is considered as a cohesive unit, but if you consider the subtrees themselves as discrete units they would have no hierarchy, but instead could have metadata + be queryable on that metadata.

Jason S
+5  A: 

Emails are often stored in folders. But ever since I have migrated to Gmail, I have become accustomed to classifying my emails with tags.

I often wondered if we could manage a whole file-system that way: instead of storing files in folders, you could tag files with the tags you like. A file identifier would not look like this:

/home/john/personal/contacts.txt

but more like this:

contacts[john,personal]

Well... just food for thought (maybe this already exists!)

MiniQuark
+1 - Good extension..
torial
Wow, I actually like it a lot as well. Perhaps there's a way to mix this with a more structured system.
Nicholas Flynt
I feel like this is the most original idea of the bunch, mainly because for a user-based OS (not a server OS for a business) a database would be... weird. I think, however, that the tag system would refer to a directory, and the "files" seen in that directory would be the results of the search.
Nicholas Flynt
A: 

The reason for files is that humans like to attach names to "things" they have to use. Otherwise, it becomes hard to talk or think about or even distinguish them.

When we have too many things on a heap, we like to separate the heap. We sort it by some means and we like to build hierarchies where you can navigate arbitrarily sized amounts of things.

Hence directories and files just map our natural way of working with real objects. Since you can put anything in a file. On Unix, even hardware is mapped as "device nodes" into the file system which are special files which you can read/write to send commands to the hardware.

I think the metaphor is so powerful, it will stay.

Aaron Digulla
A: 

I spent a while trying to come up with an automagically versioning file system that would maintain versions (and version history) of any specific file and/or directory structure.

The idea was that all of the standard access command (e.g. dir, read, etc.) would have an optional date/time parameter that could be passed to access the file system as it looked at that point in time.

I got pretty far with it, but had to abandon it when I had to actually go out and earn some money. It's been on the back-burner since then.

Andrew Rollings
Is it something along the lines of ZFS? (http://en.wikipedia.org/wiki/ZFS)
Subtwo
Pretty much. That has a lot more features, but the copy-on-write mechanic was what I was focussing on. This was back in about 2001/2 timeframe...
Andrew Rollings
Also, I had planned it to have a 'peer-to-peer' extension to act as a virtual 'cloud' of redundant storage... I was so proud of myself at the time :) (A bit ahead of the curve there too - for once!)
Andrew Rollings
A: 

If you take a look at the start-up times for operating systems, it should be clear that improvements in accessing disks can be made. I'm not sure if the changes should be in the file system or rather in the OS start-up code.

Stephan Eggermont
A: 

Personally, I'm really sorry WinFS didn't fly. I loved the concept.. From Wikipedia (http://en.wikipedia.org/wiki/WinFS) :

WinFS includes a relational database for storage of information, and allows any type of information to be stored in it, provided there is a well defined schema for the type. Individual data items could then be related together by relationships, which are either inferred by the system based on certain attributes or explicitly stated by the user. As the data has a well defined schema, any application can reuse the data; and using the relationships, related data can be effectively organized as well as retrieved. Because the system knows the structure and intent of the information, it can be used to make complex queries that enable advanced searching through the data and aggregating various data items by exploiting the relationships between them.

torial