views:

124

answers:

4

I heard that the NTFS file system is basically a b-tree. Is that true? What about the other file systems? What kind of trees are they?

Also, how is FAT32 different from FAT16?

What kind of tree are the FAT file systems using?

+1  A: 

Here is a nice chart on FAT16 vs FAT32.

The numerals in the names FAT16 and FAT32 refer to the number of bits required for a file allocation table entry.

FAT16 uses a 16-bit file allocation table entry (2 16 allocation units).

Windows 2000 reserves the first 4 bits of a FAT32 file allocation table entry, which means FAT32 has a maximum of 2 28 allocation units. However, this number is capped at 32 GB by the Windows 2000 format utilities.

http://technet.microsoft.com/en-us/library/cc940351.aspx

Tom
+1  A: 

FAT32 uses 32bit numbers to store cluster numbers. It supports larger disks and files up to 4 GiB in size.

As far as I understand the topic, FAT uses File Allocation Tables which are used to store data about status on disk. It appears that it doesn't use trees. I could be wrong though.

AndrejaKo
+3  A: 

ext3 and ext4 use "H-trees", which are apparently a specialized form of B-tree.

BTRFS uses B-trees (B-Tree File System).

ReiserFS uses B+trees, which are apparently what NTFS uses.

By the way, if you search for these on Wikipedia, it's all listed in the info box on the right side under "Directory contents".

Brendan Long
NTFS uses B trees for indexes, not B+ trees. My understanding is that btrfs uses B+ trees and for more than the indexes. Small nit, but...
jrtipton
+1  A: 

FAT (FAT12, FAT16, and FAT32) do not use a tree of any kind. Two interesting data structures are used, in addition to a block of data describing the partition itself. Full details at the level required to write a compatible implementation in an embedded system are available from Microsoft and third parties. Wikipedia has a decent article as an alternative starting point that also includes a lot of the history of how it got the way it is.

Since the original question was about the use of trees, I'll provide a quick summary of what little data structure is actually in a FAT file system. Refer to the above references for accurate details and for history.

The set of files in each directory is stored in a simple list, initially in the order the files were created. Deletion is done by marking an entry as deleted, so a subsequent file creation might re-use that slot. Each entry in the list is a fixed size struct, and is just large enough to hold the classic 8.3 file name along with the flag bits, size, dates, and the starting cluster number. Long file names (which also includes international character support) is done by using extra directory entry slots to hold the long name alongside the original 8.3 slot that holds all the rest of the file attributes.

Each file on the disk is stored in a sequence of clusters, where each cluster is a fixed number of adjacent disk blocks. Each directory (except the root directory of a disk) is just like a file, and can grow as needed by allocating additional clusters.

Clusters are managed by the (misnamed) File Allocation Table from which the file system gets its common name. This table is a packed array of slots, one for each cluster in the disk partition. The name FAT12 implies that each slot is 12 bits wide, FAT16 slots are 16 bits, and FAT32 slots are 32 bits. The slot stores code values for empty, last, and bad clusters, or the cluster number of the next cluster of the file. In this way, the actual content of a file is represented as a linked list of clusters called a chain.

Larger disks require wider FAT entries and/or larger allocation units. FAT12 is essentially only found on floppy disks where its upper bound of 4K clusters makes sense for media that was never much more than 1MB in size. FAT16 and FAT32 are both commonly found on thumb drives and flash cards. The choice of FAT size there depends partly on the intended application.

Access to the content of a particular file is straightforward. From its directory entry you learn its total size in bytes and its first cluster number. From the cluster number, you can immediately calculate the address of the first logical disk block. From the FAT indexed by cluster number, you find each allocated cluster in the chain assigned to that file.

Discovery of free space suitable for storage of a new file or extending an existing file is not as easy. The FAT file system simply marks free clusters with a code value. Finding one or more free clusters requires searching the FAT.

Locating the directory entry for a file is not fast either since the directories are not ordered, requiring a linear time search through the directory for the desired file. Note that long file names increase the search time by occupying multiple directory entries for each file with a long name.

FAT still has the advantage that it is simple enough to implement that it can be done in small microprocessors so that data interchange between even small embedded systems and PCs can be done in a cost effective way. I suspect that its quirks and oddities will be with us for a long time as a result.

RBerteig