Jump to content

File Allocation Table

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Plugwash (talk | contribs) at 15:01, 26 October 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

FAT12 FAT16 FAT32
Developer Microsoft
Full Name File Allocation Table
(12-bit version) (16-bit version) (32-bit version)
Introduced 1977 (Microsoft Disk BASIC) July 1988 (MS-DOS 4.0) August 1996 (Windows 95 OSR2)
Partition identifier 0x01 (MBR) 0x04, 0x06, 0x0E (MBR) 0x0B, 0x0C (MBR)
EBD0A0A2-B9E5-4433
-87C0-68B6B72699C7
(GPT)
Structures FAT12 FAT16 FAT32
Directory contents Table
File allocation Linked List
Bad blocks Linked List
Limits FAT12 FAT16 FAT32
Max file size 32 MiB 4 GiB
Max number of files 4077 65517 268435437
Max filename size 8.3 or 255 characters when using LFNs
Max volume size 32 MiB 4 GiB 2 TiB
Features FAT12 FAT16 FAT32
Dates recorded Creation, modified, access
Date range January 1, 1980 - December 31, 2107
Forks Only under OS/2 and Windows NT No
Attributes Read-only, hidden, system, archive, volume name
Permissions No
Transparent compression Per-volume, Stacker, DoubleSpace, DriveSpace No
Transparent encryption Per-volume only with DR-DOS No

File Allocation Table (FAT) is a file system that was developed for MS-DOS and is the primary file system for consumer versions of Microsoft Windows up to and including Windows Me. The FAT file system is considered relatively uncomplicated, and because of that, it is a popular format for floppy disks; moreover, it is supported by virtually all existing operating systems for personal computers, and because of that, it is often used to share data between several operating systems booting on the same computer (a multiboot environment). It is also used on solid-state memory cards and other similar devices. The most common implementations have a serious drawback in that when files are deleted and new files written to the media, the files fragments tend to become scattered over the entire media making reading and writing a slow process. De-fragmentation is one solution to this, but is often a lengthy process in itself and has to be repeated regularly to keep the FAT file system clean.

History

The FAT filesystem was invented by Bill Gates and Marc McDonald in 1977 for managing disks in Microsoft Disk BASIC and was incorporated by Tim Paterson in August 1980 to his 86-DOS operating system for the S-100 8086 CPU boards; the filesystem was the main difference between 86-DOS and CP/M, of which 86-DOS was otherwise mostly a clone. [1]

FAT12

This initial version of FAT is now referred to as FAT12. As a filesystem for floppy disks, it had a number of limitations: no support for hierarchical directories, cluster addresses were "only" 12-bits long (which made the code manipulating the FAT a bit tricky) and the disk size was stored as a 16-bit count of sectors, which limited the size to 32MB.

An entry-level diskette at the time would be 5.25", single-sided, 40 tracks, with 8 sectors per track, resulting in a capacity of slightly less than 160KB. The above limits exceeded this capacity by one or more orders of magnitude and at the same time allowed to fit all the control structures inside the first track, thus avoiding head movement during read and write operations. The limits were successively lifted in the following years.

In order to properly support the newer IBM PC XT computer, which featured a 10 MB hard disk, MS-DOS 2.0 was released around the same time, at the beginning of 1983, and introduced hierarchical directories. The format of the FAT itself did not change. The hard disk on the PC XT had 8 KB clusters.

In 1984 IBM released the PC AT, which featured a 20 MB hard disk. Microsoft introduced MS-DOS 3.0 in parallel. Cluster addresses were increased to 16-bit, allowing for a greater number of clusters (up to 65,536) and consequently much greater filesystem sizes. However, the maximum possible number of sectors and the maximum (partition, rather than disk) size of 32 MB did not change.

MS-DOS 3.0 also introduced support for high-density 1.2 MB 5.25" diskettes, which notably had 15 sectors per track, hence more space for FAT. This probably prompted a dubious optimization of the cluster size, which went down from 2 sectors to just 1. The net effect was that high density diskettes were significantly slower than older double density ones.

FAT16

In 1987 finally came what today is called the FAT16 format, with the removal of the 16-bit counter of disk sectors, in Compaq DOS 3.31. In 1988 the improvement became generally available through MS-DOS 4.0. The partition size was now limited by the 8-bit signed count of sectors per cluster, which could reach a maximum power-of-two value of 64, giving 32 KB clusters with the usual 512 bytes per sector, hence fixing the "definitive" limit for FAT16 partition size at 2 gigabytes. Much later, Windows NT increased the maximum cluster size to 64K by considering the sector per cluster count as unsigned. However the resulting format was not compatible with any other FAT implementation of the time and generated really huge internal fragmentation anyway. Windows 98 also supported reading and writing this variant but its disk utilities didn't work with it.

VFAT and FASTFAT

Windows 3.11 introduced a new scheme to access filesystems, using a 32-bit protected mode filesystem driver outside the DOS kernel, using directly BIOS or hardware (when available) access to disk, that also integrated caching, making accesses faster than with the DOS in kernel driver. It was called Virtual FAT (or VFAT).

Windows NT 3.1 provided the same approach, calling it FASTFAT, however in a different fashion as it is natural for a filesystem driver in NT to use 32-bit protected mode. Like VFAT, FASTFAT is cached.

It is commonly confused with LFN support, as almost nobody configured Windows 3.11 to use VFAT and it was automatically enabled (and was also enhanced to support LFNs) in Windows 95.

LFNs (Long File Names)

One of the user experience goals for the designers of Windows 95 was the use of long file names (known as LFNs for short) in the new operating system. These were implemented using a work-around in the way directory entries are laid out (see below). The version of the file system with this extention is usually known as VFAT after the Windows 95 VxD device driver which first supported them. Long file names are also supported by Windows NT 3.5 (FASTFAT 1.1) and above.

FAT32

In order to overcome the volume size limit of FAT16 while still allowing DOS real-mode code to handle the format without unnecessarily reducing the availible conventional memory, Microsoft decided to implement a newer generation of FAT, known as FAT32, with 32-bit cluster numbers, of which 28 bits are currently used.

In theory, this should support a total of approximately 268,435,438 (< 228) clusters, allowing for drive sizes in the range of 2 terabytes. However, due to limitations in Microsoft's scandisk utility, the FAT is not allowed to grow beyond 4,177,920 (< 224) clusters, placing the volume limit at 124.55 gigabytes, unless "scandisk" is not needed. [2] Windows 2000 and XP placed a limit on the size of FAT32 partitions they can create at 32 GB; Microsoft says this is by design but does not explain why, and those versions of Windows are quite capable of reading and writing larger FAT32 partitions created by other means.

FAT32 was introduced with Windows 95 OSR2, although reformatting was needed to use FAT32 advantages, and DriveSpace 3 (the version that came with Windows 95 OSR2 and Windows 98) never supported it. Windows 98 introduced a utility to convert existing hard disks from FAT16 to FAT32 without loss of data. Support in the NT line came with Windows 2000.

The maximum possible file size for a FAT32 volume is 4 GiB minus 1 byte (232-1 bytes). For most users, this has become the most nagging limit of FAT32 as of 2005, since video capture and editing applications can easily exceed this limit, as can the system swap file.

Third party support

The alternative IBM PC operating systems—such as Linux, FreeBSD, and BeOS—have all supported FAT, and most gained support for VFAT and FAT32 shortly after the appropriate Windows versions were released. Early Linux distributions also supported a format known as UMSDOS, which was nothing more than FAT with the UNIX file properties (e.g. long file name and access permissions) stored in a separate file called --linux-.---. UMSDOS fell into disuse after VFAT was released and has been dropped from recent versions of the Linux kernel. The Mac OS X operating system also supports the FAT filesystems on volumes other than the boot disk.

Future

Microsoft recently tried to secure existing patents for VFAT and FAT32, which caused concern that they might later seek royalties from Linux distros and from media vendors that pre-format their products (See FAT Licensing below). This was however rejected on October 6, 2005.

Since Microsoft has announced the discontinuation of its MS-DOS-based consumer operating systems with Windows Me, it remains unlikely that any new versions of FAT will appear. For most purposes, the NTFS file system that was developed for the Windows NT line is superior to FAT from the points of view of efficiency, performance and reliability; its main drawbacks are the size overhead for small volumes and the very limited support by anything other than the NT-based versions of Windows, since the exact specification is a trade secret of Microsoft, which in turn makes it difficult to use a DOS floppy for recovery purposes. Microsoft provided a recovery console to work around this issue, but they severely limited what could be done through it for "security" reasons.

FAT is still the normal filesystem for removable media, with FAT12 used on floppies, and FAT16 on most other removable media (such as flash memory cards for digital cameras and USB flash drives). Most removable media is not yet large enough to benefit from FAT32. FAT is used on these drives for reasons of compatibility and size overhead, as well as the fact that file permissions on removable media are likely to be more trouble than they are worth.

The FAT32 formatting support in Windows 2000 and XP is limited to drives of about 30 gigabytes, which effectively forces users of modern hard drives either to use NTFS or to format the drive using other tools outside Windows. A workaround to this involves using a version of mkdosfs that has been ported from Linux to Windows.

Design

A FAT file system is composed of four different sections.

  1. The Boot Sector (aka Partition Boot Record, BIOS Parameter Block or Reserved Sector). This is always the first sector of the partition and includes some basic file system information (in particular, its type), pointers to the location of the other sections and the operating system's boot loader code.
  2. The FAT Region. This contains two copies of the File Allocation Table for the sake of redundancy, although the extra copy is rarely used, even by disk repair utilities. These are maps of the partition, indicating how the clusters are allocated.
  3. The Root Directory Region. This is a Directory Table that stores information about the files and directories in the root directory. With FAT32 it can be stored anywhere in the partition, however with earlier versions it is always located immediately after the FAT Region.
  4. The Data Region. This is where the actual file and directory data is stored and takes up most of the partition. The size of files and subdirectories can be increased arbitrarily (as long as there are free clusters) by simply adding more links to the file's chain in the FAT. Note however, that each cluster can be taken only by one file, and so if a 1KB file resides in a 32KB cluster, 31KB are wasted.

The Boot Sector is of the following format:

Byte Offset Length Description
0 (0x00) 3 Jump instruction (to skip over header on boot)
3 (0x03) 8 OEM Name (padded with spaces)
11 (0x0b) 2 Bytes per sector
13 (0x0d) 1 Sectors per cluster
14 (0x0e) 2 Reserved sector count (including boot sector)
16 (0x10) 1 Number of file allocation tables
17 (0x11) 2 Number of root directory entries
19 (0x13) 2 Total sectors (0 if greater than 65535. See Offset 32)
21 (0x15) 1 Media descriptor
22 (0x16) 2 Sectors per file allocation table (FAT16)
24 (0x18) 2 Sectors per track
26 (0x1a) 2 Number of heads
28 (0x1c) 4 Hidden sectors
32 (0x20) 4 Total sectors (if greater than 65535)
36 (0x24) 4 Sectors per file allocation table (FAT32)
36 (0x24) 1 Physical drive number (FAT16)
37 (0x25) 1 Current head (FAT16)
38 (0x26) 1 Signature (FAT16)
39 (0x27) 4 ID (FAT16)
43 (0x2b) 11 Volume Label
54 (0x36) 8 FAT file system type (e.g. FAT, FAT12, FAT16, FAT32)

File Allocation Table

A partition is divided up into identically sized clusters, small blocks of contiguous space. Cluster sizes vary depending on the type of FAT file system being used and the size of the partition, typically cluster sizes lie somewhere between 2KB and 32KB. Each file may occupy one or more of these clusters depending on its size; thus, a file is represented by a chain of these clusters (referred to as a singly linked list). However these chains are not necessarily stored adjacently on the disk's surface but are often instead fragmented throughout the Data Region.

The File Allocation Table (FAT) is a list of entries that map to each cluster on the partition. Each entry records one of five things:

  • the address of the next cluster in a chain
  • a special end of file (EOF) character that indicates the end of a chain
  • a special character to mark a bad cluster
  • a special character to mark a reserved cluster
  • a zero to note that that cluster is unused

Each version of the FAT file system uses a different size for FAT entries. The size is indicated by the name, for example the FAT16 file system uses 16 bits for each entry while the FAT32 file system uses 32 bits. This difference means that the File Allocation Table of a FAT32 system can map a greater number of clusters than FAT16, allowing for larger partition sizes with FAT32. This also allows for more efficient use of space than FAT16, because on the same hard drive a FAT32 table can address smaller clusters which means less wasted space.

FAT entry values:

FAT12 FAT16 FAT32 Description
0x000 0x0000 0x?0000000 Free Cluster
0x001 0x0001 0x?0000001 Reserved Cluster
0x002 - 0xFEF 0x0002 - 0xFFEF 0x?0000002 - 0x?FFFFFEF Used cluster; value points to next cluster
0xFF0 - 0xFF6 0xFFF0 - 0xFFF6 0x?FFFFFF0 - 0x?FFFFFF6 Reserved values
0xFF7 0xFFF7 0x?FFFFFF7 Bad cluster
0xFF8 - 0xFFF 0xFFF8 - 0xFFFF 0x?FFFFFF8 - 0x?FFFFFFF Last cluster in file

Note that FAT32 uses only 28 bits of the 32 possible bits. The upper 4 bits are usually zero but are reserved and should be left untouched. In the table above these are denoted by a question mark.

Directory table

A directory table is a special type of file that represents a directory (nowadays commonly known as a folder). Each file or directory stored within it is represented by a 32 byte entry in the table. Each entry records the name, extension, attributes (archive, directory, hidden, read-only, system and volume), the date and time of creation, the address of the first cluster of the file/directory's data and finally the size of the file/directory.

Aside from the Root Directory Table in FAT12 and FAT16 file systems which occupies the special Root Directory Region location, all Directory Tables are stored in the Data Region.

Legal characters for DOS file names include the following:

  • Upper case letters A-Z
  • Numbers 0-9
  • Space (though trailing spaces are considered to be padding and not a part of the file name)
  • ! # $ % & ( ) - @ ^ _ ` { } ~ '
  • Values 128-255

The DOS file names are in the OEM character set.

Directory entries, both in the Root Directory Region and in subdirectories, are of the following format:

Byte Offset Length Description
0 8 DOS file name (padded with spaces)

The first byte can have the following special values:

0x00 Entry is available and no subsequent entry is in use
0x05 Initial character is actually 0xE5
0x2E 'Dot' entry; either '.' or '..'
0xE5 Entry has been previously erased and is not available. File undelete utilities must replace this character with a regular character as part of the undeletion process.
8 3 DOS file extension (padded with spaces)
11 1 File Attributes

The first byte can have the following special values:

Bit Mask Description
0 0x01 Read Only
1 0x02 Hidden
2 0x04 System
3 0x08 Volume Label
4 0x10 Subdirectory
5 0x20 Archive
6 0x40 Device (internal use only, never found on disk)
7 0x80 Unused

An attribute value of 0x0F is used to designate a long file name entry.

12 1 Reserved, used by NT
13 1 Fine resolution creation time stamp, in tenths of a second
14 4 Time of Creation
18 2 Last Access Time
20 2 EA-Index (used by OS/2 and NT) in FAT12 and FAT16, High 2 bytes of first cluster number in FAT32
22 4 Last Modified Time
26 2 First cluster in FAT12 and FAT16. Low 2 bytes of first cluster in FAT32.
28 4 File size

Long File Names (LFN) are stored on a FAT file system using a trick - adding phoney entries into the Directory Tables. The entries are marked with a Volume Label attribute which is impossible for a regular file and because of that they're ignored by most old MS-DOS programs. Notably, a directory containing only volume labels is considered as empty and is allowed to be deleted; such a situation appears if files created with long names are deleted from plain DOS. A checksum also allows to detect whether a long file name matches the 8.3 name; such a mismatch could appear if a file was deleted and re-created using DOS in the same directory position. Older versions of PC-DOS mistake LFN names in the root directory for the volume label, and are likely to display an incorrect label.

Each phoney entry can contain up to 13 UTF-16 characters (26 bytes), gaining about 15 bytes in addition to the old 8 + 3 by using fields in the record which contained file size or time stamps (but for security versus disk checking tools the starting cluster field is left unused with a 0 value). See 8.3 for additional explanations.

LFN entries use the following format:


Byte Offset Length Description
0 1 Sequence Number
1 10 Name characters (five UTF-16 characters)
11 1 Attributes (always 0x0F)
12 1 Reserved (always 0x00)
13 1 Checksum of DOS file name
14 12 Name characters (six UTF-16 characters)
26 2 First cluster (always 0x0000)
28 4 Name characters (two UTF-16 characters)

If a filename contains purely lowercase, has no special characters, and fits within the 8.3 limits, a VFAT entry is not created. Instead, some special undocumented bits in byte 13 of the directory entry are used to indicate that the filename should be displayed in lowercase. Few other operating systems support this, because it's risky to support undocumented Microsoft features.

FAT licensing

Microsoft applied for, and was granted, a series of patents for key parts of the FAT file system in the mid-1990s. Being almost universally compatible and well-understood, FAT is frequently chosen as an interchange format for flash media used in digital cameras and PDAs.

On December 3 2003, Microsoft announced it would be offering licenses for use of its FAT specification and "associated intellectual property", at the cost of a US $0.25 royalty per unit sold, with a $250,000 maximum royalty per license agreement.

To this end, Microsoft cited four patents on the FAT filesystem as the basis of its intellectual property claims. All four pertain to long-filename extensions to FAT first seen in Windows 95:

  • U.S. patent 5,745,902 - Method and system for accessing a file using file names having different file name formats. Filed July 6, 1992. This covered a means of generating and associating a short, 8.3-compatible filename with long one (for example, "Microsoft.txt" with "MICROS~1.TXT") and a means of enumerating conflicting short filenames (for example, "MICROS~2.TXT" and "MICROS~3.TXT"). It is unclear whether this patent would cover an implementation of FAT without explicit long filename capabilities. Hard links in UNIX file systems do not appear to be prior art: deleting a FAT file via its long name will also remove its short name. Renaming a file to a "short" name also updates the long file name for coherency; similarly, renaming a file to a "long" name will allocate a new "short" name. In NTFS, hard links and dual names are separate concepts and each hard link can have two names. Finally, at the API level, only one name or the other appears in directory listings; they do not appear as two separate files and do not have to be "matched".
  • U.S. patent 6,286,013 - Method and system for providing a common name space for long and short file names in an operating system. Filed on January 28, 1997. This makes claims on the methods used when Windows 95, Windows 98 and Windows Me expose long filenames to their MS-DOS compatibility layer. It does not appear to affect any non-Microsoft FAT implementations.

Many technical commentators have concluded that these patents only cover FAT implementations that include support for long filenames, and that removable solid state media and consumer devices only using short names would be unaffected.

Additionally, in the document "Microsoft Extensible Firmware Initiative FAT 32 File System Specification, FAT: General Overview of On-Disk Format" published by Microsoft (version 1.03, December 6, 2000), Microsoft specifically grants a number of rights, which many readers have interpreted as permitting operating system vendors to implement FAT.

Appeal

As there has been widespread call for these patents to be re-examined, the Public Patent Foundation submitted evidence to the US Patent and Trade Office (USPTO) disputing the validity of these patents, including prior art references from Xerox and IBM. The USPTO acknowledged that the evidence raised "substantial new question[s] of patentability," and opened an investigation into the validity of Microsoft's FAT patents.

On September 30 2004, the USPTO rejected all claims of U.S. patent 5,579,517, based primarily on evidence provided by the Public Patent Foundation. Dan Ravicher, the foundation's executive director, said "The Patent Office has simply confirmed what we already knew for some time now, Microsoft's FAT patent is bogus."

According to the PUBPAT press release, "Microsoft still has the opportunity to respond to the Patent Office's rejection. Typically, third party requests for reexamination, like the one filed by PUBPAT, are successful in having the subject patent either narrowed or completely revoked roughly 70% of the time."

On October 5, 2005, the Patent Office announced that, following the re-examination process, it had again rejected all claims of patent 5.579,517, and it additionally found U.S. patent 5,758,352 invalid on the grounds that the patent had incorrect assignees.

References

See also