oldlinux-files/study/sabre/os/files/FileSystems/DesignGoalsHPFS/faultol.html

<html><head>
<title>HPFS: Fault Tolerance</title>
</head>

<body>
<center>
<h1>Fault Tolerance</h1>
</center>

The HPFS's extensive use of lazy writes makes it imperative for the HPFS to
be able to recover gracefully from write errors under any but the most dire
circumstances.  After all, by the time a write is known to have failed, the
application has long since gone on its way under the illusion that it has
safely shipped the data into disk storage. The errors may be detected by the
hardware (such as a "sector not found" error returned by the disk adapter),
or they may be detected by the disk driver in spite of the hardware during a
read-after-write verification of the data. The primary mechanism for handling
write errors is called a hot fix. When an error is detected, the file system
takes a free block out of a reserved hot fix pool, writes the data to that
block, and updates the hot fix map. (The hot fix map is simply a series of
pairs of double words, with each pair containing the number of a bad sector
associated with the number of its hot fix replacement. A pointer to the hot
fix map is maintained in the Spare Block.) A copy of the hot fix map is then
written to disk, and a warning message is displayed to let the user know that
all is not well with the disk device.
<p>

Each time the file system requests a sector read or write from the disk
driver, it scans the hot fix map and replaces any bad sector members with the
corresponding good sector holding the actual data. This look aside translation
of sector numbers is not as expensive as it sounds, since the hot fix list
need only be scanned when a sector is physically read or written, not each
time it is accessed in the cache. One of CHKDSK's duties is to empty the hot
fix map. For each replacement block on the hot fix map, it allocates a new
sector that is in a favorable location for the file that owns the data, moves
the data from the hot fix block to tile newly allocated sector, and updates
the file's allocation information which may involve rebalancing allocation
trees and other elaborate operations). It then adds the bad sector to the bad
block list, releases the replacement sector back to the hot fix pool, deletes
the hot fix entry from the hot fix map, and writes the updated hot fix map to
disk. of course, write errors that can be detected and fixed on the fly are
not the only calamity that can befall a file system. The HPFS designers also
had to consider the inevitable damage to be wreaked by power failures, program
crashes, malicious viruses and Trojan horses, and those users who turn off
the machine without selecting Shut-down in the Presentation Manager Shell.
(Shutdown notifies the file system to flush the disk cache, update directories,
and do whatever else is necessary to bring the disk to a consistent state.)
<p>

The HPFS defends itself against the user who is too abrupt with the Big Red
Switch by maintaining a Dirty FS flag in the Spare Block of each HPFS volume.
The flag is only cleared when all files on the volume have been closed and
all dirty buffers in the cache have been written out or, in the case of the
boot volume since OS2.INI and the swap file are never closed), when Shutdown
has been selected and has completed its work. During the OS/2 boot sequence,
the file system inspects the Dirty FS flag on each HPFS volume and, if the
flag is set, will not allow further access to that volume until CHKDSK has
been run. If the Dirty FS flag is set on the boot volume, the system will
refuse to boot the user must boot OS/2 in maintenance mode from a diskette
and run CHKDSK to check and possibly repair the boot volume. In the event
of a truly major catastrophe, such as loss of the Super Block or the root
directory, the HPFS is designed to give data recovery the best possible
chance of success. Every type of crucial file objects including Fnodes,
allocation sectors, and directory blocks is doubly linked to both its parent
and its children and contains a unique 32-bit signature. Fnodes also contain
the initial pofiion of the name of their file or directory. Consequently,
CHKDSK can rebuild an entire volume by methodically scanning the disk for
Fnodes, allocation sectors, and directory blocks, using them to reconstruct
the files and directories and finally regenerating the freespace bitmaps.

<p>
<hr>

&lt; <a href="perform.html">[Performance Issues]</a> |
<a href="hpfs.html">[HPFS Home]</a> |
<a href="app_hpfs.html">[Application Programs and the HPFS]</a> &gt;

<hr>

<font size=-1>
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
</font>

</body></html>