oldlinux-files/study/sabre/os/files/FileSystems/DesignGoalsHPFS/sum.html

<html><head>
<title>HPFS: Summary</title>
</head>

<body>
<center>
<h1>Summary</h1>
</center>

The HPFS solves all of the historical problems of the FAT file system. it
achieves excellent throughput even in extreme casses--many very small files
or a few very large files--by means of advanced data structures and techniques
such as intelligent caching read-ahead and write-behind. Disk space is used
economically because it is managed on a sector basis. Existing application
programs will need modification to take advantage of the HPF'S's support for
extended attributes and long filenames but these changes will not be difficult.
All application programs will benefit from the HPFS's improved performance and
decreased CPU use whether they are modified or not. This article is based on a
prerelease version of the HPFS that was still undergoing modification and
tuning. Therefore the final release of the HPFS may differ in some details
from the description given here.
<p>
Most programmers are at at least passingly famiIiar with the data structure
known as a binary tree. Binary trees are a technique for imposing a logical
ordering on a collection of data items by means of pointers, without regard
to the physical order of the data. In a simple binary tree, each node contains
some data, including a key value that determines the node's logical position
in the tree, as well as pointers to the node's left and right sub trees. The
node that begins the tree is known as the root: the nodes that sit at the
ends of the tree's branches are sometimes called the leaves. To find a
particular piece of data, the binary tree is traversed from the root. At each
node, the desired key is compared with the node's key: if they don't match,
one branch of the node's sub tree or another is selected based on whether the
desired key is less than or greater than the node's key. This process
continues until a match is found or an empty sub tree is encountered
(see <a href="#figa">Figure A</a>).
Such simple binary trees, although easy to understand and implement, have
disadvantages in practice. If keys are not well distributed or are added to
the tree in a non-random fashion, the tree can become quite asymmetric,
leading to wide variations in tree traversal times. In order to make access
times uniform, many programmers prefer a particular type of balanced tree
known as a B- Tree. For the purposes of this discussion, the important
points about a B-Tree are that data is stored in all nodes, more than one
data item might be stored in a node, and all of the branches of the tree
are of identical length (see <a href="#figb">Figure B</a>).
The worst-case behavior of a B-Tree is predictable and much better than that
of a simple binary tree, but the maintenance of a B-Tree is correspondingly
more complex. Adding a new data item, changing a key value, or deleting a
data item may result in the splitting or merging of a node, which in turm
forces a cascade of other operations on the tree to rebalance it. A B+ Tree
is a specialized form of B-Tree that has two types of nodes: internal, which
only point to other nodes, and external, which contain the actual data
(see <a href="#figc">Figure C</a>).
The advantage of a B+ Tree over a B- Tree is that the internal nodes of the
B+Tree can hold many more decision values than the intenmediate-level nodes
of a B-Tree, so the fan out of the tree is faster and the average length of
a branch is shorter. This makes up for the fact that yell must always follow
a B+ Tree branch to its end to get the data for which you are looking, whereas
in a B-Tree you may discover the data at an interme-diate code or even at the
root.

<table><tr>
<th>                                      </th><th align=left>FAT File System          </th><th align=left>High Performance File System</th>
</tr><tr>
<td>Maximum filename length               </td><td>11(in 8.3 format)                   </td><td>254</td>
</tr><tr>
<td>Number of dot (.) delimeters allowed  </td><td>One                                 </td><td>Multiple</td>
</tr><tr>
<td>File Attributes	                  </td><td>Bit flags	                       </td><td>Bit flags plus up to 64Kb of free-form ASCll of binary information</td>
</tr><tr>
<td>Maximium Path Length	          </td><td>64	                               </td><td>260</td>
</tr><tr>
<td>Miniumum disk space overhead per file </td><td>Directory entry  (32 bytes)	       </td><td>Directory entry (length varies) + Fnode (512 bytes)</td>
</tr><tr>
<td>Average wasted space per file	  </td><td>1/2 cluster (typically 2Kb or more) </td><td>1/2 sector (256 bytes)</td>
</tr><tr>
<td>Minimum alocation unit 	          </td><td>Cluster (typically 4Kb or more)     </td><td>Sector (512 bytes)</td>
</tr><tr>
<td>Allocation info for files	          </td><td>Centralized in FAT on home track    </td><td>Located nearby each file in its Fnode</td>
</tr><tr>
<td>Free disk space info	          </td><td>Centralized in FAT on home track    </td><td>Located near free space in bitmaps</td>
</tr><tr>
<td>Free disk space described per byte    </td><td>2Kb ( 1/2 cluster at 8 sectors /clustor)
                                                                                       </td><td>4Kb (8 sectors)</td>
</tr><tr>
<td>Directory structure	                  </td><td>Unsorted linear list, must be searched exhaustivily
                                                                                       </td><td>Sorted B-Tree</td>
</tr><tr>
<td>Directory Location	                  </td><td>Root directory on home track, others scattered
                                                                                       </td><td>Localized near seek center of volume</td>
</tr><tr>
<td>Cache replacement strategy	          </td><td>Simple LRU
                                          	                                       </td><td>Modified LRU, sensitive to data type and usage history</td>
</tr><tr>
<td>Read ahead	                          </td><td>None in MS-DOS 3.3 or earlier, primitive read-ahead optional in MS-DOS 4
                                                                                       </td><td>Always present, sensitive to data type and usage history</td>
</tr><tr>
<td>Write behind	                  </td><td>Not available	               </td><td>Used by default, but can be defeated on per-handle basis</td>
</tr></table>
<p>


<center>
<a href="figa.gif" name="figa">
<img src="figa.gif" alt="[Fig. A]" border=0></a>
</center>
<p>
<b>FIGURE A</b>:
To find a piece of data, the binary tree is traversed from the root untill
the data is found or an empty subtree is encountered.
<p>


<center>
<a href="figb.gif" name="figb">
<img src="figb.gif" alt="[Fig. B]" border=0></a>
</center>
<p>
<b>FIGURE B</b>:
In a balanced B-Tree, data is stored in nodes, more than one data item can
be stored in a node, and all branches of the tree are the same length.
<p>


<center>
<a href="figc.gif" name="figc">
<img src="figc.gif" alt="[Fig. C]" border=0></a>
</center>
<p>
<b>FIGURE C</b>:
A B+ Tree has internal nodes that point to other nodes and external nodes
that contain actual data.


<p>
<hr>

&lt; <a href="app_hpfs.html">[Application Programs and HPFS]</a> |
<a href="hpfs.html">[HPFS Home]</a>

<hr>

<font size=-1>
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
</font>

</body></html>