add directory study
1133
study/sabre/os/files/FileSystems/AFS_info.txt
Normal file
BIN
study/sabre/os/files/FileSystems/Bill_Earl_LSMWS.pdf
Normal file
BIN
study/sabre/os/files/FileSystems/ClassHierarchy.pdf
Normal file
@@ -0,0 +1,73 @@
|
||||
<html><head>
|
||||
<title>HPFS: Application Programs and the HPFS</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Application Programs and the HPFS</h1>
|
||||
</center>
|
||||
|
||||
Each of the OS/2 releases thus far have carried with them a major
|
||||
discontinuity for application programmers who teamed their trade in the
|
||||
MS-DOS environment. In OS/2 1.0, such programmers were faced for the first
|
||||
time with virtual memory, multitasking, inter-process communications, and
|
||||
the protected mode restrictions on addressing and direct control of the
|
||||
hardware and were challenged to master powerful new concepts such as
|
||||
threading and dynamic linking. In OS/2 Version 1.1, the stakes were raised
|
||||
even fufiher. Programmers were offered a powerful hardware-independent
|
||||
graphical user interface but had to restructure their applications
|
||||
drastically for an event-driven environment based on objects and message
|
||||
passing. In OS/2 Version 1.2, it is time for many of the file- oriented
|
||||
programming habits and assumptions carried forward from the MS-DOS
|
||||
environment to fall by the wayside. An application that wishes to take
|
||||
full advantage of the HPFS must allow for long, free-form, mixed-case
|
||||
filenames and paths with few restrictions on punctuation and must be
|
||||
sensitive to the presence of EAs and ACLs.
|
||||
<p>
|
||||
After all, if EAs are to be of any use, it won't suffice for applications
|
||||
to update a file by renaming the old file and creating a new one without
|
||||
also copying the EAs. But the necessary changes for OS/2 Version 1.2 are
|
||||
not tricky to make. A new API function, DosCopy, helps applications create
|
||||
backups--it essentially duplicates an existing file together with its EAs.
|
||||
EAs can also be manipulated explicitly with DosQFileInfo DosSetFileInfo
|
||||
DosQPathlnfo and DosSetPathInfo. A program should call DosQSysInfo at run
|
||||
time to find the maximum possible path length for the system and ensure
|
||||
that all buffers used by DosChDir DosQCurDir and related functions are
|
||||
sufficiently large. Similarly the buffers used by DosOpen DosMove
|
||||
DosGetModName, DosFindFirst D DosFindNext and like functions must allow for
|
||||
longer filenames. Any logic that folds cases in filenames or tests for the
|
||||
occurrence of only one dot delimiter in a filename must be rethought or
|
||||
eliminated. The other changes in the API will not affect the average
|
||||
application. The functions DosQFileInfo DosFindFirst and DosFindNext now
|
||||
retain all three sets of times and dates (created last accessed last
|
||||
motified) for a file on an HPFS volume but few programs are concerned with
|
||||
time and date stamps anyway. DosQFslnfo is used to obtain volume labels or
|
||||
disk characteristics just as before and the use of DosSetFsInfo for volume
|
||||
labels is unchanged. There are a few totally new API functions such as
|
||||
DosFsCtl (analogous to DosDevlOCtl but used for communication between an
|
||||
application and an FSD) DosFsAttach (a sort of explicit mount call) and
|
||||
DosQFsAttach (determines which FSD owns a volume) these are intended mainly
|
||||
for use by disk utility program. In order to prevent old OS/2 applications
|
||||
and MS-DOS applications running in the DOS box from inadvertently damaging
|
||||
HPFS files a new flag bit has been defined in the EXE file header that
|
||||
indicates whether an application is HPFS-aware. If this bit is not set the
|
||||
application will only be able to search for open or create files on HPFS
|
||||
volumes that are compatible with the FAT' file system's 8.3 naming conventions.
|
||||
lf the bit is set OS/2 allows access to all files on an HPFS volume because
|
||||
it assumes that the program knows how to handle long free-form filenames and
|
||||
will take the responsibility of conserving a file's EAs and ACLs.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="faultol.html">[Fault Tolerance]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="sum.html">[Summary]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
43
study/sabre/os/files/FileSystems/DesignGoalsHPFS/design.html
Normal file
@@ -0,0 +1,43 @@
|
||||
<html><head>
|
||||
<title>HPFS: Design</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>
|
||||
Design Goals and Implementation of the<br>
|
||||
New High Performance File System
|
||||
</h1>
|
||||
</center>
|
||||
|
||||
The High Performance File System (hereafter HPFS), which is making its first
|
||||
appearance in the OS/2 operating system Version 1.2, had its genesis in the
|
||||
network division of Microsoft and was designed by Gordon Letwin, the chief
|
||||
architect of the OS/2 operating system. The HPFS has been designed to meet
|
||||
the demands of increasingly powerful PC's, fixed disks, and networks for many
|
||||
years to come and to serve as a suitable platform for object-oriented languages,
|
||||
applications, and user interfaces.
|
||||
The HPFS is a complex topic because it incorporates three distinct yet
|
||||
interrelated file system issues. First, the HPFS is a way of organizing data
|
||||
on a random access block storage device. Second, it is a software module that
|
||||
translates file-oriented requests from an application program into more
|
||||
primitive requests that a device driver can understand, using a variety of
|
||||
creative techniques to maximize performance. Third, the HPFS is a practical
|
||||
illustration of an important new OS/2 feature known as Installable File Systems.
|
||||
This article introduces the three aspects of the HPFS. But first, it puts the
|
||||
HPFS in perspective by reviewing some of the problems that led to the system's
|
||||
existence.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="fat.html">[FAT File System]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
91
study/sabre/os/files/FileSystems/DesignGoalsHPFS/dirs.html
Normal file
@@ -0,0 +1,91 @@
|
||||
<html><head>
|
||||
<title>HPFS: Directories</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Directories</h1>
|
||||
</center>
|
||||
|
||||
Directories like files are anchored on Fnodes. A pointer to the Fnode for
|
||||
the root directory is found in the Super Block The Anodes for directories
|
||||
other than the root are reached through subdirectory entries in their parent
|
||||
directories. Directories can grow to any size and are built up from 2Kb
|
||||
directory blocks, which are allocated as four consecutive sectors on the disk.
|
||||
The file system attempts to allocate directory blocks in the directory band,
|
||||
which is located at or near the seek center of the disk. Once the directory
|
||||
band is full, the directory blocks are allocated wherever space is available.
|
||||
Each 2Kb directory block contains from one to many directory entries.
|
||||
A directory entry contains several fields, including time and date stamps,
|
||||
an Fnode pointer, a usage count for use by disk maintenance programs, the
|
||||
length of the file or directory name, the name itself, and a B-Tree pointer.
|
||||
Each entry begins with a word that contains the length of the entry. This
|
||||
provides for a variable amount of flex space at the end of each entry, which
|
||||
can be used by special versions of the file system and allows the directory
|
||||
block to be traversed very quickly (<a href="#fig5.html">Figure 5</a>).
|
||||
The number of entries in a directory block varies with the length of names.
|
||||
If the average filename length is 13 characters, an average directory block
|
||||
will hold about 40 entries.
|
||||
<p>
|
||||
|
||||
The entries in a directory block are sorted by the binary lexical order of
|
||||
their name fields (this happens to put them in alphabetical order for the US.
|
||||
alphabet). The last entry in a directory block is a dummy record that marks
|
||||
the end of the block. When a directory gets too large to be stored in one
|
||||
block, it increases in size by the addition of 2Kb blocks that are organized
|
||||
as a B-Tree (see B-T tees and B+ Trees ). When searching for a specific name,
|
||||
the file system traverses a directory block until it either finds a match or
|
||||
finds a name that is lexically greater than the target. In the latter case,
|
||||
the file system extracts the Tree pointer from the entry. If there is no
|
||||
pointer, the search failed otherwise the file system follows the pointer to
|
||||
the next directory block in the tree and continues the search. A little
|
||||
back-of-the-envelope arithmetic yields some impressive statistics. Assuming
|
||||
40 entries per block, a two-level tree of directory blocks can hold 1640
|
||||
directory entries and a three-level tree can hold an astonishing 65,640 entries.
|
||||
In other words, a particular file can be found (or shown not to exist) in a
|
||||
typical directory of 65,640 files with a maximum of three disk hits--the
|
||||
actual number of disk accesses depending on cache contents and the location
|
||||
of the file's name in the directory blockB-Tree.That's quite a contrast to
|
||||
the FAT file system, where in the worst case more than 4000 sectors would
|
||||
have to be read to establish that a filewas or was not present in a directory
|
||||
containing the same number of files. The B-Tree directory structure has
|
||||
interesting implications beyond its effect on open and find operations.
|
||||
A file creation, renaming, or deletion may result in a cascade of complex
|
||||
operations, as directory blocks are added or freed or names are moved from
|
||||
one block to the other to keep the tree balanced. In fact, a rename
|
||||
operation could theoretically fail for lack of disk space even though the
|
||||
file itself is not growing. In order to avoid this sort of disaster, the
|
||||
HPFS maintains a small pool of free blocks that can be drawn from in a
|
||||
directory emergency; a pointer to this pool of free blocks is stored in the
|
||||
Spare Block.
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig5.gif" name="fig5">
|
||||
<img src="fig5.gif" alt="[Fig. 5]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 5</b>:
|
||||
Here directories are anchored on an Fnode and are built up from 2Kb directory
|
||||
blocks. The number of entries in a directory block varies because the length
|
||||
of the entries depends on the filename. When a directory requires more than
|
||||
one block the blocks are organized as a B-Tree. This allows a filename to be
|
||||
located very quickly with a small number of disk accesses even when the
|
||||
directory grows very large.
|
||||
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="fnodes.html">[Files and Fnodes]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="ea.html">[Extended Attributes]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
65
study/sabre/os/files/FileSystems/DesignGoalsHPFS/ea.html
Normal file
@@ -0,0 +1,65 @@
|
||||
<html><head>
|
||||
<title>HPFS: Extended Attributes</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Extended Attributes</h1>
|
||||
</center>
|
||||
|
||||
File attributes are information about a file that is maintained by the
|
||||
operating system outside the file's overt storage area. The FAT file
|
||||
system supports only a few simple attributes (read only, system, hidden,
|
||||
and archive) that are actually stored as bit flags in the file's directory
|
||||
entry these attributes are inspected or modified by special function calls
|
||||
and are not accessible through the normal file open, read, and write calls.
|
||||
The HPF'S supports the same attributes as the FAT file system for historical
|
||||
reasons, but it also supports a new form of file- associated, highly
|
||||
generalized information called Extended Attributes (EAs). Each EA is
|
||||
conceptually similar to an environment variable, taking the form
|
||||
(name=value) except that the value portion can be either a null- tenninated
|
||||
(ASCIIZ) string or binary data. In OS/2 1.2, each file or direc-tory can
|
||||
have a maximum of 64Kb of EAs attached to it. This limit may be lifted in
|
||||
a later release of OS/2. The storage method for EAs can vary. If the EAs
|
||||
associated with a given file or directory are small enough, they will be
|
||||
stored right in the Fnode. If the total size of the EAs is too large, they
|
||||
are stored outside the Fnode in sector runs, and a B+ Tree of allocation
|
||||
sectors can be created to describe the runs. If a single EA gets too large,
|
||||
it can be pushed outside the Fnode into a B+ Tree of its own.
|
||||
<p>
|
||||
The kernel API functions DosQFileInfo and DosSetFileInfo have been expanded
|
||||
with new information levels that allow application programs to manipulate
|
||||
extended attributes for files. The new functions DosQPathInfo and
|
||||
DosSetPathInfo are used to read or write the EAs associated with arbitrary
|
||||
path names. An application program can either ask for the value of a
|
||||
specific EA (supplying a name to be matched) or can obtain all of the EAs
|
||||
for the file or directory at once. Although application programs can begin
|
||||
to take advantage of EAs as soon as the HPFS is released, support for EAs
|
||||
is an essential component in Microsoft's long-range plans for object-oriented
|
||||
file systems. Information of almost any type can be stored in EAs, ranging
|
||||
from the name of the application that owns the file to names of dependent
|
||||
files to icons to executable code. As the HPFS evolves, its facilities for
|
||||
manipulating EAs are likely to become much more sophisticated. It's easy to
|
||||
imagine, for example, that in future versions the API might be extended with
|
||||
EA functions that are analogous to DosFindFirst and DosFindNext and EA data
|
||||
might get organized into B-Trees. I should note here that in addition to EAs,
|
||||
the LAN Manager version of HPFS will support another class of fil-associated
|
||||
information called Access Control Lists (ACLs). ACLs have the same general
|
||||
appearance as EAs and are manipulated in a similar manner, but they are used
|
||||
to store access rights, passwords, and other information of interest in a
|
||||
networking multi user environment.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="dirs.html">[Directories]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="ifs.html">[Installable File Systems]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
94
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fat.html
Normal file
@@ -0,0 +1,94 @@
|
||||
<html><head>
|
||||
<title>HPFS: FAT File System</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>FAT File System</h1>
|
||||
</center>
|
||||
|
||||
The so-called FAT file system ,which is the file system used in all versions of the
|
||||
MS-DOS operating system to date and in the first two releases of OS/2' (Versions 1.0
|
||||
and 1.1), has a dual heritage in Microsoft's earliest programming language products
|
||||
and the Digital Research CP/M operating system--software originally written for
|
||||
8080-based and Z-80-based microcomputers. It inherited characteristics from both
|
||||
ancestors that have progressively turned into handicaps in this new era of
|
||||
multitasking, protected mode, virtual memory, and huge fixed disks.
|
||||
<p>
|
||||
|
||||
The FAT file system revolves around the File Allocation Table for which it is
|
||||
named. Each logical volume has its own FAT, which serves two important functions:
|
||||
it contains the allocation information for each file on the volume in the fonn
|
||||
of linked lists of allocation units (clusters, which are power-of-2 multiples
|
||||
of sectors) and it indicates which allocation units are free for assignment to
|
||||
a file that is being created or extended.
|
||||
<p>
|
||||
|
||||
The FAT was invented by Bill Gates and Marc McDonald in 1977 as
|
||||
a method of managing disk space in the NCR version of standalone Microsoft's Disk
|
||||
BASIC. Tim Paterson, at that time an employee of Seattle Computer Products (SCP), was
|
||||
introduced to the FAT concept when his company shared a booth with Microsoft at the
|
||||
National Computer Conference in 1979. Paterson subsequently incorporated FATs into
|
||||
the file system of 86-DOS, an operating system for SCP s S-100 bus 8086 CPU boards.
|
||||
86-DOS was eventually purchased by Micro-soft and became the starting point for
|
||||
MS-DOS Version 1.0, which was released for the original lBM PC in August 1981.
|
||||
<p>
|
||||
When the FAT was conceived, it was an excellent solution to disk management,
|
||||
mainly because the floppy disks on which it was used were rarely larger than
|
||||
1 Mb.
|
||||
On such disks, the FAT was small enough to be held in memory at all times,
|
||||
allowing very fast random access to any part of any file. This proved far
|
||||
superior to the CP/M method of tracking disk space, in which the information
|
||||
about the sectors assigned to a file might be spread across many directory
|
||||
entries, which were in turn scattered randomly throughout the disk directory.
|
||||
When applied to fixed disks, however, the FAT began to look more like a bug
|
||||
than a feature. it became too large to be held entirely resident and had to
|
||||
be paged into memory in pieces: this paging resulted in many superfluous disk
|
||||
head movements as a program was reading through a file and degraded system
|
||||
throughput. in addition, because the information about free disk space was
|
||||
dispersed across many sectors of FAT, it was impractical to allocate file
|
||||
space contiguously, and file fragmentation became another obstacle to good
|
||||
performance. Moreover, the use of relatively large clusters on fixed disks
|
||||
resulted in a lot of dead space, since an average of one- half cluster was
|
||||
wasted for each file. (Some network servers use clusters as large as 64Kb.)
|
||||
<p>
|
||||
|
||||
The FAT file system 's restrictions on naming files and directories are
|
||||
inherited from CP/M. When Paterson was writing 86-DOS one of his primary
|
||||
objectives was to make programs easy to port from CP/M to his new operating
|
||||
system. He therefore adopted CP/M's limits on filenames and extensions so the
|
||||
critical fields of 86-DOS File Control Blocks (FCBs) would look almost exactly
|
||||
like those of CP/M. The sizes of the FCB filename and extension fields were
|
||||
also propagated into the structure of disk directory entries. In due time 86-DOS
|
||||
became MS- DOS and application programs for MS-DOS proliferated beyond anyone's
|
||||
wildest dreams. Since most of the early programs depended on the structure of
|
||||
FCBs the 8.3 format for filenames became irrevocably locked into the system.
|
||||
<p>
|
||||
|
||||
During the last couple of years Microsoft and IBM have made valiant attempts
|
||||
to prolong the useful life of the FAT file system by lifting the restrictions
|
||||
on volume sizes improving allocation strategies caching path names and moving
|
||||
tables and buffers into expanded memory. But these can only be regarded as
|
||||
temporizing measures because the fundamental data structures used by the FAT
|
||||
file system are simply not well suited to large random access devices.
|
||||
The HPFS solves the FAT file system problems mentioned here and many others
|
||||
but it is not derived in any way from the FAT file system. The architect of
|
||||
the HPFS started with a clean sheet of paper and designed a file system that
|
||||
can take full advantage of a multitasking environment and that will be able to
|
||||
cope with any sort of disk device likely to arrive on microcomputers during
|
||||
the next decade.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="design.html">[HPFS Design]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="hpfs_vol.html">[HPFS Volumes]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
@@ -0,0 +1,84 @@
|
||||
<html><head>
|
||||
<title>HPFS: Fault Tolerance</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Fault Tolerance</h1>
|
||||
</center>
|
||||
|
||||
The HPFS's extensive use of lazy writes makes it imperative for the HPFS to
|
||||
be able to recover gracefully from write errors under any but the most dire
|
||||
circumstances. After all, by the time a write is known to have failed, the
|
||||
application has long since gone on its way under the illusion that it has
|
||||
safely shipped the data into disk storage. The errors may be detected by the
|
||||
hardware (such as a "sector not found" error returned by the disk adapter),
|
||||
or they may be detected by the disk driver in spite of the hardware during a
|
||||
read-after-write verification of the data. The primary mechanism for handling
|
||||
write errors is called a hot fix. When an error is detected, the file system
|
||||
takes a free block out of a reserved hot fix pool, writes the data to that
|
||||
block, and updates the hot fix map. (The hot fix map is simply a series of
|
||||
pairs of double words, with each pair containing the number of a bad sector
|
||||
associated with the number of its hot fix replacement. A pointer to the hot
|
||||
fix map is maintained in the Spare Block.) A copy of the hot fix map is then
|
||||
written to disk, and a warning message is displayed to let the user know that
|
||||
all is not well with the disk device.
|
||||
<p>
|
||||
|
||||
Each time the file system requests a sector read or write from the disk
|
||||
driver, it scans the hot fix map and replaces any bad sector members with the
|
||||
corresponding good sector holding the actual data. This look aside translation
|
||||
of sector numbers is not as expensive as it sounds, since the hot fix list
|
||||
need only be scanned when a sector is physically read or written, not each
|
||||
time it is accessed in the cache. One of CHKDSK's duties is to empty the hot
|
||||
fix map. For each replacement block on the hot fix map, it allocates a new
|
||||
sector that is in a favorable location for the file that owns the data, moves
|
||||
the data from the hot fix block to tile newly allocated sector, and updates
|
||||
the file's allocation information which may involve rebalancing allocation
|
||||
trees and other elaborate operations). It then adds the bad sector to the bad
|
||||
block list, releases the replacement sector back to the hot fix pool, deletes
|
||||
the hot fix entry from the hot fix map, and writes the updated hot fix map to
|
||||
disk. of course, write errors that can be detected and fixed on the fly are
|
||||
not the only calamity that can befall a file system. The HPFS designers also
|
||||
had to consider the inevitable damage to be wreaked by power failures, program
|
||||
crashes, malicious viruses and Trojan horses, and those users who turn off
|
||||
the machine without selecting Shut-down in the Presentation Manager Shell.
|
||||
(Shutdown notifies the file system to flush the disk cache, update directories,
|
||||
and do whatever else is necessary to bring the disk to a consistent state.)
|
||||
<p>
|
||||
|
||||
The HPFS defends itself against the user who is too abrupt with the Big Red
|
||||
Switch by maintaining a Dirty FS flag in the Spare Block of each HPFS volume.
|
||||
The flag is only cleared when all files on the volume have been closed and
|
||||
all dirty buffers in the cache have been written out or, in the case of the
|
||||
boot volume since OS2.INI and the swap file are never closed), when Shutdown
|
||||
has been selected and has completed its work. During the OS/2 boot sequence,
|
||||
the file system inspects the Dirty FS flag on each HPFS volume and, if the
|
||||
flag is set, will not allow further access to that volume until CHKDSK has
|
||||
been run. If the Dirty FS flag is set on the boot volume, the system will
|
||||
refuse to boot the user must boot OS/2 in maintenance mode from a diskette
|
||||
and run CHKDSK to check and possibly repair the boot volume. In the event
|
||||
of a truly major catastrophe, such as loss of the Super Block or the root
|
||||
directory, the HPFS is designed to give data recovery the best possible
|
||||
chance of success. Every type of crucial file objects including Fnodes,
|
||||
allocation sectors, and directory blocks is doubly linked to both its parent
|
||||
and its children and contains a unique 32-bit signature. Fnodes also contain
|
||||
the initial pofiion of the name of their file or directory. Consequently,
|
||||
CHKDSK can rebuild an entire volume by methodically scanning the disk for
|
||||
Fnodes, allocation sectors, and directory blocks, using them to reconstruct
|
||||
the files and directories and finally regenerating the freespace bitmaps.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="perform.html">[Performance Issues]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="app_hpfs.html">[Application Programs and the HPFS]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fig1.gif
Normal file
|
After Width: | Height: | Size: 7.0 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fig2.gif
Normal file
|
After Width: | Height: | Size: 2.5 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fig3.gif
Normal file
|
After Width: | Height: | Size: 8.3 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fig4.gif
Normal file
|
After Width: | Height: | Size: 5.1 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fig5.gif
Normal file
|
After Width: | Height: | Size: 9.7 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fig6.gif
Normal file
|
After Width: | Height: | Size: 4.5 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/figa.gif
Normal file
|
After Width: | Height: | Size: 2.2 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/figb.gif
Normal file
|
After Width: | Height: | Size: 2.5 KiB |
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/figc.gif
Normal file
|
After Width: | Height: | Size: 3.4 KiB |
156
study/sabre/os/files/FileSystems/DesignGoalsHPFS/figs.html
Normal file
@@ -0,0 +1,156 @@
|
||||
<html><head>
|
||||
<title>HPFS: Illustrations</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>HPFS: Illustrations</h1>
|
||||
</center>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig1.gif" name="fig1">
|
||||
<img src="fig1.gif" alt="[Fig. 1]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 1</b>:
|
||||
This figure shows the overall structure of an HPFS volume. The most important
|
||||
fixed objects in such a volume are the Bootblock the Super Block, and the
|
||||
Spare Block. The remainder of the volume is divided into 8Mb bands. There is
|
||||
a freespace bitmap for each band and the bitmaps are located between alternate
|
||||
bands consequently, the maximum contiguous space which can be allocated to a
|
||||
file is 16Mb.
|
||||
<li><a href="hpfs_vol.html">HPFS Volume Structure</a>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig2.gif" name="fig2">
|
||||
<img src="fig2.gif" alt="[Fig. 2]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 2</b>:
|
||||
This figure shows the overall structure of an Fnode. The Fnode is the
|
||||
fundamental object in an HPFS volume and is the first sector allocated to a
|
||||
file or directory. it contains control and access history information used
|
||||
by the file system, cached EAs and ACLs or pointers to same, a truncated
|
||||
copy of the file or directory name (to aid disk repair programs, and an
|
||||
allocation structure which defines the size and location of the file's storage.
|
||||
<li><a href="fnodes.html">Files and FNodes</a>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig3.gif" name="fig3">
|
||||
<img src="fig3.gif" alt="[Fig. 3]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 3</b>:
|
||||
The simplest form of tracking for the sectors owned by a file is shown. The
|
||||
Fnode s allocation structure points directly to as many as eight sector runs.
|
||||
Each run pointer consists of a pair of 32-bit doublewords: a starting sector
|
||||
number and a length !n sectors.
|
||||
<li><a href="fnodes.html">Files and FNodes</a>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig4.gif" name="fig4">
|
||||
<img src="fig4.gif" alt="[Fig. 4]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 4</b>:
|
||||
This figure demonstrates the technique used to track the sectors owned by a
|
||||
file with 9-480 sector runs. The allocation structure in the Fnode holds the
|
||||
roots for a B+ Tree of allocation sectors. Each allocation sector can describe
|
||||
as many as 40 sector runs. lf the file requires more than 480 sector runs,
|
||||
additional intermediate levels are added to the B+ Tree, which increases the
|
||||
number of possible sector runs by a factor of sixty for each new !evel.
|
||||
<li><a href="fnodes.html">Files and FNodes</a>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig5.gif" name="fig5">
|
||||
<img src="fig5.gif" alt="[Fig. 5]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 5</b>:
|
||||
Here directories are anchored on an Fnode and are built up from 2Kb directory
|
||||
blocks. The number of entries in a directory block varies because the length
|
||||
of the entries depends on the filename. When a directory requires more than
|
||||
one block the blocks are organized as a B-Tree. This allows a filename to be
|
||||
located very quickly with a small number of disk accesses even when the
|
||||
directory grows very large.
|
||||
<li><a href="dirs.html">Directories</a>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig6.gif" name="fig6">
|
||||
<img src="fig6.gif" alt="[Fig. 6]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 6</b>:
|
||||
A simplified sketch of the relationship between an application program, the
|
||||
OS/2 kernel, an installable file system, a disk drlver, and the physical disk
|
||||
device. The applicatIon issues logical file requests to the OS/2 kernel by
|
||||
calling the entry points for DosOpen, DosRead, DosWrlte, DosChgFilePtr, and
|
||||
so on. The kernel passes these requests to the appropriate installable file
|
||||
system for the volume holding the file. The installable file system translates
|
||||
the logical file requests into requests for reads or writes of logical sectors
|
||||
and calls a kernel File System Helper (FsHlp) to pass these requests to the
|
||||
appropriate disk drlver. The disk driver transforms the logical sector
|
||||
requests into requests for specific physical units, cylinders heads, and
|
||||
sectors, and issues commands to the disk adapter to transfer data between the
|
||||
disk and memory.
|
||||
<li><a href="ifs.html">Installable File Systems</a>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="figa.gif" name="figa">
|
||||
<img src="figa.gif" alt="[Fig. A]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE A</b>:
|
||||
To find a piece of data, the binary tree is traversed from the root until
|
||||
the data is found or an empty subtree is encountered.
|
||||
<li><a href="sum.html">Summary</a>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="figb.gif" name="figb">
|
||||
<img src="figb.gif" alt="[Fig. B]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE B</b>:
|
||||
In a balanced B-Tree, data is stored in nodes, more than one data item can
|
||||
be stored in a node, and all branches of the tree are the same length.
|
||||
<li><a href="sum.html">Summary</a>
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="figc.gif" name="figc">
|
||||
<img src="figc.gif" alt="[Fig. C]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE C</b>:
|
||||
A B+ Tree has internal nodes that point to other nodes and external nodes
|
||||
that contain actual data.
|
||||
<li><a href="sum.html">Summary</a>
|
||||
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
<a href="hpfs.html">[HPFS Home]</a>
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
122
study/sabre/os/files/FileSystems/DesignGoalsHPFS/fnodes.html
Normal file
@@ -0,0 +1,122 @@
|
||||
<html><head>
|
||||
<title>HPFS: Files and FNodes</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Files and FNodes</h1>
|
||||
</center>
|
||||
|
||||
Every file or directory on an HPFS volume is anchored on a fundamental file
|
||||
system object called an Fnode (pronounced "eff node"). Each Fnode occupies a
|
||||
single sector and contains control and access history information used
|
||||
internally by the file system extended attributes and access control lists
|
||||
(more about this later) the length and the first 15 characters of the name
|
||||
of the associated file or directory and an allocation structure
|
||||
(<a href="#fig2">Figure 2</a>).
|
||||
<p>
|
||||
|
||||
An Fnode is always stored near the file or directory that it represents.
|
||||
The allocation structure in the Fnode can take several forms depending on the
|
||||
size and degree of contiguity of the file or directory.
|
||||
The HPFS views a file as a collection of one or more runs or extents of one
|
||||
or more contiguous sectors. Each run is symbolized by a pair of
|
||||
double-words--a 32-bit starting sector number and a 32-bit length in sectors
|
||||
(this is referred to as run length encoding).
|
||||
From an application program's point of view the extents are invisible; the
|
||||
file appears as a seamless stream of bytes. The space reserved for allocation
|
||||
information in an Fnode can hold pointers to as many aseight runs of sectors
|
||||
of up to 16Mb each . (This maximum run size is a result of the band size and
|
||||
free space bitmap placement only; it is not an inherent limitation of the file
|
||||
system.)
|
||||
Reasonably small files or highly contiguous files can therefore be described
|
||||
completely within the Fnode (<a href="#fig3">Figure 3</a>).
|
||||
<p>
|
||||
|
||||
HPFS uses a new method to represent the location of files that are too large
|
||||
or too frag-mented for the Fnode and consist of more than eight runs.
|
||||
The Fnode's allocation structure becomes the root for a B+ Tree of allocation
|
||||
sectors which in turn contain the actual pointers to the file's sector runs
|
||||
(see <a href="#fig4">Figure 4</a> and the sidebar, "B-Trees and B+ Trees").
|
||||
The Fnode's root has room for 12 elements. Each allocation sector can contain,
|
||||
in addition to various control information, as many as 40 pointers to sector
|
||||
runs. Therefore a two-level allocation B+ Tree can describe a file of 480
|
||||
(12x40) sector runs with a theoretical maximum size of 7.68Gb (12x40x16Mb)
|
||||
in the current implementation (although the 32-bit signed offset parameter
|
||||
for DosChgFilePtr effectively limits the sizes to 2Gb).
|
||||
In the unlikely event that a two-level B+ Tree is not sufficient to describe
|
||||
the highly fragmented file the file system will introduce additional levels
|
||||
in the tree as needed. Allocation sectors in the intermediate levels can hold
|
||||
as many as 60 intemal (nonterminal) B+ Tree nodes which means that the
|
||||
descriptive ability of this structure rapidly grows to numbers that are nearly
|
||||
beyond comprehension. For example a three level allocation B+ tree can describe
|
||||
a file with as many as 28 800 (12x60x40) sector runs. Run-length encoding and
|
||||
B+ Trees of allocation sectors are a memory-efficient way to specify a file's
|
||||
size and location but they have other s significant advantages.
|
||||
Translating a logical file offset into a sector number is extremely fast:
|
||||
the file system just needs to traverse the list (or B+ Tree of lists)
|
||||
of run pointers until it finds the correct range. It can then identify the
|
||||
sector within the run with a simple calculation.
|
||||
Run-length encoding also makes it trivial to extend the file logically if
|
||||
the newly assigned sector iscontiguous with the file's previous last sector
|
||||
the file system merely needs to increment the size double word of the file's
|
||||
last run pointer and clear the sector's bit in the appropriate freespace bitmap.
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig2.gif" name="fig2">
|
||||
<img src="fig2.gif" alt="[Fig. 2]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 2</b>:
|
||||
This figure shows the overall structure of an Fnode. The Fnode is the
|
||||
fundamental object in an HPFS volume and is the first sector allocated to a
|
||||
file or directory. it contains control and access history information used
|
||||
by the file system, cached EAs and ACLs or pointers to same, a truncated
|
||||
copy of the file or directory name (to aid disk repair programs, and an
|
||||
allocation structure which defines the size and location of the file's storage.
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig3.gif" name="fig3">
|
||||
<img src="fig3.gif" alt="[Fig. 3]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 3</b>:
|
||||
The simplest form of tracking for the sectors owned by a file is shown. The
|
||||
Fnode s allocation structure points directly to as many as eight sector runs.
|
||||
Each run pointer consists of a pair of 32-bit doublewords: a starting sector
|
||||
number and a length !n sectors.
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig4.gif" name="fig4">
|
||||
<img src="fig4.gif" alt="[Fig. 4]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 4</b>:
|
||||
This figure demonstrates the technique used to track the sectors owned by a
|
||||
file with 9-480 sector runs. The allocation structure in the Fnode holds the
|
||||
roots for a B+ Tree of allocation sectors. Each allocation sector can describe
|
||||
as many as 40 sector runs. If the file requires more than 480 sector runs,
|
||||
additional intermediate levels are added to the B+ Tree, which increases the
|
||||
number of possible sector runs by a factor of sixty for each new level.
|
||||
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="hpfs_vol.html">[HPFS Volume Structure]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="dirs.html">[Directories]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
BIN
study/sabre/os/files/FileSystems/DesignGoalsHPFS/hpfs.inf
Normal file
@@ -0,0 +1,70 @@
|
||||
<html><head>
|
||||
<title>HPFS: HPFS Volume Structure</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>HPFS Volume Structure</h1>
|
||||
</center>
|
||||
|
||||
HPFS volumes are a new partition type--type 7--and can exist on a fixed disk
|
||||
alongside of the several previously defined FAT partition types.
|
||||
IBM-compatible HPFS volumes use a sector size of 512 bytes and have a maximum
|
||||
size of 2199Gb (232 sectors).
|
||||
Although there is no particular reason why floppy disks can't be formatted
|
||||
as HPFS volumes Microsoft plans to stick with FAT file systems on floppy disks
|
||||
for the foreseeable future.
|
||||
(This ensures that users will be able to transport files easily between MS-DOS
|
||||
and OS/2 systems.)
|
||||
An HPFS volume has very few fixed structures (<a href="#fig1">Figure 1</a>).
|
||||
Sectors 0-15 of a volume (8Kb) are the Bootblock and contain a volume name,
|
||||
32-bit volume ID, and a disk bootstrap program. The bootstrap is relatively
|
||||
sophisticated (by MS-DOS standards) and can use the HPFS in a restricted
|
||||
mode to locate and read the operating system files wherever they might be found.
|
||||
Sectors 16 and 17 are known as the Super Block and the Spare Block respectively.
|
||||
The Super Block is only modified by disk maintenance utilities.
|
||||
It contains pointers to the free space bitmaps the bad block list the directory
|
||||
block band and the root directory.
|
||||
It also contains the date that the volume was last checked out and repaired
|
||||
with CHKDSK /F. The Spare Block contains various flags and pointers that
|
||||
will be discussed later it is modified although infrequently as the system
|
||||
executes. The remainder of the disk is divided into 8Mb bands.
|
||||
Each band has its own free space bitmap in which a bit represents each sector.
|
||||
A bit is 0 if the sector is in use and 1 if the sector is available.
|
||||
The bitmaps are located at the head or tail of a band so that two bitmaps are
|
||||
adjacent between alternate bands. This allows the maximum contiguous free space
|
||||
that can be allocated to a file to be 16Mb. One band located at or toward the
|
||||
seek center of the disk is called the directory block band and receives
|
||||
special treatment (more about this later). Note that the band size is a
|
||||
characteristic of the current implementation and may be changed in later
|
||||
versions of the file system.
|
||||
<p>
|
||||
|
||||
<center>
|
||||
<a href="fig1.gif" name="fig1">
|
||||
<img src="fig1.gif" alt="[Fig. 1]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 1</b>.
|
||||
This figure shows the overall structure of an HPFS volume.
|
||||
The most important fixed objects in such a volume are the Bootblock the Super
|
||||
Block, and the Spare Block.
|
||||
The remainder of the volume is divided into 8Mb bands.
|
||||
There is a freespace bitmap for each band and the bitmaps are located between
|
||||
alternate bands consequently, the maximum contiguous space which can be
|
||||
allocated to a file is 16Mb.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="fat.html">[FAT File System]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="fnodes.html">[Files and Fnodes]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
81
study/sabre/os/files/FileSystems/DesignGoalsHPFS/ifs.html
Normal file
@@ -0,0 +1,81 @@
|
||||
<html><head>
|
||||
<title>HPFS: Installable File Systems</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Installable File Systems</h1>
|
||||
</center>
|
||||
|
||||
Support for installable file systems has been one of the most eagerly
|
||||
anticipated features of OS/2 Version 1.2. it will make it possible to
|
||||
access multiple incompatible volume structures--FAT, HPFS, CD ROM, and
|
||||
perhaps even UNIX on the same OS/2 system at the same time, will
|
||||
simplify the life of network implementors, and will open the door to
|
||||
rapid file system evolution and innovation. Installable file systems
|
||||
are, however, only relevant to the HPFS insofar as they make use of
|
||||
the HPFS optional. The FAT file system is still embedded in the OS/2
|
||||
kernel, as it was in OS/2 1.0 and 1.1, and will remain there as the
|
||||
compatibility file system for some time to come. An installable file
|
||||
system driver (FSD) is analogous in many ways to a device driver. An
|
||||
FSD resides on the disk in a file that is structured like a
|
||||
dynamic-link library (DLL), typically with a SYS or IFS extension,
|
||||
and is loaded during system initialization by <tt>IFS=</tt> statements
|
||||
in the <tt>CONFIG.SYS</tt> file. <tt>IFS=</tt> directives are processed
|
||||
in the order they are encountered and are also sensitive to the order of
|
||||
<tt>DEVlCE=</tt> statements for device drivers. This lets you load a
|
||||
device driver for a nonstandard device, load a file system driver from
|
||||
a volume on that device, and so on. Once an FSD is installed and
|
||||
initialized, the kernel communicates with it in terms of logical requests
|
||||
for file opens, reads, writes, seeks, closes, and so on.
|
||||
The FSD translates these requests--using control structures and tables
|
||||
found on the volume itself--into requests for sector reads and writes for
|
||||
which it can call special kernel entry points called File System Helpers
|
||||
(FsHlps). The kernel passes the demands for sector I/O to the appropriate
|
||||
device driver and returns the results to the FSD
|
||||
(<a href="#fig6">Figure 6</a>).
|
||||
The procedure used by the operating system to associate volumes with
|
||||
FSDs is called dynamic mounting and works as follows. Whenever a volume
|
||||
is first accessed, or after it has been locked for direct access and then
|
||||
unlocked (for example, by a FORMAT operation), OS/2 presents identifying
|
||||
information from the volume to each of the FSDs in turn until one of them
|
||||
recognizes the information. When an FSD claims the volume, the volume is
|
||||
mounted and all subsequent file I/O requests for the volume are routed to
|
||||
that FSD.
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="fig6.gif" name="fig6">
|
||||
<img src="fig6.gif" alt="[Fig. 6]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE 6</b>:
|
||||
A simplified sketch of the relationship between an application program, the
|
||||
OS/2 kernel, an installable file system, a disk drlver, and the physical
|
||||
disk device. The applicatIon issues logical file requests to the OS/2 kernel
|
||||
by callng the entry points for DosOpen, DosRead, DosWrlte, DosChgFilePtr, and
|
||||
so on. The kernel passes these requests to the appropriate installable file
|
||||
system for the volume holding the file. The installable file system translates
|
||||
the logical file requests into requests for reads or writes of logical sectors
|
||||
and calls a kernel File System Helper (FsHlp) to pass these requests to the
|
||||
appropriate disk drlver. The disk driver transforms the logical sector requests
|
||||
into requests for specific physical units, cylinders heads, and sectors, and
|
||||
issues commands to the disk adapter to transfer data between the disk and
|
||||
memory.
|
||||
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="ea.html">[Extended Attributes]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="perform.html">[Performance Issues]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
38
study/sabre/os/files/FileSystems/DesignGoalsHPFS/index.html
Normal file
@@ -0,0 +1,38 @@
|
||||
<html><head>
|
||||
<title>HPFS</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>HPFS</h1>
|
||||
<b>High Performance File System</b>
|
||||
</center>
|
||||
|
||||
<p>
|
||||
Also available in OS/2 IPF format: <a href="hpfs.inf">hpfs.inf</a>.
|
||||
|
||||
<ul>
|
||||
<li><a href="design.html">Design Goals and implementation of the new
|
||||
High Performance File System</a>
|
||||
<li><a href="fat.html">FAT File System</a>
|
||||
<li><a href="hpfs_vol.html">HPFS Volume Structure</a>
|
||||
<li><a href="fnodes.html">Files and Fnodes</a>
|
||||
<li><a href="dirs.html">Directories</a>
|
||||
<li><a href="ea.html">Extended Attributes</a>
|
||||
<li><a href="ifs.html">Installable File Systems</a>
|
||||
<li><a href="perform.html">Performance issues</a>
|
||||
<li><a href="faultol.html">Fault Tolerance</a>
|
||||
<li><a href="app_hpfs.html">Application Programs and the HPFS</a>
|
||||
<li><a href="sum.html">Summary</a>
|
||||
<p>
|
||||
<li><a href="figs.html">Collection of Illustrations</a>
|
||||
</ul>
|
||||
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
@@ -0,0 +1,92 @@
|
||||
<html><head>
|
||||
<title>HPFS: Performance issues</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Performance issues</h1>
|
||||
</center>
|
||||
|
||||
The HPFS attacks potential bofflenecks in disk throughput at multiple levels.
|
||||
It uses advanced data structures contiguous sector allocation, intelligent
|
||||
caching, read-ahead, and deffered writes in order to boost performance.
|
||||
First, the HPFS matches its data structures to the task at hand:
|
||||
sophisticated data structures (B-Trees and B+ Trees) for fast random access
|
||||
to filenames, directory names, and lists of sectors allocated to files or
|
||||
directories, and simple compact data structures (bitmaps) for locating chunks
|
||||
of free space of the appropriate size. The routines that manipulate these
|
||||
data structures are written in assembly language and have been painstakingly
|
||||
tuned, with special focus on the routines that search the freespace bitmaps
|
||||
for patterns of set bits (unused sectors). Next, the HPFS's main goal --its
|
||||
prime directive, if you will -- is to assign consecutive sectors to files
|
||||
whenever possible. The time required to move the disk's readowrite head from
|
||||
one track to another far out-weighs the other possible delays, so the HPFS
|
||||
works hard to avoid or minimize such head movements by allocating file space
|
||||
contiguously and by keeping control structures such as Fnodes and freespace
|
||||
bitmaps near the things they control.
|
||||
<p>
|
||||
|
||||
Highly contiguous files also help the file system make fewer requests of the
|
||||
disk driver for more sectors at a time, allow the disk driver to exploit the
|
||||
multisector transfer capabilities of the disk controller, and reduce the
|
||||
number of disk completion interrupts that must be serviced. Of course, trying
|
||||
to keep files from becoming fragmented in amultitasking system in which many
|
||||
files are being updated concurrently is no easy chore. One strategy the HPFS
|
||||
uses is to scatter newly created files across the disk--in separate bands,
|
||||
if poosible-so that the sectors allocated to the files as they are extended
|
||||
will not be interleaved. Another strategy is to reallocate approximately 4Kb
|
||||
of contiguous space to the file each time it must be extended and give back
|
||||
any excess when the file is closed. If an application knows the ultimate size
|
||||
of a new file in advance, it can assist the file system by specifying an
|
||||
initial file allocation when it creates the file. The system will then search
|
||||
all the free space bitmaps to find a run of consecutive sectors large enough
|
||||
to hold the file. That failing, it will search for two runs that are half
|
||||
the size of the file, and so on.
|
||||
<p>
|
||||
|
||||
The HPFS relies on several different kinds of caching to minimize the number
|
||||
of physical disk transfers it must request. Naturally, it caches sectors, as
|
||||
did the FAT file system. But unlike the FAT file system, the HPFS can manage
|
||||
very large caches efficiently and adjusts sector caching on a per handle basis
|
||||
to the manner in which a file is used. The HPFS also caches path names and
|
||||
directories, transforming disk directory entries into an even more compact and
|
||||
efficient in-memory representation. Another technique that the HPFS uses to
|
||||
improve performance is to preread data it believes the program is likely to
|
||||
need. For example, when a file is opened, the file system will pre-read and
|
||||
cache the Fnode and the first few sectors of the file's contents. If the file
|
||||
is an executable program or the history information in the file's Fnode shows
|
||||
that an open operation has typically been followed by an immediate sequential
|
||||
read of the entire file, the file system will preread and cache much more of
|
||||
the file's contents. When a program issues relatively small read requests, the
|
||||
file system always fetches data from the file in 2Kb chunks and caches the
|
||||
excess, allowing most read operations to be satisfied from the cache. Finally,
|
||||
the OS/2 operating system's support for multitasking makes it possible for the
|
||||
HPFS to rely heavily on lazy writes (sometimes called deferred writes or write
|
||||
behind) to improve performance. When a program requests a disk write, the data
|
||||
is placed in the cache and the cache buffer is flagged as dirty (that is,
|
||||
inconsistent with the state of the data on disk). When the disk becomes idle
|
||||
or the cache becomes saturated with dirty buffers, the file system uses a
|
||||
captive thread from a daemon process to write the buffers to disk, starting
|
||||
with the oldest data. In general, lazy writes mean that programs run faster
|
||||
because their read requests will almost never be stalled waiting for a write
|
||||
request to complete. For programs that repeatedly read, modify, and write a
|
||||
small working set of records, it also means that many unnecessary or redundant
|
||||
physical disk writes may be avoided. Lazy writes have their dangers, of course,
|
||||
so a program can defeat them on a per-handle basis by setting the write-through
|
||||
flag in the Open Mode parameter for DosOpen or it can commit data to disk on a
|
||||
per-handle basis with the DosBufReset function.
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="ifs.html">[Installable File Systems]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a> |
|
||||
<a href="faultol.html">[Fault Tolerance]</a> >
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
147
study/sabre/os/files/FileSystems/DesignGoalsHPFS/sum.html
Normal file
@@ -0,0 +1,147 @@
|
||||
<html><head>
|
||||
<title>HPFS: Summary</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<center>
|
||||
<h1>Summary</h1>
|
||||
</center>
|
||||
|
||||
The HPFS solves all of the historical problems of the FAT file system. it
|
||||
achieves excellent throughput even in extreme casses--many very small files
|
||||
or a few very large files--by means of advanced data structures and techniques
|
||||
such as intelligent caching read-ahead and write-behind. Disk space is used
|
||||
economically because it is managed on a sector basis. Existing application
|
||||
programs will need modification to take advantage of the HPF'S's support for
|
||||
extended attributes and long filenames but these changes will not be difficult.
|
||||
All application programs will benefit from the HPFS's improved performance and
|
||||
decreased CPU use whether they are modified or not. This article is based on a
|
||||
prerelease version of the HPFS that was still undergoing modification and
|
||||
tuning. Therefore the final release of the HPFS may differ in some details
|
||||
from the description given here.
|
||||
<p>
|
||||
Most programmers are at at least passingly famiIiar with the data structure
|
||||
known as a binary tree. Binary trees are a technique for imposing a logical
|
||||
ordering on a collection of data items by means of pointers, without regard
|
||||
to the physical order of the data. In a simple binary tree, each node contains
|
||||
some data, including a key value that determines the node's logical position
|
||||
in the tree, as well as pointers to the node's left and right sub trees. The
|
||||
node that begins the tree is known as the root: the nodes that sit at the
|
||||
ends of the tree's branches are sometimes called the leaves. To find a
|
||||
particular piece of data, the binary tree is traversed from the root. At each
|
||||
node, the desired key is compared with the node's key: if they don't match,
|
||||
one branch of the node's sub tree or another is selected based on whether the
|
||||
desired key is less than or greater than the node's key. This process
|
||||
continues until a match is found or an empty sub tree is encountered
|
||||
(see <a href="#figa">Figure A</a>).
|
||||
Such simple binary trees, although easy to understand and implement, have
|
||||
disadvantages in practice. If keys are not well distributed or are added to
|
||||
the tree in a non-random fashion, the tree can become quite asymmetric,
|
||||
leading to wide variations in tree traversal times. In order to make access
|
||||
times uniform, many programmers prefer a particular type of balanced tree
|
||||
known as a B- Tree. For the purposes of this discussion, the important
|
||||
points about a B-Tree are that data is stored in all nodes, more than one
|
||||
data item might be stored in a node, and all of the branches of the tree
|
||||
are of identical length (see <a href="#figb">Figure B</a>).
|
||||
The worst-case behavior of a B-Tree is predictable and much better than that
|
||||
of a simple binary tree, but the maintenance of a B-Tree is correspondingly
|
||||
more complex. Adding a new data item, changing a key value, or deleting a
|
||||
data item may result in the splitting or merging of a node, which in turm
|
||||
forces a cascade of other operations on the tree to rebalance it. A B+ Tree
|
||||
is a specialized form of B-Tree that has two types of nodes: internal, which
|
||||
only point to other nodes, and external, which contain the actual data
|
||||
(see <a href="#figc">Figure C</a>).
|
||||
The advantage of a B+ Tree over a B- Tree is that the internal nodes of the
|
||||
B+Tree can hold many more decision values than the intenmediate-level nodes
|
||||
of a B-Tree, so the fan out of the tree is faster and the average length of
|
||||
a branch is shorter. This makes up for the fact that yell must always follow
|
||||
a B+ Tree branch to its end to get the data for which you are looking, whereas
|
||||
in a B-Tree you may discover the data at an interme-diate code or even at the
|
||||
root.
|
||||
|
||||
<table><tr>
|
||||
<th> </th><th align=left>FAT File System </th><th align=left>High Performance File System</th>
|
||||
</tr><tr>
|
||||
<td>Maximum filename length </td><td>11(in 8.3 format) </td><td>254</td>
|
||||
</tr><tr>
|
||||
<td>Number of dot (.) delimeters allowed </td><td>One </td><td>Multiple</td>
|
||||
</tr><tr>
|
||||
<td>File Attributes </td><td>Bit flags </td><td>Bit flags plus up to 64Kb of free-form ASCll of binary information</td>
|
||||
</tr><tr>
|
||||
<td>Maximium Path Length </td><td>64 </td><td>260</td>
|
||||
</tr><tr>
|
||||
<td>Miniumum disk space overhead per file </td><td>Directory entry (32 bytes) </td><td>Directory entry (length varies) + Fnode (512 bytes)</td>
|
||||
</tr><tr>
|
||||
<td>Average wasted space per file </td><td>1/2 cluster (typically 2Kb or more) </td><td>1/2 sector (256 bytes)</td>
|
||||
</tr><tr>
|
||||
<td>Minimum alocation unit </td><td>Cluster (typically 4Kb or more) </td><td>Sector (512 bytes)</td>
|
||||
</tr><tr>
|
||||
<td>Allocation info for files </td><td>Centralized in FAT on home track </td><td>Located nearby each file in its Fnode</td>
|
||||
</tr><tr>
|
||||
<td>Free disk space info </td><td>Centralized in FAT on home track </td><td>Located near free space in bitmaps</td>
|
||||
</tr><tr>
|
||||
<td>Free disk space described per byte </td><td>2Kb ( 1/2 cluster at 8 sectors /clustor)
|
||||
</td><td>4Kb (8 sectors)</td>
|
||||
</tr><tr>
|
||||
<td>Directory structure </td><td>Unsorted linear list, must be searched exhaustivily
|
||||
</td><td>Sorted B-Tree</td>
|
||||
</tr><tr>
|
||||
<td>Directory Location </td><td>Root directory on home track, others scattered
|
||||
</td><td>Localized near seek center of volume</td>
|
||||
</tr><tr>
|
||||
<td>Cache replacement strategy </td><td>Simple LRU
|
||||
</td><td>Modified LRU, sensitive to data type and usage history</td>
|
||||
</tr><tr>
|
||||
<td>Read ahead </td><td>None in MS-DOS 3.3 or earlier, primitive read-ahead optional in MS-DOS 4
|
||||
</td><td>Always present, sensitive to data type and usage history</td>
|
||||
</tr><tr>
|
||||
<td>Write behind </td><td>Not available </td><td>Used by default, but can be defeated on per-handle basis</td>
|
||||
</tr></table>
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="figa.gif" name="figa">
|
||||
<img src="figa.gif" alt="[Fig. A]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE A</b>:
|
||||
To find a piece of data, the binary tree is traversed from the root untill
|
||||
the data is found or an empty subtree is encountered.
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="figb.gif" name="figb">
|
||||
<img src="figb.gif" alt="[Fig. B]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE B</b>:
|
||||
In a balanced B-Tree, data is stored in nodes, more than one data item can
|
||||
be stored in a node, and all branches of the tree are the same length.
|
||||
<p>
|
||||
|
||||
|
||||
<center>
|
||||
<a href="figc.gif" name="figc">
|
||||
<img src="figc.gif" alt="[Fig. C]" border=0></a>
|
||||
</center>
|
||||
<p>
|
||||
<b>FIGURE C</b>:
|
||||
A B+ Tree has internal nodes that point to other nodes and external nodes
|
||||
that contain actual data.
|
||||
|
||||
|
||||
<p>
|
||||
<hr>
|
||||
|
||||
< <a href="app_hpfs.html">[Application Programs and HPFS]</a> |
|
||||
<a href="hpfs.html">[HPFS Home]</a>
|
||||
|
||||
<hr>
|
||||
|
||||
<font size=-1>
|
||||
Html'ed by <a href="http://www.seds.org/~spider/">Hartmut Frommert</a>
|
||||
</font>
|
||||
|
||||
</body></html>
|
||||
BIN
study/sabre/os/files/FileSystems/Ext2fs-overview-0.1.pdf
Normal file
BIN
study/sabre/os/files/FileSystems/FatFormat.pdf
Normal file
BIN
study/sabre/os/files/FileSystems/HPFS/fig1.gif
Normal file
|
After Width: | Height: | Size: 3.6 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig10.gif
Normal file
|
After Width: | Height: | Size: 4.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig10_4.gif
Normal file
|
After Width: | Height: | Size: 4.3 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig1_4.gif
Normal file
|
After Width: | Height: | Size: 2.9 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig2.gif
Normal file
|
After Width: | Height: | Size: 4.2 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig2_4.gif
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig3.gif
Normal file
|
After Width: | Height: | Size: 8.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig3_4.gif
Normal file
|
After Width: | Height: | Size: 2.2 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig4.gif
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig4_4.gif
Normal file
|
After Width: | Height: | Size: 3.2 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig5.gif
Normal file
|
After Width: | Height: | Size: 13 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig6.gif
Normal file
|
After Width: | Height: | Size: 2.8 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig6_3.gif
Normal file
|
After Width: | Height: | Size: 9.5 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig8.gif
Normal file
|
After Width: | Height: | Size: 9.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig9.gif
Normal file
|
After Width: | Height: | Size: 9.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig9a.gif
Normal file
|
After Width: | Height: | Size: 6.0 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig9b.gif
Normal file
|
After Width: | Height: | Size: 7.4 KiB |
238
study/sabre/os/files/FileSystems/HPFS/hpfs0.html
Normal file
@@ -0,0 +1,238 @@
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 0: Preface</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
<P>
|
||||
I am not a programmer's backside but I am an enthusiast interested in
|
||||
finding out more about HPFS. There is so little detailed information
|
||||
available on HPFS that I think you will find this modest series
|
||||
instructive. The REXX programs to be presented are functional but they
|
||||
are not particularly pleasing in an aesthetic sense. However they do
|
||||
ferret out information and will help you to understand what is going on.
|
||||
I'm sure that a programming guru, once motivated, could come up with
|
||||
superior versions. Hopefully they will. This installment originally
|
||||
appeared at the OS2Zone web site (http://www.os2zone.aus.net).
|
||||
|
||||
<P>
|
||||
I've been asked [by someone else. Ed.] to write a preface to this series.
|
||||
Normally I prefer to write on little-covered topics whereas much of what I'm
|
||||
going to discuss in this installment often appears in a cursory examination of
|
||||
the HPFS. The trouble with most of what has been written about HPFS in books on
|
||||
OS/2 is that the topic is never considered very deeply. After finishing working
|
||||
your way through this series (still being written on a monthly basis, but
|
||||
expected to occupy eight parts including this one) you will have a detailed
|
||||
knowledge of the structures of the HPFS. Having said that, there is a place for
|
||||
some initial information for readers who currently know very little about the
|
||||
subject.
|
||||
|
||||
<P>
|
||||
<H2>File Systems</H2>
|
||||
|
||||
<P>
|
||||
A File System (FS) is a combination of hardware and software that
|
||||
enables the storage and retrieval of information on removable (floppy
|
||||
disk, tape, CD) and non-removable (HD) media. The File Allocation Table
|
||||
FS (FAT) is used by DOS. It is also built into OS/2. Now FAT appeared
|
||||
back in the days of DOS v1 in 1981 and was designed with a backward
|
||||
glance to CP/M. A hierarchical directory structure arrived with DOS v2
|
||||
to support the XT's 10 MB HD. OS/2 v1.x used straight FAT. OS/2 v2.x
|
||||
and later provide "Super FAT". This uses the same layout of
|
||||
information on the storage medium (e.g. a floppy written under OS/2 v2
|
||||
can easily be read by a DOS system) but adds performance improvements to
|
||||
the software used to transfer the data. Super FAT will be covered in
|
||||
Part 1.
|
||||
|
||||
<P>
|
||||
<H2>FAT</H2>
|
||||
|
||||
<P>
|
||||
Figure 1 shows the layout of a FAT volume. There are two copies of the
|
||||
FAT. These should be identical. This may seem like a safety feature
|
||||
but it only works in the case of physical corruption (if a bad sector
|
||||
develops in one of the sectors in a FAT, the other one is automatically
|
||||
used instead) not for logical corruption. So if the FS gets confused
|
||||
and the two copies are not the same there is no easy way to determine
|
||||
which copy is still O K.
|
||||
|
||||
<P>
|
||||
<IMG SRC="hpfs1.gif" WIDTH=498 HEIGHT=64>
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 1: The layout of a volume formatted with the FAT file system.
|
||||
Note: this diagram is not to scale. The data area is quite large in
|
||||
practice.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The root directory is made a fixed known size because the system files
|
||||
are placed immediately after it. The known location for the initial
|
||||
system files enables DOS or OS/2 to commence loading itself. (The boot
|
||||
record, which loads first off, is small and only has enough space for
|
||||
code to find the initial system files at a known location.) However
|
||||
this design decision also limits the number of files that can be listed
|
||||
in the root directory of a FAT volume.
|
||||
|
||||
<P>
|
||||
Entries in the root directory and in subdirectories are not ordered so
|
||||
searching for a particular file can take some time, particularly if
|
||||
there are many files in a directory.
|
||||
|
||||
<P>
|
||||
The FAT and the root directory are positioned at the beginning of the
|
||||
volume (on a disk this is typically on the outside). These entries are
|
||||
read often, particularly in a multitasking environment, requiring a lot
|
||||
of relatively slow (in CPU terms) head movement.
|
||||
|
||||
<P>
|
||||
<H2>How Files are Stored on a FAT Volume</H2>
|
||||
|
||||
<P>
|
||||
Files are stored on a FAT volume using the FS' minimum allocation unit,
|
||||
the cluster (1-64 sectors). A 32-byte directory entry only provides
|
||||
sufficient space for a 8.3 filename, file attributes, last alteration
|
||||
date/time, filesize and the starting cluster. See Figure 2.
|
||||
|
||||
<P>
|
||||
<IMG SRC="hpfs2.gif" WIDTH=388 HEIGHT=209>
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 2: The layout of the 32 bytes in a directory entry in a FAT
|
||||
system.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The corresponding initial cluster entry in the FAT then points to the
|
||||
next FAT entry for the second cluster of the file (assuming that the
|
||||
file was big enough) which in turn points to the next cluster and so on.
|
||||
FAT entries can be 16-bit (max. FFFFh) or 12-bit (max. FFFh) in size,
|
||||
with volumes less than 16 MB using the 12-bit scheme. FAT entries can
|
||||
be of four types:
|
||||
|
||||
<UL>
|
||||
<LI>Contain 0000h if the cluster is free (available);
|
||||
<LI>Contain the number of the next cluster in the chain;
|
||||
<LI>If this is the last cluster in the chain then the FAT entry will
|
||||
consist of a character which signifies the end of the chain (EOF);
|
||||
<LI>Another special character if the cluster of the disk is bad
|
||||
(unreliable).
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The FAT FS is prone to fragmentation (i.e. a file's clusters are not in
|
||||
one, contiguous chain) in a single-tasking environment because the FAT
|
||||
is searched sequentially for the next free entry in the FAT when a file
|
||||
is written, regardless of how much needs to be written. The situation
|
||||
is even worse in a multitasking environment because you can have more
|
||||
than one writing operation in progress at the same time. See Figures 3
|
||||
and 4 for an example of a fragmented file under FAT.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=391 HEIGHT=238 SRC="hpfs3.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 3: The layout of a contiguous file in the FAT.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=458 HEIGHT=232 SRC="hpfs4.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 4: An example of a fragmented file under FAT in three pieces.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The FAT FS uses a singly-linked scheme i.e. the FAT entry points only
|
||||
to the next cluster. If, for some reason, the chain is accidentally
|
||||
broken (the next cluster value is corrupted) then there is no
|
||||
information in the isolated next cluster to indicate what it was
|
||||
previously connected to. So the FAT FS, while relatively simple, is
|
||||
also rather vulnerable.
|
||||
|
||||
<P>
|
||||
FAT was designed in the days of small disk size and today it really
|
||||
shows its age. The maximum number of entries (clusters) in a 16-bit FAT
|
||||
is just under 64K (due to technical reasons, the actual maximum is
|
||||
65,518). Since we can't increase the number of clusters past this
|
||||
limit, a large volume requires the use of large cluster sizes. So, for
|
||||
example, a volume in the 1-2 GB range has 32 KB clusters. Now a cluster
|
||||
is the minimum allocation unit so a 1 byte file on such a volume would
|
||||
consume 32 KB of space, a 33 KB file would consume 64 KB and so on. A
|
||||
rough assumption you can make is that, on average, half a cluster of
|
||||
space is wasted per file. You can run CHKDSK on a FAT volume, note the
|
||||
total number of files and also the allocation unit size and then
|
||||
multiply these two figures together and divide the result by 2 to get
|
||||
some idea of the wastage. The situation is quite different with HPFS as
|
||||
you will see when you read Part 1.
|
||||
|
||||
<P>
|
||||
Finally, FAT under OS/2 supports Extended Attributes (EAs - up to 64 KB
|
||||
of extra information associated with a file), but since there is very
|
||||
little extra space in a 32-byte directory entry it is only possible to
|
||||
store a pointer into an external file with all EAs on a volume being
|
||||
stored in this file ("EA DATA. SF"). In general it is fair to state
|
||||
that EAs are tacked on to FAT. With HPFS the integration is much
|
||||
better. If the EA is small enough HPFS stores it completely within the
|
||||
file's FNODE (every file and directory has an FNODE). Otherwise EAs is
|
||||
stored outside the file but closely associated with it and usually
|
||||
situated physically close to the file for performance reasons. Some
|
||||
users have occasionally reported crosslinking of EAs under FAT. This
|
||||
can be quite a serious matter requiring reinstallation of the operating
|
||||
system. I've not heard of this occurring under HPFS. Note that the
|
||||
WorkPlace Shell relies heavily on EAs.
|
||||
|
||||
<P>
|
||||
<H2>HPFS</H2>
|
||||
|
||||
<P>
|
||||
HPFS is example of a class of file systems known as Installable File
|
||||
Systems (IFS). Other types of IFS include CD support (CDFS), Network
|
||||
File System (NFS), Toronto Virtual File System (TVFS - combines FS
|
||||
elements of VM, namely CMS search path, with elements of UNIX, namely
|
||||
symbolic link), EXT2-OS (read Linux EXT2FS partitions under OS/2) and
|
||||
HPFS386 (with IBM LAN Server Advanced).
|
||||
|
||||
<P>
|
||||
An IFS is installed at start-up time. The software to access the actual
|
||||
device is specified as a device driver (usually BASEDEV=xxxxx.DMD/.ADD)
|
||||
while a Dynamic Link Library (DLL) is load to control the format/layout
|
||||
of the data (with IFS=xxxxx.IFS). OS/2 can run more than one IFS at a
|
||||
time so you could, for example, copy from a CD to a HPFS volume in one
|
||||
session while reading a floppy disk (FAT) in another session.
|
||||
|
||||
<P>
|
||||
HPFS has many advantages over FAT: Long Filename (254 characters
|
||||
including spaces); excellent performance when directories containing
|
||||
many files; designed to be fault tolerant; fragmentation resistant;
|
||||
space efficient with large partitions; works well in a multitasking
|
||||
environment. These topics will be explored in the series.
|
||||
|
||||
<P>
|
||||
<H2>REXX</H2>
|
||||
|
||||
<P>
|
||||
One of the many benefits of using OS/2 is that it comes with REXX
|
||||
(providing you install it - it requires very little extra space). REXX
|
||||
is a surprisingly versatile and powerful scripting language and there
|
||||
are oodles of REXX programs and add-ons available, much of it for free.
|
||||
This series presents REXX programs that access HPFS structures and
|
||||
decode their contents.
|
||||
|
||||
<P>
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
<P>
|
||||
In this installment you have seen that the FAT FS has a number of
|
||||
problems related to its ancient origins. HPFS comes from a fresh design
|
||||
with one eye on likely advances in storage that would occur in the
|
||||
foreseeable future and the other eye on obtaining good performance. In
|
||||
the next installment we look at the many techniques HPFS uses to achieve
|
||||
its better performance.
|
||||
|
||||
</BODY>
|
||||
</HTML>
|
||||
800
study/sabre/os/files/FileSystems/HPFS/hpfs1.html
Normal file
@@ -0,0 +1,800 @@
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 1: Introduction</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
<P>
|
||||
This article originally appeared in the February 1996 issue of
|
||||
Significant Bits, the monthly magazine of the Brisbug PC User Group Inc.
|
||||
|
||||
<P>
|
||||
It is sad to think that most OS/2 users are not using HPFS. The main
|
||||
reason is that unless you own the commercial program Partition Magic,
|
||||
switching to HPFS involves a destructive reformat and that most users
|
||||
couldn't be bothered (at least initially). Another reason is user
|
||||
ignorance of the numerous technical advantages of using HPFS.
|
||||
|
||||
<P>
|
||||
This month we start a series that delves into the structures that make
|
||||
up OS/2's HPFS. It is very difficult to get any public information on
|
||||
it aside from what appeared in an article written by Ray Duncan in the
|
||||
September '89 issue of Microsoft Systems Journal, Vol 4 No 5. I suspect
|
||||
that the IBM-Microsoft marriage break-up that occurred in 1991 may have
|
||||
caused an embargo on further HPFS information. I've been searching
|
||||
books and the Internet for more than a year looking for information with
|
||||
very little success. You usually end up finding a superficial
|
||||
description without any detailed discussion of the internal layout of
|
||||
its structures.
|
||||
|
||||
<P>
|
||||
There are three commercial utilities that I've found very useful. SEDIT
|
||||
from the GammaTech Utilities v3 is a wonder. It decodes quite a bit of
|
||||
the information in HPFS' structures. HPFSINFO and HPFSVIEW from the
|
||||
Graham Utilities are also good. HPFSINFO lists information gleaned from
|
||||
HPFS' SuperBlock and SpareBlock sectors, while HPFSVIEW provides the
|
||||
best visual display I've seen of the layout of a HPFS partition. You
|
||||
can receive some information on a sector by clicking on it. HPFSVIEW is
|
||||
also freely available in the demo version of the Graham Utilities,
|
||||
GULITE.xxx. I've also written a REXX program to assist with
|
||||
cross-referencing locations between SEDIT & HPFSVIEW and to provide a
|
||||
convenient means of dumping a sector.
|
||||
|
||||
<P>
|
||||
Probably the most useful program around at the moment is freeware,
|
||||
FST03F.xxx (File System Tool) written by Eberhard Mattes. This provides
|
||||
lots of information and comes with source. Even if you aren't a C
|
||||
programmer (I'm not) you can learn much from its definition of
|
||||
structures. Unfortunately I wrote the first three instalments without
|
||||
seeing this information so that made the task more difficult.
|
||||
|
||||
<P>
|
||||
In the early stages I've had to employ a very laborious process in an
|
||||
attempt to learn more. I created the smallest OS/2 HPFS partition
|
||||
possible (1 MB). Then I created/altered a file or directory and
|
||||
compared the changes. Sometimes I knew where the changes would occur so
|
||||
I could just compare the two sectors but often I ended up comparing two
|
||||
1 MB image files looking for differences and then translated the location
|
||||
in the image into C/H/S (a physical address in Cylinder/Head/Sector
|
||||
format) or LSN (Logical Sector Number). While more information will be
|
||||
presented in this series than I've seen in the public domain, there are
|
||||
still things that I've been unable to decipher.
|
||||
|
||||
<P>
|
||||
<H2>The Win95 Fizzer</H2>
|
||||
|
||||
<P>
|
||||
For me, the most disappointing feature of Win 95 is the preservation of
|
||||
the FAT (File Allocation Table) system. It's now known as VFAT but
|
||||
aside from integrated 32-bit file and disk access, the structure on the
|
||||
disk is basically the same as DOS v4 (circa 1988). An ungainly method
|
||||
involving the volume label file attribute was used to graft long
|
||||
filename support onto the file system. These engineering compromises
|
||||
were made to most easily achieve backward compatibility. It's a pity
|
||||
because Microsoft has an excellent file system available in NT, namely
|
||||
NTFS. This file system is very robust although perhaps NTFS is overkill
|
||||
for the small user.
|
||||
|
||||
<P>
|
||||
The Program Manager graphical user interface (GUI) appeared in OS/2 v1.1
|
||||
in 1988. The sophisticated High-Performance File System came with OS/2
|
||||
v1.2 which was released way back in 1989! The powerful REXX scripting
|
||||
language showed up in OS/2 v1.3 (1991). And the largely
|
||||
object-orientated WPS (Work Place Shell) GUI appeared in 1992 in OS/2
|
||||
v2.0. So it is hardly surprising that experienced OS/2 users were not
|
||||
swept up in the general hysteria about Windows 95 being the latest and
|
||||
greatest.
|
||||
|
||||
<P>
|
||||
A positive aspect of the Win 95 craze has been that the minimum system
|
||||
requirement of 8 MB RAM, 486/33 makes a good platform for OS/2 Warp. So
|
||||
now the disgruntled Win 95 user will find switching OSs less daunting,
|
||||
at least from a hardware viewpoint.
|
||||
|
||||
<P>
|
||||
<H2>Dual Boot and Boot Manager</H2>
|
||||
|
||||
<P>
|
||||
I've never used Dual Boot because it seems so limiting. I've always
|
||||
reformatted and installed Boot manager so that I could select from up to
|
||||
four different Operating Systems, for example OS/2 v2.1, OS/2 Warp
|
||||
Connect (peer-to-peer networking with TCP/IP and Internet support), IBM
|
||||
DOS v7 and Linux.
|
||||
|
||||
<P>
|
||||
In previous OS/2 installations, I've left a small (50 MB) FAT partition
|
||||
that could be seen when I booted under either DOS or OS/2, while the
|
||||
rest of the HD space (910 MB) was formatted as HPFS. Recently I
|
||||
upgraded to Warp Connect and this time I dropped FAT and the separate
|
||||
DOS boot partition completely. This does not mean I am unable to run
|
||||
DOS programs. OS/2 has inbuilt IBM DOS v5 and you can install boot
|
||||
images of other versions of DOS, or even CP/M, for near instantaneous
|
||||
booting of these versions. There is no reason why you can't have
|
||||
multiple flavours of DOS running at the same time as you're running
|
||||
multiple OS/2 sessions. Furthermore DOS programs have no problems
|
||||
reading from, writing to or running programs on HPFS partitions even
|
||||
though the layout is nothing like FAT. It's all handled transparently
|
||||
by OS/2. But this does mean you have to have booted OS/2 first. HPFS
|
||||
is not visible if you use either Dual Boot or Boot Manager to boot
|
||||
directly to DOS, but there are a number of shareware programs around to
|
||||
allow read-access to HPFS drives from DOS.
|
||||
|
||||
<P>
|
||||
DOS uses the system BIOS to access the hard disk. This is limited to
|
||||
dealing with a HD that has no more than 1,024 cylinders due to 10 bits
|
||||
(2^10 = 1,024) being used in the BIOS for cylinder numbering. OS/2 uses
|
||||
the system BIOS at boot time but then completely replaces it in memory
|
||||
with a special Advanced BIOS. This means that the boot partition and,
|
||||
if you use it, Boot Manager's 1 MB partition, must be within the first
|
||||
1,024 cylinders. Once you've booted OS/2, however, you can access
|
||||
partitions on cylinders past the Cyl 1023 point (counting from zero)
|
||||
without having to worry about LBA (Logical Block Addressing) translation
|
||||
schemes.
|
||||
|
||||
<P>
|
||||
Now this can still catch you out if you boot DOS. On my old system I'd
|
||||
sometimes use Boot Manager to boot a native DOS version. I'd load AMOS
|
||||
(a shareware program) to see the HPFS drives. I thought there must have
|
||||
been a bug in AMOS because I could only see half of F: and none of G:
|
||||
until I realised that these partitions were situated on a third HD that
|
||||
had 1,335 cylinders. So this was just the effect of DOS' 1,024 cylinder
|
||||
limitation which the AMOS program was unable to circumvent.
|
||||
|
||||
<P>
|
||||
<H2>Differences between an Easy and an Advanced Installation</H2>
|
||||
|
||||
<P>
|
||||
Most new OS/2 users select the "Easy Installation" option. This is
|
||||
satisfactory but it only utilises FAT, installs OS/2 on the same drive
|
||||
as DOS and Windows, does not reformat the partition and Dual Boot is
|
||||
installed.
|
||||
|
||||
<P>
|
||||
If you know what you're doing or are more aggressive in wanting to take
|
||||
advantage of what OS/2 can provide then the "Advanced Installation"
|
||||
option is for you. Selecting it enables you to selectively install
|
||||
parts of OS/2, install OS/2 in a primary or logical (extended) partition
|
||||
other than C: or even on a 2nd HD (I don't know whether you can install
|
||||
on higher physical drives than the 2nd one in a SCSI multi-drive setup);
|
||||
the option of installing Boot Manager is provided; you can use HPFS if
|
||||
you wish; installation can occur on a blank HD.
|
||||
|
||||
<P>
|
||||
<H2>FAT vs HPFS: If Something Goes Wrong</H2>
|
||||
|
||||
<P>
|
||||
CHKDSK on a HPFS partition can recover from much more severe faults than
|
||||
it can on a FAT system. This is because the cluster linkages in a FAT
|
||||
system are one-way, pointing to the next cluster in the chain. If the
|
||||
link is broken it is usually impossible to work out where the lost
|
||||
clusters ("x lost clusters in y chains") should be reattached. Often
|
||||
they are just artifacts of a program's use of temporary files that
|
||||
haven't been cleaned up properly. But "file truncated" and
|
||||
"cross-linked files" messages are usually an indication of more serious
|
||||
FAT problems.
|
||||
|
||||
<P>
|
||||
HPFS uses double linking: the allocation block of a directory or file
|
||||
points back to its predecessor ("parent") as well as to the next element
|
||||
("child"). Moreover, major structures contain dword (32-bit) signatures
|
||||
identifying their role and each file/directory's FNODE contains the
|
||||
first 15 characters of its name. So blind scanning can be performed by
|
||||
CHKDSK or other utilities to rebuild much of the system after a
|
||||
significant problem.
|
||||
|
||||
<P>
|
||||
As a personal comment, I've been using HPFS since April, 1993, and I've
|
||||
yet to experience any serious file system problems. I've had many OS/2
|
||||
lockups while downloading with a DOS comms program and until recently
|
||||
I was running a 4 MB hardware disk cache with delayed writes, yet,
|
||||
aside from the lost download file, the file system has not been
|
||||
permanently corrupted.
|
||||
|
||||
<P>
|
||||
<H2>Warp, FORMAT /FS:HPFS, CHKDSK /F:3 and The Lazarus Effect</H2>
|
||||
|
||||
<P>
|
||||
Warp, by default, does a quick format when you format a HD under either
|
||||
FAT or HPFS. So FORMAT /FS:HPFS x:, which is what the installation
|
||||
program performs if you decide to format the disk with HPFS, is
|
||||
performed very quickly. It's almost instantaneous if you decide to
|
||||
reformat with FAT (/FS:FAT). Now this speed differential does not mean
|
||||
that FAT is much quicker, only that FORMAT has very little work to
|
||||
perform during a quick FAT reformat since the FAT structures are so
|
||||
simple compared to HPFS.
|
||||
|
||||
<P>
|
||||
As mentioned earlier, CHKDSK has extended recovery abilities when
|
||||
dealing with HPFS. It has four levels of /F:n checking/recovery. These
|
||||
will be considered in greater detail in a later article in this series
|
||||
when we look at fault tolerance. The default of CHKDSK /F is equivalent
|
||||
to using /F:2. If you decide to use /F:3 then CHKDSK will dig deep and
|
||||
recover information that existed on the partition prior to the
|
||||
reformatting providing that it was previously formatted as HPFS. Using
|
||||
CHKDSK /F:3 after performing a quick format on a partition that was
|
||||
previously FAT but is now HPFS will not cause this, since none of the
|
||||
previous data has HPFS signature words embedded at the beginning of its
|
||||
sectors. However, if you ever use /F:3 after quickly reformatting a
|
||||
HPFS partition you could end up with a bit of a mess since everything
|
||||
would be recovered that existed on the old partition and which hadn't
|
||||
been overwritten by the current contents.
|
||||
|
||||
<P>
|
||||
To guard against this, OS/2 stores whether or not a quick format has
|
||||
been performed on a HPFS partition in bit 5 (counting from zero) of byte
|
||||
08h in LSN (Logical Sector Number) 17, the SpareBlock sector. This
|
||||
particular byte is known as the Partition Status byte, with 20h
|
||||
indicating that a quick format was performed. Bit 0 of this byte is
|
||||
also used to indicate whether the partition is "clean" or "dirty" so 21h
|
||||
indicates that the partition was quick formatted and is currently
|
||||
"dirty" (these concepts will be covered in a later instalment).
|
||||
|
||||
<P>
|
||||
If you attempt to perform a CHKDSK /F:3 on a quick-formatted partition,
|
||||
you will receive the following warning:
|
||||
|
||||
<PRE>
|
||||
SYS0641: Using CHKDSK /F:3 on this drive may cause files that existed
|
||||
before the last FORMAT to be recovered. Proceed with CHKDSK (Y/N)?
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
If you type "HELP 641" for further information you'll see:
|
||||
|
||||
<PRE>
|
||||
EXPLANATION: The target drive was formatted in "fast format" mode,
|
||||
which does not erase all data areas. CHKDSK /F:3 searches data areas
|
||||
for "lost" files. If a file existed on this drive before the last
|
||||
format, CHKDSK may find it, and attempt to recover it.
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
ACTION: Use CHKDSK /F:2 to check this drive. If you use /F:3, be aware
|
||||
that files recovered to the FOUND directories may be old files. Also,
|
||||
if you format a drive using FORMAT /L, FORMAT will completely erase all
|
||||
old files, and avoid this warning.
|
||||
|
||||
<P>
|
||||
It seems a pity to forego the power of the CHKDSK /F:3 in the future.
|
||||
As is suggested, FORMAT /L (for "Long" I presume) will completely
|
||||
obliterate the prior partition's contents, but you can't specify this
|
||||
during a reinstall. To perform it you need to use FORMAT /L on the
|
||||
partition before reinstalling. For this to be practical you will
|
||||
probably need to keep OS/2 and nothing else on a separate partition and
|
||||
to have a recent tape backup of the remaining volumes' contents. Note:
|
||||
in my opinion keeping OS/2 on a separate partition is the best way of
|
||||
laying out a system but make sure you leave enough room for things like
|
||||
extra postscript fonts and programs that insist on putting things on C:.
|
||||
|
||||
<P>
|
||||
<H2>Capacity</H2>
|
||||
|
||||
<P>
|
||||
Figure 1 shows a table comparing the capacity of OS/2's FAT and HPFS
|
||||
file systems. The difference in the logical drive numbers arises due to
|
||||
A: and B: being assigned to floppies which are always FAT. It would
|
||||
be ridiculous to put a complex, relatively large file system, which was
|
||||
designed to overcome FAT's limitations with big partitions, on volumes
|
||||
as small as current FDs.
|
||||
|
||||
<PRE>
|
||||
FAT HPFS
|
||||
|
||||
Logical drives 26 24
|
||||
Num of Partitions 16 16
|
||||
Max Partition Size 2 GB 64 GB
|
||||
Max File Size 2 GB 2 GB
|
||||
Sector Size 512 bytes 512 bytes
|
||||
Cluster/Block Size 0.5 KB-32 K 512 bytes
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig.1 Comparing the capacity of FAT and HPFS
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The next point of interest is the much greater partition size supported by HPFS.
|
||||
HPFS has a maximum possible partition size of about 2,200 GB (2^21 sectors) but
|
||||
is restricted in the current implementation to 64 GB. (Note: older references
|
||||
state that the maximum is 512 GB.) I don't know what imposes this limitation.
|
||||
Note: the effective limitation on partition size is currently around 8 GB.
|
||||
This is due to CHKDSK's inability to handle a larger partition. I presume this
|
||||
limitation will be rectified soon as ultra large HDs will become common in the
|
||||
next year or two.
|
||||
|
||||
<P>
|
||||
The 2 GB maximum filesize limit is common to DOS, OS/2 and 32-bit Unix. A
|
||||
32-bit file size should be able to span a range of 4 GB (2^32) but the
|
||||
DosSetFilePtr API function requires that the highest bit be used for indicating
|
||||
sign (forward or backward direction of movement), leaving 31 for size.
|
||||
|
||||
<P>
|
||||
The cluster size on a 1.4 MB FD is 512 bytes. For a 100 MB HD formatted
|
||||
with FAT it is 2 KB. Due to the relatively small 64K (2^16) limit on
|
||||
cluster numbering, as FAT partitions get bigger the size of clusters
|
||||
must also increase. So for a 1-2 GB partition you end up with whopping
|
||||
32 KB clusters. Since the average wastage of HD space due to the
|
||||
cluster size is half a cluster per file, storing 10,000 files on such a
|
||||
partition will typically waste 160 MB (10,000 * 32 KB / 2).
|
||||
|
||||
<P>
|
||||
HPFS has no such limitation. File space is allocated in sector-sized
|
||||
blocks unlike the FAT system. A FNODE sector is also always associated
|
||||
with each file. So for 10,000 files, the wastage due to sector size is
|
||||
typically 2.5 MB (10,000 * 512 / 2) for the files themselves + 5 MB
|
||||
consumed by the file's FNODEs = 7.5 MB. And this overhead is constant
|
||||
whether the HPFS partition is 10 MB or 100 GB.
|
||||
|
||||
<P>
|
||||
This must be balanced against the diskspace consumed by HPFS. Since
|
||||
HPFS is a sophisticated file system that is designed to accomplish a lot
|
||||
more than FAT, it correspondingly requires more diskspace than FAT.
|
||||
Figure 2 illustrates this. You may think that 10 MB for the file system
|
||||
is too much for a 1,000 MB partition but you should consider this as a
|
||||
percentage.
|
||||
|
||||
<PRE>
|
||||
System Usage including Disk Space available Allocation Unit
|
||||
MBR track to user + Fnode for HPFS
|
||||
FAT/HPFS in KB FAT/HPFS in % FAT/HPFS in KB
|
||||
|
||||
10 MB 44/415 99.57/95.95 4/0.5+0.5
|
||||
|
||||
100 MB 76/3,195 99.77/96.88 2/0.5+0.5
|
||||
|
||||
1000 MB 289(est)/10,430 99.98(est)/98.98 16/0.5+0.5
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig. 2: Space used by FAT and HPFS on different volumes
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Furthermore, once cluster size wastage is also considered, then the
|
||||
break-even point (as regards diskspace) for a 1,000 MB partition is
|
||||
about 2,200 files which isn't very many files. This is based on a 16 KB
|
||||
cluster size. In the 1,024-2,047 MB partition size range the cluster
|
||||
size increases to 32 KB so the "crossover" point shifts to only 1,100
|
||||
files.
|
||||
|
||||
<P>
|
||||
I had to calculate the 1,000 MB FAT partition values since OS/2 wouldn't
|
||||
let me have a FAT partition situated in the greater than Cyl 1023
|
||||
region. The 4 KB cluster size of the 10 MB partition is not a misprint.
|
||||
Below 16 MB, a 12-bit FAT scheme (1.5 bytes in the FAT representing 1
|
||||
cluster) is used instead of a 16-bit one.
|
||||
|
||||
<P>
|
||||
<H2>Directory Search Speed</H2>
|
||||
|
||||
<P>
|
||||
Consider an extreme case: FAT system on a full partition which has a
|
||||
maximum-sized FAT (64K entries - this is the maximum number of files a
|
||||
FAT disk can hold). The size of such a partition would be 128 MB, 256
|
||||
MB, 512 MB, 1 GB or 2 GB, depending on cluster size. Each FAT is 128 KB
|
||||
in size. (There is a second FAT which mirrors the first.) In this
|
||||
example all the files are in one subdirectory. This can't be in the
|
||||
root directory because it only has space for 512 entries. (With HPFS
|
||||
you can have as many files as you want in the root directory.) 64 K of
|
||||
entries in a FAT directory requires 2 MB of diskspace (64K * 32
|
||||
bytes/directory entry). To find a file, on average, 32 K directory
|
||||
entries would need to be searched. To say that a file was not on the
|
||||
disk, the full 64 K entries must be scanned before the "File not found"
|
||||
message was shown. The same figures would apply in you were using a
|
||||
file-finding utility to look for a file in 1,024 directories, each
|
||||
containing 63 files (the subdirectory entry also consumes space).
|
||||
|
||||
<P>
|
||||
If the directory entries were always sorted, the situation would greatly
|
||||
improve. Assuming you had a quick means of getting to the file in the
|
||||
sorted sequence, if it's the file you're looking for then you've found
|
||||
its directory entry (and thus its starting cluster's address). If a
|
||||
file greater in the sequence than the required file is found instead
|
||||
then you immediately know that the file does not exist.
|
||||
|
||||
<P>
|
||||
HPFS stores directory files in a balanced multi-branch tree structure
|
||||
(B-tree) which is always sorted due to the way the branches are
|
||||
assigned. This can lead to some extra HD activity, caused by adjustment
|
||||
of the tree structure, when a new file is added or a file is renamed.
|
||||
This is done to keep the tree balanced i.e. the total length of each
|
||||
branch from the root to the leaves is the same. The extra work when
|
||||
writing to the disk is hidden from the user by the use of "lazy writes"
|
||||
(delayed write caching).
|
||||
|
||||
<P>
|
||||
HPFS directory entries are stored in contiguous directory blocks of four
|
||||
sectors i.e. 2 KB known as DIRBLKs. A lot of information is stored in
|
||||
each variable-length (unlike FAT) file entry in a DIRBLK structure,
|
||||
namely:
|
||||
|
||||
<UL>
|
||||
<LI>The length of the entry;
|
||||
<LI>File attributes;
|
||||
<LI>A pointer to the HPFS structure (FNODE; usually just before the
|
||||
first sector of a file) that describes the sector disposition of the
|
||||
file;
|
||||
<LI>Three different date/time stamps (Created, Last Accessed, Last
|
||||
Modified);
|
||||
<LI>Usage count. Although mentioned in the 1989 document, this has not
|
||||
have been implemented;
|
||||
<LI>The length of the name (up to 254 characters);
|
||||
<LI>A B-tree pointer to the next level of the tree structure if there
|
||||
are any further levels. The pointer will be to another directory
|
||||
block if the directory entries are too numerous to fit in one 2 KB
|
||||
block;
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
At the end of the sector there is extra ("flex") space available for
|
||||
special purposes.
|
||||
|
||||
<P>
|
||||
If the average size of the filenames is 10-13 characters, then a
|
||||
directory block can store 44 of them (11 entries/sector). A two-level
|
||||
B-tree arrangement can store 1,980 entries (1 * 44-entry directory root
|
||||
block + 44 directory leaf blocks * 44 entries/block) while a three-level
|
||||
structure could accommodate 87,164 files (the number of files in the
|
||||
two-level tree + 1,936 third-level directory leaf blocks * 44
|
||||
entries/block). So the 64 K of directory entries in our example can be
|
||||
searched in a maximum of 3 "hits" (disk accesses). The term "maximum"
|
||||
was used because it depends on what level the filename in question is
|
||||
stored in the B-tree structure and what's in the disk cache.
|
||||
|
||||
<P>
|
||||
Adding files to a directory containing many files (say 500+) under FAT
|
||||
becomes an exasperating affair. I've often experienced this because a
|
||||
DOS program we've installed on hundreds of our customer's machines has
|
||||
648 files in a sub-sub-subdirectory. Watching the archive unpack on a
|
||||
machine without disk caching is bad news and it still slows down
|
||||
noticeably on machines with large SMARTDRIVE caches.
|
||||
|
||||
<P>
|
||||
Figure 3 shows a simple REXX program you can create to investigate this
|
||||
phenomenon while Figure 4 tables some results. The program creates a
|
||||
large number of zero-length files in a directory. Perform this test in
|
||||
a subdirectory to overcome FAT's restriction on a maximum of 512 entries
|
||||
in the root directory. Reformating and rebooting was performed before
|
||||
each test to ensure consistent conditions. With both FAT and HPFS, a
|
||||
1,536 KB lazy-writing cache with a maximum cacheable read/write size of
|
||||
8 KB was used. Note 1: with HPFS, a "zero-length" file consumes
|
||||
diskspace because there is always a FNODE sector associated with a
|
||||
file/directory, regardless of the file's contents. So 1,000 empty files
|
||||
consume 500 KB of space. Note 2: there is a timing slop of about 0.1
|
||||
seconds due to the 55 msec timer tick uncertainty affecting both the
|
||||
start time and stop time values.
|
||||
|
||||
<PRE>
|
||||
/* Create or open a large number of empty files in a directory */
|
||||
CALL Time 'R' /* Reset timer */
|
||||
|
||||
DO x = 1 TO 1000
|
||||
CALL STREAM 'file'||x, 'c', 'open' /* Will create if not exist */
|
||||
CALL STREAM 'file'||x, 'c', 'close'
|
||||
END
|
||||
|
||||
SAY Time('E') /* Report elapsed time */
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig 3: A REXX program to assess the directory searching and file
|
||||
creation speeds of FAT and HPFS.
|
||||
</FONT>
|
||||
|
||||
<PRE>
|
||||
Number of Files in a Directory
|
||||
|
||||
125 250 500 1000 2000 4000 4001
|
||||
->4100
|
||||
|
||||
FAT 1.7 3.4 8.0 23.4 99.4 468.4 26.6
|
||||
FAT (LW) 0.7 1.7 5.1 17.9 89.6 447.3 26.1
|
||||
|
||||
HPFS 7.4 14.7 30.7 62.9 129.0 262.6 7.5
|
||||
HPFS (LW) 0.5 1.0 2.2 4.5 9.0 18.3 0.5
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig 4: Timing results of the program in Figure 3. The beneficial effect
|
||||
of lazy writing on performance is clearly demonstrated. Tests were
|
||||
performed in an initially empty subdirectory except for the last one
|
||||
which adds 100 new files to a subdirectory already containing 4,000
|
||||
files.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
To investigate further, the full data set was plotted on a graph with
|
||||
logarithmic axes. Examine Figure 5. As you can see, HPFS' performance
|
||||
is reasonably linear (in y = a*x^b + c, b was actually 1.1) while FAT's
|
||||
performance appears to follow a third-order polynomial (y = a*x^3 +
|
||||
b*x^2 + c*x + d). It is apparent that FAT's write caching becomes less
|
||||
effective when many files are in a directory presumably because much
|
||||
time is being spent sifting through the FAT in memory. (Disk access was
|
||||
only occurring briefly about once a second based on the flashing of the
|
||||
HD light). HPFS' performance was dramatically improved in this test by
|
||||
the use of write caching. Again, disk access was about once a second
|
||||
(due to CACHE's /MAXAGE:1000 parameter). While, typically, most disk
|
||||
access will involve reading rather than writing, this graph shows how
|
||||
effective lazy writing is at hiding the extra work from the user. It is
|
||||
also apparent that HPFS handles large numbers of files well. We now
|
||||
turn to examining how this improvement is achieved.
|
||||
|
||||
<P>
|
||||
<A HREF="fig5.gif">
|
||||
<IMG WIDTH=100 HEIGHT=57 SRC="fig5_small.gif"></A>
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Fig. 5: Log-log graph comparing file system performance creating test
|
||||
files in a subdirectory. Extra data points shown. Number of files was
|
||||
increased using a cube-root-of-2 multiple. (Click for large version.)
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<H2>Directory Location and Fragmentation</H2>
|
||||
|
||||
<P>
|
||||
Subdirectories on a FAT disk are usually splattered all around it.
|
||||
Similarly, entries in a subdirectory may not all be in contiguous
|
||||
sectors on the disk. Searching a FAT system's directory structure can
|
||||
involve a large amount of HD seeking back and forth, i.e. more time.
|
||||
Sure, you can use a defragger option to move all the directories to the
|
||||
front of the disk, but this usually takes a lot of time to reshuffle
|
||||
everything and the next time you create a new subdirectory or add files
|
||||
to an existing subdirectory there will be no free space up the front so
|
||||
directory separation and fragmentation will occur again.
|
||||
|
||||
<P>
|
||||
HPFS takes a much better approach. On typical partitions (i.e. not
|
||||
very small ones) a directory band, containing many DIRBLKs, is placed at
|
||||
or near the seek centre (half the maximum cylinder number). On a 100 MB
|
||||
test partition the directory band starts at Cyl 48 (counting from 0) of
|
||||
a volume that spans 100 cylinders. Here 1,980 contiguous Directory
|
||||
sectors (just under 1 MB) were situated. Assuming 11 entries per
|
||||
Directory sector (44 entries per DIRBLK), this means that the first
|
||||
21,780 directory entries will be next to each other. So if a blind file
|
||||
search needs to be performed this can be done with just 1 or 2 long disk
|
||||
reads (assuming <20,000 files and 1-2 MB disk cache). The maximum
|
||||
size of the contiguous directory band appears to be 8,000 KB for about
|
||||
176,000 entries with 13-character names. Once the directory band is
|
||||
completely full new Directory sectors are scattered throughout the
|
||||
partition but still in four-sector DIRBLKs.
|
||||
|
||||
<P>
|
||||
Another important aspect of HPFS' directory band is its location. By
|
||||
being situated near the seek centre rather than at the very beginning
|
||||
(as in FAT), the average distance that the heads must traverse, when
|
||||
moving between files and directories, is halved. Of course, with lazy
|
||||
writing, traversals to frequently update a directory entry while writing
|
||||
to a temporary file, would be much reduced anyway.
|
||||
|
||||
<P>
|
||||
<H2>File Location and Fragmentation</H2>
|
||||
|
||||
<P>
|
||||
HPFS expends a lot of effort to keep a file either in one piece if
|
||||
possible or otherwise within a minimum number of pieces and close
|
||||
together on the disk so it can be retrieved in the minimum number of
|
||||
reads (remembering also that cache read-ahead can take in more than one
|
||||
nearby piece in the same read). Also, the seek distance, and hence time
|
||||
required to access extra pieces, is kept to an absolute minimum. The
|
||||
main design philosophy of HPFS is that mechanical head movement is a
|
||||
very time-consuming operation in CPU terms. So it is worthwhile doing
|
||||
more work looking for a good spot on the disk to place the file. There
|
||||
are many aspects to this and I'm sure there are plenty of nuances of
|
||||
which I'm ignorant.
|
||||
|
||||
<P>
|
||||
Files are stored in 8 MB contiguous runs of sectors known as data bands.
|
||||
Each data band has a four-sector (2 KB) freespace bitmap situated at
|
||||
either the band's beginning or end. Consecutive data bands have
|
||||
tail-to-head placement of the freespace bitmaps so that maximum
|
||||
contiguous filespace is 16 MB (actually 16,380 KB due to the presence of
|
||||
the bitmaps within the adjoining band). See Figure 6.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=403 HEIGHT=213 SRC="fig6.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Fig. 6: The basic data layout of an HPFS volume
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Near the start of the partition there is a list of the sectors where
|
||||
each of the freespace bitmaps commences. I'm sure that this small list
|
||||
would be kept loaded into memory for performance reasons. Having two
|
||||
small back-to-back bitmaps adjoining a combined 16 MB data band is
|
||||
advantageous when HPFS is looking for the size of each freespace region
|
||||
within bands, prior to allocating a large file. But it does mean that a
|
||||
fair number of seeks to different bitmaps might need to be performed on
|
||||
a well-filled disk, in search of a contiguous space. Or perhaps these
|
||||
bitmaps are also kept memory resident if the disk is not too big.
|
||||
|
||||
<P>
|
||||
A 2 GB file would be split into approximately 128 chunks of 16 MB, but
|
||||
these chunks are right after each other (allowing for the presence of
|
||||
the intervening 4 KB of back-to-back freespace bitmaps). So to refer to
|
||||
this file as "fragmented", while technically correct, would be
|
||||
misleading.
|
||||
|
||||
<P>
|
||||
As mentioned earlier, every file has an associated FNODE, usually right
|
||||
before the start of the file. The number of pieces a file is stored in
|
||||
are referred to as extents. A "zero-length" file has 0 extents; a
|
||||
contiguous file has 1 extent; a file of 2-8 extents is "nearly"
|
||||
contiguous (the extents should be close together).
|
||||
|
||||
<P>
|
||||
An FNODE sector contains:
|
||||
|
||||
<UL>
|
||||
<LI>The real filename length;
|
||||
<LI>The first 15 characters of the filename;
|
||||
<LI>Pointer to the directory LSN that contains this file;
|
||||
<LI>EAs (Extended Attributes) are completely stored within the FNODE
|
||||
structure if the total of the EAs is 145 bytes or less;
|
||||
<LI>0-8 contiguous sector runs (extents), organised as eight LSN
|
||||
run-starting-points (dword), run lengths (dword) and offsets into
|
||||
the file (dword).
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
A run can be up to 16 MB (back-to-back data bands) in size. If the file
|
||||
is too big or more fragmented than can be described in 8 extents, then
|
||||
an ALNODE (allocation block) is pointed to from the FNODE. In this case
|
||||
the FNODE structure changes so that it now contains up to 12 ALNODE
|
||||
pointers within the FNODE and each ALNODE can then point to either 40
|
||||
direct sector runs (extents) or to 60 further ALNODEs, and each of these
|
||||
lower-level ALNODEs could point to either... and so on.
|
||||
|
||||
<P>
|
||||
If ALNODEs are involved then a modified balanced tree structure called a
|
||||
B+tree is used with the file's FNODE forming the root of the structure.
|
||||
So only a two-level B+tree would be required to completely describe a 2
|
||||
GB (or smaller) file if it consists of less than 480 runs (12 ALNODEs *
|
||||
40 direct runs described in each ALNODE). Otherwise a 3-level structure
|
||||
would have no problems since it can handle up to 28,800 runs (12 ALNODEs
|
||||
* 60 further ALNODEs * 40 direct runs). It's difficult to imagine a
|
||||
situation where a four or higher level B+tree would ever be needed.
|
||||
|
||||
<P>
|
||||
Consider how much disk activity would be required to work out the layout
|
||||
of a 2 GB file under FAT and under HPFS. With FAT the full 128 KB of
|
||||
the FAT must be read to determine the file's layout. If this layout can
|
||||
be kept in the cache during the file access then fine. Otherwise the
|
||||
FAT would need to be reread one or more times (probably starting from
|
||||
the beginning on each reread). With HPFS, up to 361 sector reads, in a
|
||||
three-level B+tree structure, and possibly up to just 13 sector reads,
|
||||
in a two-level structure, would provide the information. The HPFS
|
||||
figures are maximums and the actual sector-read figure would most
|
||||
probably be much lower since HPFS was trying hard to reduce the number
|
||||
of runs when the file was written. Also if the ALNODEs are near each
|
||||
other then read-ahead would reduce the actual hits. Furthermore, OS/2
|
||||
will keep the file's allocation information resident in memory while the
|
||||
file is open, so no rereads would be needed.
|
||||
|
||||
<P>
|
||||
If you've ever looked at the layout of files on a HPFS partition, you
|
||||
may have been shocked to see the large gaps in the disk usage. This is
|
||||
FAT-coloured thinking. There are good reasons not to use the first
|
||||
available spot next to an existing file, particularly in a multitasking
|
||||
environment where more than one write operation can be occurring
|
||||
concurrently. HPFS uses three strategies here that I'm aware of.
|
||||
First, the destination of write operations involving new files will tend
|
||||
not to be near (preferably in a different band from) where an existing
|
||||
file is also being updated. Otherwise, fragmentation would be highly
|
||||
likely to occur.
|
||||
|
||||
<P>
|
||||
Second, 4 KB of extra space is allocated by the file system to the end
|
||||
of a file when it is created. Again the reason is to reduce the
|
||||
likelihood of fragmentation from other concurrent writing tasks.
|
||||
If not utilised, this space is recovered afterwards. To test this
|
||||
assertion, create the REXX cmdfile shown in Figure 7 and run it on an
|
||||
empty HPFS partition. (You can also do this on a partition with files
|
||||
in it but it is easier on an empty one.) Run it and when the "Press any
|
||||
key" message appears start up another OS/2 session and run CHKDSK (no
|
||||
switches) on the partition under examination. CHKDSK will get confused
|
||||
about the space allotted to the file open in the other session and will
|
||||
say it is correcting an allocation error (which it really isn't doing
|
||||
because you did not use the /F switch). Ignore this and notice that "4
|
||||
kilobytes are in 1 user files". Switch back to the other session and
|
||||
press Enter to close the file. Repeat and again run CHKDSK in the other
|
||||
session. Notice this time that no extra space is allocated since the
|
||||
file is being reopened rather than being created.
|
||||
|
||||
<PRE>
|
||||
/* Test to check the space
|
||||
preallocated to an open file */
|
||||
|
||||
CALL STREAM 'zerofile', 'c', 'open'
|
||||
/* Will create if it does not exist */
|
||||
'@pause'
|
||||
CALL STREAM 'zerofile', 'c', 'close'
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig. 7: A simple REXX program to demonstrate how HPFS allocates 4 KB of
|
||||
diskspace to a new file.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Third, if a program has been written to report the likely filesize to
|
||||
OS/2, or if you are copying an existing file (i.e. the final filesize
|
||||
is known) then HPFS will expend a great deal of effort to find a free
|
||||
space big enough to accommodate the file in one extent. If that is not
|
||||
possible then it looks for two free spaces half the size of the file and
|
||||
so on. Again this can result in two files in a directory not being next
|
||||
to each other on the disk.
|
||||
|
||||
<P>
|
||||
Since DOS and Windows programs are not written with preallocation space
|
||||
requesting in mind, they tend to be more likely candidates for
|
||||
fragmentation than properly written OS/2 programs. So, for example,
|
||||
using a DOS comms program to download a large file will often result in
|
||||
a fragmented file. Compared with FAT, though, fragmentation on heavily
|
||||
used HPFS volumes is very low, usually less than 1%. We'll consider
|
||||
fragmentation levels in more depth in Part 3.
|
||||
|
||||
<P>
|
||||
<H2>Other Matters</H2>
|
||||
|
||||
<P>
|
||||
It has also been written that the HPFS cache is smart enough to adjust
|
||||
the value of its sector read-ahead for each opened file based on the
|
||||
file's usage history or its type (Ray Duncan, 1989). It is claimed that
|
||||
EXE files and files that typically have been fully read in the past are
|
||||
given big read-aheads when next loaded. This is a fascinating concept
|
||||
but unfortunately it has not been implemented.
|
||||
|
||||
<P>
|
||||
Surprisingly, like other device drivers, HPFS is still 16-bit code. I
|
||||
think this is one of the few remaining areas of 16-bit code in Warp. I
|
||||
believe IBM's argument is that 32-bit code here would not help
|
||||
performance much as mechanical factors are the ones imposing the limits,
|
||||
at least in typical single-user scenarios.
|
||||
|
||||
<P>
|
||||
HPFS is run as a ring 3 task in the 80x86 processor protection mechanism
|
||||
i.e. at the application level. HPFS386 is a 32-bit version of HPFS
|
||||
that comes only with IBM LAN SERVER Advanced Version. HPFS386 runs in
|
||||
ring 0, i.e. at kernel level. This ensures the highest file system
|
||||
performance in demanding network situations. It can also provide much
|
||||
bigger caches than standard HPFS which is limited to 2 MB. There is a
|
||||
chance that this version will appear in a later release of Warp.
|
||||
|
||||
<P>
|
||||
OS/2 v2.x onwards also boosts the performance of FAT. This improvement,
|
||||
called "Super FAT", is a combination of 32-bit executable code and the
|
||||
mirroring of the FAT and directory paths in RAM. This requires a fair
|
||||
bit of memory. Also Super FAT speeds the search for free space by
|
||||
representing in memory in a bitmap used sectors in the FAT. This does
|
||||
help the performance but I think the results in Figure 4, which were
|
||||
performed using the Super FAT system, still highlight FAT's
|
||||
architectural weaknesses.
|
||||
|
||||
<P>
|
||||
You can easily tell whether a partition is formatted under HPFS or FAT. Just
|
||||
run DIR in the root directory. If "." and ".." directory entries are shown
|
||||
then HPFS is used [Unless the HPFS partition was formatted under Warp 4 -- Ed].
|
||||
|
||||
<P>
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
<P>
|
||||
HPFS does require 300-400 KB of memory to implement, so it's only
|
||||
suitable for OS/2 v2.1 systems with at least 12 MB or Warp systems with
|
||||
at least 8 MB. For partitions of 100 MB+ it offers definite technical
|
||||
advantages over FAT. By now you should have developed an understanding
|
||||
of how these improvements are achieved.
|
||||
|
||||
<P>
|
||||
In the next installment, we look at a shareware program to visually
|
||||
inspect the layout of a HPFS partition and a REXX program to dump the
|
||||
contents of a disk sector by specifying either decimal LSN, hexadecimal
|
||||
LSN, dword byte-order-reversed hexadecimal LSN (what you see when you
|
||||
look at a dword pointer in a hex dump) or Cyl/Hd/Sec coordinates. Other
|
||||
REXX programs will convert the data stored in the SuperBlock and the
|
||||
SpareBlock sectors into intelligible values. You should find it quite
|
||||
informative.
|
||||
1171
study/sabre/os/files/FileSystems/HPFS/hpfs2.html
Normal file
804
study/sabre/os/files/FileSystems/HPFS/hpfs3.html
Normal file
@@ -0,0 +1,804 @@
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 3: Fragmentation, Diskspace Bitmaps and Code Pages</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
<P>
|
||||
This article originally appeared in the May 1996 issue of Significant
|
||||
Bits, the monthly magazine of the Brisbug PC User Group Inc.
|
||||
|
||||
<P>
|
||||
This month we look at how HPFS knows which sectors are occupied and which ones
|
||||
are free. We examine the amount of file fragmentation on five HPFS volumes and
|
||||
also check out the fragmentation of free space. A program will be presented to
|
||||
show free runs and some other details. Finally, we'll briefly discuss Code
|
||||
Pages and look at a program to display their contents.
|
||||
|
||||
<P>
|
||||
<H2>How Sectors are Mapped on a HPFS Volume</H2>
|
||||
|
||||
<P>
|
||||
The sector usage on a HPFS partition is mapped in data band bitmap blocks.
|
||||
These blocks are 2 KB in size (four sectors) and are usually situated at either
|
||||
the beginning or end of a data band. A data band is almost 8 MB. (Actually
|
||||
8,190 KB since 2 KB is needed for its bitmap.) See Figure 1. The state of each
|
||||
bit in the block indicates whether or not a sector (HPFS' allocation unit) is
|
||||
occupied. If a bit is set (1) then its corresponding sector is free. If the
|
||||
bit is not set (0) than the sector is occupied. Structures situated within the
|
||||
confines of a data band such as Code Page Info & Data sectors, Hotfix
|
||||
sectors,
|
||||
the Root Directory DirBlk etc. are all marked as fully occupied within that
|
||||
band's usage bitmap.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=435 HEIGHT=257 SRC="fig1.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 1: The basic data layout of a HPFS volume.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Since each bit maps a sector, a byte maps eight sectors and the complete 2 KB
|
||||
block maps the 16,384 sectors (including the bitmap block itself) in a 8 MB
|
||||
band. And since two blocks can face each other, we arrive at the maximum
|
||||
possible extent (fragment) size of 16,380 KB. Examine Figure 2 now to see
|
||||
examples of file and freespace mapping.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=429 HEIGHT=302 SRC="fig2.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 2: The correspondence of the first five bytes in a data band's usage
|
||||
bitmap to the first 40 sectors in the band.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
In this example we see 23 occupied sectors ("u") and 4 unoccupied areas (".")
|
||||
which we will refer to as "freeruns" [of sectors]. At one extreme, the 23
|
||||
sectors might belong to one file (here in four extents) while at the other
|
||||
extreme we might have the FNODEs of 23 "zero-length" files. (Every file and
|
||||
directory entry on a HPFS volume must have an FNODE sector.)
|
||||
|
||||
<P>
|
||||
The advantages of the bitmap approach are twofold. First, the small allocation
|
||||
unit size on a HPFS volume means greatly reduced allocation unit wastage
|
||||
compared to large FAT partitions. Second, the compact mapping structure makes
|
||||
it feasible for HPFS to quickly search a data band for enough free space to slot
|
||||
in a file of known size, in one piece if possible. For example, as just
|
||||
mentioned HPFS can map 32,760 allocation units with just 4 KB of bitmaps whereas
|
||||
a 16-bit FAT structure requires 64 KB (per FAT copy) to map 32,768 allocation
|
||||
units.
|
||||
|
||||
<P>
|
||||
<H2>A Fragmentation Analysis</H2>
|
||||
|
||||
<P>
|
||||
In this section we'll examine the level of fragmentation on the five HPFS
|
||||
partitions of my first HD. Look at Figure 3. Notes:
|
||||
|
||||
<P>
|
||||
1. A time-since-last-defrag figure of "Never" means that I've never run a
|
||||
defragger across this partition since upgrading to OS/2 Warp 118 days ago. This
|
||||
value is stored in the SuperBlock (LSN 16) and was determined by using the
|
||||
ShowSuperSpare REXX program featured in Part 2.
|
||||
|
||||
<P>
|
||||
2. The fragmentation levels were reported by the wondrous FST (freeware) with
|
||||
"FST -n check -f C:" while the names of the fragmented files and their sizes
|
||||
came from the GammaTech Utilities (commercial) "HPFSOPT C: -u -d -o1 -l
|
||||
logfile". You can also use the Graham Utilities (commercial) "HPFS-EXT C: -s".
|
||||
|
||||
<P>
|
||||
3. The high number of files with 0 data extents on C: is due to the presence of
|
||||
the WPS folders on this drive. Each of these has "zero" bytes in the main file
|
||||
but they usually have bytes in EAs.
|
||||
|
||||
<P>
|
||||
4. Files with 0 or 1 extents are considered to fully contiguous, so I've placed
|
||||
them in one grouping.
|
||||
|
||||
<P>
|
||||
5. Files with 2-8 extents are considered to be "nearly" contiguous" since the
|
||||
fragments will usually be placed close together on the disk and also because a
|
||||
list of the location and length of up to 8 extents can be kept in a file's FNODE
|
||||
sector. This list will be kept memory resident while the file is open. Note 1:
|
||||
the extents themselves can not be kept memory resident since, theoretically,
|
||||
they could be up to 8*16,380 KB in size. But no non-data disk reads, after the
|
||||
initial read of the FNODE, would be required to work with the file. Note 2:
|
||||
under some circumstances, the 8 extents, if small enough, could be kept memory
|
||||
resident in the sense that they could be held in HPFS' cache. We will consider
|
||||
FNODEs in detail in a later installment.
|
||||
|
||||
<P>
|
||||
6. Files with more than 8 extents have too many fragments to be listed in their
|
||||
FNODEs. Instead an B+tree allocation sector structure (an ALSEC) is used to map
|
||||
the extents. The sector mappings are small enough to keep memory resident while
|
||||
the file is open. ALSECs will be covered in a latter installment.
|
||||
|
||||
<P>
|
||||
7. EAs are usually not fragmented since, in the current implementation of OS/2,
|
||||
the total EA size associated with any one file is only 64 KB. If a file has EAs
|
||||
in 0 extents then the EA information is stored completely within the FNODE
|
||||
sector. (There is space in the FNODE for up to 145 bytes of "internal" EAs.)
|
||||
In all other cases on my system they currently stored in single, external runs
|
||||
of sectors. EAs will be covered in later installments.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=443 HEIGHT=490 SRC="fig3.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 3: Fragmentation analysis of five HPFS partitions.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
We now turn to the topic of what circumstances are leading to file fragmentation
|
||||
on these partitions.
|
||||
|
||||
<P>
|
||||
C: _ The OS/2 system partition. I've run out of space on this drive on
|
||||
occasions. Activity here occurs though the running of Fixpacks (FP 16 and then
|
||||
FP 17 were run), INI maintenance utilities and driver upgrades. There is really
|
||||
nothing of concern here. Most HPFS defraggers suggest not trying to defrag
|
||||
files that have less than 2 or 3 extents since you run the risk of fragmenting
|
||||
the free space. We will return to this topic shortly.
|
||||
|
||||
<P>
|
||||
D: _ My main work area and the location of communications files. I use the DOS
|
||||
comms package TELEMATE because I've always liked its features (although OS/2 has
|
||||
to work hard to handle its modem access during a file transfer - OS/2 comms
|
||||
programs, in general, are much less demanding of the CPU's attention). The
|
||||
other major comms package I use is OS/2 BinkleyTerm v2.60 feeding OS/2 Squish
|
||||
message databases. The fragmented files consist mainly of files downloaded by
|
||||
TELEMATE (DOS comms programs do not inform HPFS, ahead of time, of how much
|
||||
space the downloaded file will occupy) and Squish databases (*.SQD). The drive
|
||||
was defragged 53 days ago at which time no special effort was made to reduce
|
||||
file fragmentation below 2-3 extents, accounting for the presence of 245 files
|
||||
with two extents. This really is an insignificant amount regardless of what the
|
||||
4% figure may lead you to believe.
|
||||
|
||||
<P>
|
||||
The most fragmented file on this partition is a 150 KB BinkleyTerm logfile with
|
||||
30 extents. The main reason I can see for fragmentation in this case is that
|
||||
the file is frequently being updated with information while file transfers are
|
||||
in progress. The Squish databases are also prone to fragmentation. Out of a
|
||||
total of 25 database files there were 8, averaging 500 KB each, with a average
|
||||
of 15 extents.
|
||||
|
||||
<P>
|
||||
E: _ The fragmentation here was insignificant apart from a single 2.8 MB
|
||||
executable Windows program that has had a DOS patch program run over it,
|
||||
resulting in 38 fragments. The 2-extent files were mainly data files that are
|
||||
produced by this same Windows package (being run under WIN-OS2).
|
||||
|
||||
<P>
|
||||
F: _ Almost no fragmentation since this partition is reserved for DOS programs
|
||||
and I don't use them much.
|
||||
|
||||
<P>
|
||||
G: _ My second major work partition. Fragmentation is low and unlikely to go
|
||||
much lower since 2 extents is considered below the point of defragger
|
||||
involvement.
|
||||
|
||||
<P>
|
||||
The conclusions to be drawn from the above is that, if you don't get too hot
|
||||
under the collar about some files having 2 or 3 extents then there will
|
||||
generally be little need to worry about fragmentation under HPFS. Only certain
|
||||
types of files (some comms/DOS/Windows) will be candidates. And keeping
|
||||
partitions less than 80% full should help reduce general fragmentation as well.
|
||||
|
||||
<P>
|
||||
<H2>Defragmenting Files</H2>
|
||||
|
||||
<P>
|
||||
Since fragmentation is a relatively minor concern under HPFS there is not much
|
||||
of an argument for purchasing OS/2 utilities based mainly on their ability to
|
||||
defragment HPFS drives, especially since it's not hard to defragment files
|
||||
yourself. You see, providing there is enough contiguous freespace on a volume,
|
||||
the mere act of copying the files to a temporary directory, deleting the
|
||||
original and then moving the files back will usually eliminate, or at least
|
||||
reduce fragmentation since HPFS, knowing the original filesize, will look for a
|
||||
suitably sized freespace. The success of this technique is demonstrated in
|
||||
Figure 4 where 25 Squish database files (*.SQD) totalling 5.7 MB where shuffled
|
||||
about on D:. Note: don't use the MOVE command to initially transfer the files
|
||||
to the temp directory since this will just alter the directory entry rather than
|
||||
actually rewriting the files.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=159 HEIGHT=232 SRC="fig4.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 4: Number of extents in 25 SQD files before and after the defrag process
|
||||
described in the text.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
I've used the GU's HPFS-EXT to report these figures. This is freely available
|
||||
in the GULITE demo package. Note: the fully functional HPFSDFRG is also in
|
||||
this package but I wanted to show that it's not that hard to do this by hand.
|
||||
HPFSDFRG does much the same as I did except that you can specify the
|
||||
optimisation threshold (minimum number of extents before a file becomes a
|
||||
candidate) and it will retry the copying operation up to ten times if there are
|
||||
more extents after the operation than before it (due to heavily fragmented
|
||||
freespace).
|
||||
|
||||
<P>
|
||||
<H2>The Fragmentation of Freespace</H2>
|
||||
|
||||
<P>
|
||||
Another significant aspect of HPFS' fragmentation resistance is how well the FS
|
||||
keeps disk freespace in big, contiguous chunks. If the current files on a
|
||||
partition are relatively fragmentation free but the remaining freespace is
|
||||
arranged in lots of small chunks then there is a good change that new files will
|
||||
be fragmented. You can check this with "FST -n info -f C:". This produces a
|
||||
table that counts the number of freespace extents that are 1, 2-3, 4-7, 8-15,
|
||||
... 16384-32767 sectors long. In my opinion though it is more important to
|
||||
consider the product of the actual extent size by their frequency since the
|
||||
presence of numerous 1-extent spaces are not important if there are still a
|
||||
number of large spaces available.
|
||||
|
||||
<P>
|
||||
Figure 5 shows the output of the REXX program ShowFreeruns.cmd. The partition
|
||||
of 100 MB is almost empty. The display shows the location of the 2 KB block
|
||||
that holds the list of the starting LSNs of each bitmap block (this figure comes
|
||||
from the dword at offset 18h in the SuperBlock), the location of each bitmap
|
||||
block on the left and the sector size and location of freespace on the right.
|
||||
As you see, this partition has 13 data bands, 6 of which face each other. A
|
||||
version of ShowFreeruns.cmd that only outputs the run size was used to generate
|
||||
a list of figures. This list was loaded into a spreadsheet, sorted and a
|
||||
frequency distribution performed. See Figure 6. You can see that C: has no
|
||||
large areas remaining, D: has the majority of its freespace in the 4 MB < 8 MB
|
||||
range and that E:, F: and G: have kept large majorities of their freespace in
|
||||
very big runs. Overall, this is quite good performance.
|
||||
|
||||
<PRE>
|
||||
Inspecting drive O:
|
||||
|
||||
List of Bmp Sectors: 0x00018FF0 (102384)
|
||||
|
||||
Space-Usage Bitmap Blocks:
|
||||
Freespace Runs:
|
||||
|
||||
0x00000014-00000017 (20-23)
|
||||
0x00007FFC-00007FFF (32764-32767)
|
||||
130-32763 (#1:32634)
|
||||
|
||||
0x00008000-00008003 (32768-32771)
|
||||
0x0000FFFC-0000FFFF (65532-65535)
|
||||
32772-65531 (#2:32760)
|
||||
|
||||
0x00010000-00010003 (65536-65539)
|
||||
0x00017FFC-00017FFF (98300-98303)
|
||||
65540-81919 (#3:16380)
|
||||
81926-98291 (#4:16366)
|
||||
|
||||
0x00018000-00018003 (98304-98307)
|
||||
0x0001FFFC-0001FFFF (131068-131071)
|
||||
100369-102383 (#5:2015)
|
||||
102400-131067 (#6:28668)
|
||||
|
||||
0x00020000-00020003 (131072-131075)
|
||||
0x00027FFC-00027FFF (163836-163839)
|
||||
131076-163835 (#7:32760)
|
||||
|
||||
0x00028000-00028003 (163840-163843)
|
||||
0x0002FFFC-0002FFFF (196604-196607)
|
||||
163844-196603 (#8:32760)
|
||||
|
||||
0x00030000-00030003 (196608-196611)
|
||||
196612-204767 (#9:8156)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 5: Output from the ShowFreeruns.cmd REXX program.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=429 HEIGHT=378 SRC="fig6_3.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 6: Freespace analysis on five HPFS partitions.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<H2>The ShowFreeruns Program</H2>
|
||||
|
||||
<P>
|
||||
Like other programs in this series, ShowFreeruns.cmd (see Figure 7) uses
|
||||
SECTOR.DLL to read a sector off a logical drive. I was motivated to design this
|
||||
program after seeing the output of the GU's "HPFSINFO C: -F". On a one-third
|
||||
full 1.2 GB partition, the program presented here takes 17 secs compared to
|
||||
HPFSINFO's time of 26 secs. HPFSINFO also shows the CHS (Cyl/Hd/Sec)
|
||||
coordinates of each run. I was not interested in these but instead display the
|
||||
freerun's size. HPFSINFO also displays the meaning of what's in the SuperBlock
|
||||
and the SpareBlock. If you want to do this, you can include the code from
|
||||
ShowSuperSpare.cmd from Part 2 and it will only add an extra 0.5 secs to the
|
||||
time. The performance then, for a interpreted program (REXX), is quite good and
|
||||
was achieved primarily through a speed-up technique to be discussed shortly.
|
||||
Moreover, HPFSINFO consistently overstates the end of each freerun by 1 and it
|
||||
sometimes does not show the last run (e.g. on C: it states that there are 366
|
||||
freeruns but only shows 365 of them). This last bug appears to be caused by the
|
||||
last freerun continuing to the end of the partition. My design accounts for
|
||||
this situation.
|
||||
|
||||
<PRE>
|
||||
/* Shows bitmap locations and free space runs */
|
||||
ARG drive . /* First parm should always be drive */
|
||||
|
||||
IF drive = '' THEN CALL HELP
|
||||
parmList = "? /? /H HELP A: B:"
|
||||
IF WordPos(drive, parmList) \= 0 THEN CALL Help
|
||||
|
||||
/* Register external DLL functions */
|
||||
CALL RxFuncAdd 'ReadSect','Sector','ReadSect'
|
||||
CALL RxFuncAdd 'RxDate','RexxDate','RxDate'
|
||||
|
||||
/* Initialise Lookup Table*/
|
||||
DO exponent = 0 TO 7
|
||||
bitValue.exponent = D2C(2**exponent)
|
||||
END exponent
|
||||
|
||||
secString = ReadSect(drive, 16) /*Read Superblk sec*/
|
||||
freespaceBmpList = C2D(Reverse(Substr(secString,25,4)))
|
||||
totalsecs = C2D(Reverse(Substr(secString,17,4)))
|
||||
|
||||
'@cls'
|
||||
SAY
|
||||
SAY "Inspecting drive" drive
|
||||
SAY
|
||||
/* LSN 25 = list of bitmap blocks */
|
||||
CALL ShowDword " List of Bitmap secs",25
|
||||
|
||||
startOfListBlk = 0
|
||||
startOfBlk = 0
|
||||
bmpListBlk = ""
|
||||
bmpBlk = ""
|
||||
getFacingBands = 0
|
||||
runNumber = 0
|
||||
byteOffset = 0
|
||||
runNumber = 0
|
||||
/* Read in 4 secs of the list of sec-usage bmp blks */
|
||||
DO secWithinBlk = freespaceBmpList TO freespaceBmpList+3
|
||||
temp = StartOfListBlk + secWithinBlk
|
||||
bmpListBlk = bmpListBlk||ReadSect(drive, temp)
|
||||
END secWithinBlk
|
||||
|
||||
SAY
|
||||
SAY "Space-Usage Bitmap Blocks:"
|
||||
SAY " Freespace Runs:"
|
||||
|
||||
/* Use dword pointers to bmps to read in 2KB bmp blks */
|
||||
DO listOffset = 1 TO 2048 BY 4
|
||||
startDecStr = C2D(Reverse(Substr(bmpListBlk,ListOffset,4)))
|
||||
IF startDecStr = 0 THEN /* No more bmps listed */
|
||||
DO
|
||||
IF getFacingBands = 1 THEN
|
||||
DO /* Last data band had no facing data band */
|
||||
bmpSize = 2048
|
||||
CALL DetermineFreeruns
|
||||
LEAVE
|
||||
END
|
||||
|
||||
LEAVE
|
||||
END
|
||||
|
||||
/*Display a blank line when a new facing band occurs*/
|
||||
IF (ListOffset+7//8 = 0 THEN SAY
|
||||
|
||||
CALL ShowBmpBlk listOffset
|
||||
DO secWithinBlk = 0 TO 3
|
||||
temp = StartOfBlk + secWithinBlk
|
||||
bmpBlk = bmpBlk||ReadSect(drive, temp)
|
||||
END secWithinBlk
|
||||
|
||||
getFacingBands = getFacingBands + 1
|
||||
IF getFacingBands = 2 THEN /* Wait until you get both */
|
||||
DO /* bmps for the facing data*/
|
||||
bmpSize = 4096 /* bands since maximum extent*/
|
||||
CALL DetermineFreeruns /* length is 16,380 KB */
|
||||
byteOffset = byteOffset+4096
|
||||
getFacingBands = 0
|
||||
bmpBlk = ""
|
||||
END
|
||||
END listOffset
|
||||
|
||||
EXIT /**************EXECUTION ENDS HERE**************/
|
||||
|
||||
|
||||
FourBytes2Hex: /* Given offset, return dword */
|
||||
ARG startPos
|
||||
rearranged = Reverse(Substr(secString,startPos,4))
|
||||
RETURN C2X(rearranged)
|
||||
|
||||
|
||||
ShowDword: /* Display dword and dec equivalent */
|
||||
PARSE ARG label, offset
|
||||
hexStr = FourBytes2Hex(offset)
|
||||
SAY label": 0x"hexStr "("X2D(hexStr)")"
|
||||
RETURN
|
||||
|
||||
|
||||
ShowBmpBlk:
|
||||
/* Show start-end of freespace runs in hex & dec */
|
||||
PARSE ARG offset
|
||||
endDecStr = C2D(Reverse(Substr(bmpListBlk,offset,4)))+3
|
||||
SAY " 0x"D2X(startDecStr,8)"-"D2X(endDecStr,8)
|
||||
" ("startDecStr"-"endDecStr")"
|
||||
startOfBlk = startDecStr
|
||||
RETURN
|
||||
|
||||
|
||||
DetermineFreeruns:
|
||||
runStatus = 0
|
||||
oldchar = ''
|
||||
/* Check 128 secs at a time to speed up operation */
|
||||
DO para = 1 to bmpSize BY 16
|
||||
/* 16 bytes*8 secs/byte = 128 secs per para scanned */
|
||||
char = Substr(bmpBlk,para,16)
|
||||
IF char = 'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF'x &,
|
||||
runstatus = 1 THEN ITERATE para
|
||||
IF char = '00000000000000000000000000000000'x &,
|
||||
runstatus = 0 THEN ITERATE para
|
||||
/* Part of paragraph has run start/end
|
||||
so check a byte (8 secs) at a time. */
|
||||
DO byte = para TO para + 15
|
||||
char = Substr(bmpBlk,byte,1)
|
||||
IF char > '0'x THEN /* 1 or more free secs */
|
||||
DO
|
||||
IF char = 'FF'x THEN /* 8 unoccupied secs */
|
||||
IF runStatus = 1 THEN /* Run is in progress */
|
||||
NOP
|
||||
ELSE /* Run starts on 8 sec boundary */
|
||||
DO
|
||||
startByte = byte + byteOffset
|
||||
startBitPos = 0
|
||||
runStatus = 1 /* Start run determination */
|
||||
END
|
||||
ELSE
|
||||
CALL DetermineBit /* Partial usage of 8 secs */
|
||||
END
|
||||
ELSE
|
||||
DO /* All 8 secs are used */
|
||||
IF runStatus = 1 THEN
|
||||
DO
|
||||
endByte = byte + byteOffset
|
||||
endBitPos = -1 /* Run ends with prior sec */
|
||||
CALL ShowRun
|
||||
END
|
||||
END
|
||||
END byte
|
||||
END para
|
||||
|
||||
IF runStatus = 1 THEN /* Freespace at end of part. */
|
||||
DO
|
||||
endByte = 9999999999 /* Larger than # of secs in */
|
||||
endBitPos = 0 /* max. possible part.(512GB) */
|
||||
CALL ShowRun /* so ShowRun will set runEnd */
|
||||
/* to last LSN in this part. */
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
DetermineBit: /* Free/occupied usage within 8 sec blk */
|
||||
DO bitPos = 0 TO 7
|
||||
IF runStatus = 0 THEN
|
||||
DO /* No run currently in progress */
|
||||
IF BitAnd(char, bitValue.bitPos) > '0'x THEN
|
||||
DO /* sec is free */
|
||||
startByte = byte + byteOffset
|
||||
startBitPos = bitPos
|
||||
runStatus = 1
|
||||
END
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
IF BitAnd(char, bitValue.bitPos) = '0'x THEN
|
||||
DO /* sec is used */
|
||||
endByte = byte + byteOffset
|
||||
/* When a run ends, the sec before the first
|
||||
used one is the last sec in the freerun. */
|
||||
endBitPos = bitPos - 1
|
||||
CALL ShowRun
|
||||
END
|
||||
END
|
||||
END bitPos
|
||||
RETURN
|
||||
|
||||
|
||||
ShowRun:
|
||||
/* Display freerun start-end secs & reset run status */
|
||||
runNumber = runNumber + 1
|
||||
runStart = (startByte - 1) * 8 + startBitPos
|
||||
runEnd = (endByte - 1) * 8 + endBitPos
|
||||
|
||||
IF runEnd > totalSecs THEN runEnd = TotalSecs - 1
|
||||
IF runStart \= runEnd THEN /* More than 1 sec is free */
|
||||
DO
|
||||
run = runStart"-"runEnd
|
||||
run = Left(run||Copies(" ",14),15)
|
||||
SAY Copies(" ",40) run "(#"runNumber":"runEnd-RunStart+1")"
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
run = Left(runStart||Copies(" ",14),15)
|
||||
SAY Copies(" ",40) run "(#"runNumber":1)"
|
||||
END
|
||||
|
||||
runStatus = 0
|
||||
RETURN
|
||||
|
||||
|
||||
Help:
|
||||
SAY
|
||||
SAY "Purpose:"
|
||||
SAY " ShowFreeruns displays the location of the
|
||||
sec-usage bitmap blocks" /* Wrapped long line */
|
||||
SAY " and the location and extent of free space runs."
|
||||
SAY
|
||||
SAY "Example:"
|
||||
SAY " ShowFreeruns C:"
|
||||
SAY
|
||||
EXIT
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 7: The ShowFreeruns.cmd REXX program. Requires SECTOR.DLL. Note that
|
||||
the long SAY line (line 40) should include the next line as well. (SAY clauses
|
||||
can't be continued on to the next line with a comma.)
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Since a sector is mapped by a bit, the program often needs to check the status
|
||||
of a bit within a bitmap's byte. This is done using the BITAND(string1,
|
||||
string2) inbuilt function. In this design string 1 holds the byte to be
|
||||
examined and string 2 holds a character that only has the corresponding bit set.
|
||||
Rather than having to work out the character for string 2 each time BITAND() is
|
||||
used, we instead precalculate the eight characters and then store them in the
|
||||
BitValue. compound variable for later use.
|
||||
|
||||
<P>
|
||||
The next step is to read in the SuperBlock and from it get the location of the
|
||||
list of bitmap sectors and the total number of sectors. The later value is
|
||||
required so we know when we've reached the end of the partition.
|
||||
|
||||
<P>
|
||||
We then read in the four sectors of the block holding the list of bitmaps. The
|
||||
list consists of dwords that store the starting LSN of each bitmap block. 128
|
||||
dwords can fit in each sector of the list so the four sectors of the list can
|
||||
hold 512 bitmap block LSNs. Now a bitmap block maps 8 MB of diskspace so this
|
||||
'lite' version is only good when dealing with a partition of less than 4 GB.
|
||||
(Earlier works refer to the maximum partition size as 512 GB but in the recent
|
||||
"Just Add OS/2 Warp" package, in its technical section, it is stated that the
|
||||
maximum partition size is 64 GB.) I won't be able to check this aspect of the
|
||||
design until I get a HD bigger than 4 GB and succumb to the mad urge to
|
||||
partition it as one volume.
|
||||
|
||||
<P>
|
||||
The end of the list is indicated by the first occurrence of 0000h. The list of
|
||||
the 100 MB partition shown in Figure 5 contains only 13 dwords since it has 13
|
||||
data bands so, in a typical case, you should not expect to find much data stored
|
||||
in this block.
|
||||
|
||||
<P>
|
||||
A freerun can be bigger than a data band since pairs of bands face each other,
|
||||
so we consider two bands at a time, unless we reach the end of the partition
|
||||
without a facing band. Once we have a data region we call the DetermineFreeruns
|
||||
procedure. Here we examine the two, combined data bitmaps (unless it's a solo
|
||||
band at the end). In the initial design I looked at each byte in the 4 KB
|
||||
bitmap combination to see it if it was either 00h (all eight sectors used) or
|
||||
FFh (all eight sectors free). Typically, you will find lots of occupied or free
|
||||
sectors together, so checking eight at a time speeds up the search. Only when
|
||||
the byte was neither of these is a bit-level search required.
|
||||
|
||||
<P>
|
||||
However, the speed of this version was poor, with the search though each byte of
|
||||
the 322 KB of bitmaps for the 161 databands in the 1.2 GB partition taking a
|
||||
total of 104 secs. The obvious solution was to extend the optimisation method
|
||||
to a second, higher level by checking more bytes first to see if they were all
|
||||
set or clear. I settled on 16 bytes which covers 128 sectors (64 KB) of
|
||||
diskspace at a time and this resulted in the final time of 17 secs. Further
|
||||
experiments with larger (64 byte) groups and also with third-level optimisation
|
||||
did not show much improvement with my mix of partitions but your situation may
|
||||
warrant further experimentation.
|
||||
|
||||
<P>
|
||||
<H2>Code Pages</H2>
|
||||
|
||||
<P>
|
||||
Different languages have different character sets. Code Pages (CPs) are used to
|
||||
map an ASCII character to the actual character. CP tables reside in
|
||||
COUNTRY.SYS. They are also present on a HPFS volume and every directory entry
|
||||
(DIRENT) includes a CP index value.
|
||||
|
||||
<P>
|
||||
CPs are used to map character case (i.e. in a foreign character set the
|
||||
relationship between lower and upper-case characters) and for collating
|
||||
sequences used when sorting. As mentioned in Part 1, HPFS directories use a
|
||||
B-tree structure which, as part of its operation, always store file/directory
|
||||
names in sorted order. Remember that HPFS is not case-sensitive (including when
|
||||
sorting) but it preserves case.
|
||||
|
||||
<P>
|
||||
The European-style language (including English) have relatively straightforward
|
||||
Single-Byte Character Sets (SBCS) i.e. one character is represented by one
|
||||
byte. Asian character sets typically have many characters so they require two
|
||||
bytes per character (DBCS).
|
||||
|
||||
<P>
|
||||
The first 128 characters in all ASCII CPs are the same so the CP tables on the
|
||||
disk only map ASCII 128-255.
|
||||
|
||||
<P>
|
||||
The SpareBlock holds the LSN of the first CP Info sector. There is a header
|
||||
followed by up to 31 16-byte CP Info Entries. There is provision for more than
|
||||
one CP Info sector which could hold CP Info Entries 31-61 (counting from 0).
|
||||
Why so many different CPs are catered for I have no idea since I've been unable
|
||||
to have more than two loaded at a time. In Australia we typically use CP437
|
||||
(standard PC) - Country 061 and CP850 (multilingual Latin-1) - Country 000. The
|
||||
layout of a CP Info sector is shown in Figure 8.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=431 HEIGHT=400 SRC="fig8.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 8: The layout of a Code Page Infomation Sector.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The CP Info Entry contains the LSN where this entry's CP mapping table is
|
||||
stored. This sector is a CP Data Sector. As well as a header there is enough
|
||||
space for up to three 128-byte CP maps per sector. Figure 9 shows the layout of
|
||||
a CP Data Sector.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=431 HEIGHT=450 SRC="fig9.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 9: The layout of a Code Page Data Sector.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<H2>The CP.cmd Program</H2>
|
||||
|
||||
<P>
|
||||
Figure 10 shows the display produced by the REXX CP.cmd program (Figure 11).
|
||||
I've stopped it before it reached ASCII 255. Normally, the output will scroll
|
||||
off the screen, so either pause it or send it to the printer. If the mapped
|
||||
character has the same value as its ASCII value the word "same" is displayed
|
||||
instead to reduce clutter.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=430 HEIGHT=320 SRC="fig10.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 10: Partial output from the CP.cmd program. List continues on to ASCII
|
||||
255.
|
||||
</FONT>
|
||||
|
||||
<PRE>
|
||||
/* Decodes CP info & CP data sectors on a HPFS volume */
|
||||
ARG drive . /* First parm should always be drive */
|
||||
IF drive = '' | drive = "?" | drive = "HELP",
|
||||
| drive = "A:" | drive = "B:" THEN CALL Help
|
||||
|
||||
CALL RxFuncAdd 'ReadSect','Sector','ReadSect' /* In SECTOR.DLL */
|
||||
secString = ReadSect(drive,17) /* SpareBlock is LSN 17 */
|
||||
'@cls'
|
||||
SAY
|
||||
SAY "Inspecting drive" drive
|
||||
SAY
|
||||
|
||||
/* Offset 33 in Spareblock contains dword of CP info LSN */
|
||||
cpInfoSec = C2D(Reverse(Substr(secString,33,2)))
|
||||
secString = ReadSect(drive,cpInfoSec) /* Load CP info sec */
|
||||
numOfCodePages = C2D(Reverse(Substr(secString,5,2)))
|
||||
prevDataSec = ''
|
||||
|
||||
SAY "CODE PAGE INFORMATION (sector" cpInfoSec"):"
|
||||
SAY "Signature Dword: 0x"FourChars2Hex(1)
|
||||
SAY " CP# Ctry Code Code Page CP Data Sec Offset"
|
||||
|
||||
DO x = 0 TO numOfCodePages-1
|
||||
hexCountry = TwoChars2Hex((16*x)+17)
|
||||
decCountry = Right('00'X2D(hexCountry),3)
|
||||
cp = TwoChars2Hex((16*x)+19)
|
||||
country.x = X2D(cp)
|
||||
hexSec = FourChars2Hex((16*x)+25)
|
||||
decSec = X2D(hexSec)
|
||||
cpDataSec = decSec
|
||||
/* Since up to 3 CP tables can fit in 1 CP data sec,
|
||||
only read in a new data sec when the need arises. */
|
||||
IF cpDataSec \= prevDataSec THEN
|
||||
DO
|
||||
dataSecString = ReadSect(drive,cpDataSec)
|
||||
prevDataSec = cpDataSec
|
||||
END
|
||||
|
||||
offset = C2D(Reverse(Substr(dataSecString,(2*x)+21,2)))
|
||||
start = offset + 1
|
||||
SAY " " x " 0x"hexCountry "("decCountry") 0x"cp "("X2D(cp)") 0x"
|
||||
hexSec "("decSec") 0x"D2X(offset) "("offset")"
|
||||
/* Wrapped long line */
|
||||
/* Store table contents of each CP in an array */
|
||||
DO y = 128 TO 255
|
||||
char = Substr(dataSecString,start+6+y-18,1)
|
||||
IF C2D(char) \= y THEN
|
||||
array.x.y = Format(C2D(char),4) "("char")"
|
||||
ELSE
|
||||
array.x.y = " same "
|
||||
END y
|
||||
END x
|
||||
|
||||
/* Work out title line based on number of CPs */
|
||||
titleLine = " ASCII "
|
||||
DO x = 0 TO numOfCodePages-1
|
||||
titleLine = titleLine " CP" country.x
|
||||
END x
|
||||
SAY
|
||||
SAY titleLine
|
||||
|
||||
/* Display each table entry based on number of CPs */
|
||||
DO y = 128 TO 255
|
||||
dispLine = ''
|
||||
DO x = 0 TO numOfCodePages-1
|
||||
dispLine = dispLine" "array.x.y
|
||||
END x
|
||||
SAY "" y "("D2C(y)"):" dispLine
|
||||
END y
|
||||
|
||||
EXIT /****************EXECUTION ENDS HERE****************/
|
||||
|
||||
|
||||
FourChars2Hex:
|
||||
ARG offset
|
||||
RETURN C2X(Reverse(Substr(secString,offset,4)))
|
||||
|
||||
|
||||
TwoChars2Hex:
|
||||
ARG offset
|
||||
RETURN C2X(Reverse(Substr(secString,offset,2)))
|
||||
|
||||
|
||||
Help:
|
||||
SAY "Purpose:"
|
||||
SAY " CP decodes the CodePage Directory sector &"
|
||||
SAY " the CodePage sector on a HPFS volume"
|
||||
SAY
|
||||
SAY "Example:"
|
||||
SAY " CP C:"
|
||||
EXIT
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 11: The CP.cmd REXX program. Requires SECTOR.DLL.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
While REXX does not support arrays it does have compound variables and I've used
|
||||
a CV called "array" to store the contents of each CP's mapping table. The
|
||||
design only deals with the first 31 CP Info entries (that should be more than
|
||||
enough anyway) and accommodates additional CPs by adding new columns to the
|
||||
display.
|
||||
|
||||
<P>
|
||||
Armed with this printout you can experiment with different collating sequences
|
||||
when switching CPs. You can check out your current CP by typing "CHCP" and then
|
||||
switch to a different CP by issuing, say, "CHCP 850". I used "REM >
|
||||
File[Alt-nnn]" to create zero-length files, with one or more high-order ASCII
|
||||
characters in their filenames, as test fodder.
|
||||
|
||||
<P>
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
<P>
|
||||
In this installment you've learned how to decode the data band usage bitmaps
|
||||
contents and how to display the contents of the Code Page mapping tables. Next
|
||||
time we'll examine B-trees, DIRBLKs and DIRENTs.
|
||||
1006
study/sabre/os/files/FileSystems/HPFS/hpfs4.html
Normal file
932
study/sabre/os/files/FileSystems/HPFS/hpfs5.html
Normal file
@@ -0,0 +1,932 @@
|
||||
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 5: FNODEs, ALSECs and B+trees</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
This article originally appeared in the August 1996 issue of Significant
|
||||
Bits, the monthly magazine of the Brisbug PC User Group Inc.
|
||||
|
||||
<P>Last month you saw how DIRENTs (directory entries) are stored in
|
||||
4-sector structures known as DIRBLKs. These blocks have limited space
|
||||
available for entries. Due to the variable length of filenames (1-254
|
||||
characters), the maximum number of entries depends on the average filename
|
||||
length. If the average name length is in the 10-13 character range, a
|
||||
DIRBLK can hold up to 44 entries.
|
||||
|
||||
<P>When there are more files in a directory then can fit in a single
|
||||
DIRBLK, other DIRBLKs will be used and the connection between these blocks
|
||||
forms a structure known as a B-tree. Since there can be many elements
|
||||
(entries) in a node (DIRBLK), a HPFS B-tree has a quick "fan-out" and a
|
||||
low height (number of levels), ensuring fast entry location.
|
||||
|
||||
<P>This time, we'll take a long look at how a file's contents are
|
||||
logically stored under HPFS. To the best of my knowledge, this topic has
|
||||
not been well-covered in the scanty information available about HPFS. You
|
||||
will find it helpful to contrast the following file-sector allocation
|
||||
methods with last month's directory entry concepts.
|
||||
|
||||
<H2>Fragging a File</H2>
|
||||
|
||||
Since HPFS is inherently fragmentation-resistant, we have to twist its arm
|
||||
a little to produce fragmented files. The method I came up with first
|
||||
fills up an empty partition with a number of files created in an ascending
|
||||
name sequence. The next step deletes every second file. Finally, I create
|
||||
a file that is approximately one-half the partition's size. This file then
|
||||
has nowhere to go except into all the discontiguous regions previously
|
||||
occupied by the deleted file entries.
|
||||
|
||||
<P>This process takes some time with a large partition (100 MB) so I
|
||||
suggest you use a very small partition (1 MB). At first glance, you may
|
||||
think that if we fill up a 1 MB partition with say 100 files, then delete
|
||||
File1, File3, ... File99, and then create a 512K file, we will end up with
|
||||
a file with exactly 50 extents (fragments). This is not so, since each
|
||||
individual file occupies a FNODE sector as well as the sectors for the
|
||||
file itself, whereas a single fragmented file still has only 1 FNODE. So
|
||||
there is slightly more space available in each gap for an extent than
|
||||
there was for a file, and a 512K file will find more than 512K of space
|
||||
available and ends up occupying fewer gaps than expected and we end up
|
||||
with a smaller number of extents than was specified. For example, in the
|
||||
50-gap, 1 MB partition scenario we end up with 45 extents. There are also
|
||||
variations produced by things like the centrally located DIRBAND, the
|
||||
separate Root DIRBLK and multiple Databands to "fragment" the available
|
||||
freespace for very large files. So the number of gaps produced by deleting
|
||||
alternate files is only an rough approximation of the number of extents
|
||||
that will be produced.
|
||||
|
||||
<P>Figure 1 shows the MakeExtents.cmd REXX program. You specify the number
|
||||
of gaps you want to produce. For example, to originally produce 100 files
|
||||
on N:, delete half of them and leave 50 gaps, you would issue the command
|
||||
"MakeExtents N: 50".
|
||||
|
||||
<PRE>
|
||||
/* Produces a large, fragmented file */
|
||||
PARSE ARG numOfExts
|
||||
CALL RxFuncAdd 'SysLoadFuncs', 'RexxUtil', 'SysLoadFuncs'
|
||||
CALL SysLoadFuncs /* Load REXXUTIL.DLL external funcs */
|
||||
CALL SysCls
|
||||
EXIT /* Safety line. Delete this when you've adjusted the
|
||||
drive to suit your system. Formats the drive. */
|
||||
'echo y | format n: /l /fs:hpfs'
|
||||
SAY
|
||||
CALL SysMkDir 'n:\test' /* REXX MD. Faster than OS/2 MD */
|
||||
currentDir = Directory() /* Store current drive/directory */
|
||||
CALL Directory 'n:\test' /* Change to test dirve/directory*/
|
||||
/* Determine free space */
|
||||
PARSE VALUE SysDriveInfo('n:') WITH . free .
|
||||
|
||||
/* Determine size of each sequential file */
|
||||
fileSize = (free - (numOfExts*2*512)) % (numOfExts*2)
|
||||
secsInFile = fileSize % 512
|
||||
sectorFill = Copies('x',512) /* 512 bytes of 'x' char */
|
||||
Fill_20K = Copies(sectorFill,40) /* 20,480 bytes of 'x' */
|
||||
|
||||
/* Create string of the required length */
|
||||
CALL MakeFile secsInFile
|
||||
|
||||
DO i = 1 TO numOfExts*2 /* Produce the file sequence */
|
||||
CALL CreateFile /* Fixed-length filenames: File00001 */
|
||||
END i
|
||||
|
||||
DO i = 1 TO numOfExts*2 BY 2 /* Delete alternate files */
|
||||
CALL SysFileDelete 'n:\test\file'||Right("0000"||i,5)
|
||||
END i
|
||||
|
||||
PARSE VALUE SysDriveInfo('n:') WITH . free .
|
||||
|
||||
fragmentedFileSecs = ((free-512) % 512)-1
|
||||
CALL MakeFile fragmentedFileSecs
|
||||
|
||||
i='FRAGG' /* Fragmented filename: FileFRAGG */
|
||||
CALL CreateFile /* Create "FileFRAGG" */
|
||||
CALL Directory currentDir /* Return to original location */
|
||||
|
||||
EXIT /********************************************/
|
||||
|
||||
|
||||
MakeFile: PROCEDURE EXPOSE file sectorFill fill_20K
|
||||
ARG secs
|
||||
file = ''
|
||||
/* If final file is over 20K, speed up creation a little */
|
||||
IF secs>40 THEN
|
||||
file = Copies(fill_20K, secs%40)
|
||||
|
||||
file = file||Copies(sectorFill, secs//40)
|
||||
RETURN file
|
||||
|
||||
|
||||
CreateFile:
|
||||
CALL Charout 'n:\test\file'||Right("0000"||i,5),file,1
|
||||
CALL Stream 'n:\test\file'||Right("0000"||i,5),'C','CLOSE'
|
||||
RETURN
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 1: The MakeExtents.cmd program produces a fragmented file. When set up
|
||||
correctly, this program will wipe a partition.
|
||||
</FONT>
|
||||
|
||||
<H2>FNODEs, ALSECs, ALLEAFs and ALNODEs</H2>
|
||||
|
||||
Every file and directory on a HPFS partition has an associated FNODE,
|
||||
usually situated in the sector just before the file's first sector. The
|
||||
role of an FNODE is quite specific: to map the location of the file's
|
||||
extents (fragments) and any associated components, namely EAs (Extended
|
||||
Attributes - up to 64K of ancillary information) and ACLs (Access Control
|
||||
Lists - to do with LAN Manager).
|
||||
|
||||
<P>FNODEs and ALSECs (to be discussed shortly) contain a list of either
|
||||
ALLEAF or ALNODE entries. See Figure 2. An ALLEAF entry contains three
|
||||
dwords: logical sector offset (where the start of this run of sectors is
|
||||
within the total number of sectors in the file - the logical start sector
|
||||
is 0); run size in sectors; physical LSN (where the run starts in the
|
||||
partition). An ALLEAF entry is at the end of the B+tree. An ALNODE entry
|
||||
is an intermediate component in that it does not contain any extent
|
||||
information. Rather, it points to an ALSEC, and in turn the ALSEC can
|
||||
contain a list of either ALLEAFs (the end of the line) or ALNODEs (another
|
||||
descendant level in the B+tree).
|
||||
|
||||
<PRE>
|
||||
Offset Data Size Comment
|
||||
hex (dec) bytes
|
||||
|
||||
Header
|
||||
00h (1) Signature 4 0xF7E40AAE
|
||||
04h (5) Seq. Read History 4 Not implemented.
|
||||
08h (9) Fast Read History 4 Not Implemented.
|
||||
0Ch (13) Name Length 1 0-254.
|
||||
0Dh (14) Name 15 Last 15 chars. (Full name in DIRBLK.)
|
||||
1Ch (29) Container Dir LSN 4 FNODE of Dir that contains this one.
|
||||
20h (33) ACL Ext. Run Size 4 Secs in external ACL, if present.
|
||||
24h (37) ACL LSN 4 Location of external ACL run.
|
||||
28h (41) ACL Int. Size 2 Bytes in internal (inside FNODE) ACL.
|
||||
2Ah (43) ACL ALSEC Flag 1 >0 if ACL LSN points to an ALSEC.
|
||||
2Bh (44) History Bits Count 1 Not implemented.
|
||||
2Ch (45) EA Ext. Run Size 4
|
||||
30h (49) EA LSN 4
|
||||
34h (53) EA Int. Size 2
|
||||
36h (55) EA ALSEC Flag 1 >0 if EA LSN points to an ALSEC.
|
||||
37h (56) Dir Flag 1 Bit0 = 1 if dir FNODE, else file FNODE.
|
||||
38h (57) B+Tree Info Flag 1 0x20 (5) Parent is an FNODE, else ALSEC.
|
||||
0x80 (7) ALNODEs follow, else ALLEAFs.
|
||||
39h (58) Padding 3 Reestablish 32-bit alignment.
|
||||
3Ch (61) Free Entries 1 Number of free array entries.
|
||||
3Dh (62) Used Entries 1 Number of used array entries.
|
||||
3Eh (63) Free Ent. Offset 2 Offset to next free entry in array.
|
||||
|
||||
If ALLEAFs (Maximum of 8 in an FNODE)
|
||||
Extent #0
|
||||
40h (65) Logical LSN 4 Sec offset of this extent within file.
|
||||
The first extent has an offset of 0.
|
||||
44h (69) Run Size 4 Number of sectors in this extent.
|
||||
48h (73) Physical LSN 4 File: LSN of extent start.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #7
|
||||
94h (149) Logical LSN 4
|
||||
98h (153) Run Size 4
|
||||
9Ch (157) Physical LSN 4
|
||||
|
||||
If ALNODEs (Maximum of 12 in an FNODE)
|
||||
Extent #0
|
||||
40h (65) End Sector Count 4 Running total of secs mapped by this
|
||||
alnode. 1-based. If EOF is within this
|
||||
alnode then field contains 0xFFFFFFFF.
|
||||
44h (69) Physical LSN 4 File: LSN of ALSEC.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #11
|
||||
98h (153) End Sector Count 4
|
||||
9Ch (157) Physical LSN 4
|
||||
|
||||
Tail
|
||||
A0h (161) Valid File Length 4 Should be the same as File Size in DIRENT.
|
||||
A4h (165) "Needed" EAs Count 4 If any, EAs vital to the file's wellbeing.
|
||||
A8h (169) User ID 16 Not used.
|
||||
B8h (185) ACL/EA Offset 2 Offset in FNODE to first ACL, if present,
|
||||
otherwise offset to where EAs would be
|
||||
stored, if internalised.
|
||||
BAh (187) Spare 10 Unused.
|
||||
C4h (197) ACL/EA Storage 316 Only 145 bytes appear avaiable for EAs.
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 2: Layout of an FNODE. This component can contain either an array
|
||||
of ALNODE or ALLEAF entries.
|
||||
</FONT>
|
||||
|
||||
<P>Returning to the B-tree structure of DIRBLKs, you will remember that
|
||||
both intermediate and leaf components contain DIRENT data. So you may find
|
||||
the entry you're looking for in a node. This is not the case with a
|
||||
B+tree. Since an ALNODE can only point to an ALSEC, you must always
|
||||
proceed to the bottom of the tree, to a leaf, to retrieve extent
|
||||
information.
|
||||
|
||||
<P>An ALNODE entry only contains two dwords: a running total indicating
|
||||
the logical sector offset of the last sector in the ALSEC (i.e. how far we
|
||||
are through the file - this starts from 1); the physical LSN of where to
|
||||
find the ALSEC. The advantage of the smaller entry size of an ALNODE
|
||||
compared to an ALLEAF is that, in the same space, there can be more of
|
||||
them.
|
||||
|
||||
<P>An FNODE contains other data. One important piece of information is the
|
||||
last 15 characters of the filename. This comes in handy when we need to
|
||||
undelete. The last 316 bytes of the sector is also set aside for internal
|
||||
ACL/EAs (stored completely within the FNODE). In the Graham Utilities
|
||||
manual it is stated that up to 316 bytes of EAs can be stored within the
|
||||
FNODE but my experiments with OS/2 Warp v3 show that only up to 145 bytes
|
||||
of EAs can be internalised. Refer to Part 6 for further information.
|
||||
|
||||
<P>Figure 3 shows the structure of an ALSEC. You will notice that there is
|
||||
much more space in the sector devoted to ALNODE/ALSEC entries then is
|
||||
available in an FNODE sector (480 bytes compared to 96 bytes). This leads
|
||||
to the following maximum number of entries:
|
||||
|
||||
<PRE>
|
||||
ALLEAF ANODE
|
||||
FNODE 8 12
|
||||
ALSEC 40 60
|
||||
</PRE>
|
||||
|
||||
<PRE>
|
||||
Offset Data Size Comment
|
||||
hex (dec) bytes
|
||||
|
||||
Header
|
||||
00h (1) Signature 4 0x37E40AAE
|
||||
04h (5) This block's LSN 4 Helps when placing other blks nearby.
|
||||
08h (9) Parent's LSN 4 Points to either FNODE or another ALSEC.
|
||||
0Ch (13) Btree Flag 1 0x20 (5) Parent is an FNODE, else ALSEC.
|
||||
0x80 (7) ALNODEs follows, else ALLEAFs.
|
||||
0Dh (14) Padding 3 Reestablish dword alignment.
|
||||
10h (17) Free Entries 1 Number of free array entries.
|
||||
11h (18) Used Entries 1 Number of used array entries.
|
||||
12h (19) Free Ent. Offset 2 Offset to first free entry.
|
||||
|
||||
|
||||
If ALLEAFs (Maximum of 40 in an ALSEC)
|
||||
Extent #0
|
||||
14h (21) Logical LSN 4 Sec offset of this extent within file.
|
||||
Zero-based.
|
||||
18h (25) Run Size 4 Secs in this extent.
|
||||
1Ch (29) Physical LSN 4 File: LSN of extent start.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #39
|
||||
1E8h (489) Logical LSN 4
|
||||
1ECh (493) Run Size 4
|
||||
1F0h (497) Physical LSN 4
|
||||
|
||||
|
||||
If ALNODEs (Maximum of 60 in an ALSEC)
|
||||
Extent #0
|
||||
14h (21) End Sector Count 4 Running total of secs mapped by this
|
||||
alnode. 1-based. If EOF is within this
|
||||
alnode then field contains 0xFFFF.
|
||||
18h (25) Physical LSN 4 File: LSN of ALSEC.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #59
|
||||
1ECh (493) End Sector Count 4
|
||||
1F0h (497) Physical LSN 4
|
||||
|
||||
|
||||
Tail
|
||||
1F4h (501) Padding 12 Unused.
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 3: The layout of an ALSEC. This component can contain either an
|
||||
array of ALNODE or ALLEAF entries.
|
||||
</FONT>
|
||||
|
||||
<H2>Some Examples</H2>
|
||||
|
||||
The main program this month, ShowExtents.cmd (to be discussed later),
|
||||
needs to know the LSN of the FNODE or ALSEC that you want to start with.
|
||||
It would be possible to design a version that accepted the full pathname
|
||||
of a file but it would be a larger program. For the purpose of
|
||||
comprehending these structures, the requirement of having to specify a LSN
|
||||
is acceptable. To determine the file's FNODE location use last month's
|
||||
ShowBtree.cmd. Figure 4 shows ShowBtree's output on a 1 MB partition after
|
||||
"MakeExtents 7" was issued. From the information reported in Figure 4 we
|
||||
will first examine the TEST directory's FNODE. Figure 5 shows the result
|
||||
of issuing "ShowExtents N: 1033". Since there is no information in the
|
||||
allocation array area of a directory FNODE (the 128 byte region commencing
|
||||
at decimal offset 65), ShowExtents is designed to terminate early in such
|
||||
a situation.
|
||||
|
||||
<PRE>
|
||||
Root Directory:
|
||||
1016-1019 Next Byte Free: 125 Topmost DirBlk
|
||||
This directory's FNODE: 1032 (\ [level 1]) 1016->1032
|
||||
**************************************************
|
||||
SD 21 #00: .. FNODE:1032
|
||||
D 57 #01: test FNODE:1033
|
||||
E 93 #02:
|
||||
|
||||
36-39 Next Byte Free: 409 Topmost DirBlk
|
||||
This directory's FNODE: 1033 (test [level 1]) 36->1033
|
||||
**************************************************
|
||||
SD 21 #00: .. FNODE:1033
|
||||
57 #01: file00002 FNODE:432
|
||||
97 #02: file00004 FNODE:664
|
||||
137 #03: file00006 FNODE:896
|
||||
177 #04: file00008 FNODE:1154
|
||||
217 #05: file00010 FNODE:1386
|
||||
257 #06: file00012 FNODE:1618
|
||||
297 #07: file00014 FNODE:1850
|
||||
337 #08: fileFRAGG FNODE:316
|
||||
E 377 #09:
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 4: Last month's program, ShowBtree.cmd, shows the LSN of
|
||||
FileFRAGG's FNODE.
|
||||
</FONT>
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 1033
|
||||
Signature: F7E40AAE
|
||||
Name Length: 4
|
||||
Name: test
|
||||
Container Dir LSN: 1032
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: Directory FNODE
|
||||
Topmost DIRBLK LSN: 36
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 5: ShowExtents' output when displaying the contents of a directory
|
||||
FNODE.
|
||||
</FONT>
|
||||
|
||||
<P>Next, we'll look at an FNODE with a full complement of 8 ALLEAF
|
||||
entries. On my system, this is produced when "MakeExtents 7" is issued.
|
||||
See Figure 6. The next free entry in the array of ALLEAF entries is at
|
||||
offset 104 dec. Since the start point for this offset is counted from 65
|
||||
dec, this means that the next entry would start at 169 dec. This is
|
||||
actually past the end of the available entry area, at the beginning of the
|
||||
tail region. This is another indication that the array is full. (The main
|
||||
indication is the "0" value in the Free Entries field.)
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 316
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALLEAFs follow
|
||||
Free Entries: 0
|
||||
Used Entries: 8
|
||||
Next Free Offset: 104
|
||||
Valid data size: 420352
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #0: 115 sectors starting at LSN 317 (file sec offset:0)
|
||||
Extent #1: 116 sectors starting at LSN 548 (file sec off:115)
|
||||
Extent #2: 116 sectors starting at LSN 780 (file sec off:231)
|
||||
Extent #3: 116 sectors starting at LSN 1038 (file sec off:347)
|
||||
Extent #4: 116 sectors starting at LSN 1270 (file sec off:463)
|
||||
Extent #5: 116 sectors starting at LSN 1502 (file sec off:579)
|
||||
Extent #6: 116 sectors starting at LSN 1734 (file sec off:695)
|
||||
Extent #7: 10 sectors starting at LSN 1966 (file sec off:811)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 6: A FNODE with a full ALLEAF array.
|
||||
</FONT>
|
||||
|
||||
<P>If we need to map any more extents we must switch from a FNODE (with
|
||||
ALLEAFs) structure to FNODE (with ALNODEs) -> ALSEC (with ALLEAFs). Figure
|
||||
7 shows the mapping of a 10-extent file ("MakeExtents 8"). The B+tree Info
|
||||
Flag tells us that the FNODE contains an array of ALNODEs. There is only
|
||||
one entry in this array. The End Sector Count value is not shown here but,
|
||||
in this example, you could easily check it out using Part 2's SEC.cmd
|
||||
("SEC N: 316") and then look at the four bytes at offset 40h (in the case
|
||||
of a single entry in the array). Since this is the sole entry, you will
|
||||
find FFFFFFFFh (appears to be the array End-of-Entries indicator) at this
|
||||
location.
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 316
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALNODEs follow
|
||||
Free Entries: 11
|
||||
Used Entries: 1
|
||||
Next Free Offset: 16
|
||||
Valid data size: 418304
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
FNODE Entry #0
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 933
|
||||
Parent's LSN: 316
|
||||
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
|
||||
Free Entries: 30
|
||||
Used Entries: 10
|
||||
Next Free Offset: 128
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #0: 101 sectors starting at LSN 317 (file sec off:0)
|
||||
Extent #1: 102 sectors starting at LSN 520 (file sec off:101)
|
||||
Extent #2: 102 sectors starting at LSN 724 (file sec off:203)
|
||||
Extent #3: 102 sectors starting at LSN 1158 (file sec off:305)
|
||||
Extent #4: 102 sectors starting at LSN 1362 (file sec off:407)
|
||||
Extent #5: 102 sectors starting at LSN 1566 (file sec off:509)
|
||||
Extent #6: 102 sectors starting at LSN 1770 (file sec off:611)
|
||||
Extent #7: 42 sectors starting at LSN 1974 (file sec off:713)
|
||||
Extent #8: 5 sectors starting at LSN 928 (file sec off:755)
|
||||
Extent #9: 57 sectors starting at LSN 934 (file sec off:760)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 7: A 10-extent file is mapped in a 1-level B+tree with a single
|
||||
ALSEC.
|
||||
</FONT>
|
||||
|
||||
<P>The next section in the display in Figure 7, labelled "FNODE Entry #0"
|
||||
shows us that the sole ALNODE entry points to LSN 933. Here we are seeing
|
||||
this ALSEC's layout. The B+tree Info Flag informs us that this ALSEC
|
||||
contains ALLEAF entries i.e. the actual mapping of the extents. Notice
|
||||
that we have 10 ALLEAF entries in the allocation array. Remember that an
|
||||
ALSEC has much more space available for array entries than an FNODE has,
|
||||
in that it can store up to 40 ALLEAF entries. You can verify this by
|
||||
adding the ALSEC's Free Entries and the Used Entries values together.
|
||||
|
||||
<P>When you try and map more than 40 extents you will exceed the capacity
|
||||
of a sole ALSEC. What happens in this case is that more ALNODE entries are
|
||||
created in the FNODE, each pointing to an ALSEC. Figure 8 shows a
|
||||
42-extent layout (produced with a parameter of "45").
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 316
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALNODEs follow
|
||||
Free Entries: 10
|
||||
Used Entries: 2
|
||||
Next Free Offset: 24
|
||||
Valid data size: 393192
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
FNODE Entry #0
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 588
|
||||
Parent's LSN: 316
|
||||
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
|
||||
Free Entries: 0
|
||||
Used Entries: 40
|
||||
Next Free Offset: 232
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #0: 16 sectors starting at LSN 317 (file sec off:0)
|
||||
...
|
||||
Extent #39: 17 sectors starting at LSN 1668 (file sec off:720)
|
||||
|
||||
FNODE Entry #1
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 996
|
||||
Parent's LSN: 316
|
||||
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
|
||||
Free Entries: 38
|
||||
Used Entries: 2
|
||||
Next Free Offset: 32
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #40: 17 sectors starting at LSN 1702 (file sec off:737)
|
||||
Extent #41: 14 sectors starting at LSN 1736 (file sec off:754)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 8: 42 extents require a 1-level B+tree with 2 ALNODE entries in the
|
||||
FNODE pointing to 2 ALSECs.
|
||||
</FONT>
|
||||
|
||||
<P>There is space in an FNODE for 12 ALNODE entries. If each of these
|
||||
points to a full ALSEC (with ALLEAFs) i.e. 40-entries each, this two-level
|
||||
structure can accommodate 480 extents (parameter "564").
|
||||
|
||||
<P>Let's see what happens when we exceed this value. Figure 9 shows a
|
||||
482-extent layout ("565"). Interesting things have occurred. We now have a
|
||||
2-level B+tree structure. The FNODE ALNODE array has been adjusted to
|
||||
contain a sole entry. This in turn points to an ALSEC that has 13 ALNODE
|
||||
entries. Each of these ALNODE points to another ALSEC which contains
|
||||
ALLEAF entries. 12 of the ALSECs (with ALLEAFs) are full i.e. 12*40 while
|
||||
the 13th ALSEC (with ALLEAFs) only maps 2 extents.
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 1000
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALNODEs follow
|
||||
Free Entries: 11
|
||||
Used Entries: 1
|
||||
Next Free Offset: 16
|
||||
Valid data size: 524264
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
FNODE Entry #0
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 1333
|
||||
Parent's LSN: 1000
|
||||
B+tree Info Flag: Parent was an FNODE; ALNODEs follow
|
||||
Free Entries: 47
|
||||
Used Entries: 13
|
||||
Next Free Offset: 112
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #0 situated at LSN 328 (file sec count:582)
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 328
|
||||
Parent's LSN: 1333
|
||||
B+tree Info Flag: ALLEAFs follow
|
||||
Free Entries: 0
|
||||
Used Entries: 40
|
||||
Next Free Offset: 232 ALLEAF INFORMATION Extent #0-#39
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #1 situated at LSN 394 (file sec count:622)
|
||||
ALSEC STRUCTURE 394 (40) ALLEAF INFORMATION Extent #40-#79
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #2 situated at LSN 476 (file sec count:662)
|
||||
ALSEC STRUCTURE 476 (40) ALLEAF INFORMATION Extent #80-#119
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #3 situated at LSN 558 (file sec count:702)
|
||||
ALSEC STRUCTURE 558 (40) ALLEAF INFORMATION Extent #120-#159
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #4 situated at LSN 640 (file sec count:742)
|
||||
ALSEC STRUCTURE 640 (40) ALLEAF INFORMATION Extent #160-#199
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #5 situated at LSN 722 (file sec count:782)
|
||||
ALSEC STRUCTURE 722 (40) ALLEAF INFORMATION Extent #200-#239
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #6 situated at LSN 804 (file sec count:822)
|
||||
ALSEC STRUCTURE 804 (40) ALLEAF INFORMATION Extent #240-#279
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #7 situated at LSN 886 (file sec count:862)
|
||||
ALSEC STRUCTURE 886 (40) ALLEAF INFORMATION Extent #280-#319
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #8 situated at LSN 968 (file sec count:902)
|
||||
ALSEC STRUCTURE 968 (40) ALLEAF INFORMATION Extent #320-#359
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #9 situated at LSN 1085 (file sec count:942)
|
||||
ALSEC STRUCTURE 1085 (40) ALLEAF INFORMATION Extent #360-#399
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #10 situated at LSN 1167 (file sec count:982)
|
||||
ALSEC STRUCTURE 1167 (40) ALLEAF INFORMATION Extent #400-#439
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #11 situated at LSN 1249 (file sec count:1022)
|
||||
ALSEC STRUCTURE 1249 (40) ALLEAF INFORMATION Extent #440-#479
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #12 situated at LSN 1331 (file sec count:At end)
|
||||
ALSEC STRUCTURE 1331 (2) ALLEAF INFORMATION Extent #480-#481
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 9: 482 extents are mapped by a 2-level B+tree with 1 ALNODE entry
|
||||
in the FNODE pointing to 1 ALSEC, which in turn points to 13 ALSECs.
|
||||
</FONT>
|
||||
|
||||
<P>If you look at FNODE Entry #0's Used & Free Entries values you can
|
||||
verify that, in an ALSEC, there can be a maximum of 60 ALNODEs. It would
|
||||
take 60*40 = 2,400 extents to fill this level up again. Going past this
|
||||
would require the presence of a second FNODE entry. Since we can have up
|
||||
to 12 ALNODE entries in an FNODE, this means we could map 12*60*40 =
|
||||
28,800 extents before the need to insert another intermediary ALSEC level
|
||||
would arise.
|
||||
|
||||
<P>On a 100 MB partition I produced a 3-level 44,413 extent structure
|
||||
("44500"). To put this discussion on B+tree fan-out in perspective, it
|
||||
should be remembered that, in the fragmentation analysis performed in Part
|
||||
3 on 20,800 files in 5 partitions, there were only 14 files with more than
|
||||
8 extents (i.e. requiring an ALSEC) and the largest number of extents
|
||||
reported was 30.
|
||||
|
||||
<H2>The ShowExtents Program</H2>
|
||||
|
||||
Figure 10 presents the ShowExtents.cmd REXX program. You will need to get
|
||||
SECTOR.DLL. The program first determines if the LSN you've specified
|
||||
belongs to an FNODE or ALSEC. (You can bypass the FNODE and commence the
|
||||
examination from an ALSEC.) Once it has determined this, the next most
|
||||
important consideration is: does the allocation array consist of ALLEAFs
|
||||
or ALNODEs? If it contains ALLEAFs we've reached the end of the tree and
|
||||
need only show the extents. If we are looking at an array of ALNODEs we
|
||||
need to recurse down each ALNODE entry, loading the ALSEC pointed to by
|
||||
the entry and then see whether it contains either ALLEAFs or ALNODEs. And
|
||||
so on...
|
||||
|
||||
<PRE>
|
||||
/*Shows the layout of FNODE and ALSECs. Requires SECTOR.DLL*/
|
||||
PARSE UPPER ARG drive lsn
|
||||
/* There must be at least two parms supplied */
|
||||
IF drive = '' | lsn = '' THEN CALL HELP
|
||||
/* Register external functions */
|
||||
CALL RxFuncAdd 'QDrive','sector','QDrive'
|
||||
CALL RxFuncAdd 'ReadSect','sector','ReadSect'
|
||||
alleafEntryCount = 0
|
||||
anodeEntryCount = 0
|
||||
SAY
|
||||
CALL MainRoutine
|
||||
EXIT /*****************EXECUTION ENDS HERE*****************/
|
||||
|
||||
|
||||
MainRoutine:
|
||||
PROCEDURE EXPOSE drive lsn alleafEntryCount anodeEntryCount
|
||||
usedEntries = 0
|
||||
sectorString = ReadSect(drive,lsn) /* Read in required sec */
|
||||
IF FourBytes2Hex(1) = 'F7E40AAE' THEN
|
||||
/* Is an FNODE */
|
||||
DO
|
||||
alSecIndicator = ''
|
||||
CALL DisplayFnode
|
||||
END
|
||||
ELSE
|
||||
/* Not an FNODE */
|
||||
DO
|
||||
IF FourBytes2Hex(1) = '37E40AAE' THEN
|
||||
/* Is an ALSEC */
|
||||
DO
|
||||
alSecIndicator = 'Y'
|
||||
CALL DisplayALSEC
|
||||
END
|
||||
ELSE
|
||||
/* Neither an FNODE or an ALSEC */
|
||||
DO
|
||||
SAY 'LSN' lsn 'is not an FNODE or ALSEC'
|
||||
EXIT
|
||||
END
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
DisplayFnode:
|
||||
SAY 'FNODE STRUCTURE'
|
||||
SAY 'LSN: ' lsn
|
||||
SAY 'Signature: ' FourBytes2Hex(1)
|
||||
SAY 'Name Length: ' Bytes2Dec(13,1)
|
||||
SAY 'Name: ' Substr(sectorString,14,Bytes2Dec(13,1))
|
||||
SAY 'Container Dir LSN:' Bytes2Dec(29,4)
|
||||
SAY 'EA Ext. Run Size: ' Bytes2Dec(45,4)
|
||||
SAY 'EA LSN: ' Bytes2Dec(49,4)
|
||||
SAY 'EA Int. Size: ' Bytes2Dec(53,2)
|
||||
SAY 'EA ALSEC Flag: ' Bytes2Dec(55,1)
|
||||
IF Bitand(Byte2Char(56),'1'x) = '1'x THEN
|
||||
dirFlag = 'Directory FNODE'
|
||||
ELSE
|
||||
dirFlag = 'File FNODE'
|
||||
|
||||
SAY 'Dir Flag: ' dirFlag
|
||||
IF dirFlag = 'Directory FNODE' THEN
|
||||
SAY 'Topmost DIRBLK LSN:'||Bytes2Dec(73,4)
|
||||
ELSE
|
||||
DO
|
||||
/* Is a file, so determine extents */
|
||||
CALL DetermineBtreeInfo 57
|
||||
SAY 'B+tree Info Flag: ' btreeInfo
|
||||
SAY 'Free Entries: ' Bytes2Dec(61,1)
|
||||
usedEntries = Bytes2Dec(62,1)
|
||||
SAY 'Used Entries: ' usedEntries
|
||||
SAY 'Next Free Offset: ' Bytes2Dec(63,2)
|
||||
SAY 'Valid data size: ' Bytes2Dec(161,4)
|
||||
SAY '"Needed" EAs: ' Bytes2Dec(165,4)
|
||||
SAY 'EA/ACL Int. Off: ' Bytes2Dec(169,4)
|
||||
CALL ShowALLEAF_or_ANODE
|
||||
END
|
||||
RETURN
|
||||
|
||||
FourBytes2Hex: /* Given offset, return Dword */
|
||||
ARG startPos
|
||||
rearranged = Reverse(Substr(sectorString,startPos,4))
|
||||
RETURN C2X(rearranged)
|
||||
|
||||
|
||||
Bytes2Dec:
|
||||
ARG startPos,numOfChars
|
||||
temp = Substr(sectorString,startPos,numOfChars)
|
||||
IF C2X(temp) = 'FFFFFFFF' THEN
|
||||
RETURN 'At the end'
|
||||
ELSE
|
||||
RETURN Format(C2D(Reverse(temp)),,0)
|
||||
|
||||
|
||||
Byte2Char:
|
||||
ARG startPos
|
||||
RETURN Substr(sectorString,startPos,1)
|
||||
|
||||
|
||||
DetermineBtreeInfo:
|
||||
ARG btreeByteOffset
|
||||
IF Bitand(Byte2Char(btreeByteOffset),'20'x) = '20'x THEN
|
||||
btreeInfo = 'Parent was an FNODE; '
|
||||
ELSE
|
||||
btreeInfo = ''
|
||||
|
||||
IF Bitand(Byte2Char(btreeByteOffset),'80'x) = '80'x THEN
|
||||
DO
|
||||
btreeInfo = btreeInfo||'ALNODEs follow'
|
||||
alNodeIndicator = 'Y'
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
btreeInfo = btreeInfo||'ALLEAFs follow'
|
||||
alNodeIndicator = 'N'
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
DisplayALSEC:
|
||||
SAY 'ALSEC STRUCTURE'
|
||||
alSecIndicator = 'Y'
|
||||
SAY 'Signature: ' FourBytes2Hex(1)
|
||||
lsn = Bytes2Dec(5,4)
|
||||
SAY 'This LSN: ' lsn
|
||||
SAY "Parent's LSN: " Bytes2Dec(9,4)
|
||||
CALL DetermineBtreeInfo 13
|
||||
SAY 'B+tree Info Flag: ' btreeInfo
|
||||
SAY 'Free Entries: ' Bytes2Dec(17,1)
|
||||
usedEntries = Bytes2Dec(18,1)
|
||||
SAY 'Used Entries: ' usedEntries
|
||||
SAY 'Next Free Offset: ' Bytes2Dec(19,1)
|
||||
CALL ShowALLEAF_or_ANODE
|
||||
RETURN
|
||||
|
||||
|
||||
ShowALLEAF_or_ANODE: PROCEDURE EXPOSE drive lsn sectorString,
|
||||
usedEntries alleafEntryCount anodeEntryCount entrySize,
|
||||
alsecIndicator alnodeIndicator
|
||||
IF alsecIndicator = 'Y' THEN
|
||||
entryOffset = 21
|
||||
ELSE
|
||||
entryOffset = 65
|
||||
|
||||
IF alnodeIndicator \= 'Y' THEN
|
||||
/* Is an ALLEAF */
|
||||
DO
|
||||
SAY
|
||||
IF usedEntries = 0 THEN
|
||||
DO
|
||||
SAY 'Zero-length file'
|
||||
EXIT
|
||||
END
|
||||
|
||||
SAY 'ALLEAF INFORMATION'
|
||||
entrySize = 12
|
||||
DO entry = alleafEntryCount TO alleafEntryCount+usedEntries-1
|
||||
fileSecOffset = Bytes2Dec(entryOffset,4)
|
||||
runSize = Bytes2Dec(entryOffset+4,4)
|
||||
physicalLSN = Bytes2Dec(entryOffset+8,4)
|
||||
SAY 'Extent #'||entry||':' runSize 'sectors starting
|
||||
at LSN' physicalLSN '(file sec offset:' ||fileSecOffset ||')'
|
||||
/* Wrapped long line */
|
||||
entryOffset = entryOffset+entrySize
|
||||
END entry
|
||||
|
||||
alleafEntryCount = entry
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
/* Is either an ALNODE in an ALSEC or in an FNODE */
|
||||
entrySize = 8
|
||||
IF alSecIndicator \= 'Y' THEN
|
||||
/* In an FNODE */
|
||||
DO entry = anodeEntryCount TO anodeEntryCount+usedEntries-1
|
||||
lsn = Bytes2Dec(entryOffset+4,4)
|
||||
SAY
|
||||
SAY 'FNODE Entry #' || entry
|
||||
CALL MainRoutine
|
||||
entryOffset = entryOffset+entrySize
|
||||
END entry
|
||||
ELSE
|
||||
DO
|
||||
/* In an ALSEC */
|
||||
listStart = 65
|
||||
sectorString = ReadSect(drive,lsn)
|
||||
DO entry = anodeEntryCount TO anodeEntryCount+usedEntries-1
|
||||
SAY
|
||||
SAY 'ALNODE INFORMATION'
|
||||
fileSecOffset = Bytes2Dec(entryOffset,4)
|
||||
lsn = Bytes2Dec(entryOffset+4,4)
|
||||
SAY 'ALSEC Entry #'||entry 'situated at LSN'
|
||||
lsn '(file sec count:'|| fileSecOffset ||')'
|
||||
/* Wrapped long line */
|
||||
CALL MainRoutine
|
||||
anodeEntryCount = entry
|
||||
entryOffset = entryOffset+entrySize
|
||||
END entry
|
||||
END
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
Help:
|
||||
SAY 'ShowExtents shows the extents mapped by a FNODE or ALSEC'
|
||||
SAY 'structure.'
|
||||
SAY
|
||||
SAY ' Usage: ShowExtents drive LSN_of_a_FNODE/ALSEC'
|
||||
SAY ' Example: ShowExtents C: 316'
|
||||
EXIT
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 10: The ShowExtents.cmd program.
|
||||
</FONT>
|
||||
|
||||
<H2>Counting Extents</H2>
|
||||
|
||||
It is handy to be able to report just the number of extents in a file.
|
||||
HPFS-EXT, in the Graham Utilities, can do this. It take a filename. It is
|
||||
available in the demo version of the GU's, "GULITE.xxx".
|
||||
|
||||
<P>The freeware FST (currently FST03F.xxx) does just about everything. You
|
||||
can specify either a filename ("FST INFO N: \TEST\FILEFRAGG" - note the
|
||||
space after the drive letter) or a LSN ("FST INFO N: 1000"). It will
|
||||
include the height of the B+tree and the total number of extents at the
|
||||
end of its display. Unfortunately, it displays a lot of other info, and
|
||||
sometimes you're only interesting in just the number of levels and
|
||||
extents.
|
||||
|
||||
<P>I cut down ShowExtents.cmd to produce CountExtents.cmd The design was
|
||||
not amenable to showing the height but it was a straightforward matter to
|
||||
show just the number of extents. I've not bothered to present it here
|
||||
since most readers will probably prefer to specify the filename. (The
|
||||
FNODE LSN keeps changing as you increase the number of extents so this
|
||||
makes it more difficult to use CountExtents.)
|
||||
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
In this installment we have seen how a file's sectors are mapped by FNODEs
|
||||
and ALSECs. These file system components can contain either an array of
|
||||
ALNODE or ALLEAF entries. By following through to the ALLEAFs we can
|
||||
examine the mapping of extents.
|
||||
|
||||
<P>We have also seen how a B+tree is different from a B-tree. In a DIRBLK
|
||||
B-tree, DIRENT information can be found in a node entry. But in an ALSEC
|
||||
B+tree, extent information is not stored in node entries, only in the
|
||||
leaves. The filling of nodes in an ALSEC B+tree is also much more
|
||||
efficient than the utilisation of nodal space in a DIRENT's B-Tree.
|
||||
|
||||
<P>When the next installment is presented we'll look at Extended
|
||||
Attributes. While not specifically a HPFS topic, they are well integrated
|
||||
into the file system and will fit well into this series.
|
||||
BIN
study/sabre/os/files/FileSystems/HPFS/hpfs_11.gif
Normal file
|
After Width: | Height: | Size: 7.0 KiB |
53
study/sabre/os/files/FileSystems/HPFS/index.html
Normal file
@@ -0,0 +1,53 @@
|
||||
<html><head><title>Operating Systems: The HPFS Filesystem</title></head>
|
||||
<body BGCOLOR=#FFFFFF TEXT=#000000 LINK="#0000FF" VLINK="#0000FF" ALINK="#107010">
|
||||
|
||||
<center><font face=Verdana size=7><b>HPFS FileSystem</b></font></center>
|
||||
<hr><p>
|
||||
|
||||
This series of articles apparently originally appeared in now defunct OS2Zone (Their page should be at http://www.os2zone.aus.net) written by Dan Bridges. I ran across it during my journeys of the net, and put it up here... The "original" form is <a href="hpfs.zip">available here</a>. This is a six part series of articles on HPFS.<p>
|
||||
|
||||
<ul><DL>
|
||||
<DT><font size=+1><a href="hpfs0.html">Part #0 - Preface</a></font><br>
|
||||
<DD>This article is the initial "preface" article that explains the motivations behind the series.
|
||||
It also talks about the filesystem organization scheme used by the FAT filesystem... and briefly
|
||||
introduces HPFS.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs1.html">Part #1 - Introduction</a></font><br>
|
||||
<DD>This introductory article compares the FAT filesystem against the HPFS filesystem in terms that
|
||||
a user would understand. This talks about the practical differences, such as speed, size, and
|
||||
fragmentation.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs2.html">Part #2 - The SuperBlock and the SpareBlock</a></font><br>
|
||||
<DD>This article starts delving more deeply into HPFS' internal structures. Two REXX programs are
|
||||
presented that greatly assist in the search for information. It also briefly looks at some
|
||||
other HPFS-related programs. Finally, you will see the Big Picture when the major structures
|
||||
of a HPFS partition are shown. <p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs3.html">Part #3 - Fragmentation, Diskspace Bitmaps and Code Pages</a></font><br>
|
||||
<DD>This article looks at how HPFS knows which sectors are occupied and which ones are free.
|
||||
It examines the amount of file fragmentation on five HPFS volumes and also checks out the
|
||||
fragmentation of free space. A program is presented to show free runs and some other
|
||||
details. Finally, it briefly discusses Code Pages and looks at a program that displays
|
||||
their contents.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs4.html">Part #4 - B-Trees, DIRBLKs, and DIRENTs</a></font><br>
|
||||
<DD>The most basic structures in the HPFS are DIRBLKs, DIRENTs and FNODEs. This article examines
|
||||
DIRBLKs and DIRENTs, talks about the differences between binary trees and B-trees and shows
|
||||
how DIRBLKs are interconnected to facilitate quick access in a large directory (one of HPFS'
|
||||
strengths). To assist in this investigation, a program, ShowBtree.cmd, helps to visualise
|
||||
the layout of directory and file entries in a partition.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs5.html">Part #5 - FNODEs, ALSECs and B+trees</a></font><br>
|
||||
<DD>This article takes a long look at how a file's contents are logically stored under HPFS.
|
||||
It is helpful to contrast the following file-sector allocation methods with last articles's
|
||||
directory entry concepts. It also talks about fragmentation and how HPFS deals with it.<p>
|
||||
|
||||
<DT><font size=+1>Part #6 - ?</font><br>
|
||||
<DD>This is as far as I can go... if anyone has any of the other articles that appeared in this
|
||||
series, please please send them my way...<p>
|
||||
|
||||
</DL></ul>
|
||||
|
||||
<p><hr><FONT SIZE = 4><TABLE ALIGN=RIGHT BORDER=0><TR><TD><center>
|
||||
Copyright © 1998 <i><a href="mailto:sabre@nondot.org">Chris Lattner</a></i><br>
|
||||
Last modified: Wednesday, 13-Sep-2000 14:10:50 CDT </center></TD></TR></TABLE>
|
||||
593
study/sabre/os/files/FileSystems/Joliet.html
Normal file
@@ -0,0 +1,593 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>Joliet Specification</title>
|
||||
</head>
|
||||
<body bgcolor="#ffffff">
|
||||
<a name="top"></a><center>
|
||||
<h1>Joliet Specification</h1>
|
||||
<b>
|
||||
<p>CD-ROM Recording Spec ISO 9660:1988</b> </center> <br>
|
||||
</p>
|
||||
<p>Extensions for Unicode Version 1; May 22, 1995 </p>
|
||||
<p>Copyright 1995, Microsoft Corporation All Rights Reserved <br>
|
||||
Contact Microsoft Developer Relations Group <br>
|
||||
MAC@avca.com </p>
|
||||
<hr>
|
||||
<h2><a name="contents">CONTENTS</a></h2>
|
||||
<ul>
|
||||
<li><a href="#preface">Preface</a>
|
||||
<ul>
|
||||
<li><a href="#scope">Purpose and Scope</a> </li>
|
||||
<li><a href="#overview">Overview </a> </li>
|
||||
<li><a href="#terms">Terminology and Notation</a> </li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="#recording">Joliet Recording Specification</a>
|
||||
<ul>
|
||||
<li><a href="#change">Change Summary </a> </li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="#unicode">Identifying an ISO 9660 SVD as Unicode
|
||||
(UCS-2)</a>
|
||||
<ul>
|
||||
<li><a href="#escapes">SVD Escape Sequences Field</a> </li>
|
||||
<li><a href="#flags">SVD Volume Flags Field</a> </li>
|
||||
<li><a href="#resolution">Resolution of ISO 9660
|
||||
Ambiguities for Wide Characters </a> </li>
|
||||
<li><a href="#wide">Wide Character Byte Ordering</a> </li>
|
||||
<li><a href="#allowed">Allowed Character Set </a> </li>
|
||||
<li><a href="#identifiers">Special Directory
|
||||
Identifiers</a> </li>
|
||||
<li><a href="#separator">Separator Characters</a> </li>
|
||||
<li><a href="#sort">Sort Ordering</a> </li>
|
||||
<li><a href="#relaxation">Relaxation of ISO 9660
|
||||
Restrictions on UCS-2 Volumes </a> </li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="#extension">Extensions to Joliet</a>
|
||||
<ul>
|
||||
<li><a href="#multisession">Joliet for Multisession
|
||||
Media</a> </li>
|
||||
<li><a href="#cdxa">CD-XA Extensions to Joliet</a> </li>
|
||||
<li><a href="#other">Other Extensions to Joliet </a> </li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="#bibliography">Bibliography 14 </a> </li>
|
||||
</ul>
|
||||
<h2><a name="preface">Preface</a></h2>
|
||||
<h3><a name="scope"></a>Purpose and Scope </h3>
|
||||
<p>While the CD-ROM media provides for cost-effective software
|
||||
distribution, the existing ISO 9660 file system contains a number
|
||||
of restrictions which interfere with simple and efficient
|
||||
distribution of files on a CD-ROM. </p>
|
||||
<p>The read-only nature of CD-ROM media has led content authors
|
||||
to continue to use traditional magnetic media as their main
|
||||
avenue for creating applications. Each of the existing file
|
||||
systems for magnetic media contain various features which can not
|
||||
be represented on CD-ROM media using an unenhanced version of ISO
|
||||
9660. </p>
|
||||
<p>As content authors attempt to transfer their applications to
|
||||
the CD-ROM, they are likely to find that some of their work
|
||||
cannot be distributed on the CD-ROM media due to restrictions in
|
||||
the ISO 9660 format. This frustrates some content authors. </p>
|
||||
<p>Because the CD-ROM media is mainly a distribution media,
|
||||
rather than a creative (read/write) media, it is necessary for
|
||||
the CD-ROM file system to support a superset of the creative
|
||||
media features. This fundamental flaw in the design of ISO 9660
|
||||
has prompted several operating systems vendors to extend ISO 9660
|
||||
in several ways. Some examples are Rock Ridge Interchange
|
||||
Protocol and Apple's use of the System Use Area to store finder
|
||||
flags. </p>
|
||||
<p>Some of the ISO 9660 problems which are addressed by this
|
||||
specification include: </p>
|
||||
<ul>
|
||||
<li>Character Set limitations. </li>
|
||||
<li>File Name Length limitations </li>
|
||||
<li>Directory Tree Depth limitations </li>
|
||||
<li>Directory Name Format limitations </li>
|
||||
<li>Wide Character (16-bit character) ambiguities </li>
|
||||
</ul>
|
||||
<p>The general design approach used in the Joliet specification
|
||||
is to relax restrictions and resolve ambiguities in the ISO
|
||||
9660:1988 specification so the practical goals can be met. </p>
|
||||
<h3><a name="overview"></a>Overview </h3>
|
||||
<p>The Joliet specification utilizes the supplementary volume
|
||||
descriptor (SVD) feature of ISO 9660 to specify a set of files
|
||||
recorded within the Unicode character set. </p>
|
||||
<p>The ISO 10646 character set specification may be identified by
|
||||
an ISO 2022 escape sequence. By recording this escape sequence in
|
||||
an ISO 9660 SVD, this technique for identifying the Unicode SVD
|
||||
is compliant with the ISO 9660 specification. It also retains
|
||||
interchange by not disrupting the files referenced through the
|
||||
primary volume descriptor (PVD). </p>
|
||||
<p>All that remains is to resolve minor technical ambiguities
|
||||
within ISO 9660 which arise as the result of the use of wide
|
||||
characters. </p>
|
||||
<p>Because the use of this particular escape sequence in an ISO
|
||||
9660 SVD is unprecedented up to this time, several of the
|
||||
restrictions which are imposed by ISO 9660 may be relaxed without
|
||||
significantly disrupting information interchange between existing
|
||||
systems from a practical standpoint. </p>
|
||||
<p>This design approach has several benefits. For instance, the
|
||||
use of the existing ISO 9660 standard allows for straightforward
|
||||
integration with existing extensions to ISO 9660. The designs for
|
||||
the System Use Sharing Protocol, Rock Ridge extensions for POSIX
|
||||
semantics, CD-XA System Use Area Semantics, Apple's Finder Flags
|
||||
and Resource Forks, all port in a straightforward manner to the
|
||||
Joliet specification. </p>
|
||||
<p>Also, the use of a new SVD eliminates the danger of breaking
|
||||
software compatibility with existing ISO 9660 systems. Existing
|
||||
software will simply ignore the Unicode SVD, and will simply use
|
||||
the PVD instead. This compatibility "safety-valve"
|
||||
makes the goal of relaxing the file system's restrictions easier. </p>
|
||||
<p>This document describes how a CD-ROM may be constructed so
|
||||
that names on the volume can be recorded in Unicode while
|
||||
remaining in compliance with ISO 9660. The particular ISO 10646
|
||||
character sets used here are UCS-2 Level 1, UCS-2 Level 2, and
|
||||
UCS-2 Level 3. </p>
|
||||
<p>The basic strategy of CD-ROM volume recognition is the Volume
|
||||
Recognition Sequence, which is a sequence of volume descriptors,
|
||||
recorded one per sector, starting at Sector 16 in the first track
|
||||
of the last session on the disc. A receiving system reads these
|
||||
sectors and chooses a particular volume descriptor from the
|
||||
sequence. This volume descriptor acts as a kind of anchor upon
|
||||
which the remainder of the volume is constructed. </p>
|
||||
<h3><a name="terms">Terminology and Notation</a></h3>
|
||||
<p>Joliet is based on the ISO 9660:1988 standard. Unless defined
|
||||
in this document, the terminology used shall be as defined in ISO
|
||||
9660:1988. </p>
|
||||
<p>The following notation is used in this document. </p>
|
||||
<ul>
|
||||
<li>Decimal and Hexadecimal Notation
|
||||
<ul>
|
||||
<li>Numbers in decimal notation are represented by
|
||||
decimal digits, namely 0 to 9. </li>
|
||||
<li>Numbers in hexadecimal notation are represented
|
||||
by hexadecimal digits, namely 0 to 9 and A to F,
|
||||
shown in parentheses. For instance, the
|
||||
hexadecimal number D0 shall be written as (D0). </li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>A literal sequence of ASCII characters will be
|
||||
represented by those characters within single quotes. For
|
||||
instance, 'ABC' means the byte sequence (41)(42)(43). </li>
|
||||
<li>References to characters in the ISO 2022 escape sequence
|
||||
will be given in comma-separated decimal nibble/nibble
|
||||
format, in hexadecimal format, and as ASCII characters,
|
||||
with equal signs between each format, all enclosed within
|
||||
parenthesis. For instance, the 3-byte ISO 2022 escape
|
||||
sequence for Shift-JIS is (2/4, 2/11, 3/10 =
|
||||
(24)(2B)(3A)= '$+:'). </li>
|
||||
</ul>
|
||||
<p><a name="recording"></a><a href="#contents">return to the
|
||||
table of contents</a> </p>
|
||||
<h2>Joliet Recording Specification</h2>
|
||||
<h3><a name="change"></a>Change Summary</h3>
|
||||
<p>The Joliet specification resolves the following ISO 9660
|
||||
ambiguities for UCS-2 volumes: </p>
|
||||
<ul>
|
||||
<li>Use a SVD with a UCS-2 (UNICODE) Escape Sequence. </li>
|
||||
<li>The UCS-2 escape sequences used are: (25)(2F)(40),
|
||||
(25)(2F)(43), or (25)(2F)(45). </li>
|
||||
<li>The default setting of bit 0 of the SVD "Volume
|
||||
Flags Field" is ZERO. </li>
|
||||
<li>The Unicode Wide characters shall be recorded in
|
||||
"Big Endian" (Motorola) format. </li>
|
||||
<li>Special Directory Identifiers are recorded as single byte
|
||||
names containing (00) or (01). </li>
|
||||
<li>SEPARATOR 1 and SEPARATOR 2 are encoded using an
|
||||
equivalent 16-bit code point. </li>
|
||||
<li>Sort ordering is unchanged, except that all justification
|
||||
pad bytes are to be set to (00). </li>
|
||||
</ul>
|
||||
<p>The Joliet specification recommends that several ISO 9660
|
||||
restrictions be lifted on UCS-2 volumes. The Joliet specification
|
||||
allows for the following interchange rules: </p>
|
||||
<ul>
|
||||
<li>The File or Directory Identifiers may be up to 128 bytes
|
||||
(64 unicode characters) in length. </li>
|
||||
<li>Directory Identifiers may contain file name extensions. </li>
|
||||
<li>The Directory Hierarchy may be recorded deeper than 8
|
||||
levels. </li>
|
||||
<li>The volume recognition sequence supports multisession.
|
||||
This is compatible with the CD-Bridge specification. </li>
|
||||
</ul>
|
||||
<p>The Joliet specification may be extended through the use of
|
||||
the following specifications: </p>
|
||||
<ul>
|
||||
<li>Mode 2 Form 2 extents and CD-DA extents, ("System
|
||||
Description CD-ROM XA") </li>
|
||||
<li>System Use Sharing Protocol (not explicitly specified
|
||||
here) </li>
|
||||
<li>RockRidge Interchange Protocol (not explicitly specified
|
||||
here) </li>
|
||||
<li>Other future CD-ROM file system formats </li>
|
||||
</ul>
|
||||
<p> <a name="unicode"></a> </p>
|
||||
<p><a href="#contents">return to the table of contents</a> </p>
|
||||
<h2>Identifying an ISO 9660SVD as Unicode (UCS-2)</h2>
|
||||
<h3><a name="escapes">SVD Escape Sequences Field</a></h3>
|
||||
<p>The Escape Sequences field of an ISO 9660 Supplementary Volume
|
||||
Descriptor (ISO 9660 section 8.5.6) shall identify the character
|
||||
set used to interpret descriptor fields related to the Directory
|
||||
Hierarchy identified by the Volume Descriptor. </p>
|
||||
<p>If the Escape Sequences field of an ISO 9660 SVD identifies
|
||||
any of the following UCS-2 escape sequences, then the descriptor
|
||||
fields related to the Directory Hierarchy identified by that
|
||||
Volume Descriptor shall be interpreted according to the
|
||||
identified UCS-2 character set. </p>
|
||||
<p> </p>
|
||||
<hr>
|
||||
<b>
|
||||
<p>Table 1 - ISO 2022 UCS-2 Escape Sequences</b> </p>
|
||||
<pre>
|
||||
ISO 2022 Escape Sequence as recorded in the ISO 9660 SVD
|
||||
|
||||
Standard Level Decimal Hex Bytes ASCII
|
||||
UCS-2 Level 1 2/5, 2/15, 4/0 (25)(2F)(40) '%\@'
|
||||
UCS-2 Level 2 2/5, 2/15, 4/3 (25)(2F)(43) '%\C'
|
||||
UCS-2 Level 3 2/5, 2/15, 4/5 (25)(2F)(45) '%\E'
|
||||
</pre>
|
||||
<hr>
|
||||
<p>A "Unicode Volume" refers to the Volume Descriptor
|
||||
and Directory Hierarchy identified by a Supplementary Volume
|
||||
Descriptor containing an Escape Sequences field which identifies
|
||||
any of the above UCS-2 character sets. </p>
|
||||
<h3><a name="flags">SVD Volume Flags Field</a></h3>
|
||||
<p>The UCS-2 Level 1, UCS Level 2, and UCS-2 Level 3 escape
|
||||
sequences are considered to be registered according ISO 2735 for
|
||||
purposes of setting bit 0 of the Volume Flags field of the SVD. </p>
|
||||
<p>The nominal value of Bit 0 of the Volume Flags field for a
|
||||
Unicode SVD shall be ZERO. </p>
|
||||
<h3><a name="resolution">Resolution of ISO 9660 </a>Ambiguities
|
||||
for Wide Characters</h3>
|
||||
<p>This specification resolves ISO 9660 ambiguities with respect
|
||||
to wide (16-bit) character sets, such as the UCS-2 character set. </p>
|
||||
<h3><a name="wide">Wide Character Byte Ordering</a> </h3>
|
||||
<p>All UCS-2 characters shall be recorded according to ISO
|
||||
9660:1988 section 7.2.2, 16-bit numerical value, most significant
|
||||
byte first ("Big Endian"). </p>
|
||||
<h3><a name="allowed">Allowed Character Set</a> </h3>
|
||||
<p>All UCS-2 code points shall be allowed except for the
|
||||
following UCS-2 code points: </p>
|
||||
<ul>
|
||||
<li>All code points between (00)(00) and (00)(1F), inclusive.
|
||||
(Control Characters) </li>
|
||||
<li>(00)(2A) '*'(Asterisk) </li>
|
||||
<li>(00)(2F) '/' (Forward Slash) </li>
|
||||
<li>(00)(3A) ':' (Colon) </li>
|
||||
<li>(00)(3B) ';' (Semicolon) </li>
|
||||
<li>(00)(3F) '?' (Question Mark) </li>
|
||||
<li>(00)(5C) '\' (Backslash) </li>
|
||||
</ul>
|
||||
<p><a name="identifiers"></a> </p>
|
||||
<p><a href="#contents">return to the table of contents</a> </p>
|
||||
<h3>Special Directory Identifiers </h3>
|
||||
<p>Section 7.6 of ISO 9660 describes the recording of reserved
|
||||
directory identifiers for the root, current, and parent directory
|
||||
identifiers as single (00) or single (01) bytes. </p>
|
||||
<p>In a wide character set, it is not possible to represent a
|
||||
character in a single byte. The following portions of the ISO
|
||||
9660:1988 specification referring to reserved directory
|
||||
identifiers are ambiguous. </p>
|
||||
<p>The ISO 9660:1988 sections in question are as follows: </p>
|
||||
<ul>
|
||||
<li>6.8.2.2 Identification of directories </li>
|
||||
<li>7.6.2 Reserved Directory Identifiers </li>
|
||||
<li>9.1.11 File Identifier </li>
|
||||
<li>9.4.5 Directory Identifier </li>
|
||||
</ul>
|
||||
<p>These special case directory identifiers are not intended to
|
||||
represent characters in a graphic character set. These characters
|
||||
are placeholders, not characters. Therefore, these definitions
|
||||
remain unchanged on a volume recorded in Unicode. </p>
|
||||
<p>Simply put, Special Directory Identifiers shall remain as
|
||||
8-bit values, even on a UCS-2 volume, where other characters have
|
||||
been expanded to 16-bits. </p>
|
||||
<dl>
|
||||
<dt>Root Directory </dt>
|
||||
<dt><dfn>The Directory Identifier of a Directory Record
|
||||
describing the Root Directory shall consist of a single
|
||||
(00) byte.</dfn> </dt>
|
||||
<dt>Current Directory </dt>
|
||||
<dt><dfn>The Directory Identifier of the first Directory
|
||||
Record of each directory shall consist of a single (00)
|
||||
byte.</dfn> </dt>
|
||||
<dt>Parent Directory </dt>
|
||||
<dt><dfn>The Directory Identifier of the second Directory
|
||||
Record of each directory shall consist of a single (01)
|
||||
byte.</dfn> </dt>
|
||||
</dl>
|
||||
<h3><a name="separator">Separator Characters</a> </h3>
|
||||
<p>The separator characters SEPARATOR 1 and SEPARATOR 2 are
|
||||
specified as 8-bit characters, which can not be represented in a
|
||||
wide character set, so the ISO 9660:1988 specification sections
|
||||
referring to SEPARATOR 1 and SEPARATOR 2 are ambiguous. </p>
|
||||
<p>The ISO 9660:1988 sections in question are as follows: </p>
|
||||
<ul>
|
||||
<li>7.4.3 Separators </li>
|
||||
<li>7.5.1 File Identifier format </li>
|
||||
<li>7.5.2 File Identifier length </li>
|
||||
<li>8.4.24 Abstract File Identifier </li>
|
||||
<li>8.4.25 Bibliographic File Identifier </li>
|
||||
<li>8.5.17 Copyright File Identifier </li>
|
||||
<li>8.5.19 Bibliographic File Identifier </li>
|
||||
<li>9.1.11 File Identifier </li>
|
||||
</ul>
|
||||
<p>The values SEPARATOR 1 and SEPARATOR 2 shall be represented
|
||||
differently depending on the d1 character set. </p>
|
||||
<p>In the case of an SVD identifying a UCS-2 character set, the
|
||||
values of SEPARATOR 1 and SEPARATOR 2 shall be recorded as a
|
||||
UCS-2 character with an equivalent code point value. </p>
|
||||
<p>Otherwise, the definitions of SEPARATOR 1 and SEPARATOR 2
|
||||
shall be recorded according to section 7.4.3 of ISO 9660:1988. </p>
|
||||
<p>Simply put, SEPARATOR 1 and SEPARATOR 2 shall be expanded to
|
||||
16-bits. </p>
|
||||
<p> </p>
|
||||
<hr>
|
||||
<b>
|
||||
<p>Table 2 - Separator Representations</b> </p>
|
||||
<pre>
|
||||
ISO 9660:1988 Volume Unicode Volume
|
||||
|
||||
Separator Bit Combination UCS-2 Codepoint
|
||||
SEPARATOR 1 (2E) (00)(2E)
|
||||
SEPARATOR 2 (3B) (00)(3B)
|
||||
</pre>
|
||||
<hr>
|
||||
<p><a name="sort"></a><a href="#contents">return to the table of
|
||||
contents</a> </p>
|
||||
<h3>Sort Ordering</h3>
|
||||
<p>ISO 9660 specifies the order of path table records within a
|
||||
path table, and specifies the order of directory records within a
|
||||
directory. These sorting algorithms assume an 8-bit character set
|
||||
is used. These sorting algorithms are ambiguous when used with
|
||||
wide characters. </p>
|
||||
<p>The ISO 9660:1988 sections in question are as follows: </p>
|
||||
<ul>
|
||||
<li>6.9.1 Order of Path Table Records </li>
|
||||
<li>9.3 Order of Directory Records </li>
|
||||
</ul>
|
||||
<p>The only change required is to redefine the value of the sort
|
||||
justification pad byte to zero (00). </p>
|
||||
<p>Simply put, comparing the byte contents in all positions
|
||||
remains a suitable sorting algorithm for the descriptor fields
|
||||
recorded in a UCS-2 SVD Directory Hierarchy. This is one of the
|
||||
primary reasons for selecting the Big Endian format to represent
|
||||
all UCS-2 characters. </p>
|
||||
<p><b>Natural Language Sorting</b> </p>
|
||||
<p>On a Unicode volume, the 16-bit UCS-2 code points are used to
|
||||
determine the Order of Path Table Records and the Order of
|
||||
Directory Records. </p>
|
||||
<p>No attempt will be made to provide natural language sorting on
|
||||
the media. Natural language sorting may optionally be provided by
|
||||
a display application as desired. </p>
|
||||
<p><b>Justification Pad Bytes</b> </p>
|
||||
<p>The sort ordering algorithms as specified in ISO 9660:1988
|
||||
sections 6.9.1 and 9.3 are acceptable except for the value of the
|
||||
justification "pad byte". </p>
|
||||
<p>The value of the justification "pad byte" as
|
||||
specified in ISO 9660:1988 section 6.9.1 shall be (00). This is
|
||||
changed from a value of (20) as specified in that same section. </p>
|
||||
<p>The value of the justification "pad byte" as
|
||||
specified in ISO 9660:1988 section 9.3 subsections (a) and (b)
|
||||
shall be (00). This is changed from a value of (20) as specified
|
||||
in those same sections. </p>
|
||||
<p>The value of the justification "pad byte" as
|
||||
specified in ISO 9660:1988 section 9.3 subsections (c) shall be
|
||||
(00). This is changed from a value of (30) as specified in that
|
||||
same section. </p>
|
||||
<p>Simply put, set all the justification "pad bytes" to
|
||||
zero to simplify sorting. </p>
|
||||
<p> <b>Mandatory Sort Ordering.</b> </p>
|
||||
<p>Correct sort ordering is mandatory on UCS-2 volumes. </p>
|
||||
<p><b>Descriptor Fields affected by the UCS-2 Escape Sequence</b> </p>
|
||||
<p>If a UCS-2 escape sequence is detected in a supplementary
|
||||
volume descriptor, the following descriptor fields referenced
|
||||
from that supplementary volume descriptor shall contain UCS-2
|
||||
characters. </p>
|
||||
<ul>
|
||||
<li>ISO 9660:1988 Section 8.5.4 System Identifier </li>
|
||||
<li>ISO 9660:1988 Section 8.5.5 Volume Identifier </li>
|
||||
<li>ISO 9660:1988 Section 8.5.13 Volume Set Identifier </li>
|
||||
<li>ISO 9660:1988 Section 8.5.14 Publisher Identifier </li>
|
||||
<li>ISO 9660:1988 Section 8.5.15 Data Preparer Identifier </li>
|
||||
<li>ISO 9660:1988 Section 8.5.16 Application Identifier </li>
|
||||
<li>ISO 9660:1988 Section 8.5.17 Copyright File Identifier </li>
|
||||
<li>ISO 9660:1988 Section 8.5.18 Abstract File Identifier
|
||||
(missing section) </li>
|
||||
<li>ISO 9660:1988 Section 8.5.19 Bibliographic File
|
||||
Identifier </li>
|
||||
<li>ISO 9660:1988 Section 9.1.11 File Identifier </li>
|
||||
<li>ISO 9660:1988 Section 9.4.5 Directory Identifier </li>
|
||||
<li>ISO 9660:1988 Section 9.5.11 System Identifier (of
|
||||
Extended Attribute Record) </li>
|
||||
</ul>
|
||||
<p><a name="relaxation"></a><a href="#contents">return to the
|
||||
table of contents</a> </p>
|
||||
<h3>Relaxation of ISO 9660 Restrictions on UCS-2 Volumes </h3>
|
||||
<p>Several ISO 9660 restrictions will be relaxed to achieve a
|
||||
more useful recording specification. Joliet receiving systems
|
||||
shall be capable of receiving media recorded with restrictions
|
||||
which have been relaxed relative to ISO 9660. </p>
|
||||
<p> <b>Maximum File Identifier Length Increased</b> </p>
|
||||
<p>Joliet receiving systems shall receive directory hierarchies
|
||||
recorded with file identifiers longer than those allowed by ISO
|
||||
9660 receiving systems. </p>
|
||||
<p>ISO 9660 (Section 7.5.1) states that the sum of the following
|
||||
shall not exceed 30: </p>
|
||||
<ul>
|
||||
<li>If there is a file name, the length of the file name. </li>
|
||||
<li>If there is a file name extension, the length of the file
|
||||
name extension. </li>
|
||||
</ul>
|
||||
<p>On Joliet compliant media, however, the sum as calculated
|
||||
above shall not exceed 128, to allow for longer file identifiers. </p>
|
||||
<p>The above lengths shall be expressed as a number of bytes. </p>
|
||||
<p><b>Maximum Directory Identifier Length Increased</b> </p>
|
||||
<p>Joliet receiving systems shall receive directory hierarchies
|
||||
recorded with file names longer than those allowed by ISO 9660
|
||||
receiving systems. </p>
|
||||
<p>ISO 9660 (Section 7.6.3) states that the length of a directory
|
||||
identifier shall not exceed 31. </p>
|
||||
<p>On Joliet compliant media, however, the length of a directory
|
||||
identifier shall not exceed 128, to allow for longer directory
|
||||
identifiers. </p>
|
||||
<p>The above lengths shall be expressed as a number of bytes. </p>
|
||||
<p> <b>Directory Names May Have File Name Extensions</b> </p>
|
||||
<p>ISO 9660 does not allow directory identifiers to contain file
|
||||
name extensions. </p>
|
||||
<p>On Joliet compliant media, however, directory identifiers may
|
||||
contain file name extensions. </p>
|
||||
<p>The Joliet directory identifier format shall be calculated
|
||||
according to ISO 9660 section 7.5.1 "File Identifier
|
||||
format", with the exception that the length of a directory
|
||||
identifier may exceed 31, but shall not exceed 128. </p>
|
||||
<p>In addition, the Joliet directory identifier format shall
|
||||
comply with ISO 9660 section 7.6.2 "Reserved Directory
|
||||
Identifiers". </p>
|
||||
<p>The directory identifier length shall be calculated according
|
||||
to ISO 9660 section 7.5.2 "File Identifier length". </p>
|
||||
<p>The above lengths shall be expressed as a number of bytes. </p>
|
||||
<p>Maximum Directory Hierarchy Depth May Exceed 8 Levels </p>
|
||||
<p>ISO 9660 (Section 6.8.2.1) specifies restrictions regarding
|
||||
the Depth of Directory Hierarchy. This section of ISO 9660
|
||||
specifies that this number of levels in the hierarchy shall not
|
||||
exceed eight. </p>
|
||||
<p>On Joliet compliant media, however, the number of levels in
|
||||
the hierarchy may exceed eight. </p>
|
||||
<p>Joliet compliant media shall comply with the remainder of ISO
|
||||
9660 section 6.8.2.1, so that for each file recorded, the sum of
|
||||
the following shall not exceed 240: </p>
|
||||
<ul>
|
||||
<li>the length of the file identifier; </li>
|
||||
<li>the length of the directory identifiers of all relevant
|
||||
directories; </li>
|
||||
<li>the number of relevant directories. </li>
|
||||
</ul>
|
||||
<p>The above lengths shall be expressed as a number of bytes. </p>
|
||||
<p><a name="extension"></a><a href="#contents">return to the
|
||||
table of contents</a> </p>
|
||||
<h2>Extensions to Joliet </h2>
|
||||
<h3><a name="multisession"></a>Joliet for Multisession Media </h3>
|
||||
<p>Multisession Recordings are Received </p>
|
||||
<p>When provided with CD-ROM reader hardware with multisession
|
||||
capability, Joliet receiving systems shall receive media recorded
|
||||
using the multisession recording technique. </p>
|
||||
<p>The details of this technique are provided below </p>
|
||||
<p><b>Logical Sector Addressing on Multisession Recordings</b> </p>
|
||||
<p>Each sector on the media is assigned a unique Logical Sector
|
||||
Address. </p>
|
||||
<p>Logical Sector Addresses zero and above increase linearly
|
||||
across the surface of the disc, regardless of session boundaries. </p>
|
||||
<p>Logical Sector Address zero references the sector with
|
||||
Minute:Second:Frame address 00:02:00 in the first session. All
|
||||
other Logical Sector Addresses are relative to
|
||||
Minute:Second:Frame address 00:02:00 in the first session. </p>
|
||||
<p>The conversion between Logical Sector Addresses and
|
||||
Minute:Second:Frame addresses is Logical Sector Address =
|
||||
(((Minute*60)+Seconds)*75) - 150. </p>
|
||||
<p>Simply put, the Logical Sector Address on a multisession disc
|
||||
describes a flat address space. </p>
|
||||
<p> <b>Multisession Addressability</b> </p>
|
||||
<p>The data area for a volume may span multiple sessions. </p>
|
||||
<p>For example, if a disc is recorded with 3 sessions, the
|
||||
directory hierarchy described by a volume descriptor in session 3
|
||||
may reference logical sectors recorded in session 1, 2, or 3. </p>
|
||||
<p><b>Multisession Volume Recognition Sequence</b> </p>
|
||||
<p>The Volume Recognition Sequence shall begin at the 16th
|
||||
logical sector of the first track of the last session on the
|
||||
disc. </p>
|
||||
<p>This volume recognition sequence supersedes all other volume
|
||||
recognition sequences on the disc. The interpretation of the
|
||||
Volume Recognition Sequence is otherwise unchanged. </p>
|
||||
<p>For example, consider a disc that contains 3 sessions, where
|
||||
session 1 starts at 00:00:00, session 2 starts at 10:00:00, and
|
||||
session 3 starts at 20:00:00. The Volume Recognition Sequence for
|
||||
this disc would start at Minute:Second:Frame address 20:00:16. </p>
|
||||
<p>This technique is compatible with the CD-Bridge multisession
|
||||
technique. </p>
|
||||
<p><b>Track Modes and Sector Forms</b> </p>
|
||||
<p>The data area for a Joliet volume on a CD-ROM shall be
|
||||
comprised of either Mode 1 or Mode 2 Form 1 sectors. CD-ROM media
|
||||
utilizing the multisession recording techniques outlined above
|
||||
may not contain any Mode 1 sectors anywhere on the media. Mode 1
|
||||
sectors are allowed only on single-session media. </p>
|
||||
<p>Mode 2 Form 2 sectors and CD-Digital Audio tracks may be
|
||||
recorded on the same media as a Joliet volume. In this case, the
|
||||
CD-XA extensions to Joliet may be utilized to identify Mode 2
|
||||
Form 2 extents and CD-Digital Audio extents. </p>
|
||||
<p>CD-Digital Audio tracks may not be recorded in sessions 2 and
|
||||
higher. If any CD-Digital Audio tracks are recorded, all the
|
||||
CD-Digital Audio tracks shall be recorded in the first session. </p>
|
||||
<h3><a name="_Toc305607052"></a><a name="cdxa"></a>CD-XA
|
||||
Extensions to Joliet </h3>
|
||||
<p>CD-ROM discs utilizing the Joliet extensions to ISO 9660 and
|
||||
which also identify mode 2 form 2 extents or CD-Digital Audio
|
||||
extents shall be marked with a CD-ROM XA Label as specified in
|
||||
"System Description CD-XA" section 2.1. </p>
|
||||
<p>The CD-ROM XA Label shall be located at offset 1024 (byte
|
||||
position 1025) in the Joliet Supplementary Volume Descriptor. The
|
||||
identifying signature 'CD-XA001' shall be recorded starting at
|
||||
offset 1024 in the Joliet Supplementary Volume Descriptor. This
|
||||
identifying signature is equivalent to the hex bytes
|
||||
(43)(44)(2D)(58)(41)(30)(30)(31). </p>
|
||||
<p>Mode 2 form 2 extents shall be identified using recording
|
||||
rules outlined in "System Description CD-XA", section
|
||||
2.7. In this case, bit 12 of the Attributes field of the "XA
|
||||
System Use Information" shall be set to one to identify that
|
||||
the file contains mode 2 form 2 sectors. See below for additional
|
||||
information regarding Data Length. </p>
|
||||
<p>CD-Digital Audio extents shall be identified using recording
|
||||
rules outlined in "System Description CD-XA", section
|
||||
2.7. In this case, bit 14 of the Attributes field of the "XA
|
||||
System Use Information" shall be set to one to identify that
|
||||
the file is comprised of an extent of CD-Digital Audio. See below
|
||||
for additional information regarding Data Length. </p>
|
||||
<p>If a file is marked such that either bit 12 is set to one or
|
||||
bit 14 is set to one in the Attributes field of the "XA
|
||||
System Use Information", then the Data Length field of the
|
||||
Directory Record shall be set to 2048 times the number of sectors
|
||||
contained in the extent. </p>
|
||||
<p>See ISO 9660:1988 section 9.1.4. </p>
|
||||
<h3><a name="_Toc305607053"></a><a name="other"></a>Other
|
||||
Extensions to Joliet </h3>
|
||||
<p>The Joliet Extensions to ISO 9660 are designed to coexist with
|
||||
other extensions such as the "System Use Sharing
|
||||
Protocol" and "RockRidge Interchange Protocol".
|
||||
However, these protocols are not an integral part of the Joliet
|
||||
specification. </p>
|
||||
<p>The method used to integrate these other protocols into Joliet
|
||||
is not defined here. </p>
|
||||
<p><a name="bibliography"></a><a href="#contents">return to the
|
||||
table of contents</a> </p>
|
||||
<h2>Bibliography </h2>
|
||||
<p><u>ISO 2022 - <i>Information processing </i>- ISO 7-bit and
|
||||
8-bit coded character sets - Code extension techniques</u>,
|
||||
International Organization for Standardization, </p>
|
||||
<p><u>ISO 9660 - <i>Information processing </i>- Volume and file
|
||||
structure of CD-ROM for information interchange</u>,
|
||||
International Organization for Standardization, 1988-04-15 </p>
|
||||
<p><u>ISO 10149 : 1989 (E) - <i>Information technology</i> - Data
|
||||
interchange on read-only 120mm optical data discs (CD-ROM)
|
||||
"YellowBook", </u>International Organization for
|
||||
Standardization, 1989-09-01 </p>
|
||||
<p><u>ISO 10646 - Information technology - Universal
|
||||
Multiple-Octet Coded Character Sets (UCS)</u>, International
|
||||
Organization for Standardization, </p>
|
||||
<p><u>The Unicode Standard - <i>Worldwide Character Encoding </i>Version
|
||||
1.0,</u> The Unicode Consortium, Addison-Wesley Publishing
|
||||
Company, Inc, 1990-1991 Unicode, Inc., Volume 1 </p>
|
||||
<p><u>Orangebook</u>, N. V. Philips and Sony Corporation,
|
||||
November 1990 </p>
|
||||
<p><u>System Description CD-XA, </u>N. V. Philips and Sony
|
||||
Corporation, March 1991 </p>
|
||||
<p><u>System Use Sharing Protocol</u> </p>
|
||||
<p><u>RockRidge Interchange Protocol</u> </p>
|
||||
<p>
|
||||
<hr>
|
||||
<p><b>Copyright © 1995 Microsoft Corporation unless
|
||||
otherwise specified. All Rights Reserved.<br>
|
||||
</b> </center> </p>
|
||||
</body>
|
||||
</html>
|
||||
140
study/sabre/os/files/FileSystems/LF1.txt
Normal file
@@ -0,0 +1,140 @@
|
||||
Date: Sat, 29 Jun 1996 18:59:41 -0500
|
||||
From: Robert Vandervelde <RVand@SNOWHILL.COM>
|
||||
Subject: Re: Long Filename Structure (Windows 95).
|
||||
|
||||
>Does anybody know how Windows 95 implemented long filenames?
|
||||
>I used Norton Disk Editor to read the directory and found some encrypted
|
||||
>form of the long filenames.
|
||||
>I plan to write a utility that changes the long filenames only.
|
||||
>
|
||||
>Thank you,
|
||||
>;-b Sintar Wirawan at Menur 30 Surabaya - Indonesia
|
||||
>8-@ squid@sby.mega.net.id
|
||||
>
|
||||
>
|
||||
How Windows 95 Stores Long Filenames
|
||||
|
||||
Copyright notice: Taken from PC Magazine, June 25, 1996 by Jeff Prosise
|
||||
|
||||
"Windows 95 stores short filenames the same way DOS and 16-bit windows so.
|
||||
Every file on every disk is accompanied by a 32-byte directory entry that
|
||||
records the name of the file as well as the file's attributes, a date and
|
||||
time stamp, and other information."
|
||||
|
||||
The format of the short directory entry is as follows:
|
||||
|
||||
Offset Description Size
|
||||
0 Filename 8 bytes (ASCII)
|
||||
8 Filename extension 3 bytes (ASCII)
|
||||
11 File attributes 1 byte (encoded)
|
||||
12 reserved 10 bytes
|
||||
22 Time stamp 2 bytes (encoded)
|
||||
24 Date stamp 2 bytes (encoded)
|
||||
26 Starting cluster 2 bytes
|
||||
28 File size 4 bytes
|
||||
|
||||
File attributes byte
|
||||
7: reserved 3: Volume label
|
||||
6: reserved 2: System
|
||||
5: archive 1: Hidden
|
||||
4: subdirectory 0: Read-only
|
||||
|
||||
|
||||
Time stamp byte
|
||||
11-15: Hours (0-23)
|
||||
5-10: Minutes (0-59)
|
||||
0- 4: Seconds divided by 2 (0-29)
|
||||
|
||||
Date stamp byte
|
||||
11-15: Year (relative to 1980)
|
||||
5- 8: Month (1-12)
|
||||
0- 4: Day of month (0-31)
|
||||
|
||||
"Because of compatibility issues, adding long filename support to an
|
||||
operating system that uses 8.3 filenames isn't as expanding directory
|
||||
entries to hold more than 11 characters. ...
|
||||
|
||||
Windows 95's designers devised a clever solution to the problem of
|
||||
supporting long filenames while preserving compatability with previous
|
||||
versions of DOS and Windows applications. ... Through testing, Microsoft
|
||||
found that if a driectory entry is marked with an "impossible" combination
|
||||
of read-only, hidden, system, and volume label attribute bits - that is,
|
||||
if the directory entry's attribute byte holds the value 0Fh - the
|
||||
enumeration functions built inot all existing versions of DOS and pre-95
|
||||
versions of Windows will skip over that directory entry as if it weren't
|
||||
there.
|
||||
|
||||
The solution for Windows 95, then, was to store two names for every file and
|
||||
subdirectory: a short name that's visible to all applications and a long
|
||||
name that's visible only to Windows 95 applications...Short filenames are
|
||||
stored in 8.3 format in conventionl 32-byte directory entries. Windows
|
||||
creates a short filename from a long one by truncating it to six uppercase
|
||||
characters and adding "~1" to the end of the base filename. If there's
|
||||
already another filename with the same first six characters, the number is
|
||||
incremented. The extension is kept the same, and any character that was
|
||||
illegal in earlier versions of Windows and DOS is replaced with an
|
||||
underscore.
|
||||
|
||||
|
||||
Long filenames are stored in specially formatted 32-byte long filename (LFN)
|
||||
directory entries marked with attribute bytes set to 0Fh. For a given
|
||||
file or subdirectory, a group of one or more LFN directory entries
|
||||
immediately precedes the single 8.3 directory entry on the disk. Each LFN
|
||||
directory entry contains up to 13 characters of the long filename, and the
|
||||
OS strings together as many as needed to comprise an entire long filename.
|
||||
|
||||
Filenames are stored in Unicode format, which requires 2 bytes per character
|
||||
as opposed to ASCII's 1 byte. Filename characters are spread among three
|
||||
separate fields: the first 10 bytes (five characters) in length, the second
|
||||
12 bytes (6 characters), and the third 4 bytes (two characters). The lowest
|
||||
five bits of the directory entry's first byte hold a sequence number that
|
||||
identifies the directory entry's position relative to other LFN directory
|
||||
entries associated with the same file. If a long filename requires three
|
||||
LFN directory entries, for example, the sequence number of the first will
|
||||
be 1, that of the second will be 2, and the sequence of the third will be
|
||||
3. Bit 6 of the third entry's first byte is set to 1 to indicate that it's
|
||||
the last entry in the sequence.
|
||||
|
||||
The attribute field appears at the same location in LFN directory entries
|
||||
as in 8.3 directory entries. ... The starting cluster number field also
|
||||
appears at the same location, but in LFN directory entries its value is
|
||||
always 0. The type indicator field also holds 0 in every long filename I've
|
||||
examined, but Adrian King's Inside Windows 95 (Microsoft Press, 1994) says
|
||||
it can also hold a nonzero value indicating that the directory entry
|
||||
contains "class information" for the corresponding file. ... The LFN
|
||||
directory entry's checksum byte holds an eight-bit checksum value computed
|
||||
by adding certain fields of the 8.3 directory entry and performing a
|
||||
modulo 256 operation on the result. Windows 95 uses this checksum to detect
|
||||
orphaned or corrupted LFN directory entries.
|
||||
|
||||
|
||||
Long filename directory entry
|
||||
|
||||
OFFSET DESCRIPTION Size
|
||||
0 Sequence byte 1 byte
|
||||
1 First five characters of LFN 10 bytes
|
||||
11 File attributes 1 byte
|
||||
12 Type indicator 1 byte (always 0??)
|
||||
13 Checksum 1 byte
|
||||
14 Next six characters of LFN 12 bytes
|
||||
26 Starting cluster number 2 bytes (always 0)
|
||||
28 Next two characters of LFN 4 bytes
|
||||
|
||||
*NOTE: The above structure may span up to 31 entries. The last entry will
|
||||
be a standard 8.3 filename directory structure.
|
||||
|
||||
Sequence byte
|
||||
7: apparently unused (always 0)
|
||||
6: 1=final component of this LFN
|
||||
5: apparently unused (always 0)
|
||||
0-4: sequence number (1-31)
|
||||
|
||||
|
||||
|
||||
--------------------------------------------------------
|
||||
Robert Vandervelde + ...that what we have learned and
|
||||
RVand@snowhill.com + truly understood, we discovered
|
||||
Enterprise, AL + ourselves.
|
||||
The Wiregrass + - Richard C. Dorf
|
||||
--------------------------------------------------------
|
||||
|
||||
41
study/sabre/os/files/FileSystems/LF2.txt
Normal file
@@ -0,0 +1,41 @@
|
||||
From: noesis@ucscb.UCSC.EDU (Kyle Anthony York)
|
||||
Newsgroups: comp.os.msdos.programmer
|
||||
Subject: Re: Win95 FAT long file name storage?
|
||||
|
||||
ok, here goes:
|
||||
|
||||
long file names are stored as follows, i've been using direct sector
|
||||
access, so i don't know if findfirst(..) findnext(..) will work.
|
||||
|
||||
the long names are stored as unicode strings in the immediatly preceding
|
||||
entries. the entry attribute byte is 0x0f.
|
||||
|
||||
the long name format is also used whenever a filename has lower case
|
||||
characters, thus preserving case and backwards compatability.
|
||||
|
||||
so...if the name is ``abcdefghijklmnop'' and this is the first entry of a
|
||||
subdirectory:
|
||||
entry[0] = "."
|
||||
entry[1] = ".."
|
||||
entry[2] = "ijklmnop", attribute = 0x0f
|
||||
entry[3] = "abcdefgh", attribute = 0x0f
|
||||
entry[4] = "ABCDEF~1", attribute = normal
|
||||
|
||||
in addition to having the attribute 0x0f, the entry format is:
|
||||
BYTE 0 --- bit 7 = 1 if deleted, 0 if not
|
||||
6 = 1 if last block of extended entry, 0 if not
|
||||
5..0 = extended entry # (1..31)
|
||||
BYTE 1..10 --- first 5 characters in unicode ("abcde" becomes
|
||||
"a", 0, "b", 0, "c", 0, "d", 0, "e", 0
|
||||
BYTE 11 --- attribute (0x0f)
|
||||
BYTE 12 --- ?? unknown ??
|
||||
BYTE 13 --- ?? unknown ??
|
||||
BYTE 14..25 -- next 6 characters (in unicode)
|
||||
BYTE 26..27 --- 0x0000 (first cluster #)
|
||||
BYTE 28..31 --- last 4 characters (in unicode)
|
||||
|
||||
unused bytes are set to 0xff
|
||||
end of string is denoted by 0x00, 0x00
|
||||
|
||||
best o' luck
|
||||
--kyle
|
||||
107
study/sabre/os/files/FileSystems/LF3.txt
Normal file
@@ -0,0 +1,107 @@
|
||||
LONG FILENAMES
|
||||
How does Windows 95 stores LONG FILENAMES?
|
||||
|
||||
This file was worked out by Jozsef Hidasi
|
||||
Hidasi.Jozsef@MTTBBS.hu
|
||||
<EFBFBD>-- [ Contact Info > ] --------------------------------------------------------<2D>
|
||||
If you realize any mistakes, please contact me and let me know, to correct it!
|
||||
Thanks for everybody who helps to make this dox up to date!
|
||||
|
||||
Don's hesitate contact me!
|
||||
|
||||
Jozsef Hidasi
|
||||
E-Mail: Hidasi.Jozsef@MTTBBS.hu
|
||||
FIDO: 2:371/4.13 (At the moment this is my BBS :-) You can write to SysOp?!
|
||||
|
||||
<EFBFBD>-- [ WARNING! > ] ------------------------------------------------------------<2D>
|
||||
This text contains the most info I know at the moment! I'm not responsible for
|
||||
any DATA LOST!
|
||||
"???" Means I don't know what that field means...
|
||||
|
||||
<EFBFBD>-- [ What this doxument about? > ] -------------------------------------------<2D>
|
||||
This document contains some info how Windows'95 stores the long filenames.
|
||||
I don't know How long filenames can be handled by windows but as I calculated
|
||||
a file entry can be 832 bytes long. (See below)
|
||||
|
||||
Windows uses a simple methold to hide a file from DOS, it changes the "file"'s
|
||||
attribute to VolumeLabel. Basicly a disk can have only one VolumeLabel, and
|
||||
this attrib is not used any more as other files! In this way we can make
|
||||
difference between DOS File Rec. (I won't describe it now) and Windows Record.
|
||||
Eighter Dos File and Windows Record are 32 bytes long. (DOS file Record is the
|
||||
main file descriptor, date/time/attrib/etc...)
|
||||
|
||||
Windows Record>
|
||||
OFFSET Count Type Description Remark
|
||||
------------------------------------------------------------------
|
||||
0000h 1 byte Counter -
|
||||
0001h 10 char FileName E1 Entry 1
|
||||
000Bh 1 byte Attrib Always 0Fh
|
||||
000Ch 2 word ??? 0
|
||||
000Eh 12 char FileName E2 Entry 2
|
||||
001Ah 2 word ??? 0
|
||||
001Ch 4 char FileName E3 Entry 3
|
||||
|
||||
Counter:
|
||||
If attrib=0Fh and the counter>64 then Windows Entries will follow:
|
||||
Entry no.: Counter-'@'
|
||||
|
||||
Filename: The FileName is cut in 3 parts... Because of DOS compatibility...
|
||||
???: Reserved or don't know...
|
||||
|
||||
|
||||
Simple Example:
|
||||
Sector 19 ; Don't laught! This is a simple floppy :-)
|
||||
This is a simple DOS filenamed file>
|
||||
00000000: 53 49 4D 50 4C 45 20 20 - 44 4F 53 20 00 03 B8 9D SIMPLE DOS .
|
||||
00000010: 1F 25 1F 25 00 00 B9 9D - 1F 25 00 00 00 00 00 00 %%..<2E><>%......
|
||||
This is the first entry of the new Long filenamed file>
|
||||
(I've created this first and renamed by Windows)
|
||||
(This file is errased because of the filename's first byte is 0E5h)
|
||||
00000020: E5 49 4D 50 4C 45 20 20 - 57 49 4E 20 00 2A C6 9D <20>IMPLE WIN .*Ɲ
|
||||
00000030: 1F 25 1F 25 00 00 C7 9D - 1F 25 00 00 00 00 00 00 %%..ǝ%......
|
||||
|
||||
This is the first windows entry.
|
||||
The Filename's first byte (Counter) is C so 4 entryes will follow ...
|
||||
(One entry can hold 13 characters of the Long Filename...)
|
||||
This entry holds "m e d F i l e "=Filename E1+Filename E2+Filename E3
|
||||
(See bellow)
|
||||
This means that the first entry holds the last characters of the long
|
||||
filename...
|
||||
00000040: 43 6D 00 65 00 64 00 20 - 00 46 00 0F 00 44 69 00 Cm.e.d. .F..Di.
|
||||
00000050: 6C 00 65 00 00 00 FF FF - FF FF 00 00 FF FF FF FF l.e...<2E><><EFBFBD><EFBFBD>..<2E><><EFBFBD><EFBFBD>
|
||||
|
||||
Here is the next entry>
|
||||
Counter=2 means this is the 2nd entry of 3...
|
||||
Holds: " a L o n g F i l e n a"
|
||||
00000060: 02 20 00 61 00 20 00 4C - 00 6F 00 0F 00 44 6E 00 .a. .L.o..Dn.
|
||||
00000070: 64 00 46 00 69 00 6C 00 - 65 00 00 00 6E 00 61 00 g.F.i.l.e...n.a.
|
||||
|
||||
Here is the next entry>
|
||||
Counter=1 means this is the 1nd entry of 3...
|
||||
Holds: "S i m p l e . w i n i s "
|
||||
00000080: 01 53 00 69 00 6D 00 70 - 00 6C 00 0F 00 44 65 S.i.m.p.l..De.
|
||||
00000090: 2E 00 77 00 69 00 6E 00 - 20 00 00 00 69 00 73 00 ..w.i.n. ...i.s.
|
||||
|
||||
Here is the next entry>
|
||||
Counter="0" means this is the 1nd entry of 3...
|
||||
This entry holds the file's genereal info like date/time/attrib/length/etc...
|
||||
and DOS filename...
|
||||
000000A0: 53 49 4D 50 4C 45 7E 31 - 57 49 4E 20 00 2A C6 9D SIMPLE~1WIN .*Ɲ
|
||||
000000B0: 1F 25 1F 25 00 00 C7 9D - 1F 25 00 00 00 00 00 00 %%..ǝ%......
|
||||
--------------------------
|
||||
Summary>
|
||||
DOS Filename: simple~1.win
|
||||
Long Filename: Simple.win is a LongFilenamed file...
|
||||
|
||||
* A chracter ha a 0 after all characters, i don't know why!
|
||||
|
||||
The followings are empty:
|
||||
000000C0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
|
||||
000000D0: 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
|
||||
|
||||
Well, that's all i can write now, but it's quite hard to explain how does
|
||||
this work! Write me a letter instead... :-)
|
||||
|
||||
<EFBFBD>-- [ End of document > ] -----------------------------------------------------<2D>
|
||||
Best wishes,
|
||||
Hidi...
|
||||
BIN
study/sabre/os/files/FileSystems/LongFileName.pdf
Normal file
392
study/sabre/os/files/FileSystems/VFATInfo.txt
Normal file
@@ -0,0 +1,392 @@
|
||||
NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM
|
||||
----------------------------------------------------------------------
|
||||
(This documentation was provided by Galen C. Hunt <gchunt@cs.rochester.edu>
|
||||
and lightly annotated by Gordon Chaffee).
|
||||
|
||||
This document presents a very rough, technical overview of my
|
||||
knowledge of the extended FAT file system used in Windows NT 3.5 and
|
||||
Windows 95. I don't guarantee that any of the following is correct,
|
||||
but it appears to be so.
|
||||
|
||||
The extended FAT file system is almost identical to the FAT
|
||||
file system used in DOS versions up to and including 6.223410239847
|
||||
:-). The significant change has been the addition of long file names.
|
||||
Theses names support up to 255 characters including spaces and lower
|
||||
case characters as opposed to the traditional 8.3 short names.
|
||||
|
||||
Here is the description of the traditional FAT entry in the current
|
||||
Windows 95 filesystem:
|
||||
|
||||
struct directory { // Short 8.3 names
|
||||
unsigned char name[8]; // file name
|
||||
unsigned char ext[3]; // file extension
|
||||
unsigned char attr; // attribute byte
|
||||
unsigned char lcase; // Case for base and extension
|
||||
unsigned char ctime_ms; // Creation time, milliseconds
|
||||
unsigned char ctime[2]; // Creation time
|
||||
unsigned char cdate[2]; // Creation date
|
||||
unsigned char adate[2]; // Last access date
|
||||
unsigned char reserved[2]; // reserved values (ignored)
|
||||
unsigned char time[2]; // time stamp
|
||||
unsigned char date[2]; // date stamp
|
||||
unsigned char start[2]; // starting cluster number
|
||||
unsigned char size[4]; // size of the file
|
||||
};
|
||||
|
||||
The lcase field specifies if the base and/or the extension of an 8.3
|
||||
name should be capitalized. This field does not seem to be used by
|
||||
Windows 95 but it is used by Windows NT. The case of filenames is not
|
||||
completely compatible from Windows NT to Windows 95. It is not completely
|
||||
compatible in the reverse direction, however. Filenames that fit in
|
||||
the 8.3 namespace and are written on Windows NT to be lowercase will
|
||||
show up as uppercase on Windows 95.
|
||||
|
||||
Note that the "start" and "size" values are actually little
|
||||
endian integer values. The descriptions of the fields in this
|
||||
structure are public knowledge and can be found elsewhere.
|
||||
|
||||
With the extended FAT system, Microsoft has inserted extra
|
||||
directory entries for any files with extended names. (Any name which
|
||||
legally fits within the old 8.3 encoding scheme does not have extra
|
||||
entries.) I call these extra entries slots. Basically, a slot is a
|
||||
specially formatted directory entry which holds up to 13 characters of
|
||||
a files extended name. Think of slots as additional labeling for the
|
||||
directory entry of the file to which they correspond. Microsoft
|
||||
prefers to refer to the 8.3 entry for a file as its alias and the
|
||||
extended slot directory entries as the file name.
|
||||
|
||||
The C structure for a slot directory entry follows:
|
||||
|
||||
struct slot { // Up to 13 characters of a long name
|
||||
unsigned char id; // sequence number for slot
|
||||
unsigned char name0_4[10]; // first 5 characters in name
|
||||
unsigned char attr; // attribute byte
|
||||
unsigned char reserved; // always 0
|
||||
unsigned char alias_checksum; // checksum for 8.3 alias
|
||||
unsigned char name5_10[12]; // 6 more characters in name
|
||||
unsigned char start[2]; // starting cluster number
|
||||
unsigned char name11_12[4]; // last 2 characters in name
|
||||
};
|
||||
|
||||
If the layout of the slots looks a little odd, it's only
|
||||
because of Microsoft's efforts to maintain compatibility with old
|
||||
software. The slots must be disguised to prevent old software from
|
||||
panicing. To this end, a number of measures are taken:
|
||||
|
||||
1) The attribute byte for a slot directory entry is always set
|
||||
to 0x0f. This corresponds to an old directory entry with
|
||||
attributes of "hidden", "system", "read-only", and "volume
|
||||
label". Most old software will ignore any directory
|
||||
entries with the "volume label" bit set. Real volume label
|
||||
entries don't have the other three bits set.
|
||||
|
||||
2) The starting cluster is always set to 0, an impossible
|
||||
value for a DOS file.
|
||||
|
||||
Because the extended FAT system is backward compatible, it is
|
||||
possible for old software to modify directory entries. Measures must
|
||||
be taken to insure the validity of slots. An extended FAT system can
|
||||
verify that a slot does in fact belong to an 8.3 directory entry by
|
||||
the following:
|
||||
|
||||
1) Positioning. Slots for a file always immediately proceed
|
||||
their corresponding 8.3 directory entry. In addition, each
|
||||
slot has an id which marks its order in the extended file
|
||||
name. Here is a very abbreviated view of an 8.3 directory
|
||||
entry and its corresponding long name slots for the file
|
||||
"My Big File.Extension which is long":
|
||||
|
||||
<proceeding files...>
|
||||
<slot #3, id = 0x43, characters = "h is long">
|
||||
<slot #2, id = 0x02, characters = "xtension whic">
|
||||
<slot #1, id = 0x01, characters = "My Big File.E">
|
||||
<directory entry, name = "MYBIGFIL.EXT">
|
||||
|
||||
Note that the slots are stored from last to first. Slots
|
||||
are numbered from 1 to N. The Nth slot is or'ed with 0x40
|
||||
to mark it as the last one.
|
||||
|
||||
2) Checksum. Each slot has an "alias_checksum" value. The
|
||||
checksum is calculated from the 8.3 name using the
|
||||
following algorithm:
|
||||
|
||||
for (sum = i = 0; i < 11; i++) {
|
||||
sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i]
|
||||
}
|
||||
|
||||
3) If there is in the final slot, a Unicode NULL (0x0000) is stored
|
||||
after the final character. After that, all unused characters in
|
||||
the final slot are set to Unicode 0xFFFF.
|
||||
|
||||
Finally, note that the extended name is stored in Unicode. Each Unicode
|
||||
character takes two bytes.
|
||||
|
||||
|
||||
NOTES ON UNICODE TRANSLATION IN VFAT FILESYSTEM
|
||||
----------------------------------------------------------------------
|
||||
(Information provided by Steve Searle <steve@mgu.bath.ac.uk>)
|
||||
|
||||
Char used as Char(s) used Char(s) used in Entries which have
|
||||
filename in shortname longname slot been corrected
|
||||
0x80 (128) 0x80 0xC7
|
||||
0x81 (129) 0x9A 0xFC
|
||||
0x82 (130) 0x90 0xE9 E
|
||||
0x83 (131) 0xB6 0xE2 E
|
||||
0x84 (132) 0x8E 0xE4 E
|
||||
0x85 (133) 0xB7 0xE0 E
|
||||
0x86 (134) 0x8F 0xE5 E
|
||||
0x87 (135) 0x80 0xE7 E
|
||||
0x88 (136) 0xD2 0xEA E
|
||||
0x89 (137) 0xD3 0xEB E
|
||||
0x8A (138) 0xD4 0xE8 E
|
||||
0x8B (139) 0xD8 0xEF E
|
||||
0x8C (140) 0xD7 0xEE E
|
||||
0x8D (141) 0xDE 0xEC E
|
||||
0x8E (142) 0x8E 0xC4 E
|
||||
0x8F (143) 0x8F 0xC5 E
|
||||
0x90 (144) 0x90 0xC9 E
|
||||
0x91 (145) 0x92 0xE6 E
|
||||
0x92 (146) 0x92 0xC6 E
|
||||
0x93 (147) 0xE2 0xF4 E
|
||||
0x94 (148) 0x99 0xF6
|
||||
0x95 (149) 0xE3 0xF2
|
||||
0x96 (150) 0xEA 0xFB
|
||||
0x97 (151) 0xEB 0xF9
|
||||
0x98 (152) "_~1" 0xFF
|
||||
0x99 (153) 0x99 0xD6
|
||||
0x9A (154) 0x9A 0xDC
|
||||
0x9B (155) 0x9D 0xF8
|
||||
0x9C (156) 0x9C 0xA3
|
||||
0x9D (157) 0x9D 0xD8
|
||||
0x9E (158) 0x9E 0xD7
|
||||
0x9F (159) 0x9F 0x92
|
||||
0xA0 (160) 0xB5 0xE1
|
||||
0xA1 (161) 0xD6 0xE0
|
||||
0xA2 (162) 0xE0 0xF3
|
||||
0xA3 (163) 0xE9 0xFA
|
||||
0xA4 (164) 0xA5 0xF1
|
||||
0xA5 (165) 0xA5 0xD1
|
||||
0xA6 (166) 0xA6 0xAA
|
||||
0xA7 (167) 0xA7 0xBA
|
||||
0xA8 (168) 0xA8 0xBF
|
||||
0xA9 (169) 0xA9 0xAE
|
||||
0xAA (170) 0xAA 0xAC
|
||||
0xAB (171) 0xAB 0xBD
|
||||
0xAC (172) 0xAC 0xBC
|
||||
0xAD (173) 0xAD 0xA1
|
||||
0xAE (174) 0xAE 0xAB
|
||||
0xAF (175) 0xAF 0xBB
|
||||
0xB0 (176) 0xB0 0x91 0x25
|
||||
0xB1 (177) 0xB1 0x92 0x25
|
||||
0xB2 (178) 0xB2 0x93 0x25
|
||||
0xB3 (179) 0xB3 0x02 0x25
|
||||
0xB4 (180) 0xB4 0x24 0x25
|
||||
0xB5 (181) 0xB5 0xC1
|
||||
0xB6 (182) 0xB6 0xC2
|
||||
0xB7 (183) 0xB7 0xC0
|
||||
0xB8 (184) 0xB8 0xA9
|
||||
0xB9 (185) 0xB9 0x63 0x25
|
||||
0xBA (186) 0xBA 0x51 0x25
|
||||
0xBB (187) 0xBB 0x57 0x25
|
||||
0xBC (188) 0xBC 0x5D 0x25
|
||||
0xBD (189) 0xBD 0xA2
|
||||
0xBE (190) 0xBE 0xA5
|
||||
0xBF (191) 0xBF 0x10 0x25
|
||||
0xC0 (192) 0xC0 0x14 0x25
|
||||
0xC1 (193) 0xC1 0x34 0x25
|
||||
0xC2 (194) 0xC2 0x2C 0x25
|
||||
0xC3 (195) 0xC3 0x1C 0x25
|
||||
0xC4 (196) 0xC4 0x00 0x25
|
||||
0xC5 (197) 0xC5 0x3C 0x25
|
||||
0xC6 (198) 0xC7 0xE3 E
|
||||
0xC7 (199) 0xC7 0xC3
|
||||
0xC8 (200) 0xC8 0x5A 0x25 E
|
||||
0xC9 (201) 0xC9 0x54 0x25 E
|
||||
0xCA (202) 0xCA 0x69 0x25 E
|
||||
0xCB (203) 0xCB 0x66 0x25 E
|
||||
0xCC (204) 0xCC 0x60 0x25 E
|
||||
0xCD (205) 0xCD 0x50 0x25 E
|
||||
0xCE (206) 0xCE 0x6C 0x25 E
|
||||
0xCF (207) 0xCF 0xA4 E
|
||||
0xD0 (208) 0xD1 0xF0
|
||||
0xD1 (209) 0xD1 0xD0
|
||||
0xD2 (210) 0xD2 0xCA
|
||||
0xD3 (211) 0xD3 0xCB
|
||||
0xD4 (212) 0xD4 0xC8
|
||||
0xD5 (213) 0x49 0x31 0x01
|
||||
0xD6 (214) 0xD6 0xCD
|
||||
0xD7 (215) 0xD7 0xCE
|
||||
0xD8 (216) 0xD8 0xCF
|
||||
0xD9 (217) 0xD9 0x18 0x25
|
||||
0xDA (218) 0xDA 0x0C 0x25
|
||||
0xDB (219) 0xDB 0x88 0x25
|
||||
0xDC (220) 0xDC 0x84 0x25
|
||||
0xDD (221) 0xDD 0xA6
|
||||
0xDE (222) 0xDE 0xCC
|
||||
0xDF (223) 0xDF 0x80 0x25
|
||||
0xE0 (224) 0xE0 0xD3
|
||||
0xE1 (225) 0xE1 0xDF
|
||||
0xE2 (226) 0xE2 0xD4
|
||||
0xE3 (227) 0xE3 0xD2
|
||||
0xE4 (228) 0x05 0xF5
|
||||
0xE5 (229) 0x05 0xD5
|
||||
0xE6 (230) 0xE6 0xB5
|
||||
0xE7 (231) 0xE8 0xFE
|
||||
0xE8 (232) 0xE8 0xDE
|
||||
0xE9 (233) 0xE9 0xDA
|
||||
0xEA (234) 0xEA 0xDB
|
||||
0xEB (235) 0xEB 0xD9
|
||||
0xEC (236) 0xED 0xFD
|
||||
0xED (237) 0xED 0xDD
|
||||
0xEE (238) 0xEE 0xAF
|
||||
0xEF (239) 0xEF 0xB4
|
||||
0xF0 (240) 0xF0 0xAD
|
||||
0xF1 (241) 0xF1 0xB1
|
||||
0xF2 (242) 0xF2 0x17 0x20
|
||||
0xF3 (243) 0xF3 0xBE
|
||||
0xF4 (244) 0xF4 0xB6
|
||||
0xF5 (245) 0xF5 0xA7
|
||||
0xF6 (246) 0xF6 0xF7
|
||||
0xF7 (247) 0xF7 0xB8
|
||||
0xF8 (248) 0xF8 0xB0
|
||||
0xF9 (249) 0xF9 0xA8
|
||||
0xFA (250) 0xFA 0xB7
|
||||
0xFB (251) 0xFB 0xB9
|
||||
0xFC (252) 0xFC 0xB3
|
||||
0xFD (253) 0xFD 0xB2
|
||||
0xFE (254) 0xFE 0xA0 0x25
|
||||
0xFF (255) 0xFF 0xA0
|
||||
|
||||
|
||||
Page 0
|
||||
0x80 (128) 0x00
|
||||
0x81 (129) 0x00
|
||||
0x82 (130) 0x00
|
||||
0x83 (131) 0x00
|
||||
0x84 (132) 0x00
|
||||
0x85 (133) 0x00
|
||||
0x86 (134) 0x00
|
||||
0x87 (135) 0x00
|
||||
0x88 (136) 0x00
|
||||
0x89 (137) 0x00
|
||||
0x8A (138) 0x00
|
||||
0x8B (139) 0x00
|
||||
0x8C (140) 0x00
|
||||
0x8D (141) 0x00
|
||||
0x8E (142) 0x00
|
||||
0x8F (143) 0x00
|
||||
0x90 (144) 0x00
|
||||
0x91 (145) 0x00
|
||||
0x92 (146) 0x00
|
||||
0x93 (147) 0x00
|
||||
0x94 (148) 0x00
|
||||
0x95 (149) 0x00
|
||||
0x96 (150) 0x00
|
||||
0x97 (151) 0x00
|
||||
0x98 (152) 0x00
|
||||
0x99 (153) 0x00
|
||||
0x9A (154) 0x00
|
||||
0x9B (155) 0x00
|
||||
0x9C (156) 0x00
|
||||
0x9D (157) 0x00
|
||||
0x9E (158) 0x00
|
||||
0x9F (159) 0x92
|
||||
0xA0 (160) 0xFF
|
||||
0xA1 (161) 0xAD
|
||||
0xA2 (162) 0xBD
|
||||
0xA3 (163) 0x9C
|
||||
0xA4 (164) 0xCF
|
||||
0xA5 (165) 0xBE
|
||||
0xA6 (166) 0xDD
|
||||
0xA7 (167) 0xF5
|
||||
0xA8 (168) 0xF9
|
||||
0xA9 (169) 0xB8
|
||||
0xAA (170) 0x00
|
||||
0xAB (171) 0xAE
|
||||
0xAC (172) 0xAA
|
||||
0xAD (173) 0xF0
|
||||
0xAE (174) 0x00
|
||||
0xAF (175) 0xEE
|
||||
0xB0 (176) 0xF8
|
||||
0xB1 (177) 0xF1
|
||||
0xB2 (178) 0xFD
|
||||
0xB3 (179) 0xFC
|
||||
0xB4 (180) 0xEF
|
||||
0xB5 (181) 0xE6
|
||||
0xB6 (182) 0xF4
|
||||
0xB7 (183) 0xFA
|
||||
0xB8 (184) 0xF7
|
||||
0xB9 (185) 0xFB
|
||||
0xBA (186) 0x00
|
||||
0xBB (187) 0xAF
|
||||
0xBC (188) 0xAC
|
||||
0xBD (189) 0xAB
|
||||
0xBE (190) 0xF3
|
||||
0xBF (191) 0x00
|
||||
0xC0 (192) 0xB7
|
||||
0xC1 (193) 0xB5
|
||||
0xC2 (194) 0xB6
|
||||
0xC3 (195) 0xC7
|
||||
0xC4 (196) 0x8E
|
||||
0xC5 (197) 0x8F
|
||||
0xC6 (198) 0x92
|
||||
0xC7 (199) 0x80
|
||||
0xC8 (200) 0xD4
|
||||
0xC9 (201) 0x90
|
||||
0xCA (202) 0xD2
|
||||
0xCB (203) 0xD3
|
||||
0xCC (204) 0xDE
|
||||
0xCD (205) 0xD6
|
||||
0xCE (206) 0xD7
|
||||
0xCF (207) 0xD8
|
||||
0xD0 (208) 0x00
|
||||
0xD1 (209) 0xA5
|
||||
0xD2 (210) 0xE3
|
||||
0xD3 (211) 0xE0
|
||||
0xD4 (212) 0xE2
|
||||
0xD5 (213) 0xE5
|
||||
0xD6 (214) 0x99
|
||||
0xD7 (215) 0x9E
|
||||
0xD8 (216) 0x9D
|
||||
0xD9 (217) 0xEB
|
||||
0xDA (218) 0xE9
|
||||
0xDB (219) 0xEA
|
||||
0xDC (220) 0x9A
|
||||
0xDD (221) 0xED
|
||||
0xDE (222) 0xE8
|
||||
0xDF (223) 0xE1
|
||||
0xE0 (224) 0x85, 0xA1
|
||||
0xE1 (225) 0xA0
|
||||
0xE2 (226) 0x83
|
||||
0xE3 (227) 0xC6
|
||||
0xE4 (228) 0x84
|
||||
0xE5 (229) 0x86
|
||||
0xE6 (230) 0x91
|
||||
0xE7 (231) 0x87
|
||||
0xE8 (232) 0x8A
|
||||
0xE9 (233) 0x82
|
||||
0xEA (234) 0x88
|
||||
0xEB (235) 0x89
|
||||
0xEC (236) 0x8D
|
||||
0xED (237) 0x00
|
||||
0xEE (238) 0x8C
|
||||
0xEF (239) 0x8B
|
||||
0xF0 (240) 0xD0
|
||||
0xF1 (241) 0xA4
|
||||
0xF2 (242) 0x95
|
||||
0xF3 (243) 0xA2
|
||||
0xF4 (244) 0x93
|
||||
0xF5 (245) 0xE4
|
||||
0xF6 (246) 0x94
|
||||
0xF7 (247) 0xF6
|
||||
0xF8 (248) 0x9B
|
||||
0xF9 (249) 0x97
|
||||
0xFA (250) 0xA3
|
||||
0xFB (251) 0x96
|
||||
0xFC (252) 0x81
|
||||
0xFD (253) 0xEC
|
||||
0xFE (254) 0xE7
|
||||
0xFF (255) 0x98
|
||||
|
||||
|
||||
|
||||
162
study/sabre/os/files/FileSystems/bfs-structure.html
Normal file
@@ -0,0 +1,162 @@
|
||||
<html>
|
||||
<head><title>The BFS filesystem structure</title></head>
|
||||
<body>
|
||||
|
||||
<center><h1>The BFS filesystem structure</h1></center>
|
||||
The UnixWare Boot FileSystem (BFS) is a filesystem used in SCO UnixWare.
|
||||
It contains all files necessary for UnixWare boot procedures (such as
|
||||
<tt>unix</tt>).
|
||||
Because the object of the bfs filesystem type is to allow quick and
|
||||
simple booting, BFS was designed as a contiguous flat filesystem. It
|
||||
is not intended to support general users. The only directory bfs
|
||||
supports is the root directory. Users can create only regular files;
|
||||
no directories or special files can be created in the bfs filesystem.<p>
|
||||
|
||||
A BFS filesystem consists of three parts:
|
||||
<ul>
|
||||
<li> Superblock
|
||||
<li> Inodes
|
||||
<li> Data area
|
||||
</ul>
|
||||
Each block on disk is 512 bytes long, blocks are numbered from zero. Most
|
||||
data structures use "offset from begining of disk". Divide this number to
|
||||
get block number.<p>
|
||||
|
||||
<b>NOTE:</b> Operations on a BFS filesystem in SCO UnixWare severely limited.
|
||||
For example, it is not possible to have two files open for writing
|
||||
simultaneously. These restrictions do not
|
||||
apply to operations involving only the reading of files.<p>
|
||||
|
||||
You can read a BFS filesystem from your Linux box. See
|
||||
<A href="http://www.penguin.cz/~mhi/fs/bfs/">BFS Linux module home page</a>.
|
||||
<p>
|
||||
|
||||
<h2>The BFS superblock</h2>
|
||||
|
||||
The superblock is at the begining of disk, block 0.
|
||||
|
||||
<table border=1>
|
||||
<tr><th>Type
|
||||
<th>Name
|
||||
<th>Description
|
||||
<tr><td>32bit int
|
||||
<td>magic
|
||||
<td>Magic number (0x1BADFACE)
|
||||
<tr><td>32bit int
|
||||
<td>start
|
||||
<td>Start of data blocks (in bytes)
|
||||
<tr><td>32bit int
|
||||
<td>size
|
||||
<td>Size of filesystem (in bytes)
|
||||
<tr><td>4x 32bit int
|
||||
<td>sanity words
|
||||
<td>Sanity words are used to recover filesystem after interrupted
|
||||
<A href="#compaction">compaction</a>. They are usually 0xFFFFFFFF.
|
||||
</table>
|
||||
|
||||
|
||||
<h2>BFS inodes</h2>
|
||||
The inode contains all the information about a file except its name.
|
||||
Filenames are kept in the root directory, the only directory in the
|
||||
BFS filesystem. An inode is 64 bytes long. Inode table starts at
|
||||
block number 1 and fills the space between superblock and first data
|
||||
block (usually root directory). First inode has number 2.
|
||||
|
||||
<table border=1>
|
||||
<tr><th>Type
|
||||
<th>Name
|
||||
<th>Description
|
||||
<tr><td>32bit int
|
||||
<td>inode number
|
||||
<td>Inode number, often contains "garbage" in high 16 bits.
|
||||
<tr><td>32bit int
|
||||
<td>first block
|
||||
<td>First block of file. Next block is n+1, n+2, ... n+x.
|
||||
<tr><td>32bit int
|
||||
<td>Last block
|
||||
<td>Last block of file
|
||||
<tr><td>32bit int
|
||||
<td>offset to eof
|
||||
<td>Disk offset to end of file (in bytes)
|
||||
<tr><td>32bit int
|
||||
<td>Attributes
|
||||
<td>File attributes (1 = regular file, 2 = directory)
|
||||
<tr><td>32bit int
|
||||
<td>mode
|
||||
<td>File mode, rwxrwxrwx (only low 9 bits used)
|
||||
<tr><td>32bit int
|
||||
<td>uid
|
||||
<td>File owner - user id
|
||||
<tr><td>32bit int
|
||||
<td>gid
|
||||
<td>File owner - group id
|
||||
<tr><td>32bit int
|
||||
<td>nlinks
|
||||
<td>Hard link count
|
||||
<tr><td>32bit int
|
||||
<td>atime
|
||||
<td>Access time
|
||||
<tr><td>32bit int
|
||||
<td>mtime
|
||||
<td>Modify time
|
||||
<tr><td>32bit int
|
||||
<td>ctime
|
||||
<td>Create time
|
||||
<tr><td>4x 32bit int
|
||||
<td>spare
|
||||
<td>Unused, should be zero
|
||||
</table>
|
||||
The number of inodes is defined when mkfs is used to create the filesystem.
|
||||
<p>
|
||||
|
||||
<h2>BFS storage blocks</h2>
|
||||
The remainder of the space allocated to the filesystem is taken up by
|
||||
data blocks. The storage blocks store the root directory and
|
||||
the regular files. For a regular file, the storage blocks contain the
|
||||
contents of the file. For the root directory, the storage blocks
|
||||
contain 16-byte entries.
|
||||
<table border=1>
|
||||
<tr><th>Type
|
||||
<th>Name
|
||||
<th>Description
|
||||
<tr><td>16bit int
|
||||
<td>inode
|
||||
<td>File inode number
|
||||
<tr><td>14 8bit characters
|
||||
<td>name
|
||||
<td>File name
|
||||
</table>
|
||||
The root directory *MUST* begin with two entries "." and "..", both with
|
||||
inode number 2 (root directory).
|
||||
<p>
|
||||
|
||||
<h2>Managing BFS data blocks</h2>
|
||||
The data or storage blocks for a file are allocated contiguously. The
|
||||
data block after the last data block used in the filesystem is
|
||||
considered the next data block available to store a file. When a file
|
||||
is deleted, its data blocks are released.<p>
|
||||
|
||||
<A name="compaction"><h2>Compaction</h2></a>
|
||||
Compaction is a way of recovering data blocks by shifting files until
|
||||
the gaps left behind by deleted files are eliminated. This operation
|
||||
can be expensive, but it is necessary because of the method used by
|
||||
BFS to store and delete files.
|
||||
You need to perform compaction when either of the following situations occurs:
|
||||
<ul>
|
||||
<li> The system has reached the end of the filesystem, and there are
|
||||
still free blocks available.
|
||||
<li> The system deletes a large file and the file after it on disk is
|
||||
small and is the last file in the filesystem. (Small files are
|
||||
files of no more than ten blocks; large files are files of 500 or
|
||||
more blocks.)
|
||||
</ul>
|
||||
|
||||
<h2>Related links</h2>
|
||||
<A href="http://www.penguin.cz/~mhi/fs/bfs/">BFS Linux module</a><br>
|
||||
<A href="http://www.sco.com/">SCO homepage</a><br>
|
||||
<A href="http://www.penguin.cz/~mhi/fs/">Filesystems HOWTO</a><br>
|
||||
|
||||
<hr>
|
||||
<center><i>Copyright (c) 1999 Martin Hinner,
|
||||
<A href="mailto:mhi@penguin.cz">mhi@penguin.cz</a></i></center>
|
||||
</body>
|
||||
BIN
study/sabre/os/files/FileSystems/darmstadt-GFS.pdf
Normal file
BIN
study/sabre/os/files/FileSystems/dimp-ext2/ext2-dir.gif
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
study/sabre/os/files/FileSystems/dimp-ext2/ext2-inode.gif
Normal file
|
After Width: | Height: | Size: 3.1 KiB |
BIN
study/sabre/os/files/FileSystems/dimp-ext2/ext2-vfs.gif
Normal file
|
After Width: | Height: | Size: 4.7 KiB |
901
study/sabre/os/files/FileSystems/dimp-ext2/index.html
Normal file
@@ -0,0 +1,901 @@
|
||||
<!-- X-URL: http://www.mit.edu/~tytso/linux/ext2intro.html -->
|
||||
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>Design and Implementation of the Second Extended Filesystem</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
|
||||
<P>This paper was first published in the Proceedings of the First Dutch
|
||||
International Symposium on Linux, ISBN 90-367-0385-9.</P>
|
||||
|
||||
<HR>
|
||||
|
||||
<H2>Design and Implementation of the Second Extended Filesystem</H2>
|
||||
|
||||
<H4>R<EFBFBD>my Card, Laboratoire MASI--Institut Blaise Pascal,
|
||||
E-Mail: card@masi.ibp.fr, and<BR>
|
||||
Theodore Ts'o, Massachussets Institute of Technology,
|
||||
E-Mail: tytso@mit.edu, and<BR>
|
||||
Stephen Tweedie, University of Edinburgh,
|
||||
E-Mail: sct@dcs.ed.ac.uk</H4>
|
||||
|
||||
<H3>Introduction</H3>
|
||||
|
||||
<P>Linux is a Unix-like operating system, which runs on PC-386
|
||||
computers. It was implemented first as extension to the Minix
|
||||
operating system <A href="#minix">[Tanenbaum 1987]</A> and its
|
||||
first versions included support for the Minix filesystem only.
|
||||
The Minix filesystem contains two serious limitations: block
|
||||
addresses are stored in 16 bit integers, thus the maximal
|
||||
filesystem size is restricted to 64 mega bytes, and directories
|
||||
contain fixed-size entries and the maximal file name is 14
|
||||
characters.
|
||||
|
||||
<P>We have designed and implemented two new filesystems that are
|
||||
included in the standard Linux kernel. These filesystems,
|
||||
called ``Extended File System'' (Ext fs) and ``Second Extended
|
||||
File System'' (Ext2 fs) raise the limitations and add new
|
||||
features.
|
||||
|
||||
<P>In this paper, we describe the history of Linux filesystems. We
|
||||
briefly introduce the fundamental concepts implemented in Unix
|
||||
filesystems. We present the implementation of the Virtual File
|
||||
System layer in Linux and we detail the Second Extended File
|
||||
System kernel code and user mode tools. Last, we present
|
||||
performance measurements made on Linux and BSD filesystems and
|
||||
we conclude with the current status of Ext2fs and the future
|
||||
directions.
|
||||
|
||||
<H3>History of Linux filesystems</H3>
|
||||
|
||||
<P>In its very early days, Linux was cross-developed under the
|
||||
Minix operating system. It was easier to share disks between
|
||||
the two systems than to design a new filesystem, so Linus
|
||||
Torvalds decided to implement support for the Minix filesystem
|
||||
in Linux. The Minix filesystem was an efficient and relatively
|
||||
bug-free piece of software.
|
||||
|
||||
<P>However, the restrictions in the design of the Minix
|
||||
filesystem were too limiting, so people started thinking and
|
||||
working on the implementation of new filesystems in Linux.
|
||||
|
||||
<P>In order to ease the addition of new filesystems into the
|
||||
Linux kernel, a Virtual File System (VFS) layer was developed.
|
||||
The VFS layer was initially written by Chris Provenzano, and
|
||||
later rewritten by Linus Torvalds before it was integrated into
|
||||
the Linux kernel. It is described in <A href="#section:vfs">The Virtual File System</A>.
|
||||
|
||||
<P>After the integration of the VFS in the kernel, a new
|
||||
filesystem, called the ``Extended File System'' was implemented
|
||||
in April 1992 and added to Linux 0.96c. This new filesystem
|
||||
removed the two big Minix limitations: its maximal size was 2
|
||||
giga bytes and the maximal file name size was 255 characters.
|
||||
It was an improvement over the Minix filesystem but some
|
||||
problems were still present in it. There was no support for the
|
||||
separate access, inode modification, and data modification
|
||||
timestamps. The filesystem used linked lists to keep track of
|
||||
free blocks and inodes and this produced bad performances: as
|
||||
the filesystem was used, the lists became unsorted and the
|
||||
filesystem became fragmented.
|
||||
|
||||
<P>As a response to these problems, two new filesytems were
|
||||
released in Alpha version in January 1993: the Xia filesystem
|
||||
and the Second Extended File System. The Xia filesystem was
|
||||
heavily based on the Minix filesystem kernel code and only
|
||||
added a few improvements over this filesystem. Basically, it
|
||||
provided long file names, support for bigger partitions and
|
||||
support for the three timestamps. On the other hand, Ext2fs was
|
||||
based on the Extfs code with many reorganizations and many
|
||||
improvements. It had been designed with evolution in mind and
|
||||
contained space for future improvements. It will be described
|
||||
with more details in <A href="#section:ext2fs">The Second
|
||||
Extended File System</A>
|
||||
|
||||
<P>When the two new filesystems were first released, they
|
||||
provided essentially the same features. Due to its minimal
|
||||
design, Xia fs was more stable than Ext2fs. As the filesystems
|
||||
were used more widely, bugs were fixed in Ext2fs and lots of
|
||||
improvements and new features were integrated. Ext2fs is now
|
||||
very stable and has become the de-facto standard Linux
|
||||
filesystem.
|
||||
|
||||
<P>This table contains a summary of the features
|
||||
provided by the different filesystems:
|
||||
|
||||
<TABLE border>
|
||||
<TR><TH></TH><TH>Minix FS</TH><TH>Ext FS</TH><TH>Ext2 FS</TH><TH>Xia FS</TD></TR>
|
||||
<TR><TH>Max FS size</TH><TD>64 MB</TD><TD>2 GB</TD><TD>4 TB</TD><TD>2 GB</TD></TR>
|
||||
<TR><TH>Max file size</TH><TD>64 MB</TD><TD>2 GB</TD><TD>2 GB</TD><TD>64 MB</TD></TR>
|
||||
<TR><TH>Max file name</TH><TD>16/30 c</TD><TD>255 c</TD><TD>255 c</TD><TD>248 c</TD></TR>
|
||||
<TR><TH>3 times support</TH><TD>No</TD><TD>No</TD><TD>Yes</TD><TD>Yes</TD></TR>
|
||||
<TR><TH>Extensible</TH><TD>No</TD><TD>No</TD><TD>Yes</TD><TD>No</TD></TR>
|
||||
<TR><TH>Var. block size</TH><TD>No</TD><TD>No</TD><TD>Yes</TD><TD>No</TD></TR>
|
||||
<TR><TH>Maintained</TH><TD>Yes</TD><TD>No</TD><TD>Yes</TD><TD>?</TD></TR>
|
||||
</TABLE>
|
||||
|
||||
<H3>Basic File System Concepts</H3>
|
||||
|
||||
<P>Every Linux filesystem implements a basic set of common
|
||||
concepts derivated from the Unix operating system
|
||||
<A href="#bach">[Bach 1986]</A> files are represented by inodes,
|
||||
directories are simply files containing a list of entries and
|
||||
devices can be accessed by requesting I/O on special files.
|
||||
|
||||
<H4>Inodes</H4>
|
||||
|
||||
<P>Each file is represented by a structure, called an inode.
|
||||
Each inode contains the description of the file: file type,
|
||||
access rights, owners, timestamps, size, pointers to data
|
||||
blocks. The addresses of data blocks allocated to a file are
|
||||
stored in its inode. When a user requests an I/O operation on
|
||||
the file, the kernel code converts the current offset to a
|
||||
block number, uses this number as an index in the block
|
||||
addresses table and reads or writes the physical block. This
|
||||
figure represents the structure of an inode:
|
||||
|
||||
<IMG SRC="ext2-inode.gif">
|
||||
|
||||
<H4>Directories</H4>
|
||||
|
||||
<P>Directories are structured in a hierarchical tree. Each
|
||||
directory can contain files and subdirectories.
|
||||
|
||||
<P>Directories are implemented as a special type of files.
|
||||
Actually, a directory is a file containing a list of entries.
|
||||
Each entry contains an inode number and a file name. When a
|
||||
process uses a pathname, the kernel code searchs in the
|
||||
directories to find the corresponding inode number. After the
|
||||
name has been converted to an inode number, the inode is loaded
|
||||
into memory and is used by subsequent requests.
|
||||
|
||||
<P>This figure represents a directory:
|
||||
|
||||
<IMG SRC="ext2-dir.gif">
|
||||
|
||||
<H4>Links</H4>
|
||||
|
||||
<P>Unix filesystems implement the concept of link. Several
|
||||
names can be associated with a inode. The inode contains a
|
||||
field containing the number associated with the file. Adding a
|
||||
link simply consists in creating a directory entry, where the
|
||||
inode number points to the inode, and in incrementing the links
|
||||
count in the inode. When a link is deleted, i.e. when one uses
|
||||
the <TT>rm</TT> command to remove a filename, the kernel
|
||||
decrements the links count and deallocates the inode if this
|
||||
count becomes zero.
|
||||
|
||||
<P>This type of link is called a hard link and can only be used
|
||||
within a single filesystem: it is impossible to create
|
||||
cross-filesystem hard links. Moreover, hard links can only
|
||||
point on files: a directory hard link cannot be created to
|
||||
prevent the apparition of a cycle in the directory tree.
|
||||
|
||||
<P>Another kind of links exists in most Unix filesystems.
|
||||
Symbolic links are simply files which contain a filename. When
|
||||
the kernel encounters a symbolic link during a pathname to
|
||||
inode conversion, it replaces the name of the link by its
|
||||
contents, i.e. the name of the target file, and restarts the
|
||||
pathname interpretation. Since a symbolic link does not point
|
||||
to an inode, it is possible to create cross-filesystems
|
||||
symbolic links. Symbolic links can point to any type of file,
|
||||
even on nonexistent files. Symbolic links are very useful
|
||||
because they don't have the limitations associated to hard
|
||||
links. However, they use some disk space, allocated for their
|
||||
inode and their data blocks, and cause an overhead in the
|
||||
pathname to inode conversion because the kernel has to restart
|
||||
the name interpretation when it encounters a symbolic link.
|
||||
|
||||
<H4>Device special files</H4>
|
||||
|
||||
<P>In Unix-like operating systems, devices can be accessed via
|
||||
special files. A device special file does not use any space on
|
||||
the filesystem. It is only an access point to the device
|
||||
driver.
|
||||
|
||||
<P>Two types of special files exist: character and block
|
||||
special files. The former allows I/O operations in character
|
||||
mode while the later requires data to be written in block mode
|
||||
via the buffer cache functions. When an I/O request is made on
|
||||
a special file, it is forwarded to a (pseudo) device driver. A
|
||||
special file is referenced by a major number, which identifies
|
||||
the device type, and a minor number, which identifies the unit.
|
||||
|
||||
<A name="section:vfs">
|
||||
<H3>The Virtual File System</H3>
|
||||
</A>
|
||||
|
||||
<H4>Principle</H4>
|
||||
|
||||
<P>The Linux kernel contains a Virtual File System layer which
|
||||
is used during system calls acting on files. The VFS is an
|
||||
indirection layer which handles the file oriented system calls
|
||||
and calls the necessary functions in the physical filesystem
|
||||
code to do the I/O.
|
||||
|
||||
<P>This indirection mechanism is frequently used in Unix-like
|
||||
operating systems to ease the integration and the use of
|
||||
several filesystem types <A href="#vnodes">[Kleiman 1986,</A>
|
||||
<A href="#lfs:unix">Seltzer <I>et al.</I> 1993]</A>.
|
||||
|
||||
<P>When a process issues a file oriented system call, the
|
||||
kernel calls a function contained in the VFS. This function
|
||||
handles the structure independent manipulations and redirects
|
||||
the call to a function contained in the physical filesystem
|
||||
code, which is responsible for handling the structure dependent
|
||||
operations. Filesystem code uses the buffer cache functions to
|
||||
request I/O on devices. This scheme is illustrated in this
|
||||
figure:
|
||||
|
||||
<IMG SRC="ext2-vfs.gif">
|
||||
|
||||
<H4>The VFS structure</H4>
|
||||
|
||||
<P>The VFS defines a set of functions that every filesystem has
|
||||
to implement. This interface is made up of a set of operations
|
||||
associated to three kinds of objects: filesystems, inodes, and
|
||||
open files.
|
||||
|
||||
<P>The VFS knows about filesystem types supported in the
|
||||
kernel. It uses a table defined during the kernel
|
||||
configuration. Each entry in this table describes a filesystem
|
||||
type: it contains the name of the filesystem type and a pointer
|
||||
on a function called during the mount operation. When a
|
||||
filesystem is to be mounted, the appropriate mount function is
|
||||
called. This function is responsible for reading the superblock
|
||||
from the disk, initializing its internal variables, and
|
||||
returning a mounted filesystem descriptor to the VFS. After the
|
||||
filesystem is mounted, the VFS functions can use this
|
||||
descriptor to access the physical filesystem routines.
|
||||
|
||||
<P>A mounted filesystem descriptor contains several kinds of
|
||||
data: informations that are common to every filesystem types,
|
||||
pointers to functions provided by the physical filesystem
|
||||
kernel code, and private data maintained by the physical
|
||||
filesystem code. The function pointers contained in the
|
||||
filesystem descriptors allow the VFS to access the filesystem
|
||||
internal routines.
|
||||
|
||||
<P>Two other types of descriptors are used by the VFS: an inode descriptor
|
||||
and an open file descriptor. Each descriptor contains informations related to
|
||||
files in use and a set of operations provided by the physical filesystem code.
|
||||
While the inode descriptor contains pointers to functions that can be used to
|
||||
act on any file (e.g. <TT>create</TT>, <TT>unlink</TT>), the file descriptors
|
||||
contains pointer to functions which can only act on open files (e.g.
|
||||
<TT>read</TT>, <TT>write</TT>).
|
||||
|
||||
<A name="section:ext2fs">
|
||||
<H3>The Second Extended File System</H3>
|
||||
</A>
|
||||
|
||||
<H4>Motivations</H4>
|
||||
|
||||
<P>The Second Extended File System has been designed and
|
||||
implemented to fix some problems present in the first Extended
|
||||
File System. Our goal was to provide a powerful filesystem,
|
||||
which implements Unix file semantics and offers advanced
|
||||
features.
|
||||
|
||||
<P>Of course, we wanted to Ext2fs to have excellent
|
||||
performance. We also wanted to provide a very robust
|
||||
filesystem in order to reduce the risk of data loss in
|
||||
intensive use. Last, but not least, Ext2fs had to include
|
||||
provision for extensions to allow users to benefit from new
|
||||
features without reformatting their filesystem.
|
||||
|
||||
<H4>``Standard'' Ext2fs features</H4>
|
||||
|
||||
<P>The Ext2fs supports standard Unix file types: regular files,
|
||||
directories, device special files and symbolic links.
|
||||
|
||||
<P>Ext2fs is able to manage filesystems created on really big
|
||||
partitions. While the original kernel code restricted the
|
||||
maximal filesystem size to 2 GB, recent work in the VFS layer
|
||||
have raised this limit to 4 TB. Thus, it is now possible to use
|
||||
big disks without the need of creating many partitions.
|
||||
|
||||
<P>Ext2fs provides long file names. It uses variable length
|
||||
directory entries. The maximal file name size is 255
|
||||
characters. This limit could be extended to 1012 if needed.
|
||||
|
||||
<P>Ext2fs reserves some blocks for the super user
|
||||
(<TT>root</TT>). Normally, 5% of the blocks are reserved. This
|
||||
allows the administrator to recover easily from situations
|
||||
where user processes fill up filesystems.
|
||||
|
||||
<A name="subsection:ext2fs:adv-feat">
|
||||
<H4>``Advanced'' Ext2fs features</H4>
|
||||
</A>
|
||||
|
||||
<P>In addition to the standard Unix features, Ext2fs supports
|
||||
some extensions which are not usually present in Unix
|
||||
filesystems.
|
||||
|
||||
<P>File attributes allow the users to modify the kernel
|
||||
behavior when acting on a set of files. One can set attributes
|
||||
on a file or on a directory. In the later case, new files
|
||||
created in the directory inherit these attributes.
|
||||
|
||||
<P>BSD or System V Release 4 semantics can be selected at mount
|
||||
time. A mount option allows the administrator to choose the
|
||||
file creation semantics. On a filesystem mounted with BSD
|
||||
semantics, files are created with the same group id as their
|
||||
parent directory. System V semantics are a bit more complex: if
|
||||
a directory has the setgid bit set, new files inherit the group
|
||||
id of the directory and subdirectories inherit the group id and
|
||||
the setgid bit; in the other case, files and subdirectories are
|
||||
created with the primary group id of the calling process.
|
||||
|
||||
<P>BSD-like synchronous updates can be used in Ext2fs. A mount
|
||||
option allows the administrator to request that metadata
|
||||
(inodes, bitmap blocks, indirect blocks and directory blocks)
|
||||
be written synchronously on the disk when they are modified.
|
||||
This can be useful to maintain a strict metadata consistency
|
||||
but this leads to poor performances. Actually, this feature is
|
||||
not normally used, since in addition to the performance loss
|
||||
associated with using synchronous updates of the metadata, it
|
||||
can cause corruption in the user data which will not be flagged
|
||||
by the filesystem checker.
|
||||
|
||||
<P>Ext2fs allows the administrator to choose the logical block
|
||||
size when creating the filesystem. Block sizes can typically be
|
||||
1024, 2048 and 4096 bytes. Using big block sizes can speed up
|
||||
I/O since fewer I/O requests, and thus fewer disk head seeks,
|
||||
need to be done to access a file. On the other hand, big blocks
|
||||
waste more disk space: on the average, the last block allocated
|
||||
to a file is only half full, so as blocks get bigger, more
|
||||
space is wasted in the last block of each file. In addition,
|
||||
most of the advantages of larger block sizes are obtained by
|
||||
Ext2 filesystem's preallocation techniques (see section
|
||||
<A href="#subsection:ext2fs:allocation">Performance optimizations</A>).
|
||||
|
||||
<P>Ext2fs implements fast symbolic links. A fast symbolic link
|
||||
does not use any data block on the filesystem. The target name
|
||||
is not stored in a data block but in the inode itself. This
|
||||
policy can save some disk space (no data block needs to be
|
||||
allocated) and speeds up link operations (there is no need to
|
||||
read a data block when accessing such a link). Of course, the
|
||||
space available in the inode is limited so not every link can
|
||||
be implemented as a fast symbolic link. The maximal size of the
|
||||
target name in a fast symbolic link is 60 characters. We plan
|
||||
to extend this scheme to small files in the near future.
|
||||
|
||||
<P>Ext2fs keeps track of the filesystem state. A special field
|
||||
in the superblock is used by the kernel code to indicate the
|
||||
status of the file system. When a filesystem is mounted in
|
||||
read/write mode, its state is set to ``Not Clean''. When it is
|
||||
unmounted or remounted in read-only mode, its state is reset to
|
||||
``Clean''. At boot time, the filesystem checker uses this
|
||||
information to decide if a filesystem must be checked. The
|
||||
kernel code also records errors in this field. When an
|
||||
inconsistency is detected by the kernel code, the filesystem is
|
||||
marked as ``Erroneous''. The filesystem checker tests this to
|
||||
force the check of the filesystem regardless of its apparently
|
||||
clean state.
|
||||
|
||||
<P>Always skipping filesystem checks may sometimes be
|
||||
dangerous, so Ext2fs provides two ways to force checks at
|
||||
regular intervals. A mount counter is maintained in the
|
||||
superblock. Each time the filesystem is mounted in read/write
|
||||
mode, this counter is incremented. When it reaches a maximal
|
||||
value (also recorded in the superblock), the filesystem checker
|
||||
forces the check even if the filesystem is ``Clean''. A last
|
||||
check time and a maximal check interval are also maintained in
|
||||
the superblock. These two fields allow the administrator to
|
||||
request periodical checks. When the maximal check interval has
|
||||
been reached, the checker ignores the filesystem state and
|
||||
forces a filesystem check.
|
||||
|
||||
Ext2fs offers tools to tune the filesystem behavior.
|
||||
The <TT>tune2fs</TT> program can be used to modify:
|
||||
<UL>
|
||||
<LI>the error behavior. When an inconsistency is detected by
|
||||
the kernel code, the filesystem is marked as ``Erroneous'' and
|
||||
one of the three following actions can be done: continue normal
|
||||
execution, remount the filesystem in read-only mode to avoid
|
||||
corrupting the filesystem, make the kernel panic and reboot to
|
||||
run the filesystem checker.
|
||||
<LI>the maximal mount count.
|
||||
<LI>the maximal check interval.
|
||||
<LI>the number of logical blocks reserved for the super user.
|
||||
</UL>
|
||||
|
||||
<P>Mount options can also be used to change the kernel error behavior.
|
||||
|
||||
<P>An attribute allows the users to request secure deletion on
|
||||
files. When such a file is deleted, random data is written in
|
||||
the disk blocks previously allocated to the file. This prevents
|
||||
malicious people from gaining access to the previous content of
|
||||
the file by using a disk editor.
|
||||
|
||||
<P>Last, new types of files inspired from the 4.4 BSD
|
||||
filesystem have recently been added to Ext2fs. Immutable files
|
||||
can only be read: nobody can write or delete them. This can be
|
||||
used to protect sensitive configuration files. Append-only
|
||||
files can be opened in write mode but data is always appended
|
||||
at the end of the file. Like immutable files, they cannot be
|
||||
deleted or renamed. This is especially useful for log files
|
||||
which can only grow.
|
||||
|
||||
<H4>Physical Structure</H4>
|
||||
|
||||
<P>The physical structure of Ext2 filesystems has been strongly
|
||||
influenced by the layout of the BSD filesystem
|
||||
<A href="#mckusick:ffs">[McKusick <I>et al.</I> 1984]</A>. A
|
||||
filesystem is made up of block groups. Block groups are
|
||||
analogous to BSD FFS's cylinder groups. However, block groups
|
||||
are not tied to the physical layout of the blocks on the disk,
|
||||
since modern drives tend to be optimized for sequential access
|
||||
and hide their physical geometry to the operating system.
|
||||
|
||||
<P>The physical structure of a filesystem is represented in this
|
||||
table:
|
||||
<TABLE border>
|
||||
<TR>
|
||||
<TD>Boot<BR>Sector</TD>
|
||||
<TD>Block<BR>Group 1</TD>
|
||||
<TD>Block<BR>Group 2</TD>
|
||||
<TD>...<BR>...</TD>
|
||||
<TD>Block<BR>Group N</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
|
||||
<P>Each block group contains a redundant copy of crucial filesystem
|
||||
control informations (superblock and the filesystem descriptors) and
|
||||
also contains a part of the filesystem (a block bitmap, an inode
|
||||
bitmap, a piece of the inode table, and data blocks). The structure of
|
||||
a block group is represented in this table:
|
||||
<TABLE border>
|
||||
<TR>
|
||||
<TD>Super<BR>Block</TD>
|
||||
<TD>FS<BR>descriptors</TD>
|
||||
<TD>Block<BR>Bitmap</TD>
|
||||
<TD>Inode<BR>Bitmap</TD>
|
||||
<TD>Inode<BR>Table</TD>
|
||||
<TD>Data<BR>Blocks</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
|
||||
<P>Using block groups is a big win in terms of reliability:
|
||||
since the control structures are replicated in each block
|
||||
group, it is easy to recover from a filesystem where the
|
||||
superblock has been corrupted. This structure also helps to get
|
||||
good performances: by reducing the distance between the inode
|
||||
table and the data blocks, it is possible to reduce the disk
|
||||
head seeks during I/O on files.
|
||||
|
||||
<P>In Ext2fs, directories are managed as linked lists of
|
||||
variable length entries. Each entry contains the inode number,
|
||||
the entry length, the file name and its length. By using
|
||||
variable length entries, it is possible to implement long file
|
||||
names without wasting disk space in directories. The structure
|
||||
of a directory entry is shown in this table:
|
||||
<TABLE border>
|
||||
<TR>
|
||||
<TD>inode number</TD><TD>entry length</TD>
|
||||
<TD>name length</TD><TD>filename</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
|
||||
<P>As an example, The next table represents the structure of a
|
||||
directory containing three files: <TT>file1</TT>,
|
||||
<TT>long_file_name</TT>, and <TT>f2</TT>:
|
||||
<TABLE border>
|
||||
<TR><TD>i1</TD><TD>16</TD><TD>05</TD><TD><TT>file1 </TT></TD></TR>
|
||||
</TABLE>
|
||||
<TABLE border>
|
||||
<TR><TD>i2</TD><TD>40</TD><TD>14</TD><TD><TT>long_file_name </TT></TD></TR>
|
||||
</TABLE>
|
||||
<TABLE border>
|
||||
<TR><TD>i3</TD><TD>12</TD><TD>02</TD><TD><TT>f2 </TT></TD></TR>
|
||||
</TABLE>
|
||||
|
||||
<A name="subsection:ext2fs:allocation">
|
||||
<H4>Performance optimizations</H4>
|
||||
</A>
|
||||
|
||||
<P>The Ext2fs kernel code contains many performance
|
||||
optimizations, which tend to improve I/O speed when reading and
|
||||
writing files.
|
||||
|
||||
<P>Ext2fs takes advantage of the buffer cache management by
|
||||
performing readaheads: when a block has to be read, the kernel
|
||||
code requests the I/O on several contiguous blocks. This way,
|
||||
it tries to ensure that the next block to read will already be
|
||||
loaded into the buffer cache. Readaheads are normally performed
|
||||
during sequential reads on files and Ext2fs extends them to
|
||||
directory reads, either explicit reads (<TT>readdir(2)</TT>
|
||||
calls) or implicit ones (<TT>namei</TT> kernel directory
|
||||
lookup).
|
||||
|
||||
<P>Ext2fs also contains many allocation optimizations. Block
|
||||
groups are used to cluster together related inodes and data:
|
||||
the kernel code always tries to allocate data blocks for a file
|
||||
in the same group as its inode. This is intended to reduce the
|
||||
disk head seeks made when the kernel reads an inode and its
|
||||
data blocks.
|
||||
|
||||
<P>When writing data to a file, Ext2fs preallocates up to 8
|
||||
adjacent blocks when allocating a new block. Preallocation hit
|
||||
rates are around 75% even on very full filesystems. This
|
||||
preallocation achieves good write performances under heavy
|
||||
load. It also allows contiguous blocks to be allocated to
|
||||
files, thus it speeds up the future sequential reads.
|
||||
|
||||
<P>These two allocation optimizations produce a very good locality of:
|
||||
<UL>
|
||||
<LI>related files through block groups
|
||||
<LI>related blocks through the 8 bits clustering of block allocations.
|
||||
</UL>
|
||||
|
||||
<H3>The Ext2fs library</H3>
|
||||
|
||||
<P>To allow user mode programs to manipulate the control
|
||||
structures of an Ext2 filesystem, the libext2fs library was
|
||||
developed. This library provides routines which can be used to
|
||||
examine and modify the data of an Ext2 filesystem, by accessing
|
||||
the filesystem directly through the physical device.
|
||||
|
||||
<P>The Ext2fs library was designed to allow maximal code reuse
|
||||
through the use of software abstraction techniques. For
|
||||
example, several different iterators are provided. A program
|
||||
can simply pass in a function to
|
||||
<TT>ext2fs_block_interate()</TT>, which will be called for each
|
||||
block in an inode. Another iterator function allows an
|
||||
user-provided function to be called for each file in a
|
||||
directory.
|
||||
|
||||
<P>Many of the Ext2fs utilities (<TT>mke2fs</TT>,
|
||||
<TT>e2fsck</TT>, <TT>tune2fs</TT>, <TT>dumpe2fs</TT>, and
|
||||
<TT>debugfs</TT>) use the Ext2fs library. This greatly
|
||||
simplifies the maintainance of these utilities, since any
|
||||
changes to reflect new features in the Ext2 filesystem format
|
||||
need only be made in one place--in the Ext2fs library. This
|
||||
code reuse also results in smaller binaries, since the Ext2fs
|
||||
library can be built as a shared library image.
|
||||
|
||||
<P>Because the interfaces of the Ext2fs library are so abstract
|
||||
and general, new programs which require direct access to the
|
||||
Ext2fs filesystem can very easily be written. For example, the
|
||||
Ext2fs library was used during the port of the 4.4BSD dump and
|
||||
restore backup utilities. Very few changes were needed to adapt
|
||||
these tools to Linux: only a few filesystem dependent functions
|
||||
had to be replaced by calls to the Ext2fs library.
|
||||
|
||||
<P>The Ext2fs library provides access to several classes of
|
||||
operations. The first class are the filesystem-oriented
|
||||
operations. A program can open and close a filesystem, read
|
||||
and write the bitmaps, and create a new filesystem on the disk.
|
||||
Functions are also available to manipulate the filesystem's bad
|
||||
blocks list.
|
||||
|
||||
<P>The second class of operations affect directories. A caller
|
||||
of the Ext2fs library can create and expand directories, as
|
||||
well as add and remove directory entries. Functions are also
|
||||
provided to both resolve a pathname to an inode number, and to
|
||||
determine a pathname of an inode given its inode number.
|
||||
|
||||
<P>The final class of operations are oriented around inodes.
|
||||
It is possible to scan the inode table, read and write inodes,
|
||||
and scan through all of the blocks in an inode. Allocation and
|
||||
deallocation routines are also available and allow user mode
|
||||
programs to allocate and free blocks and inodes.
|
||||
|
||||
<H3>The Ext2fs tools</H3>
|
||||
|
||||
<P>Powerful management tools have been developed for Ext2fs.
|
||||
These utilities are used to create, modify, and correct any
|
||||
inconsistencies in Ext2 filesystems. The <TT>mke2fs</TT>
|
||||
program is used to initialize a partition to contain an empty
|
||||
Ext2 filesystem.
|
||||
|
||||
<P>The <TT>tune2fs</TT> program can be used to modify the filesystem
|
||||
parameters. As explained in section <A href="#subsection:ext2fs:adv-feat">
|
||||
``Advanced'' Ext2fs features</A>, it can change the error
|
||||
behavior, the maximal mount count, the maximal check interval,
|
||||
and the number of logical blocks reserved for the super user.
|
||||
|
||||
<P>The most interesting tool is probably the filesystem
|
||||
checker. <TT>E2fsck</TT> is intended to repair filesystem
|
||||
inconsistencies after an unclean shutdown of the system. The
|
||||
original version of <TT>e2fsck</TT> was based on Linus
|
||||
Torvald's fsck program for the Minix filesystem. However, the
|
||||
current version of <TT>e2fsck</TT> was rewritten from scratch,
|
||||
using the Ext2fs library, and is much faster and can correct
|
||||
more filesystem inconsistencies than the original version.
|
||||
|
||||
<P>The <TT>e2fsck</TT> program is designed to run as quickly as
|
||||
possible. Since filesystem checkers tend to be disk bound,
|
||||
this was done by optimizing the algorithms used by
|
||||
<TT>e2fsck</TT> so that filesystem structures are not
|
||||
repeatedly accessed from the disk. In addition, the order in
|
||||
which inodes and directories are checked are sorted by block
|
||||
number to reduce the amount of time in disk seeks. Many of
|
||||
these ideas were originally explored by
|
||||
<A href="#bsd:fsck">[Bina and Emrath 1989]</A> although they have
|
||||
since been further refined by the authors.
|
||||
|
||||
<P>In pass 1, <TT>e2fsck</TT> iterates over all of the inodes
|
||||
in the filesystem and performs checks over each inode as an
|
||||
unconnected object in the filesystem. That is, these checks do
|
||||
not require any cross-checks to other filesystem objects.
|
||||
Examples of such checks include making sure the file mode is
|
||||
legal, and that all of the blocks in the inode are valid block
|
||||
numbers. During pass 1, bitmaps indicating which blocks and
|
||||
inodes are in use are compiled.
|
||||
|
||||
<P>If <TT>e2fsck</TT> notices data blocks which are claimed by
|
||||
more than one inode, it invokes passes 1B through 1D to resolve
|
||||
these conflicts, either by cloning the shared blocks so that
|
||||
each inode has its own copy of the shared block, or by
|
||||
deallocating one or more of the inodes.
|
||||
|
||||
<P>Pass 1 takes the longest time to execute, since all of the
|
||||
inodes have to be read into memory and checked. To reduce the
|
||||
I/O time necessary in future passes, critical filesystem
|
||||
information is cached in memory. The most important example of
|
||||
this technique is the location on disk of all of the directory
|
||||
blocks on the filesystem. This obviates the need to re-read
|
||||
the directory inodes structures during pass 2 to obtain this
|
||||
information.
|
||||
|
||||
<P>Pass 2 checks directories as unconnected objects. Since
|
||||
directory entries do not span disk blocks, each directory block
|
||||
can be checked individually without reference to other
|
||||
directory blocks. This allows <TT>e2fsck</TT> to sort all of
|
||||
the directory blocks by block number, and check directory
|
||||
blocks in ascending order, thus decreasing disk seek time. The
|
||||
directory blocks are checked to make sure that the directory
|
||||
entries are valid, and contain references to inode numbers
|
||||
which are in use (as determined by pass 1).
|
||||
|
||||
<P>For the first directory block in each directory inode, the
|
||||
`.' and `..' entries are checked to make sure they exist, and
|
||||
that the inode number for the `.' entry matches the current
|
||||
directory. (The inode number for the `..' entry is not checked
|
||||
until pass 3.)
|
||||
|
||||
<P>Pass 2 also caches information concerning the parent
|
||||
directory in which each directory is linked. (If a directory
|
||||
is referenced by more than one directory, the second reference
|
||||
of the directory is treated as an illegal hard link, and it is
|
||||
removed).
|
||||
|
||||
<P>It is noteworthy to note that at the end of pass 2, nearly
|
||||
all of the disk I/O which <TT>e2fsck</TT> needs to perform is
|
||||
complete. Information required by passes 3, 4 and 5 are cached
|
||||
in memory; hence, the remaining passes of <TT>e2fsck</TT> are
|
||||
largely CPU bound, and take less than 5-10% of the total
|
||||
running time of <TT>e2fsck</TT>.
|
||||
|
||||
<P>In pass 3, the directory connectivity is checked.
|
||||
<TT>E2fsck</TT> traces the path of each directory back to the
|
||||
root, using information that was cached during pass 2. At this
|
||||
time, the `..' entry for each directory is also checked to make
|
||||
sure it is valid. Any directories which can not be traced back
|
||||
to the root are linked to the <TT>/lost+found</TT> directory.
|
||||
|
||||
<P>In pass 4, <TT>e2fsck</TT> checks the reference counts for
|
||||
all inodes, by iterating over all the inodes and comparing the
|
||||
link counts (which were cached in pass 1) against internal
|
||||
counters computed during passes 2 and 3. Any undeleted files
|
||||
with a zero link count is also linked to the
|
||||
<TT>/lost+found</TT> directory during this pass.
|
||||
|
||||
<P>Finally, in pass 5, <TT>e2fsck</TT> checks the validity of
|
||||
the filesystem summary information. It compares the block and
|
||||
inode bitmaps which were constructed during the previous passes
|
||||
against the actual bitmaps on the filesystem, and corrects the
|
||||
on-disk copies if necessary.
|
||||
|
||||
<P>The filesystem debugger is another useful tool.
|
||||
<TT>Debugfs</TT> is a powerful program which can be used to
|
||||
examine and change the state of a filesystem. Basically, it
|
||||
provides an interactive interface to the Ext2fs library:
|
||||
commands typed by the user are translated into calls to the
|
||||
library routines.
|
||||
|
||||
<P><TT>Debugfs</TT> can be used to examine the internal
|
||||
structures of a filesystem, manually repair a corrupted
|
||||
filesystem, or create test cases for <TT>e2fsck</TT>.
|
||||
Unfortunately, this program can be dangerous if it is used by
|
||||
people who do not know what they are doing; it is very easy to
|
||||
destroy a filesystem with this tool. For this reason,
|
||||
<TT>debugfs</TT> opens filesytems for read-only access by
|
||||
default. The user must explicitly specify the <TT>-w</TT> flag
|
||||
in order to use <TT>debugfs</TT> to open a filesystem for
|
||||
read/wite access.
|
||||
|
||||
<H3>Performance Measurements</H3>
|
||||
|
||||
<H4>Description of the benchmarks</H4>
|
||||
|
||||
<P>We have run benchmarks to measure filesystem performances.
|
||||
Benchmarks have been made on a middle-end PC, based on a
|
||||
i486DX2 processor, using 16 MB of memory and two 420 MB IDE
|
||||
disks. The tests were run on Ext2 fs and Xia fs (Linux 1.1.62)
|
||||
and on the BSD Fast filesystem in asynchronous and synchronous
|
||||
mode (FreeBSD 2.0 Alpha--based on the 4.4BSD Lite
|
||||
distribution).
|
||||
|
||||
<P>We have run two different benchmarks. The Bonnie benchmark
|
||||
tests I/O speed on a big file--the file size was set to 60 MB
|
||||
during the tests. It writes data to the file using character
|
||||
based I/O, rewrites the contents of the whole file, writes data
|
||||
using block based I/O, reads the file using character I/O and
|
||||
block I/O, and seeks into the file. The Andrew Benchmark was
|
||||
developed at Carneggie Mellon University and has been used at
|
||||
the University of Berkeley to benchmark BSD FFS and LFS. It
|
||||
runs in five phases: it creates a directory hierarchy, makes a
|
||||
copy of the data, recursively examine the status of every file,
|
||||
examine every byte of every file, and compile several of the
|
||||
files.
|
||||
|
||||
<H4>Results of the Bonnie benchmark</H4>
|
||||
|
||||
<P>The results of the Bonnie benchmark are presented in this
|
||||
table:
|
||||
<TABLE border>
|
||||
<TR><TH></TH><TH>Char Write<BR>(KB/s)</TH>
|
||||
<TH>Block Write<BR>(KB/s)</TH>
|
||||
<TH>Rewrite<BR>(KB/s)</TH>
|
||||
<TH>Char Read<BR>(KB/s)</TH>
|
||||
<TH>Block Read<BR>(KB/s)</TH></TR>
|
||||
<TR><TD>BSD Async</TD><TD align="right">710</TD><TD align="right">684</TD><TD align="right">401</TD><TD align="right">721</TD><TD align="right">888</TD></TR>
|
||||
<TR><TD>BSD Sync</TD><TD align="right">699</TD><TD align="right">677</TD><TD align="right">400</TD><TD align="right">710</TD><TD align="right">878</TD></TR>
|
||||
<TR><TD>Ext2 fs</TD><TD align="right">452</TD><TD align="right">1237</TD><TD align="right">536</TD><TD align="right">397</TD><TD align="right">1033</TD></TR>
|
||||
<TR><TD>Xia fs</TD><TD align="right">440</TD><TD align="right">704</TD><TD align="right">380</TD><TD align="right">366</TD><TD align="right">895</TD></TR>
|
||||
</TABLE>
|
||||
|
||||
<P>The results are very good in block oriented I/O: Ext2 fs
|
||||
outperforms other filesystems. This is clearly a benefit of the
|
||||
optimizations included in the allocation routines. Writes are
|
||||
fast because data is written in cluster mode. Reads are fast
|
||||
because contiguous blocks have been allocated to the file. Thus
|
||||
there is no head seek between two reads and the readahead
|
||||
optimizations can be fully used.
|
||||
|
||||
<P>On the other hand, performance is better in the FreeBSD
|
||||
operating system in character oriented I/O. This is probably
|
||||
due to the fact that FreeBSD and Linux do not use the same
|
||||
stdio routines in their respective C libraries. It seems that
|
||||
FreeBSD has a more optimized character I/O library and its
|
||||
performance is better.
|
||||
|
||||
<H4>Results of the Andrew benchmark</H4>
|
||||
|
||||
The results of the Andrew benchmark are presented in
|
||||
this table:
|
||||
<TABLE border>
|
||||
<TR>
|
||||
<TH></TH>
|
||||
<TH>P1 Create<BR>(ms)</TH>
|
||||
<TH>P2 Copy<BR>(ms)</TH>
|
||||
<TH>P3 Stat<BR>(ms)</TH>
|
||||
<TH>P4 Grep<BR>(ms)</TH>
|
||||
<TH>P5 Compile<BR>(ms)</TH>
|
||||
</TR>
|
||||
<TR><TD>BSD Async</TD><TD align="right">2203</TD><TD align="right">7391</TD><TD align="right">6319</TD><TD align="right">17466</TD><TD align="right">75314</TD></TR>
|
||||
<TR><TD>BSD Sync</TD><TD align="right">2330</TD><TD align="right">7732</TD><TD align="right">6317</TD><TD align="right">17499</TD><TD align="right">75681</TD></TR>
|
||||
<TR><TD>Ext2 fs</TD><TD align="right">790</TD><TD align="right">4791</TD><TD align="right">7235</TD><TD align="right">11685</TD><TD align="right">63210</TD></TR>
|
||||
<TR><TD>Xia fs</TD><TD align="right">934</TD><TD align="right">5402</TD><TD align="right">8400</TD><TD align="right">12912</TD><TD align="right">66997</TD></TR>
|
||||
</TABLE>
|
||||
|
||||
<P>The results of the two first passes show that Linux benefits
|
||||
from its asynchronous metadata I/O. In passes 1 and 2,
|
||||
directories and files are created and BSD synchronously writes
|
||||
inodes and directory entries. There is an anomaly, though: even
|
||||
in asynchronous mode, the performance under BSD is poor. We
|
||||
suspect that the asynchronous support under FreeBSD is not
|
||||
fully implemented.
|
||||
|
||||
<P>In pass 3, the Linux and BSD times are very similar. This is
|
||||
a big progress against the same benchmark run six months ago.
|
||||
While BSD used to outperform Linux by a factor of 3 in this
|
||||
test, the addition of a file name cache in the VFS has fixed
|
||||
this performance problem.
|
||||
|
||||
<P>In passes 4 and 5, Linux is faster than FreeBSD mainly
|
||||
because it uses an unified buffer cache management. The buffer
|
||||
cache space can grow when needed and use more memory than the
|
||||
one in FreeBSD, which uses a fixed size buffer cache.
|
||||
Comparison of the Ext2fs and Xiafs results shows that the
|
||||
optimizations included in Ext2fs are really useful: the
|
||||
performance gain between Ext2fs and Xiafs is around 5-10%.
|
||||
|
||||
<H3>Conclusion</H3>
|
||||
|
||||
<P>The Second Extended File System is probably the most widely
|
||||
used filesystem in the Linux community. It provides standard
|
||||
Unix file semantics and advanced features. Moreover, thanks to
|
||||
the optimizations included in the kernel code, it is robust and
|
||||
offers excellent performance.
|
||||
|
||||
<P>Since Ext2fs has been designed with evolution in mind, it
|
||||
contains hooks that can be used to add new features. Some
|
||||
people are working on extensions to the current filesystem:
|
||||
access control lists conforming to the Posix semantics
|
||||
<A href="#posix6">[IEEE 1992]</A>, undelete, and on-the-fly
|
||||
file compression.
|
||||
|
||||
<P>Ext2fs was first developed and integrated in the Linux
|
||||
kernel and is now actively being ported to other operating
|
||||
systems. An Ext2fs server running on top of the GNU Hurd has
|
||||
been implemented. People are also working on an Ext2fs port in
|
||||
the LITES server, running on top of the Mach microkernel
|
||||
<A href="#mach:foundation">[Accetta <I>et al.</I> 1986]</A>, and
|
||||
in the VSTa operating system. Last, but not least, Ext2fs is an
|
||||
important part of the Masix operating system
|
||||
<A href="#masix:osf">[Card <I>et al.</I> 1993]</A>,
|
||||
currently under development by one of the authors.
|
||||
|
||||
<H3>Acknowledgments</H3>
|
||||
|
||||
<P>The Ext2fs kernel code and tools have been written mostly by
|
||||
the authors of this paper. Some other people have also
|
||||
contributed to the development of Ext2fs either by suggesting
|
||||
new features or by sending patches. We want to thank these
|
||||
contributors for their help.
|
||||
|
||||
<H3>References</H3>
|
||||
|
||||
<P><A name="mach:foundation">[Accetta <I>et al.</I> 1986]</A>
|
||||
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and
|
||||
M. Young.
|
||||
Mach: A New Kernel Foundation For UNIX Development.
|
||||
In <I>Proceedings of the USENIX 1986 Summer Conference</I>, June 1986.
|
||||
|
||||
<P><A name="bach">[Bach 1986]</A>
|
||||
M. Bach.
|
||||
<I>The Design of the UNIX Operating System</I>.
|
||||
Prentice Hall, 1986.
|
||||
|
||||
<P><A name="bsd:fsck">[Bina and Emrath 1989]</A>
|
||||
E. Bina and P. Emrath.
|
||||
A Faster fsck for BSD Unix.
|
||||
In <I>Proceedings of the USENIX Winter Conference</I>, January 1989.
|
||||
|
||||
<P><A name="masix:osf">[Card <I>et al.</I> 1993]</A>
|
||||
R. Card, E. Commelin, S. Dayras, and F. M<>vel.
|
||||
The MASIX Multi-Server Operating System.
|
||||
In <I>OSF Workshop on Microkernel Technology for Distributed Systems</I>,
|
||||
June 1993.
|
||||
|
||||
<P><A name="posix6">[IEEE 1992]</A>
|
||||
<I>SECURITY INTERFACE for the Portable Operating System Interface for
|
||||
Computer Environments - Draft 13</I>.
|
||||
Institute of Electrical and Electronics Engineers, Inc, 1992.
|
||||
|
||||
<P><A name="vnodes">[Kleiman 1986]</A>
|
||||
S. Kleiman.
|
||||
Vnodes: An Architecture for Multiple File System Types
|
||||
in Sun UNIX.
|
||||
In <I>Proceedings of the Summer USENIX Conference</I>, pages 260--269,
|
||||
June 1986.
|
||||
|
||||
<P><A name="mckusick:ffs">[McKusick <I>et al.</I> 1984]</A>
|
||||
M. McKusick, W. Joy, S. Leffler, and R. Fabry.
|
||||
A Fast File System for UNIX.
|
||||
<I>ACM Transactions on Computer Systems</I>, 2(3):181--197, August
|
||||
1984.
|
||||
|
||||
<P><A name="lfs:unix">[Seltzer <I>et al.</I> 1993]</A>
|
||||
M. Seltzer, K. Bostic, M. McKusick, and C. Staelin.
|
||||
An Implementation of a Log-Structured File System for
|
||||
UNIX.
|
||||
In <I>Proceedings of the USENIX Winter Conference</I>, January 1993.
|
||||
|
||||
<P><A name="minix">[Tanenbaum 1987]</A>
|
||||
A. Tanenbaum.
|
||||
<I>Operating Systems: Design and Implementation</I>.
|
||||
Prentice Hall, 1987.
|
||||
<P>
|
||||
|
||||
<HR>
|
||||
|
||||
<P>Thanks to Michael Johnson for HTMLizing it (originally for use in
|
||||
the <A HREF="http://khg.redhat.com/HyperNews/get/fs/fs.html"> Kernel
|
||||
Hacker's Guide</A>).</P>
|
||||
|
||||
</BODY>
|
||||
</HTML>
|
||||
BIN
study/sabre/os/files/FileSystems/dtfs-thesis.pdf
Normal file
83
study/sabre/os/files/FileSystems/ext2-doc.htm
Normal file
@@ -0,0 +1,83 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
|
||||
<HTML>
|
||||
<body bgcolor="#ffffff" text="#000000">
|
||||
<font size="+1">Second Extended File System</font><BR>
|
||||
by Dave Poirier (<a href="mailto:instinc@users.sf.net">instinc@users.sf.net</a>)<br>
|
||||
<a href="http://savannah.gnu.org/projects/ext2-doc/">Project Page</a>
|
||||
<p>
|
||||
<center>View the information in:<br>
|
||||
<a href="http://freesoftware.fsf.org/download/ext2-doc/ext2.dvi">dvi</a> -
|
||||
<a href="ext2.html">html</a> -
|
||||
<a href="http://freesoftware.fsf.org/download/ext2-doc/ext2.pdf">pdf</a> -
|
||||
<a href="http://freesoftware.fsf.org/download/ext2-doc/ext2.ps">ps</a> -
|
||||
<a href="http://freesoftware.fsf.org/download/ext2-doc/ext2.rtf">rtf</a>
|
||||
<p><font size="-1">Last update: August 5th, 2002</font></center>
|
||||
<p>
|
||||
When I was working at my first ext2 driver implementation, I found myself short of
|
||||
documentation on the subject. It wasn't so much the information not being available
|
||||
as of it not being available all in one place.
|
||||
<p>
|
||||
This project tries to fix this, by bringing in one single place all the useful
|
||||
information in one easy to understand package. I try to not tie the documentation
|
||||
to any particular operating system, so that it may be useful to the widest audience.
|
||||
<p>
|
||||
<b>Change Log</b>
|
||||
<hr>
|
||||
August 5th, 2002
|
||||
<blockquote>
|
||||
<li>Added a note to .i_blocks and .i_dtime</li>
|
||||
</blockquote>
|
||||
August 2nd, 2002
|
||||
<blockquote>
|
||||
<li>Updated the values of EXT2_S_IFLNK and EXT2_S_IFSOCK as noted by Jeremy Stanley of AccessData Inc</li>
|
||||
<li>Added a note about the reserved inode entries</li>
|
||||
</blockquote>
|
||||
July 31st, 2002
|
||||
<blockquote>
|
||||
<li>Fixed the 0 and 1 definitions for the block and inode bitmaps.</li>
|
||||
</blockquote>
|
||||
June 16th, 2002
|
||||
<blockquote>
|
||||
<li>Cleared up the confusion about the location of the group descriptors in section 'Group Descriptor'</li>
|
||||
</blockquote>
|
||||
April 1st, 2002
|
||||
<blockquote>
|
||||
<li>Added the description of EXT2_INDEX_FL (Hash Indexed Directory)</li>
|
||||
<li>Fixed many table layouts</li>
|
||||
</blockquote>
|
||||
March 31st, 2002
|
||||
<blockquote>
|
||||
<li>Added the Indexed Directory Format</li>
|
||||
<li>Added .i_flags descriptions</li>
|
||||
<li>Added a collaborator section and a credits appendix</li>
|
||||
<li>Added some notes for compat/incompat features</li>
|
||||
<li>Completed the inode chapter</li>
|
||||
</blockquote>
|
||||
March 25th, 2002
|
||||
<blockquote>
|
||||
<li>Added extended attributes</li>
|
||||
</blockquote>
|
||||
<hr>
|
||||
References:
|
||||
<li><a href="http://www.science.unitn.it/~fiorella/guidelinux/tlk/node95.html">Physical Layout</a></li>
|
||||
<li><a href="http://e2fsprogs.sourceforge.net/">e2fsprogs (e2fsck)</a></li>
|
||||
<li><a href="http://e2fsprogs.sourceforge.net/ext2intro.html">Design & Implementation</a></li>
|
||||
<li><a href="ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/">Journaling (ext3)</a></li>
|
||||
<li><a href="http://kernelnewbies.org/~phillips/htree/">Hashed Directories</a></li>
|
||||
<li><a href="http://ext2resize.sourceforge.net/">Filesystem Resizing</a></li>
|
||||
<li><a href="http://acl.bestbits.at/">Extended Attributes & Access Control Lists</a></li>
|
||||
<li><a href="http://www.netspace.net.au/~reiter/e2compr/">Compression</a> (*)</li>
|
||||
<BR><BR>
|
||||
Implementations for:
|
||||
<li><a href="http://uranus.it.swin.edu.au/~jn/linux/explore2fs.htm">Windows 95/98/NT/2000</a></li>
|
||||
<li><a href="http://www.yipton.demon.co.uk/content.html#FSDEXT2">Windows 95</a> (*)</li>
|
||||
<li><a href="ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/">DOS client</a> (*)</li>
|
||||
<li><a href="http://perso.wanadoo.fr/matthieu.willm/ext2-os2/">OS/2</a> (*)</li>
|
||||
<!-- invalid url .. <li><a href="ftp://ftp.barnet.ac.uk/pub/acorn/armlinux/iscafs/">RISC OS client</a></li> -->
|
||||
<li><a href="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/uuu/dimension/cell/fs/ext2/ext2.asm">Unununium</a></li>
|
||||
<BR><BR>
|
||||
(*) no longer actively developed/supported (as of March 2002)
|
||||
<hr>
|
||||
<center>graciously hosted by <a href="http://savannah.gnu.org">Savannah</a></center>
|
||||
</body>
|
||||
</HTML>
|
||||
83
study/sabre/os/files/FileSystems/ext2fs/ext2fs_1.html
Normal file
@@ -0,0 +1,83 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_1.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Introduction</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_2.html">next</A> section.<P>
|
||||
<P>
|
||||
Copyright (C) 1994 Louis-Dominique Dubeau.
|
||||
<P>
|
||||
You may without charge, royalty or other payment, copy and distribute
|
||||
copies of this work and derivative works of this work in source or
|
||||
binary form provided that:
|
||||
<P>
|
||||
(1) you appropriately publish on each copy an appropriate copyright
|
||||
notice; (2) faithfully reproduce all prior copyright notices included in
|
||||
the original work (you may add your own copyright notice); and (3) agree
|
||||
to indemnify and hold all prior authors, copyright holders and licensors
|
||||
of the work harmless from and against all damages arising from the use
|
||||
of the work.
|
||||
<P>
|
||||
You may distribute sources of derivative works of the work provided
|
||||
that:
|
||||
<P>
|
||||
(1) (a) all source files of the original work that have been modified,
|
||||
(b) all source files of the derivative work that contain any party of the
|
||||
original work, and (c) all source files of the derivative work that are
|
||||
necessary to compile, link and run the derivative work without
|
||||
unresolved external calls and with the same functionality of the
|
||||
original work ("Necessary Sources") carry a prominent notice explaining
|
||||
the nature and date of the modification and/or creation. You are
|
||||
encouraged to make the Necessary Sources available under this license in
|
||||
order to further development and acceptance of the work.
|
||||
<P>
|
||||
EXCEPT AS OTHERWISE RESTRICTED BY LAW, THIS WORK IS PROVIDED WITHOUT ANY
|
||||
EXPRESS OR IMPLIED WARRANTIES OF ANY KIND, INCLUDING BUT NOT LIMITED TO,
|
||||
ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE,
|
||||
MERCHANTABILITY OR TITLE. EXCEPT AS OTHERWISE PROVIDED BY LAW, NO
|
||||
AUTHOR, COPYRIGHT HOLDER OR LICENSOR SHALL BE LIABLE TO YOU FOR DAMAGES
|
||||
OF ANY KIND, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
||||
<P>
|
||||
<H1><A NAME="SEC1" HREF="ext2fs_toc.html#SEC1">Introduction</A></H1>
|
||||
<P>
|
||||
This document has been written by Louis-Dominique Dubeau. It contains
|
||||
an analysis of the structure of the Second Extended File System and is
|
||||
based on a study of the Linux kernel source files. This document does
|
||||
not contain specifications written by the Ext2fs development team.
|
||||
<P>
|
||||
Ext2fs was designed by
|
||||
R<EFBFBD>my Card <A NAME="FOOT1" HREF="ext2fs_foot.html#FOOT1">(1)</A>
|
||||
as an extensible and powerful file system for Linux. It is also the most
|
||||
successful file system so far in the Linux community.
|
||||
<P>
|
||||
The first Linux file system was Minixfs: a file system originally
|
||||
developed for the Minix operating system. This file system had many
|
||||
disadvantages. Among them was: the 64MB limit on partitions, the 14
|
||||
characters limit on file names and no built in extensibility.
|
||||
<P>
|
||||
To overcome those problems,
|
||||
R<EFBFBD>my Card
|
||||
wrote extfs. This file system was mostly based upon the original Minixfs
|
||||
code and implementation. However, it removed the 64MB size limit on
|
||||
partitions, and increased the file name size limit to 255 characters.
|
||||
<P>
|
||||
In his quest for the perfect file system,
|
||||
R<EFBFBD>my
|
||||
was still unsatisfied. So he decided to write an brand new file system:
|
||||
ext2fs. This file system not only has the advantages of extfs but also
|
||||
provides a better space allocation management, allows the use of special
|
||||
flags for file management, the use of access control lists and is
|
||||
extensible.
|
||||
<P>
|
||||
Will someday
|
||||
R<EFBFBD>my
|
||||
come up with ext3fs? Who knows? However, in the meantime ext2fs is
|
||||
<STRONG>the</STRONG> de-facto standard Linux file system. This document describes
|
||||
the physical layout of an ext2 file system on disk and the management
|
||||
policies that every ext2 file system managers should implement. The
|
||||
information in this document is accurate as of version 0.5 of ext2fs
|
||||
(Linux kernel version 1.0). The information about access control lists
|
||||
is not included because no implementation of ext2fs enforce them anyway.
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_2.html">next</A> section.<P>
|
||||
49
study/sabre/os/files/FileSystems/ext2fs/ext2fs_10.html
Normal file
@@ -0,0 +1,49 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_10.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Error Handling</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_9.html">previous</A>, <A HREF="ext2fs_11.html">next</A> section.<P>
|
||||
<A NAME="IDX48"></A>
|
||||
<A NAME="IDX49"></A>
|
||||
<H1><A NAME="SEC10" HREF="ext2fs_toc.html#SEC10">Error Handling</A></H1>
|
||||
<P>
|
||||
This chapter describes how a standard ext2 file system must handle
|
||||
errors. The superblock contains two parameters controlling the way
|
||||
errors are handled. See section <A HREF="ext2fs_4.html#SEC4">Superblock</A>
|
||||
<P>
|
||||
The first of these is the <CODE>s_mount_opt</CODE> member of the superblock
|
||||
structure in memory. Its value is computed from the options specified
|
||||
when the fs is mounted. Its error handling related values are:
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>EXT2_MOUNT_ERRORS_CONT</CODE>
|
||||
<DD>continue even if an error occurs.
|
||||
<P>
|
||||
<DT><CODE>EXT2_MOUNT_ERRORS_RO</CODE>
|
||||
<DD>remount the file system read only.
|
||||
<P>
|
||||
<DT><CODE>EXT2_MOUNT_ERRORS_PANIC</CODE>
|
||||
<DD>the kernel panics on error.
|
||||
</DL>
|
||||
<P>
|
||||
The second of these is the <CODE>s_errors</CODE> member of the superblock
|
||||
structure on disk. It may take one of the following values:
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>EXT2_ERRORS_CONTINUE</CODE>
|
||||
<DD>continue even if an error occurs.
|
||||
<P>
|
||||
<DT><CODE>EXT2_ERRORS_RO</CODE>
|
||||
<DD>remount the file system read only.
|
||||
<P>
|
||||
<DT><CODE>EXT2_ERRORS_PANIC</CODE>
|
||||
<DD>in which case the kernel simply panics.
|
||||
<P>
|
||||
<DT><CODE>EXT2_ERRORS_DEFAULT</CODE>
|
||||
<DD>use the default behavior (as of 0.5a <CODE>EXT2_ERRORS_CONTINUE</CODE>).
|
||||
</DL>
|
||||
<P>
|
||||
<CODE>s_mount_opt</CODE> has precedence on <CODE>s_errors</CODE>.
|
||||
<P>Go to the <A HREF="ext2fs_9.html">previous</A>, <A HREF="ext2fs_11.html">next</A> section.<P>
|
||||
16
study/sabre/os/files/FileSystems/ext2fs/ext2fs_11.html
Normal file
@@ -0,0 +1,16 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_11.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Formulae</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_10.html">previous</A>, <A HREF="ext2fs_12.html">next</A> section.<P>
|
||||
<H1><A NAME="SEC11" HREF="ext2fs_toc.html#SEC11">Formulae</A></H1>
|
||||
<P>
|
||||
Here are a couple of formulae usually used in ext2fs managers.
|
||||
<P>
|
||||
The block number of a file relative offset:
|
||||
<P>
|
||||
block = offset / s_blocksize
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_10.html">previous</A>, <A HREF="ext2fs_12.html">next</A> section.<P>
|
||||
17
study/sabre/os/files/FileSystems/ext2fs/ext2fs_12.html
Normal file
@@ -0,0 +1,17 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_12.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Invariants</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_11.html">previous</A>, <A HREF="ext2fs_13.html">next</A> section.<P>
|
||||
<H1><A NAME="SEC12" HREF="ext2fs_toc.html#SEC12">Invariants</A></H1>
|
||||
<P>
|
||||
Here we define a set of invariant propositions. These propositions can
|
||||
be momentarily false during file manipulations in the ext2 file system
|
||||
manager. However, file invariants should be always be true for the set
|
||||
of files not currently manipulated by the file system manager. File
|
||||
system invariants should always be true when the file system manager is
|
||||
not currently manipulating the file system.
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_11.html">previous</A>, <A HREF="ext2fs_13.html">next</A> section.<P>
|
||||
10
study/sabre/os/files/FileSystems/ext2fs/ext2fs_13.html
Normal file
@@ -0,0 +1,10 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_13.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - File Invariants</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_12.html">previous</A>, <A HREF="ext2fs_14.html">next</A> section.<P>
|
||||
<H2><A NAME="SEC13" HREF="ext2fs_toc.html#SEC13">File Invariants</A></H2>
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_12.html">previous</A>, <A HREF="ext2fs_14.html">next</A> section.<P>
|
||||
9
study/sabre/os/files/FileSystems/ext2fs/ext2fs_14.html
Normal file
@@ -0,0 +1,9 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_14.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - File System Invariants</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_13.html">previous</A>, <A HREF="ext2fs_15.html">next</A> section.<P>
|
||||
<H2><A NAME="SEC14" HREF="ext2fs_toc.html#SEC14">File System Invariants</A></H2>
|
||||
<P>Go to the <A HREF="ext2fs_13.html">previous</A>, <A HREF="ext2fs_15.html">next</A> section.<P>
|
||||
30
study/sabre/os/files/FileSystems/ext2fs/ext2fs_15.html
Normal file
@@ -0,0 +1,30 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_15.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - References</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_14.html">previous</A>, <A HREF="ext2fs_16.html">next</A> section.<P>
|
||||
<H1><A NAME="SEC15" HREF="ext2fs_toc.html#SEC15">References</A></H1>
|
||||
<P>
|
||||
Here are cited the sources used to write this document. Everything is
|
||||
cited: books, sources, man pages, etc.
|
||||
<A NAME="FOOT4" HREF="ext2fs_foot.html#FOOT4">(4)</A>
|
||||
<P>
|
||||
Card, R<>my 1993. <EM>Impl<EFBFBD>mentation du syst<73>me de fichiers ext2 dans Linux</EM>,
|
||||
Rapport MASI, Institut Blaise Pascal, Paris, France.
|
||||
<P>
|
||||
Card, R<>my, et al. 1994. The ext2fs sources in Linux kernel. Available
|
||||
by ftp at nic.funet.fi.
|
||||
<P>
|
||||
Card, R<>my, Ts'o, Theodore and Tweedie, Stephen. 1994. <EM>Linux File
|
||||
Systems</EM>. Available at
|
||||
ftp://ftp.ibp.fr/pub2/linux/packages/ext2fs/ext2-1.eps.gz
|
||||
<P>
|
||||
Torvalds, Linus, et al. 1994. The Linux 1.0 kernel sources. Available
|
||||
by ftp at nic.funet.fi.
|
||||
<P>
|
||||
Ts'o, Theodore, and Card, R<>my. 1994. The e2fsprogs-0.5a sources. Available
|
||||
by ftp at sunsite.unc.edu
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_14.html">previous</A>, <A HREF="ext2fs_16.html">next</A> section.<P>
|
||||
76
study/sabre/os/files/FileSystems/ext2fs/ext2fs_16.html
Normal file
@@ -0,0 +1,76 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_16.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Concept Index</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_15.html">previous</A> section.<P>
|
||||
<H1><A NAME="SEC16" HREF="ext2fs_toc.html#SEC16">Concept Index</A></H1>
|
||||
<P>
|
||||
<DIR>
|
||||
<H2>a</H2>
|
||||
<LI><A HREF="ext2fs_8.html#IDX44">Access path</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX39">ACL inode</A>
|
||||
<H2>b</H2>
|
||||
<LI><A HREF="ext2fs_7.html#IDX36">Bad blocks list</A>
|
||||
<LI><A HREF="ext2fs_4.html#IDX20">Bitmap cache</A>
|
||||
<LI><A HREF="ext2fs_6.html#IDX22">Bitmaps, in general</A>
|
||||
<LI><A HREF="ext2fs_6.html#IDX23">Block allocation and bitmaps</A>
|
||||
<LI><A HREF="ext2fs_6.html#IDX24">Block bitmap</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX1">Blocks, in general</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX40">Boot loader inode</A>
|
||||
<H2>c</H2>
|
||||
<LI><A HREF="ext2fs_4.html#IDX21">Caching of bitmaps</A>
|
||||
<LI><A HREF="ext2fs_3.html#IDX17">Content of a group</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX33">Content of an inode</A>
|
||||
<LI><A HREF="ext2fs_8.html#IDX47">Current directory</A>
|
||||
<H2>d</H2>
|
||||
<LI><A HREF="ext2fs_2.html#IDX3">Definition of a block</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX11">Definition of a fragment</A>
|
||||
<LI><A HREF="ext2fs_8.html#IDX43">Directories, in general</A>
|
||||
<LI><A HREF="ext2fs_8.html#IDX45">Directory entries</A>
|
||||
<LI><A HREF="ext2fs_3.html#IDX15">Duplication of information</A>
|
||||
<H2>e</H2>
|
||||
<LI><A HREF="ext2fs_10.html#IDX49">Error handling</A>
|
||||
<LI><A HREF="ext2fs_10.html#IDX48">Errors, in general</A>
|
||||
<H2>f</H2>
|
||||
<LI><A HREF="ext2fs_7.html#IDX42">First normal inode</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX13">Fragment size</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX2">Fragments, in general</A>
|
||||
<H2>g</H2>
|
||||
<LI><A HREF="ext2fs_3.html#IDX14">Groups, in general</A>
|
||||
<H2>i</H2>
|
||||
<LI><A HREF="ext2fs_3.html#IDX16">Information duplication</A>
|
||||
<LI><A HREF="ext2fs_6.html#IDX25">Inode allocation and bitmaps</A>
|
||||
<LI><A HREF="ext2fs_6.html#IDX26">Inode bitmap</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX32">Inode content</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX28">Inode layout</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX31">Inode structure</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX27">Inodes, in general</A>
|
||||
<H2>l</H2>
|
||||
<LI><A HREF="ext2fs_3.html#IDX18">Layout of a group</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX29">Layout of a inode</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX35">List of bad blocks</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX10">Logical addresses range</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX8">Logical block size</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX6">Logical versus physical addresses</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX5">Logical versus physical blocks</A>
|
||||
<H2>p</H2>
|
||||
<LI><A HREF="ext2fs_8.html#IDX46">Parent directory</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX7">Physical blocks</A>
|
||||
<H2>r</H2>
|
||||
<LI><A HREF="ext2fs_2.html#IDX4">Reserved blocks</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX38">Root directory</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX37">Root inode</A>
|
||||
<H2>s</H2>
|
||||
<LI><A HREF="ext2fs_2.html#IDX12">Size of a fragment</A>
|
||||
<LI><A HREF="ext2fs_2.html#IDX9">Size of logical blocks</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX34">Special inodes</A>
|
||||
<LI><A HREF="ext2fs_7.html#IDX30">Structure of an inode</A>
|
||||
<H2>t</H2>
|
||||
<LI><A HREF="ext2fs_4.html#IDX19">Times</A>
|
||||
<H2>u</H2>
|
||||
<LI><A HREF="ext2fs_7.html#IDX41">Undelete directory inode</A>
|
||||
</DIR>
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_15.html">previous</A> section.<P>
|
||||
71
study/sabre/os/files/FileSystems/ext2fs/ext2fs_2.html
Normal file
@@ -0,0 +1,71 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_2.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Blocks and Fragments</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_1.html">previous</A>, <A HREF="ext2fs_3.html">next</A> section.<P>
|
||||
<A NAME="IDX1"></A>
|
||||
<A NAME="IDX2"></A>
|
||||
<H1><A NAME="SEC2" HREF="ext2fs_toc.html#SEC2">Blocks and Fragments</A></H1>
|
||||
<A NAME="IDX3"></A>
|
||||
<P>
|
||||
Blocks are the basic building blocks of a file system. The file system
|
||||
manager requests to read or write from the disk are always translated to
|
||||
a query to read or write an integral number of blocks from the disk.
|
||||
<A NAME="IDX4"></A>
|
||||
<P>
|
||||
Some blocks on the file system are reserved for the exclusive use of the
|
||||
superuser. This information is recorded in the <CODE>s_r_blocks_count</CODE>
|
||||
member of the superblock structure. See section <A HREF="ext2fs_4.html#SEC4">Superblock</A> Whenever the total
|
||||
number of free blocks becomes equal to the number of reserved blocks,
|
||||
the normal users can no longer allocate blocks for their use. Only the
|
||||
superuser may allocate new blocks. Without this provision for reserved
|
||||
blocks, filling up the file system might make the computer unbootable.
|
||||
Whenever the startup tasks would try to allocate a block, the computer
|
||||
would crash. With reserved blocks, we ensure a minimum space for booting
|
||||
and allowing the superuser to clean up the disk.
|
||||
<A NAME="IDX5"></A>
|
||||
<A NAME="IDX6"></A>
|
||||
<A NAME="IDX7"></A>
|
||||
<A NAME="IDX8"></A>
|
||||
<A NAME="IDX9"></A>
|
||||
<P>
|
||||
This is all very simple. However, computer scientists like to
|
||||
complicates things a bit. There are in fact two kinds of blocks, logical
|
||||
blocks and physical blocks. The addressing scheme and size of these two
|
||||
kind of blocks may vary. What happens is that when a request is made to
|
||||
manipulate the range <SAMP>`[a,b]'</SAMP> of some file, this range is first
|
||||
converted by the higher parts of the file system into a request to
|
||||
manipulate an integral number of logical blocks: <SAMP>`a'</SAMP> is rounded
|
||||
down to a logical block boundary and, <SAMP>`b'</SAMP> is rounded up to a
|
||||
logical block boundary. Then, this range of logical blocks is converted
|
||||
by lower parts of the file system into a request to manipulate an
|
||||
integral number of physical blocks. The logical block size must be the
|
||||
physical block size multiplied by a power of two <A NAME="FOOT2" HREF="ext2fs_foot.html#FOOT2">(2)</A>. So when going from logical to physical addressing
|
||||
we just have to multiply the address by this power of two.
|
||||
<A NAME="IDX10"></A>
|
||||
<P>
|
||||
The logical addresses of the file system goes from zero up to the total
|
||||
number of blocks minus one. Block zero is the boot block and is usually
|
||||
only accessed during special operations.
|
||||
<P>
|
||||
Now, the problem with blocks is that if we have a file that is not an
|
||||
integral number of blocks, space at the end of the last block is wasted.
|
||||
On average, one half block is wasted per file. On most file systems this
|
||||
means a lot of wasted space.
|
||||
<A NAME="IDX11"></A>
|
||||
<A NAME="IDX12"></A>
|
||||
<A NAME="IDX13"></A>
|
||||
<P>
|
||||
To circumvent this inconvenience, the file system uses fragments. The
|
||||
fragment size must be the physical block size multiplied by a power of
|
||||
two <A NAME="FOOT3" HREF="ext2fs_foot.html#FOOT3">(3)</A>. A file is therefore a sequence of
|
||||
blocks followed by a small sequence of consecutive fragments. When a file
|
||||
has enough ending fragments to fill a block, those fragments are grouped
|
||||
into a block. When a file is shortened, the last block may be broken into
|
||||
many contiguous fragments.
|
||||
<P>
|
||||
The general relationship between sizes is:
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_1.html">previous</A>, <A HREF="ext2fs_3.html">next</A> section.<P>
|
||||
38
study/sabre/os/files/FileSystems/ext2fs/ext2fs_3.html
Normal file
@@ -0,0 +1,38 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_3.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Groups</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_2.html">previous</A>, <A HREF="ext2fs_4.html">next</A> section.<P>
|
||||
<A NAME="IDX14"></A>
|
||||
<H1><A NAME="SEC3" HREF="ext2fs_toc.html#SEC3">Groups</A></H1>
|
||||
<A NAME="IDX15"></A>
|
||||
<A NAME="IDX16"></A>
|
||||
<P>
|
||||
The blocks on disk are divided into groups. Each of these groups
|
||||
duplicates critical information of the file system. Moreover, the
|
||||
presence of block groups on disk allow the use of efficient disk
|
||||
allocation algorithms.
|
||||
<A NAME="IDX17"></A>
|
||||
<A NAME="IDX18"></A>
|
||||
<P>
|
||||
Each group contains in that order:
|
||||
<P>
|
||||
<UL>
|
||||
<LI>the superblock. See section <A HREF="ext2fs_4.html#SEC4">Superblock</A>
|
||||
<P>
|
||||
<LI>the group descriptors. See section <A HREF="ext2fs_5.html#SEC5">Group Descriptors</A>
|
||||
<P>
|
||||
<LI>the block bitmap of the group. See section <A HREF="ext2fs_6.html#SEC6">Bitmaps</A>
|
||||
<P>
|
||||
<LI>the inode bitmap of the group.
|
||||
<P>
|
||||
<LI>the inode table of the group. See section <A HREF="ext2fs_7.html#SEC7">Inodes</A>
|
||||
<P>
|
||||
<LI>the data blocks in the group. See section <A HREF="ext2fs_2.html#SEC2">Blocks and Fragments</A>
|
||||
</UL>
|
||||
<P>
|
||||
The superblock and group descriptors of each group must carry the same
|
||||
values on disk.
|
||||
<P>Go to the <A HREF="ext2fs_2.html">previous</A>, <A HREF="ext2fs_4.html">next</A> section.<P>
|
||||
231
study/sabre/os/files/FileSystems/ext2fs/ext2fs_4.html
Normal file
@@ -0,0 +1,231 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_4.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Superblock</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_3.html">previous</A>, <A HREF="ext2fs_5.html">next</A> section.<P>
|
||||
<H1><A NAME="SEC4" HREF="ext2fs_toc.html#SEC4">Superblock</A></H1>
|
||||
<P>
|
||||
In this section, the layout of a superblock is described. Here is the
|
||||
official structure of an ext2fs superblock [include/linux/ext2_fs.h]:
|
||||
<P>
|
||||
<PRE>
|
||||
struct ext2_super_block {
|
||||
unsigned long s_inodes_count;
|
||||
unsigned long s_blocks_count;
|
||||
unsigned long s_r_blocks_count;
|
||||
unsigned long s_free_blocks_count;
|
||||
unsigned long s_free_inodes_count;
|
||||
unsigned long s_first_data_block;
|
||||
unsigned long s_log_block_size;
|
||||
long s_log_frag_size;
|
||||
unsigned long s_blocks_per_group;
|
||||
unsigned long s_frags_per_group;
|
||||
unsigned long s_inodes_per_group;
|
||||
unsigned long s_mtime;
|
||||
unsigned long s_wtime;
|
||||
unsigned short s_mnt_count;
|
||||
short s_max_mnt_count;
|
||||
unsigned short s_magic;
|
||||
unsigned short s_state;
|
||||
unsigned short s_errors;
|
||||
unsigned short s_pad;
|
||||
unsigned long s_lastcheck;
|
||||
unsigned long s_checkinterval;
|
||||
unsigned long s_reserved[238];
|
||||
};
|
||||
</PRE>
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>s_inodes_count</CODE>
|
||||
<DD>the total number of inodes on the fs.
|
||||
<P>
|
||||
<DT><CODE>s_blocks_count</CODE>
|
||||
<DD>the total number of blocks on the fs.
|
||||
<P>
|
||||
<DT><CODE>s_r_blocks_count</CODE>
|
||||
<DD>the total number of blocks reserved for the exclusive use of the
|
||||
superuser.
|
||||
<P>
|
||||
<DT><CODE>s_free_blocks_count</CODE>
|
||||
<DD>the total number of free blocks on the fs.
|
||||
<P>
|
||||
<DT><CODE>s_free_inodes_count</CODE>
|
||||
<DD>the total number of free inodes on the fs.
|
||||
<P>
|
||||
<DT><CODE>s_first_data_block</CODE>
|
||||
<DD>the position on the fs of the first data block. Usually, this is block
|
||||
number 1 for fs containing 1024 bytes blocks and is number 0 for other
|
||||
fs.
|
||||
<P>
|
||||
<DT><CODE>s_log_block_size</CODE>
|
||||
<DD>used to compute the logical block size in bytes. The logical block size
|
||||
is in fact <CODE>1024 << s_log_block_size</CODE>.
|
||||
<P>
|
||||
<DT><CODE>s_log_frag_size</CODE>
|
||||
<DD>used to compute the logical fragment size. The logical fragment size is
|
||||
in fact <CODE>1024 << s_log_frag_size</CODE> if <CODE>s_log_frag_size</CODE> is positive
|
||||
and <CODE>1024 >> -s_log_frag_size</CODE> if <CODE>s_log_frag_size</CODE> is negative.
|
||||
<P>
|
||||
<DT><CODE>s_blocks_per_group</CODE>
|
||||
<DD>the total number of blocks contained in a group.
|
||||
<P>
|
||||
<DT><CODE>s_frags_per_group</CODE>
|
||||
<DD>the total number of fragments contained in a group.
|
||||
<P>
|
||||
<DT><CODE>s_inodes_per_group</CODE>
|
||||
<DD>the total number of inodes contained in a group.
|
||||
<P>
|
||||
<DT><CODE>s_mtime</CODE>
|
||||
<DD>the time at which the last mount of the fs was performed.
|
||||
<P>
|
||||
<DT><CODE>s_wtime</CODE>
|
||||
<DD>the time at which the last write of the superblock on the fs was performed.
|
||||
<P>
|
||||
<DT><CODE>s_mnt_count</CODE>
|
||||
<DD>the number of time the fs has been mounted in read-write mode without having
|
||||
been checked.
|
||||
<P>
|
||||
<DT><CODE>s_max_mnt_count</CODE>
|
||||
<DD>the maximum number of time the fs may be mounted in read-write mode before a
|
||||
check must be done.
|
||||
<P>
|
||||
<DT><CODE>s_magic</CODE>
|
||||
<DD>a magic number that permits the identification of the file system. It is
|
||||
<CODE>0xEF53</CODE> for a normal ext2fs and <CODE>0xEF51</CODE> for versions of
|
||||
ext2fs prior to 0.2b.
|
||||
<P>
|
||||
<DT><CODE>s_state</CODE>
|
||||
<DD>the state of the file system. It contains an or'ed value of EXT2_VALID_FS
|
||||
(0x0001) which means: unmounted cleanly; and EXT2_ERROR_FS (0x0002) which
|
||||
means: errors detected by the kernel code.
|
||||
<P>
|
||||
<DT><CODE>s_errors</CODE>
|
||||
<DD>indicates what operation to perform when an error occurs. See section <A HREF="ext2fs_10.html#SEC10">Error Handling</A>
|
||||
<P>
|
||||
<DT><CODE>s_pad</CODE>
|
||||
<DD>unused.
|
||||
<P>
|
||||
<DT><CODE>s_lastcheck</CODE>
|
||||
<DD>the time of the last check performed on the fs.
|
||||
<P>
|
||||
<DT><CODE>s_checkinterval</CODE>
|
||||
<DD>the maximum possible time between checks on the fs.
|
||||
<P>
|
||||
<DT><CODE>s_reserved</CODE>
|
||||
<DD>unused.
|
||||
</DL>
|
||||
<A NAME="IDX19"></A>
|
||||
<P>
|
||||
Times are measured in seconds since 00:00:00 GMT, January 1, 1970.
|
||||
<P>
|
||||
Once the superblock is read in memory, the ext2fs kernel code calculates
|
||||
some other information and keeps them in another structure. This structure
|
||||
has the following layout:
|
||||
<P>
|
||||
<PRE>
|
||||
struct ext2_sb_info {
|
||||
unsigned long s_frag_size;
|
||||
unsigned long s_frags_per_block;
|
||||
unsigned long s_inodes_per_block;
|
||||
unsigned long s_frags_per_group;
|
||||
unsigned long s_blocks_per_group;
|
||||
unsigned long s_inodes_per_group;
|
||||
unsigned long s_itb_per_group;
|
||||
unsigned long s_desc_per_block;
|
||||
unsigned long s_groups_count;
|
||||
struct buffer_head * s_sbh;
|
||||
struct ext2_super_block * s_es;
|
||||
struct buffer_head * s_group_desc[EXT2_MAX_GROUP_DESC];
|
||||
unsigned short s_loaded_inode_bitmaps;
|
||||
unsigned short s_loaded_block_bitmaps;
|
||||
unsigned long s_inode_bitmap_number[EXT2_MAX_GROUP_LOADED];
|
||||
struct buffer_head * s_inode_bitmap[EXT2_MAX_GROUP_LOADED];
|
||||
unsigned long s_block_bitmap_number[EXT2_MAX_GROUP_LOADED];
|
||||
struct buffer_head * s_block_bitmap[EXT2_MAX_GROUP_LOADED];
|
||||
int s_rename_lock;
|
||||
struct wait_queue * s_rename_wait;
|
||||
unsigned long s_mount_opt;
|
||||
unsigned short s_mount_state;
|
||||
};
|
||||
</PRE>
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>s_frag_size</CODE>
|
||||
<DD>fragment size in bytes.
|
||||
<P>
|
||||
<DT><CODE>s_frags_per_block</CODE>
|
||||
<DD>number of fragments in a block.
|
||||
<P>
|
||||
<DT><CODE>s_inodes_per_block</CODE>
|
||||
<DD>number of inodes in a block of the inode table.
|
||||
<P>
|
||||
<DT><CODE>s_frags_per_group</CODE>
|
||||
<DD>number of fragments in a group.
|
||||
<P>
|
||||
<DT><CODE>s_blocks_per_group</CODE>
|
||||
<DD>number of blocks in a group.
|
||||
<P>
|
||||
<DT><CODE>s_inodes_per_group</CODE>
|
||||
<DD>number of inodes in a group.
|
||||
<P>
|
||||
<DT><CODE>s_itb_per_group</CODE>
|
||||
<DD>number of inode table blocks per group.
|
||||
<P>
|
||||
<DT><CODE>s_desc_per_block</CODE>
|
||||
<DD>number of group descriptors per block.
|
||||
<P>
|
||||
<DT><CODE>s_groups_count</CODE>
|
||||
<DD>number of groups.
|
||||
<P>
|
||||
<DT><CODE>s_sbh</CODE>
|
||||
<DD>the buffer containing the disk superblock in memory.
|
||||
<P>
|
||||
<DT><CODE>s_es</CODE>
|
||||
<DD>pointer to the superblock in the buffer.
|
||||
<P>
|
||||
<DT><CODE>s_group_desc</CODE>
|
||||
<DD>pointers to the buffers containing the group descriptors.
|
||||
<P>
|
||||
<DT><CODE>s_loaded_inode_bitmaps</CODE>
|
||||
<DD>number of inodes bitmap cache entries used.
|
||||
<P>
|
||||
<DT><CODE>s_loaded_block_bitmaps</CODE>
|
||||
<DD>number of blocks bitmap cache entries used.
|
||||
<P>
|
||||
<DT><CODE>s_inode_bitmap_number</CODE>
|
||||
<DD>indicates to which group the inodes bitmap in the buffers belong.
|
||||
<P>
|
||||
<DT><CODE>s_inode_bitmap</CODE>
|
||||
<DD>inode bitmap cache.
|
||||
<P>
|
||||
<DT><CODE>s_block_bitmap_number</CODE>
|
||||
<DD>indicates to which group the blocks bitmap in the buffers belong.
|
||||
<P>
|
||||
<DT><CODE>s_block_bitmap</CODE>
|
||||
<DD>block bitmap cache.
|
||||
<P>
|
||||
<DT><CODE>s_rename_lock</CODE>
|
||||
<DD>lock used to avoid two simultaneous rename operations on a fs.
|
||||
<P>
|
||||
<DT><CODE>s_rename_wait</CODE>
|
||||
<DD>wait queue used to wait for the completion of a rename operation in progress.
|
||||
<P>
|
||||
<DT><CODE>s_mount_opt</CODE>
|
||||
<DD>the mounting options specified by the administrator.
|
||||
<P>
|
||||
<DT><CODE>s_mount_state</CODE>
|
||||
<DD></DL>
|
||||
<P>
|
||||
Most of those values are computed from the superblock on disk.
|
||||
<A NAME="IDX20"></A>
|
||||
<A NAME="IDX21"></A>
|
||||
<P>
|
||||
Linux ext2fs manager caches access to the inodes and blocks
|
||||
bitmaps. This cache is a list of buffers ordered from the most recently
|
||||
used to the last recently used buffer. Managers should use the same kind
|
||||
of bitmap caching or other similar method of improving access time to
|
||||
disk.
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_3.html">previous</A>, <A HREF="ext2fs_5.html">next</A> section.<P>
|
||||
53
study/sabre/os/files/FileSystems/ext2fs/ext2fs_5.html
Normal file
@@ -0,0 +1,53 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_5.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Group Descriptors</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_4.html">previous</A>, <A HREF="ext2fs_6.html">next</A> section.<P>
|
||||
<H1><A NAME="SEC5" HREF="ext2fs_toc.html#SEC5">Group Descriptors</A></H1>
|
||||
<P>
|
||||
On disk, the group descriptors immediately follow the superblock and
|
||||
each descriptor has the following layout:
|
||||
<P>
|
||||
<PRE>
|
||||
struct ext2_group_desc
|
||||
{
|
||||
unsigned long bg_block_bitmap;
|
||||
unsigned long bg_inode_bitmap;
|
||||
unsigned long bg_inode_table;
|
||||
unsigned short bg_free_blocks_count;
|
||||
unsigned short bg_free_inodes_count;
|
||||
unsigned short bg_used_dirs_count;
|
||||
unsigned short bg_pad;
|
||||
unsigned long bg_reserved[3];
|
||||
};
|
||||
</PRE>
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>bg_block_bitmap</CODE>
|
||||
<DD>points to the blocks bitmap block for the group.
|
||||
<P>
|
||||
<DT><CODE>bg_inode_bitmap</CODE>
|
||||
<DD>points to the inodes bitmap block for the group.
|
||||
<P>
|
||||
<DT><CODE>bg_inode_table</CODE>
|
||||
<DD>points to the inodes table first block.
|
||||
<P>
|
||||
<DT><CODE>bg_free_blocks_count</CODE>
|
||||
<DD>number of free blocks in the group.
|
||||
<P>
|
||||
<DT><CODE>bg_free_inodes_count</CODE>
|
||||
<DD>number of free inodes in the group.
|
||||
<P>
|
||||
<DT><CODE>bg_used_dirs_count</CODE>
|
||||
<DD>number of inodes allocated to directories in the group.
|
||||
<P>
|
||||
<DT><CODE>bg_pad</CODE>
|
||||
<DD>padding.
|
||||
</DL>
|
||||
<P>
|
||||
The information in a group descriptor pertains only to the group it is
|
||||
actually describing.
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_4.html">previous</A>, <A HREF="ext2fs_6.html">next</A> section.<P>
|
||||
38
study/sabre/os/files/FileSystems/ext2fs/ext2fs_6.html
Normal file
@@ -0,0 +1,38 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_6.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Bitmaps</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_5.html">previous</A>, <A HREF="ext2fs_7.html">next</A> section.<P>
|
||||
<A NAME="IDX22"></A>
|
||||
<H1><A NAME="SEC6" HREF="ext2fs_toc.html#SEC6">Bitmaps</A></H1>
|
||||
<P>
|
||||
The ext2 file system uses bitmaps to keep track of allocated blocks
|
||||
and inodes.
|
||||
<A NAME="IDX23"></A>
|
||||
<A NAME="IDX24"></A>
|
||||
<P>
|
||||
The blocks bitmap of each group refers to blocks ranging from the first
|
||||
block in the group to the last block in the group. To access the bit of
|
||||
a precise block, we first have to look for the group the block belongs
|
||||
to and then look for the bit of this block in the blocks bitmap
|
||||
contained in the group. It it very important to note that the blocks
|
||||
bitmap refer in fact to the smallest allocation unit supported by the
|
||||
file system: fragments. Since the block size is always a multiple of
|
||||
fragment size, when the file system manager allocates a block, it
|
||||
actually allocates a multiple number of fragments. This use of the
|
||||
blocks bitmap permits to the file system manager to allocate and
|
||||
deallocate space on a fragment basis.
|
||||
<A NAME="IDX25"></A>
|
||||
<A NAME="IDX26"></A>
|
||||
<P>
|
||||
The inode bitmap of each group refer to inodes ranging from the first
|
||||
inode of the group to the last inode of the group. To access the bit of
|
||||
a precise inode, we first have to look for the group the inode belongs
|
||||
to and then look for the bit of this inode in the inode bitmap contained
|
||||
in the group. To obtain the inode information from the inode table, the
|
||||
process is the same, except that the final search is in the inode table
|
||||
of the group instead of the inode bitmap.
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_5.html">previous</A>, <A HREF="ext2fs_7.html">next</A> section.<P>
|
||||
180
study/sabre/os/files/FileSystems/ext2fs/ext2fs_7.html
Normal file
@@ -0,0 +1,180 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_7.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Inodes</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_6.html">previous</A>, <A HREF="ext2fs_8.html">next</A> section.<P>
|
||||
<A NAME="IDX27"></A>
|
||||
<H1><A NAME="SEC7" HREF="ext2fs_toc.html#SEC7">Inodes</A></H1>
|
||||
<A NAME="IDX28"></A>
|
||||
<A NAME="IDX29"></A>
|
||||
<A NAME="IDX30"></A>
|
||||
<A NAME="IDX31"></A>
|
||||
<A NAME="IDX32"></A>
|
||||
<A NAME="IDX33"></A>
|
||||
<P>
|
||||
An inode uniquely describes a file. Here's what an inode looks like on
|
||||
disk:
|
||||
<P>
|
||||
<PRE>
|
||||
struct ext2_inode {
|
||||
unsigned short i_mode;
|
||||
unsigned short i_uid;
|
||||
unsigned long i_size;
|
||||
unsigned long i_atime;
|
||||
unsigned long i_ctime;
|
||||
unsigned long i_mtime;
|
||||
unsigned long i_dtime;
|
||||
unsigned short i_gid;
|
||||
unsigned short i_links_count;
|
||||
unsigned long i_blocks;
|
||||
unsigned long i_flags;
|
||||
unsigned long i_reserved1;
|
||||
unsigned long i_block[EXT2_N_BLOCKS];
|
||||
unsigned long i_version;
|
||||
unsigned long i_file_acl;
|
||||
unsigned long i_dir_acl;
|
||||
unsigned long i_faddr;
|
||||
unsigned char i_frag;
|
||||
unsigned char i_fsize;
|
||||
unsigned short i_pad1;
|
||||
unsigned long i_reserved2[2];
|
||||
};
|
||||
</PRE>
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>i_mode</CODE>
|
||||
<DD>type of file (character, block, link, etc.) and access rights on the
|
||||
file.
|
||||
<P>
|
||||
<DT><CODE>i_uid</CODE>
|
||||
<DD>uid of the owner of the file.
|
||||
<P>
|
||||
<DT><CODE>i_size</CODE>
|
||||
<DD>logical size in bytes.
|
||||
<P>
|
||||
<DT><CODE>i_atime</CODE>
|
||||
<DD>last time the file was accessed.
|
||||
<P>
|
||||
<DT><CODE>i_ctime</CODE>
|
||||
<DD>last time the inode information of the file was changed.
|
||||
<P>
|
||||
<DT><CODE>i_mtime</CODE>
|
||||
<DD>last time the file content was modified.
|
||||
<P>
|
||||
<DT><CODE>i_dtime</CODE>
|
||||
<DD>when this file was deleted.
|
||||
<P>
|
||||
<DT><CODE>i_gid</CODE>
|
||||
<DD>gid of the file.
|
||||
<P>
|
||||
<DT><CODE>i_links_count</CODE>
|
||||
<DD>number of links pointing to this file.
|
||||
<P>
|
||||
<DT><CODE>i_blocks</CODE>
|
||||
<DD>number of blocks allocated to this file counted in 512 bytes units.
|
||||
<P>
|
||||
<DT><CODE>i_flags</CODE>
|
||||
<DD>flags (see below).
|
||||
<P>
|
||||
<DT><CODE>i_reserved1</CODE>
|
||||
<DD>reserved.
|
||||
<P>
|
||||
<DT><CODE>i_block</CODE>
|
||||
<DD>pointers to blocks (see below).
|
||||
<P>
|
||||
<DT><CODE>i_version</CODE>
|
||||
<DD>version of the file (used by NFS).
|
||||
<P>
|
||||
<DT><CODE>i_file_acl</CODE>
|
||||
<DD>control access list of the file (not used yet).
|
||||
<P>
|
||||
<DT><CODE>i_dir_acl</CODE>
|
||||
<DD>control access list of the directory (not used yet).
|
||||
<P>
|
||||
<DT><CODE>i_faddr</CODE>
|
||||
<DD>block where the fragment of the file resides.
|
||||
<P>
|
||||
<DT><CODE>i_frag</CODE>
|
||||
<DD>number of the fragment in the block.
|
||||
<P>
|
||||
<DT><CODE>i_size</CODE>
|
||||
<DD>size of the fragment.
|
||||
<P>
|
||||
<DT><CODE>i_pad1</CODE>
|
||||
<DD>padding.
|
||||
<P>
|
||||
<DT><CODE>i_reserved2</CODE>
|
||||
<DD>reserved.
|
||||
</DL>
|
||||
<P>
|
||||
As you can see, the inode contains, <CODE>EXT2_N_BLOCKS</CODE> (15 in ext2fs
|
||||
0.5) pointers to block. Of theses pointers, the first
|
||||
<CODE>EXT2_NDIR_BLOCKS</CODE> (12) are direct pointers to data. The following entry
|
||||
points to a block of pointers to data (indirect). The following entry
|
||||
points to a block of pointers to blocks of pointers to data (double
|
||||
indirection). The following entry points to a block of pointers to a
|
||||
block of pointers to a block of pointers to data (triple indirection).
|
||||
<P>
|
||||
The inode flags may take one or more of the following or'ed values:
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>EXT2_SECRM_FL 0x0001</CODE>
|
||||
<DD>secure deletion. This usually means that when this flag is set and we
|
||||
delete the file, random data is written in the blocks previously allocated
|
||||
to the file.
|
||||
<P>
|
||||
<DT><CODE>EXT2_UNRM_FL 0x0002</CODE>
|
||||
<DD>undelete. When this flag is set and the file is being deleted, the file
|
||||
system code must store enough information to ensure the undeletion of
|
||||
the file (to a certain extent).
|
||||
<P>
|
||||
<DT><CODE>EXT2_COMPR_FL 0x0004</CODE>
|
||||
<DD>compress file. The content of the file is compressed, the file system
|
||||
code must use compression/decompression algorithms when accessing the
|
||||
data of this file.
|
||||
<P>
|
||||
<DT><CODE>EXT2_SYNC_FL 0x0008</CODE>
|
||||
<DD>synchronous updates. The disk representation of this file must be kept
|
||||
in sync with it's in core representation. Asynchronous I/O on this kind
|
||||
of file is not possible. The synchronous updates only apply to the inode
|
||||
itself and to the indirect blocks. Data blocks are always written
|
||||
asynchronously on the disk.
|
||||
</DL>
|
||||
<A NAME="IDX34"></A>
|
||||
<A NAME="IDX35"></A>
|
||||
<A NAME="IDX36"></A>
|
||||
<A NAME="IDX37"></A>
|
||||
<A NAME="IDX38"></A>
|
||||
<A NAME="IDX39"></A>
|
||||
<A NAME="IDX40"></A>
|
||||
<A NAME="IDX41"></A>
|
||||
<A NAME="IDX42"></A>
|
||||
<P>
|
||||
Some inodes have a special meaning:
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>EXT2_BAD_INO 1</CODE>
|
||||
<DD>a file containing the list of bad blocks on the file system.
|
||||
<P>
|
||||
<DT><CODE>EXT2_ROOT_INO 2</CODE>
|
||||
<DD>the root directory of the file system.
|
||||
<P>
|
||||
<DT><CODE>EXT2_ACL_IDX_INO 3</CODE>
|
||||
<DD>ACL inode.
|
||||
<P>
|
||||
<DT><CODE>EXT2_ACL_DATA_INO 4</CODE>
|
||||
<DD>ACL inode.
|
||||
<P>
|
||||
<DT><CODE>EXT2_BOOT_LOADER_INO 5</CODE>
|
||||
<DD>the file containing the boot loader. (Not used yet it seems.)
|
||||
<P>
|
||||
<DT><CODE>EXT2_UNDEL_DIR_INO 6</CODE>
|
||||
<DD>the undelete directory of the system.
|
||||
<P>
|
||||
<DT><CODE>EXT2_FIRST_INO 11</CODE>
|
||||
<DD>this is the first inode that does not have a special meaning.
|
||||
</DL>
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_6.html">previous</A>, <A HREF="ext2fs_8.html">next</A> section.<P>
|
||||
50
study/sabre/os/files/FileSystems/ext2fs/ext2fs_8.html
Normal file
@@ -0,0 +1,50 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_8.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Directories</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_7.html">previous</A>, <A HREF="ext2fs_9.html">next</A> section.<P>
|
||||
<A NAME="IDX43"></A>
|
||||
<H1><A NAME="SEC8" HREF="ext2fs_toc.html#SEC8">Directories</A></H1>
|
||||
<A NAME="IDX44"></A>
|
||||
<P>
|
||||
Directories are special files that are used to create access path to
|
||||
the files on disk. It is very important to understand that an inode may
|
||||
have many access paths. Since the directories are essential part of the
|
||||
file system, they have a specific structure. A directory file is a list
|
||||
of entries of the following format:
|
||||
<P>
|
||||
<PRE>
|
||||
struct ext2_dir_entry {
|
||||
unsigned long inode;
|
||||
unsigned short rec_len;
|
||||
unsigned short name_len;
|
||||
char name[EXT2_NAME_LEN];
|
||||
};
|
||||
</PRE>
|
||||
<P>
|
||||
<DL COMPACT>
|
||||
<DT><CODE>inode</CODE>
|
||||
<DD>points to the inode of the file.
|
||||
<P>
|
||||
<DT><CODE>rec_len</CODE>
|
||||
<DD>length of the entry record.
|
||||
<P>
|
||||
<DT><CODE>name_len</CODE>
|
||||
<DD>length of the file name.
|
||||
<P>
|
||||
<DT><CODE>name</CODE>
|
||||
<DD>name of the file. This name may have a maximum length of
|
||||
<CODE>EXT2_NAME_LEN</CODE> bytes (255 bytes as of version 0.5).
|
||||
</DL>
|
||||
<A NAME="IDX45"></A>
|
||||
<A NAME="IDX46"></A>
|
||||
<A NAME="IDX47"></A>
|
||||
<P>
|
||||
There is such an entry in the directory file for each file in the
|
||||
directory. Since ext2fs is a Unix file system the first two entries in
|
||||
the directory are file <SAMP>`.'</SAMP> and <SAMP>`..'</SAMP> which points to the
|
||||
current directory and the parent directory respectively.
|
||||
<P>
|
||||
<P>Go to the <A HREF="ext2fs_7.html">previous</A>, <A HREF="ext2fs_9.html">next</A> section.<P>
|
||||
39
study/sabre/os/files/FileSystems/ext2fs/ext2fs_9.html
Normal file
@@ -0,0 +1,39 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_9.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Allocation algorithms</TITLE>
|
||||
<P>Go to the <A HREF="ext2fs_8.html">previous</A>, <A HREF="ext2fs_10.html">next</A> section.<P>
|
||||
<H1><A NAME="SEC9" HREF="ext2fs_toc.html#SEC9">Allocation algorithms</A></H1>
|
||||
<P>
|
||||
Here are the allocation algorithms that ext2 file system managers
|
||||
<STRONG>must</STRONG> use. We are adamant on this point. Nowadays, many users
|
||||
use more than one operating system on the same computer. If more than
|
||||
one operating system use the same ext2 partition, they have to use the
|
||||
same allocation algorithms. If they do otherwise, what will happen is
|
||||
that one file system manager will undo the work of the other file system
|
||||
manager. It is useless to have a manager that uses highly efficient
|
||||
allocation algorithms if the other one does not bother with allocation
|
||||
and uses quick and dirty algorithms.
|
||||
<P>
|
||||
Here are the rules used to allocate new inodes:
|
||||
<P>
|
||||
<UL>
|
||||
<LI>the inode for a new file is allocated in the same group of the
|
||||
inode of its parent directory.
|
||||
<P>
|
||||
<LI>inodes are allocated equally between groups.
|
||||
</UL>
|
||||
<P>
|
||||
Here are the rules used to allocate new blocks:
|
||||
<P>
|
||||
<UL>
|
||||
<LI>a new block is allocated in the same group as its inode.
|
||||
<P>
|
||||
<LI>allocate consecutive sequences of blocks.
|
||||
</UL>
|
||||
<P>
|
||||
Of course, it may be sometimes impossible to abide by those rules. In
|
||||
this case, the manager may allocate the block or inode anywhere.
|
||||
<P>Go to the <A HREF="ext2fs_8.html">previous</A>, <A HREF="ext2fs_10.html">next</A> section.<P>
|
||||
29
study/sabre/os/files/FileSystems/ext2fs/index.html
Normal file
@@ -0,0 +1,29 @@
|
||||
<!-- X-URL: http://step.polymtl.ca/~ldd/ext2fs/ext2fs_toc.html -->
|
||||
|
||||
<!-- This HTML file has been created by texi2html 1.29
|
||||
from ext2fs.texi on 3 August 1994 -->
|
||||
|
||||
<TITLE>Analysis of the Ext2fs structure - Table of Contents</TITLE>
|
||||
<H1>Analysis of the Ext2fs structure</H1>
|
||||
<ADDRESS>Louis-Dominique Dubeau</ADDRESS>
|
||||
<P>
|
||||
<UL>
|
||||
<LI><A NAME="SEC1" HREF="ext2fs_1.html#SEC1">Introduction</A>
|
||||
<LI><A NAME="SEC2" HREF="ext2fs_2.html#SEC2">Blocks and Fragments</A>
|
||||
<LI><A NAME="SEC3" HREF="ext2fs_3.html#SEC3">Groups</A>
|
||||
<LI><A NAME="SEC4" HREF="ext2fs_4.html#SEC4">Superblock</A>
|
||||
<LI><A NAME="SEC5" HREF="ext2fs_5.html#SEC5">Group Descriptors</A>
|
||||
<LI><A NAME="SEC6" HREF="ext2fs_6.html#SEC6">Bitmaps</A>
|
||||
<LI><A NAME="SEC7" HREF="ext2fs_7.html#SEC7">Inodes</A>
|
||||
<LI><A NAME="SEC8" HREF="ext2fs_8.html#SEC8">Directories</A>
|
||||
<LI><A NAME="SEC9" HREF="ext2fs_9.html#SEC9">Allocation algorithms</A>
|
||||
<LI><A NAME="SEC10" HREF="ext2fs_10.html#SEC10">Error Handling</A>
|
||||
<LI><A NAME="SEC11" HREF="ext2fs_11.html#SEC11">Formulae</A>
|
||||
<LI><A NAME="SEC12" HREF="ext2fs_12.html#SEC12">Invariants</A>
|
||||
<UL>
|
||||
<LI><A NAME="SEC13" HREF="ext2fs_13.html#SEC13">File Invariants</A>
|
||||
<LI><A NAME="SEC14" HREF="ext2fs_14.html#SEC14">File System Invariants</A>
|
||||
</UL>
|
||||
<LI><A NAME="SEC15" HREF="ext2fs_15.html#SEC15">References</A>
|
||||
<LI><A NAME="SEC16" HREF="ext2fs_16.html#SEC16">Concept Index</A>
|
||||
</UL>
|
||||
BIN
study/sabre/os/files/FileSystems/ext2future.ag.pdf
Normal file
169
study/sabre/os/files/FileSystems/fatFilesystem.txt
Normal file
@@ -0,0 +1,169 @@
|
||||
By: Inbar Raz
|
||||
--------------------------------------------------------------------
|
||||
|
||||
The FAT is a linked-list table that DOS uses to keep track of the physical
|
||||
position of data on a disk and for locating free space for storing new files.
|
||||
|
||||
The word at offset 1aH in a directory entry is a cluster number of the first
|
||||
cluster in an allocation chain. If you locate that cell in the FAT, it will
|
||||
either indicate the end of the chain or the next cell, etc. Observe:
|
||||
|
||||
starting cluster number --|
|
||||
Directory +-------------------+-+-------------------+---+---+-+-+-------+
|
||||
Entry -- |M Y F I L E T X T|a| |tim|dat|08 | size |
|
||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|-+-+-+-+-+
|
||||
+-------------------------+
|
||||
00 01 02 03 04 05 06 07 |8 09 0a 0b 0c 0d 0e 0f
|
||||
+--++--++--++--++--++--++--++--++-++--++--++--++--++--++--++--+
|
||||
00 |ID||ff||03-04-05-ff||00||00||09-0a-0b-15||00||00||00||00|
|
||||
+--++--++--++--++--++--++--++--++--++--++--++|-++--++--++--++--+
|
||||
+-----------------------+
|
||||
+--++--++--++--++--++-++--++--++--++--++--++--++--++--++--++--+
|
||||
10 |00||00||00||00||00||16-17-19||f7||1a-1b-ff||00||00||00||00|
|
||||
+--++--++--++--++--++--++--++|-++--++-++--++--++--++--++--++--+
|
||||
+-------+
|
||||
This diagram illustrates the main concepts of reading the FAT. In it:
|
||||
<EFBFBD> The file MYFILE.TXT is 10 clusters long. The first byte is in cluster 08
|
||||
and the last is in cluster 1bH. The chain is 8,9,0a,0b,15,16,17,19,1a,1b.
|
||||
Each entry indicates the next entry in the chain, with a special code in
|
||||
the last entry.
|
||||
<EFBFBD> Cluster 18H is marked bad and is not part of any allocation chain.
|
||||
<EFBFBD> Clusters 6,7, 0cH-14H, and 1cH-1fH are empty and available for allocation.
|
||||
<EFBFBD> Another chain starts at cluster 2 and ends at cluster 5.
|
||||
|
||||
+-----------+
|
||||
| FAT Facts | The FAT normally starts at logical sector 1 in the DOS partition
|
||||
+-----------+ (eg, you can read it with INT 25H with DX=1). The only way to
|
||||
be sure is to read the boot sector (DX=0), and examine offset 0eH. This
|
||||
tells how many boot and reserved sectors come before the FAT. Use that
|
||||
number (usually 1) in DX to read the FAT via INT 25H.
|
||||
|
||||
There may be more than one copy of the FAT. There are usually two complete
|
||||
copies. If there are two or more, they will all be adjacent (the second FAT
|
||||
directly follows the first).
|
||||
|
||||
You have the following services available to help you determine information
|
||||
about the FAT:
|
||||
|
||||
<20> Use INT 25H to read the Boot Sector and examine the data fields therein
|
||||
<20> Use DOS Fn 36H or 1cH to determine total disk sectors and clusters
|
||||
<20> Use DOS Fn 44H (if the device driver supports Generic IOCTL) DOS 3.2
|
||||
<20> Use DOS Fn 32H to get all kinds of useful information. UNDOCUMENTED
|
||||
|
||||
Note: The boot sector of non-booting disks (such as network block devices
|
||||
and very old hard disks) may contain nothing but garbage.
|
||||
|
||||
+---------------+
|
||||
| 12-bit/16-bit | The FAT can be laid out in 12-bit or 16-bit entries. 12-bit
|
||||
+---------------+ entries are very efficient for media less than 384K--the
|
||||
entire FAT can fit in a single 512-byte disk sector. For larger media, each
|
||||
FAT entry must map to a larger and larger cluster size--to the point where a
|
||||
20M hard disk would need to allocate in units of 16 sectors in order to use
|
||||
the 12-bit format (in other words, a 1-byte file would take up a full 8K
|
||||
cluster of a disk).
|
||||
|
||||
16-bit FAT entries were introduced with DOS 3.0 with the necessity of
|
||||
efficient handling the AT's 20-Megabyte hard disk. However, floppy disks
|
||||
and 10M hard disks continue to use the 12-bit layout. You can determine if
|
||||
the FAT is laid out with 12-bit or 16-bit elements:
|
||||
|
||||
DOS 3.0 says: If a disk has more than 4086 (0ff6H) clusters, it uses 16 bits
|
||||
(4096 is max value for a 12-bit number and >0ff6H is reserved)
|
||||
DOS 3.2 says: If a disk has more than 20740 (5104H) SECTORS, it uses 16 bits
|
||||
(in other words, any disk over 10 Megabytes uses a 16-bit FAT
|
||||
and all others--including large RAM disks--use 12-bits).
|
||||
|
||||
Note: It's a common misconception that the 16-bit FAT allows DOS to work with
|
||||
disks larger than 32 Megabytes. In fact, the limiting factor is that
|
||||
INT 25H/26H (through which DOS performs its disk I/O) in unable to
|
||||
access a SECTOR number higher than 65535. Normally, sectors are 512
|
||||
bytes (<28>-K), so that sets the 32M limit.
|
||||
|
||||
In DOS 4.0, INT 25H/26H supports a technique for accessing sector
|
||||
numbers
|
||||
higher than 65535, and thus supports trans-32M DOS partitions. This
|
||||
has no effect on the layout of the FAT itself. Using 16-bit FAT
|
||||
entries and 4-sector clusters, DOS now supports partitions up to 134M
|
||||
(twice that for 8-sector clusters, etc.).
|
||||
|
||||
+-----------------+
|
||||
| Reading the FAT | To read the value of any entry in a FAT (as when following
|
||||
+-----------------+ a FAT chain), first read the entire FAT into memory and
|
||||
obtain a starting cluster number from a directory. Then, for 12-bit entries:
|
||||
==============
|
||||
<EFBFBD> Multiply the cluster number by 3 =|
|
||||
<EFBFBD> Divide the result by 2 =========+= (each entry is 1.5 (3/2) bytes long)
|
||||
<EFBFBD> Read the WORD at the resulting address (as offset from the start of the FAT)
|
||||
<EFBFBD> If the cluster was even, mask the value by 0fffH (keep the low 12 bits)
|
||||
If the cluster number was odd, shift the value right by 4 bits (keep the
|
||||
upper 12 bits)
|
||||
<EFBFBD> The result is the entry for the next cluster in the chain (0fffH=the end).
|
||||
|
||||
Note: A 12-bit entry can cross over a sector boundary, so be careful with
|
||||
1-sector FAT buffering schemes.
|
||||
|
||||
16-bit entries are simpler--each entry contains the 16-bit offset (from the
|
||||
start of the FAT) of the next entry in the chain (0ffffH indicates the end).
|
||||
|
||||
+-------------+
|
||||
| FAT Content | The first byte of the FAT is called the Media Descriptor or
|
||||
+-------------+ FAT ID byte. The next 5 bytes (12-bit FATs) or 7 bytes
|
||||
(16-bit FATs) are 0ffH. The rest of the FAT is composed of 12-bit or 16-bit
|
||||
cells that each represent one disk cluster. These cells will contain one of
|
||||
the following values:
|
||||
|
||||
<20> (0)000H ................... an available cluster
|
||||
<20> (f)ff0H through (f)ff7H ... a reserved cluster
|
||||
<20> (f)ff7H ................... a bad cluster
|
||||
<20> (f)ff8H through (f)fffH ... the end of an allocation chain
|
||||
<20> (0)002H through (f)fefH ... the number of the next cluster in a chain
|
||||
|
||||
Note: the high nibble of the value is used only in 16-bit FATs; eg, a bad
|
||||
cluster is marked with 0ff7H in 12-bit FATs, and fff7H with 16-bit FATs.
|
||||
|
||||
+------------------------------------------------+
|
||||
| Converting a Cluster Number to a Sector Number | After you obtain a file's
|
||||
+------------------------------------------------+ starting cluster number
|
||||
from a directory entry you will want to locate to actual disk sector that
|
||||
holds the file (or subdirectory) data.
|
||||
|
||||
A diskette (or a DOS partition of a hard disk) is laid out like so:
|
||||
|
||||
<20> Boot and reserved sector(s)
|
||||
<20> FAT #1
|
||||
<20> FAT #2 (optional -- not used on RAM disks)
|
||||
<20> root directory
|
||||
<20> data area (all file data reside here, including files for directories)
|
||||
|
||||
Every section of this layout is variable and the sizes of each section must
|
||||
be known in order to perform a correct cluster-to-sector conversion. The
|
||||
following formulae represent the only documented method of determining a DOS
|
||||
logical sector number from a cluster number:
|
||||
|
||||
RootDirSectors = sectorBytes / (rootDirEntries * 32)
|
||||
FatSectors = fatCount * sectorsPerFat
|
||||
DataStart = reservedSectors + fatSectors + rootDirSectors
|
||||
|
||||
INT 25h/26h Sector = DataStart + ((AnyClusterNumber-2) * sectorsPerCluster)
|
||||
|
||||
Where the variables:
|
||||
|
||||
sectorBytes sectorsPerFat fatCount
|
||||
rootDirEntries reservedSectors sectorsPerCluster
|
||||
|
||||
are obtained from the Boot Sector or from a BPB (if you can get access). The
|
||||
resulting sector number can be used in DX for INT 25H/26H DOS absolute disk
|
||||
access.
|
||||
|
||||
If you are a daring sort of person, you can save trouble by using the
|
||||
undocumented DOS Fn 32H (Get Disk Info) which provides a package of pre-
|
||||
calculated data, including the sector number of the start of file data (it
|
||||
gives you "DataStart", in the above equation).
|
||||
|
||||
Author's note: The best use I've found for all this information is in
|
||||
directory scanning; ie, to bypass the DOS file-searching services and read
|
||||
directory sectors directly. For a program that must obtain a list of all
|
||||
files and directories, direct access of directory sectors will work roughly
|
||||
twice as fast as DOS Fns 4eH and 4fH.
|
||||
|
||||
|
||||
BIN
study/sabre/os/files/FileSystems/ih99-stegfs.pdf
Normal file
7
study/sabre/os/files/FileSystems/index.html
Normal file
@@ -0,0 +1,7 @@
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="refresh" content="0;url=/Linux.old/sabre/os/articles">
|
||||
</head>
|
||||
<body lang="zh-CN">
|
||||
</body>
|
||||
</html>
|
||||
531
study/sabre/os/files/FileSystems/iso9660.html
Normal file
@@ -0,0 +1,531 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>ISO9660 Simplified for DOS/Windows</TITLE>
|
||||
<META name="description"
|
||||
content="A simplfied description of the ISO9660 file specification as
|
||||
used on CD-ROM disks with DOS and Windows.">
|
||||
</HEAD>
|
||||
<BODY>
|
||||
|
||||
<H2>ISO9660 Simplified for DOS/Windows<BR>
|
||||
by Philip J. Erdelsky</H2>
|
||||
|
||||
<H4>1. Introduction</H4>
|
||||
|
||||
<P>We weren't sure about it a few years ago, but by now it should be
|
||||
clear to everyone that CD-ROM's are here to stay. Most PC's are equipped
|
||||
with CD-ROM readers, and most major PC software packages are being
|
||||
distributed on CD-ROM's.
|
||||
|
||||
<P>Under DOS (and Windows, which uses the DOS file system) files are
|
||||
written to both hard and floppy disks with a so-called FAT (File
|
||||
Allocation Table) file system.
|
||||
|
||||
<P>Files on a CD-ROM, however, are written to a different standard,
|
||||
called ISO9660. ISO9660 is rather complex and poorly written, and
|
||||
obviously contains a number of diplomatic compromises among advocates of
|
||||
DOS, UNIX, MVS and perhaps other operating systems.
|
||||
|
||||
<P>The simplified version presented here includes only features that
|
||||
would normally be found on a CD-ROM to be used in a DOS system and which
|
||||
are supported by the Microsoft MS-DOS CD-ROM Extensions (MSCDEX). It is
|
||||
based on ISO9660, on certain documents regarding MSCDEX (version 2.10),
|
||||
and on the contents of some actual CD-ROM's.
|
||||
|
||||
<P>Where a field has a specific value on a CD-ROM to be used with DOS,
|
||||
that value is given in this document. However, in some cases a brief
|
||||
description of values for use with other operating systems is given in
|
||||
square brackets.
|
||||
|
||||
<P>ISO9660 makes provisions for sets of CD-ROM's, and apparently even
|
||||
permits a file system to span more than one CD-ROM. However, this
|
||||
feature is not supported by MSCDEX.
|
||||
|
||||
|
||||
<H4>2. Files</H4>
|
||||
|
||||
<P>The directory structure on a CD-ROM is almost exactly like that on a
|
||||
DOS floppy or hard disk. (It is presumed that the reader of this
|
||||
document is reasonably familiar with the DOS file system.) For this
|
||||
reason, DOS and Windows applications can read files from a CD-ROM just
|
||||
as they would from a floppy or hard disk.
|
||||
|
||||
<P>There are only a few differences, which do not affect most
|
||||
applications:
|
||||
|
||||
<OL>
|
||||
<LI>The root directory contains the notorious "." and ".." entries,
|
||||
just like any other directory.
|
||||
|
||||
<LI>There is no limit, other than disk capacity, to the size of the
|
||||
root directory.
|
||||
|
||||
<LI>The depth of directory nesting is limited to eight levels,
|
||||
including the root. For example, if drive E: contains a CD-ROM,
|
||||
a file such as E:\D2\D3\D4\D5\D6\D7\D8\FOO.TXT is permitted but
|
||||
E:\D2\D3\D4\D5\D6\D7\D8\D9\FOO.TXT is not.
|
||||
|
||||
<LI>If a CD-ROM is to be used by a DOS system, file names and
|
||||
extensions must be limited to eight and three characters,
|
||||
respectively, even though ISO9660 permits longer names and
|
||||
extensions.
|
||||
|
||||
<LI>ISO9660 permits only capital letters, digits and underscores in
|
||||
a file or directory name or extension, but DOS also permits a
|
||||
number of other punctuation marks.
|
||||
|
||||
<LI>ISO9660 permits a file to have an extension but not a name, but
|
||||
DOS does not.
|
||||
|
||||
<LI>DOS permits a directory to have an extension, but ISO9660 does
|
||||
not.
|
||||
|
||||
<LI>Directories on a CD-ROM are always sorted, as described below.
|
||||
</OL>
|
||||
|
||||
<P>Of course, neither DOS, nor UNIX, nor any other operating system can
|
||||
WRITE files to a CD-ROM as it would to a floppy or hard disk, because a
|
||||
CD-ROM is not rewritable. Files must be written to the CD-ROM by a
|
||||
special program with special hardware.
|
||||
|
||||
|
||||
<H4>3. Sectors</H4>
|
||||
|
||||
<P>The information on a CD-ROM is divided into sectors, which are
|
||||
numbered consecutively, starting with zero. There are no gaps in the
|
||||
numbering.
|
||||
|
||||
<P>Each sector contains 2048 8-bit bytes. (ISO9660 apparently permits
|
||||
other sector sizes, but the 2048-byte size seems to be universal.)
|
||||
|
||||
<P>When a number of sectors are to be read from the CD-ROM, they should
|
||||
be read in order of increasing sector number, if possible, since that is
|
||||
the order in which they pass under the read head as the CD-ROM rotates.
|
||||
Most implementations arrange the information so sectors will be read in
|
||||
this order for typical file operations, although ISO9660 does not
|
||||
require this in all cases.
|
||||
|
||||
<P>The order of bytes within a sector is considered to be the order in
|
||||
which they appear when read into memory; i.e., the "first" bytes are
|
||||
read into the lowest memory addresses. This is also the order used in
|
||||
this document; i.e., the "first" bytes in any list appear at the top of
|
||||
the list.
|
||||
|
||||
|
||||
<H4>4. Character Sets</H4>
|
||||
|
||||
<P>Names and extensions of files and directories, the volume name, and
|
||||
some other names are expressed in standard ASCII character codes
|
||||
(although ISO9660 does not use the name ASCII). According to ISO9660,
|
||||
only capital letters, digits, and underscores are permitted. However,
|
||||
DOS permits some other punctuation marks, which are sometimes found on
|
||||
CD-ROM's, in apparent defiance of ISO9660.
|
||||
|
||||
<P>MSCDEX does offer support for the kanji (Japanese) character set.
|
||||
However, this document does not cover kanji.
|
||||
|
||||
|
||||
<H4>5. Sorting Names or Extensions</H4>
|
||||
|
||||
<P>Where ISO9660 requires file or directory names or extensions to be
|
||||
sorted, the usual ASCII collating sequence is used. That is, two
|
||||
different names or extensions are compared as follows:
|
||||
|
||||
<OL>
|
||||
<LI>ASCII blanks (32) are added to the right end of the shorter
|
||||
name or extension, if necessary, to make it as long as the
|
||||
longer name or extension.
|
||||
|
||||
<LI>The first (leftmost) position in which the names or extensions
|
||||
are not identical determines the order. The name or extension
|
||||
with the lower ASCII code in that position appears first in the
|
||||
sorted order.
|
||||
</OL>
|
||||
|
||||
|
||||
<H4>6. Multiple-Byte Values</H4>
|
||||
|
||||
<P>A 16-bit numeric value (usually called a word) may be represented on
|
||||
a CD-ROM in any of three ways:
|
||||
|
||||
<DL>
|
||||
<DT>Little Endian Word: <DD>The value occupies two consecutive bytes, with
|
||||
the less significant byte first.
|
||||
|
||||
<DT>Big Endian Word: <DD>The value occupies two consecutive bytes, with
|
||||
the more significant byte first.
|
||||
|
||||
<DT>Both Endian Word: <DD>The value occupies FOUR consecutive bytes; the
|
||||
first and second bytes contain the value expressed as a little
|
||||
endian word, and the third and fourth bytes contain the same
|
||||
value expressed as a big endian word.
|
||||
</DL>
|
||||
|
||||
<P>A 32-bit numeric value (usually called a double word) may be
|
||||
represented on a CD-ROM in any of three ways:
|
||||
|
||||
<DL>
|
||||
<DT>Little Endian Double Word: <DD>The value occupies four consecutive
|
||||
bytes, with the least significant byte first and the other bytes
|
||||
in order of increasing significance.
|
||||
|
||||
<DT>Big Endian Double Word: <DD>The value occupies four consecutive bytes,
|
||||
with the most significant first and the other bytes in order of
|
||||
decreasing significance.
|
||||
|
||||
<DT>Both Endian Double Word: <DD>The value occupies EIGHT consecutive
|
||||
bytes; the first four bytes contain the value expressed as a
|
||||
little endian double word, and the last four bytes contain the
|
||||
same value expressed as a big endian double word.
|
||||
</DL>
|
||||
|
||||
<H4>7. The First Sixteen Sectors are Empty</H4>
|
||||
|
||||
<P>The first sixteen sectors (sector numbers 0 to 15, inclusive) contain
|
||||
nothing but zeros. ISO9660 does not define the contents of these
|
||||
sectors, but for DOS they are apparently always written as zeros. They
|
||||
are apparently reserved for use by systems that can be booted from a
|
||||
CD-ROM.
|
||||
|
||||
|
||||
<H4>8. The Volume Descriptors</H4>
|
||||
|
||||
<P>Sector 16 and a few of the following sectors contain a series of
|
||||
volume descriptors. There are several kinds of volume descriptor, but
|
||||
only two are normally used with DOS. Each volume descriptor occupies
|
||||
exactly one sector.
|
||||
|
||||
<P>The last volume descriptors in the series are one or more Volume
|
||||
Descriptor Set Terminators. The first seven bytes of a Volume Descriptor
|
||||
Set Terminator are 255, 67, 68, 48, 48, 49 and 1, respectively. The
|
||||
other 2041 bytes are zeros. (The middle bytes are the ASCII codes for
|
||||
the characters CD001.)
|
||||
|
||||
<P>The only volume descriptor of real interest under DOS is the Primary
|
||||
Volume Descriptor. There must be at least one, and there is usually only
|
||||
one. However, some CD-ROM's have two or more identical Primary Volume
|
||||
Descriptors. The contents of a Primary Volume Descriptor are as follows:
|
||||
|
||||
<pre>
|
||||
length
|
||||
in bytes contents
|
||||
-------- ---------------------------------------------------------
|
||||
1 1
|
||||
6 67, 68, 48, 48, 49 and 1, respectively (same as Volume
|
||||
Descriptor Set Terminator)
|
||||
1 0
|
||||
32 system identifier
|
||||
32 volume identifier
|
||||
8 zeros
|
||||
8 total number of sectors, as a both endian double word
|
||||
32 zeros
|
||||
4 1, as a both endian word [volume set size]
|
||||
4 1, as a both endian word [volume sequence number]
|
||||
4 2048 (the sector size), as a both endian word
|
||||
8 path table length in bytes, as a both endian double word
|
||||
4 number of first sector in first little endian path table,
|
||||
as a little endian double word
|
||||
4 number of first sector in second little endian path table,
|
||||
as a little endian double word, or zero if there is no
|
||||
second little endian path table
|
||||
4 number of first sector in first big endian path table,
|
||||
as a big endian double word
|
||||
4 number of first sector in second big endian path table,
|
||||
as a big endian double word, or zero if there is no
|
||||
second big endian path table
|
||||
34 root directory record, as described below
|
||||
128 volume set identifier
|
||||
128 publisher identifier
|
||||
128 data preparer identifier
|
||||
128 application identifier
|
||||
37 copyright file identifier
|
||||
37 abstract file identifier
|
||||
37 bibliographical file identifier
|
||||
17 date and time of volume creation
|
||||
17 date and time of most recent modification
|
||||
17 date and time when volume expires
|
||||
17 date and time when volume is effective
|
||||
1 1
|
||||
1 0
|
||||
512 reserved for application use (usually zeros)
|
||||
653 zeros
|
||||
</pre>
|
||||
|
||||
<P>The first 11 characters of the volume identifier are returned as the
|
||||
volume identifier by standard DOS system calls and utilities.
|
||||
|
||||
<P>Other identifiers are not used by DOS, and may be filled with ASCII
|
||||
blanks (32).
|
||||
|
||||
<P>Each date and time field is of the following form:
|
||||
|
||||
<pre>
|
||||
length
|
||||
in bytes contents
|
||||
-------- ---------------------------------------------------------
|
||||
4 year, as four ASCII digits
|
||||
2 month, as two ASCII digits, where
|
||||
01=January, 02=February, etc.
|
||||
2 day of month, as two ASCII digits, in the range
|
||||
from 01 to 31
|
||||
2 hour, as two ASCII digits, in the range from 00 to 23
|
||||
2 minute, as two ASCII digits, in the range from 00 to 59
|
||||
2 second, as two ASCII digits, in the range from 00 to 59
|
||||
2 hundredths of a second, as two ASCII digits, in the range
|
||||
from 00 to 99
|
||||
1 offset from Greenwich Mean Time, in 15-minute intervals,
|
||||
as a twos complement signed number, positive for time
|
||||
zones east of Greenwich, and negative for time zones
|
||||
west of Greenwich
|
||||
</pre>
|
||||
|
||||
<P>If the date and time are not specified, the first 16 bytes are all
|
||||
ASCII zeros (48), and the last byte is zero.
|
||||
|
||||
<P>Other kinds of Volume Descriptors (which are normally ignored by DOS)
|
||||
have the following format:
|
||||
|
||||
<pre>
|
||||
length
|
||||
in bytes contents
|
||||
-------- ---------------------------------------------------------
|
||||
1 neither 1 nor 255
|
||||
6 67, 68, 48, 48, 49 and 1, respectively (same as Volume
|
||||
Descriptor Set Terminator)
|
||||
2041 other things
|
||||
</pre>
|
||||
|
||||
<H4>9. Path Tables</H4>
|
||||
|
||||
<P>The path tables normally come right after the volume descriptors.
|
||||
However, ISO9660 merely requires that each path table begin in the
|
||||
sector specified by the Primary Volume Descriptor.
|
||||
|
||||
<P>The path tables are actually redundant, since all of the information
|
||||
contained in them is also stored elsewhere on the CD-ROM. However, their
|
||||
use can make directory searches much faster.
|
||||
|
||||
<P>There are two kinds of path table -- a little endian path table, in
|
||||
which multiple-byte values are stored in little endian order, and a big
|
||||
endian path table, in which multiple-byte values are stored in big
|
||||
endian order. The two kinds of path tables are identical in every other
|
||||
way.
|
||||
|
||||
<P>A path table contains one record for each directory on the CD-ROM
|
||||
(including the root directory). The format of a record is as follows:
|
||||
|
||||
<pre>
|
||||
length
|
||||
in bytes contents
|
||||
-------- ---------------------------------------------------------
|
||||
1 N, the name length (or 1 for the root directory)
|
||||
1 0 [number of sectors in extended attribute record]
|
||||
4 number of the first sector in the directory, as a
|
||||
double word
|
||||
2 number of record for parent directory (or 1 for the root
|
||||
directory), as a word; the first record is number 1,
|
||||
the second record is number 2, etc.
|
||||
N name (or 0 for the root directory)
|
||||
0 or 1 padding byte: if N is odd, this field contains a zero; if
|
||||
N is even, this field is omitted
|
||||
</pre>
|
||||
|
||||
<P>According to ISO9660, a directory name consists of at least one and
|
||||
not more than 31 capital letters, digits and underscores. For DOS the
|
||||
upper limit is eight characters.
|
||||
|
||||
<P>A path table occupies as many consecutive sectors as may be required
|
||||
to hold all its records. The first record always begins in the first
|
||||
byte of the first sector. Except for the single byte described above, no
|
||||
padding is used between records; hence the last record in a sector is
|
||||
usually continued in the next following sector. The unused part of the
|
||||
last sector is filled with zeros.
|
||||
|
||||
<P>The records in a path table are arranged in a precisely specified
|
||||
order. For this purpose, each directory has an associated number called
|
||||
its level. The level of the root directory is 1. The level of each other
|
||||
directory is one greater than the level of its parent. As noted above,
|
||||
ISO9660 does not permit levels greater than 8.
|
||||
|
||||
<P>The relative positions of any two records are determined as follows:
|
||||
|
||||
<OL>
|
||||
<LI>If the levels are different, the directory with the lower level
|
||||
appears first. In particular, this implies that the root
|
||||
directory is always represented by the first record in the
|
||||
table, because it is the only directory with level 1.
|
||||
|
||||
<LI>If the levels are identical, but the directories have different
|
||||
parents, then the directories are in the same relative
|
||||
positions as their parents.
|
||||
|
||||
<LI>Directories with the same level and the same parent are
|
||||
arranged in the order obtained by sorting on their names, as
|
||||
described in Section 5.
|
||||
</OL>
|
||||
|
||||
|
||||
<H4>10. Directories</H4>
|
||||
|
||||
<P>A directory consists of a series of directory records in one or more
|
||||
consecutive sectors. However, unlike path records, directory records may
|
||||
not straddle sector boundaries. There may be unused space at the end of
|
||||
each sector, which is filled with zeros.
|
||||
|
||||
<P>Each directory record represents a file or directory. Its format is
|
||||
as follows:
|
||||
|
||||
<pre>
|
||||
length
|
||||
in bytes contents
|
||||
-------- ---------------------------------------------------------
|
||||
1 R, the number of bytes in the record (which must be even)
|
||||
1 0 [number of sectors in extended attribute record]
|
||||
8 number of the first sector of file data or directory
|
||||
(zero for an empty file), as a both endian double word
|
||||
8 number of bytes of file data or length of directory,
|
||||
excluding the extended attribute record,
|
||||
as a both endian double word
|
||||
1 number of years since 1900
|
||||
1 month, where 1=January, 2=February, etc.
|
||||
1 day of month, in the range from 1 to 31
|
||||
1 hour, in the range from 0 to 23
|
||||
1 minute, in the range from 0 to 59
|
||||
1 second, in the range from 0 to 59
|
||||
(for DOS this is always an even number)
|
||||
1 offset from Greenwich Mean Time, in 15-minute intervals,
|
||||
as a twos complement signed number, positive for time
|
||||
zones east of Greenwich, and negative for time zones
|
||||
west of Greenwich (DOS ignores this field)
|
||||
1 flags, with bits as follows:
|
||||
bit value
|
||||
------ ------------------------------------------
|
||||
0 (LS) 0 for a norma1 file, 1 for a hidden file
|
||||
1 0 for a file, 1 for a directory
|
||||
2 0 [1 for an associated file]
|
||||
3 0 [1 for record format specified]
|
||||
4 0 [1 for permissions specified]
|
||||
5 0
|
||||
6 0
|
||||
7 (MS) 0 [1 if not the final record for the file]
|
||||
1 0 [file unit size for an interleaved file]
|
||||
1 0 [interleave gap size for an interleaved file]
|
||||
4 1, as a both endian word [volume sequence number]
|
||||
1 N, the identifier length
|
||||
N identifier
|
||||
P padding byte: if N is even, P = 1 and this field contains
|
||||
a zero; if N is odd, P = 0 and this field is omitted
|
||||
R-33-N-P unspecified field for system use; must contain an even
|
||||
number of bytes
|
||||
</pre>
|
||||
|
||||
<P>The length of a directory includes the unused space, if any, at the
|
||||
ends of sectors. Hence it is always an exact multiple of 2048 (the
|
||||
sector size). Since every directory, even a nominally empty one,
|
||||
contains at least two records, the length of a directory is never zero.
|
||||
|
||||
<P>All fields in the first record (sometimes called the "." record)
|
||||
refer to the directory itself, except that the identifier length is 1,
|
||||
and the identifier is zero. The root directory record in the Primary
|
||||
Volume Descriptor also has this format.
|
||||
|
||||
<P>All fields in the second record (sometimes called the ".." record)
|
||||
refer to the parent directory, except that the identifier length is 1,
|
||||
and the identifier is 1. The second record in the root directory refers
|
||||
to the root directory.
|
||||
|
||||
<P>The identifier for a subdirectory is its name. The identifier for a
|
||||
file consists of the following fields, in the order given:
|
||||
|
||||
<OL>
|
||||
<LI>The name, consisting of the ASCII codes for at least one and
|
||||
not more than eight capital letters, digits and underscores.
|
||||
|
||||
<LI>If there is an extension, the ASCII code for a period (46). If
|
||||
there is no extension, this field is omitted.
|
||||
|
||||
<LI>The extension, consisting of the ASCII codes for not more than
|
||||
three capital letters, digits and underscores. If there is no
|
||||
extension, this field is omitted.
|
||||
|
||||
<LI>The ASCII code for a semicolon (59).
|
||||
|
||||
<LI>The ASCII code for 1 (49). [On other systems, this is the
|
||||
version number, consisting of the ASCII codes for a sequence of
|
||||
digits representing a number between 1 and 32767, inclusive.]
|
||||
</OL>
|
||||
|
||||
<P>Some implementations for DOS omit (4) and (5), and some use
|
||||
punctuation marks other than underscores in file names and extensions.
|
||||
|
||||
<P>Directory records other than the first two are sorted as follows:
|
||||
|
||||
<OL>
|
||||
<LI>Records are sorted by name, as described above.
|
||||
|
||||
<LI>Every series of records with the same name is sorted by
|
||||
extension, as described above. For this purpose, a record
|
||||
without an extension is sorted as though its extension
|
||||
consisted of ASCII blanks (32).
|
||||
|
||||
<LI>[On other systems, every series of records with the same name
|
||||
and extension is sorted in order of decreasing version number.]
|
||||
|
||||
<LI>[On other systems, two records with the same name, extension
|
||||
and version number are permitted, if the first record is an
|
||||
associated file.]
|
||||
</OL>
|
||||
|
||||
<P>[ISO9660 permits names containing more than eight characters and
|
||||
extensions containing more than three characters, as long as both of
|
||||
them together contain no more than 30 characters.]
|
||||
|
||||
<P>It is apparently permissible under ISO9660 to use two or more
|
||||
consecutive records to represent consecutive pieces of the same file.
|
||||
Bit 7 of the flags byte is set in every record except the last one.
|
||||
However, this technique seems pointless and is apparently not used. It
|
||||
is not supported by MSCDEX.
|
||||
|
||||
<P>Interleaving is another technique that is apparently seldom used. It
|
||||
is not supported by MSCDEX (version 2.10).
|
||||
|
||||
|
||||
<H4>11. Arrangement of Directory and Data Sectors</H4>
|
||||
|
||||
<P>ISO9660 does not specify the order of directory or file sectors. It
|
||||
merely requires that the first sector of each directory or file be in
|
||||
the location specified by its directory record, and that the sectors for
|
||||
directories and non-interleaved files be consecutive.
|
||||
|
||||
<P>However, most implementations arrange the directories so each
|
||||
directory follows its parent, and the data sectors for the files in each
|
||||
directory lie immediately after the directory and immediately before the
|
||||
next following directory. This appears to be an efficient arrangement
|
||||
for most applications.
|
||||
|
||||
<P>Some implementations go one step further and order the directories in
|
||||
the same manner as the corresponding path table records.
|
||||
|
||||
|
||||
<H4>12. Extended Attribute Records</H4>
|
||||
|
||||
<P>Extended attribute records contain file and directory information
|
||||
used by operating systems other than DOS, such as permissions and
|
||||
logical record lengths.
|
||||
|
||||
<P>A CD-ROM written for DOS normally does not contain any extended
|
||||
attribute records.
|
||||
|
||||
<P>When reading a CD-ROM containing extended attribute records, early
|
||||
versions of MSCDEX simply returned incorrect results. Later versions
|
||||
learned to skip over extended attribute records.
|
||||
|
||||
<P>Philip J. Erdelsky<BR>
|
||||
San Diego, California USA<BR>
|
||||
<A HREF="mailto:pje@acm.org">pje@acm.org</A><BR>
|
||||
<A HREF="http://www.alumni.caltech.edu/~pje/">
|
||||
http://www.alumni.caltech.edu/~pje/</A><BR>
|
||||
|
||||
</BODY>
|
||||
</HTML>
|
||||
|
||||