add directory study
BIN
study/sabre/os/files/FileSystems/HPFS/fig1.gif
Normal file
|
After Width: | Height: | Size: 3.6 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig10.gif
Normal file
|
After Width: | Height: | Size: 4.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig10_4.gif
Normal file
|
After Width: | Height: | Size: 4.3 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig1_4.gif
Normal file
|
After Width: | Height: | Size: 2.9 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig2.gif
Normal file
|
After Width: | Height: | Size: 4.2 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig2_4.gif
Normal file
|
After Width: | Height: | Size: 1.8 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig3.gif
Normal file
|
After Width: | Height: | Size: 8.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig3_4.gif
Normal file
|
After Width: | Height: | Size: 2.2 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig4.gif
Normal file
|
After Width: | Height: | Size: 1.9 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig4_4.gif
Normal file
|
After Width: | Height: | Size: 3.2 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig5.gif
Normal file
|
After Width: | Height: | Size: 13 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig6.gif
Normal file
|
After Width: | Height: | Size: 2.8 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig6_3.gif
Normal file
|
After Width: | Height: | Size: 9.5 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig8.gif
Normal file
|
After Width: | Height: | Size: 9.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig9.gif
Normal file
|
After Width: | Height: | Size: 9.4 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig9a.gif
Normal file
|
After Width: | Height: | Size: 6.0 KiB |
BIN
study/sabre/os/files/FileSystems/HPFS/fig9b.gif
Normal file
|
After Width: | Height: | Size: 7.4 KiB |
238
study/sabre/os/files/FileSystems/HPFS/hpfs0.html
Normal file
@@ -0,0 +1,238 @@
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 0: Preface</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
<P>
|
||||
I am not a programmer's backside but I am an enthusiast interested in
|
||||
finding out more about HPFS. There is so little detailed information
|
||||
available on HPFS that I think you will find this modest series
|
||||
instructive. The REXX programs to be presented are functional but they
|
||||
are not particularly pleasing in an aesthetic sense. However they do
|
||||
ferret out information and will help you to understand what is going on.
|
||||
I'm sure that a programming guru, once motivated, could come up with
|
||||
superior versions. Hopefully they will. This installment originally
|
||||
appeared at the OS2Zone web site (http://www.os2zone.aus.net).
|
||||
|
||||
<P>
|
||||
I've been asked [by someone else. Ed.] to write a preface to this series.
|
||||
Normally I prefer to write on little-covered topics whereas much of what I'm
|
||||
going to discuss in this installment often appears in a cursory examination of
|
||||
the HPFS. The trouble with most of what has been written about HPFS in books on
|
||||
OS/2 is that the topic is never considered very deeply. After finishing working
|
||||
your way through this series (still being written on a monthly basis, but
|
||||
expected to occupy eight parts including this one) you will have a detailed
|
||||
knowledge of the structures of the HPFS. Having said that, there is a place for
|
||||
some initial information for readers who currently know very little about the
|
||||
subject.
|
||||
|
||||
<P>
|
||||
<H2>File Systems</H2>
|
||||
|
||||
<P>
|
||||
A File System (FS) is a combination of hardware and software that
|
||||
enables the storage and retrieval of information on removable (floppy
|
||||
disk, tape, CD) and non-removable (HD) media. The File Allocation Table
|
||||
FS (FAT) is used by DOS. It is also built into OS/2. Now FAT appeared
|
||||
back in the days of DOS v1 in 1981 and was designed with a backward
|
||||
glance to CP/M. A hierarchical directory structure arrived with DOS v2
|
||||
to support the XT's 10 MB HD. OS/2 v1.x used straight FAT. OS/2 v2.x
|
||||
and later provide "Super FAT". This uses the same layout of
|
||||
information on the storage medium (e.g. a floppy written under OS/2 v2
|
||||
can easily be read by a DOS system) but adds performance improvements to
|
||||
the software used to transfer the data. Super FAT will be covered in
|
||||
Part 1.
|
||||
|
||||
<P>
|
||||
<H2>FAT</H2>
|
||||
|
||||
<P>
|
||||
Figure 1 shows the layout of a FAT volume. There are two copies of the
|
||||
FAT. These should be identical. This may seem like a safety feature
|
||||
but it only works in the case of physical corruption (if a bad sector
|
||||
develops in one of the sectors in a FAT, the other one is automatically
|
||||
used instead) not for logical corruption. So if the FS gets confused
|
||||
and the two copies are not the same there is no easy way to determine
|
||||
which copy is still O K.
|
||||
|
||||
<P>
|
||||
<IMG SRC="hpfs1.gif" WIDTH=498 HEIGHT=64>
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 1: The layout of a volume formatted with the FAT file system.
|
||||
Note: this diagram is not to scale. The data area is quite large in
|
||||
practice.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The root directory is made a fixed known size because the system files
|
||||
are placed immediately after it. The known location for the initial
|
||||
system files enables DOS or OS/2 to commence loading itself. (The boot
|
||||
record, which loads first off, is small and only has enough space for
|
||||
code to find the initial system files at a known location.) However
|
||||
this design decision also limits the number of files that can be listed
|
||||
in the root directory of a FAT volume.
|
||||
|
||||
<P>
|
||||
Entries in the root directory and in subdirectories are not ordered so
|
||||
searching for a particular file can take some time, particularly if
|
||||
there are many files in a directory.
|
||||
|
||||
<P>
|
||||
The FAT and the root directory are positioned at the beginning of the
|
||||
volume (on a disk this is typically on the outside). These entries are
|
||||
read often, particularly in a multitasking environment, requiring a lot
|
||||
of relatively slow (in CPU terms) head movement.
|
||||
|
||||
<P>
|
||||
<H2>How Files are Stored on a FAT Volume</H2>
|
||||
|
||||
<P>
|
||||
Files are stored on a FAT volume using the FS' minimum allocation unit,
|
||||
the cluster (1-64 sectors). A 32-byte directory entry only provides
|
||||
sufficient space for a 8.3 filename, file attributes, last alteration
|
||||
date/time, filesize and the starting cluster. See Figure 2.
|
||||
|
||||
<P>
|
||||
<IMG SRC="hpfs2.gif" WIDTH=388 HEIGHT=209>
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 2: The layout of the 32 bytes in a directory entry in a FAT
|
||||
system.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The corresponding initial cluster entry in the FAT then points to the
|
||||
next FAT entry for the second cluster of the file (assuming that the
|
||||
file was big enough) which in turn points to the next cluster and so on.
|
||||
FAT entries can be 16-bit (max. FFFFh) or 12-bit (max. FFFh) in size,
|
||||
with volumes less than 16 MB using the 12-bit scheme. FAT entries can
|
||||
be of four types:
|
||||
|
||||
<UL>
|
||||
<LI>Contain 0000h if the cluster is free (available);
|
||||
<LI>Contain the number of the next cluster in the chain;
|
||||
<LI>If this is the last cluster in the chain then the FAT entry will
|
||||
consist of a character which signifies the end of the chain (EOF);
|
||||
<LI>Another special character if the cluster of the disk is bad
|
||||
(unreliable).
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The FAT FS is prone to fragmentation (i.e. a file's clusters are not in
|
||||
one, contiguous chain) in a single-tasking environment because the FAT
|
||||
is searched sequentially for the next free entry in the FAT when a file
|
||||
is written, regardless of how much needs to be written. The situation
|
||||
is even worse in a multitasking environment because you can have more
|
||||
than one writing operation in progress at the same time. See Figures 3
|
||||
and 4 for an example of a fragmented file under FAT.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=391 HEIGHT=238 SRC="hpfs3.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 3: The layout of a contiguous file in the FAT.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=458 HEIGHT=232 SRC="hpfs4.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 4: An example of a fragmented file under FAT in three pieces.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The FAT FS uses a singly-linked scheme i.e. the FAT entry points only
|
||||
to the next cluster. If, for some reason, the chain is accidentally
|
||||
broken (the next cluster value is corrupted) then there is no
|
||||
information in the isolated next cluster to indicate what it was
|
||||
previously connected to. So the FAT FS, while relatively simple, is
|
||||
also rather vulnerable.
|
||||
|
||||
<P>
|
||||
FAT was designed in the days of small disk size and today it really
|
||||
shows its age. The maximum number of entries (clusters) in a 16-bit FAT
|
||||
is just under 64K (due to technical reasons, the actual maximum is
|
||||
65,518). Since we can't increase the number of clusters past this
|
||||
limit, a large volume requires the use of large cluster sizes. So, for
|
||||
example, a volume in the 1-2 GB range has 32 KB clusters. Now a cluster
|
||||
is the minimum allocation unit so a 1 byte file on such a volume would
|
||||
consume 32 KB of space, a 33 KB file would consume 64 KB and so on. A
|
||||
rough assumption you can make is that, on average, half a cluster of
|
||||
space is wasted per file. You can run CHKDSK on a FAT volume, note the
|
||||
total number of files and also the allocation unit size and then
|
||||
multiply these two figures together and divide the result by 2 to get
|
||||
some idea of the wastage. The situation is quite different with HPFS as
|
||||
you will see when you read Part 1.
|
||||
|
||||
<P>
|
||||
Finally, FAT under OS/2 supports Extended Attributes (EAs - up to 64 KB
|
||||
of extra information associated with a file), but since there is very
|
||||
little extra space in a 32-byte directory entry it is only possible to
|
||||
store a pointer into an external file with all EAs on a volume being
|
||||
stored in this file ("EA DATA. SF"). In general it is fair to state
|
||||
that EAs are tacked on to FAT. With HPFS the integration is much
|
||||
better. If the EA is small enough HPFS stores it completely within the
|
||||
file's FNODE (every file and directory has an FNODE). Otherwise EAs is
|
||||
stored outside the file but closely associated with it and usually
|
||||
situated physically close to the file for performance reasons. Some
|
||||
users have occasionally reported crosslinking of EAs under FAT. This
|
||||
can be quite a serious matter requiring reinstallation of the operating
|
||||
system. I've not heard of this occurring under HPFS. Note that the
|
||||
WorkPlace Shell relies heavily on EAs.
|
||||
|
||||
<P>
|
||||
<H2>HPFS</H2>
|
||||
|
||||
<P>
|
||||
HPFS is example of a class of file systems known as Installable File
|
||||
Systems (IFS). Other types of IFS include CD support (CDFS), Network
|
||||
File System (NFS), Toronto Virtual File System (TVFS - combines FS
|
||||
elements of VM, namely CMS search path, with elements of UNIX, namely
|
||||
symbolic link), EXT2-OS (read Linux EXT2FS partitions under OS/2) and
|
||||
HPFS386 (with IBM LAN Server Advanced).
|
||||
|
||||
<P>
|
||||
An IFS is installed at start-up time. The software to access the actual
|
||||
device is specified as a device driver (usually BASEDEV=xxxxx.DMD/.ADD)
|
||||
while a Dynamic Link Library (DLL) is load to control the format/layout
|
||||
of the data (with IFS=xxxxx.IFS). OS/2 can run more than one IFS at a
|
||||
time so you could, for example, copy from a CD to a HPFS volume in one
|
||||
session while reading a floppy disk (FAT) in another session.
|
||||
|
||||
<P>
|
||||
HPFS has many advantages over FAT: Long Filename (254 characters
|
||||
including spaces); excellent performance when directories containing
|
||||
many files; designed to be fault tolerant; fragmentation resistant;
|
||||
space efficient with large partitions; works well in a multitasking
|
||||
environment. These topics will be explored in the series.
|
||||
|
||||
<P>
|
||||
<H2>REXX</H2>
|
||||
|
||||
<P>
|
||||
One of the many benefits of using OS/2 is that it comes with REXX
|
||||
(providing you install it - it requires very little extra space). REXX
|
||||
is a surprisingly versatile and powerful scripting language and there
|
||||
are oodles of REXX programs and add-ons available, much of it for free.
|
||||
This series presents REXX programs that access HPFS structures and
|
||||
decode their contents.
|
||||
|
||||
<P>
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
<P>
|
||||
In this installment you have seen that the FAT FS has a number of
|
||||
problems related to its ancient origins. HPFS comes from a fresh design
|
||||
with one eye on likely advances in storage that would occur in the
|
||||
foreseeable future and the other eye on obtaining good performance. In
|
||||
the next installment we look at the many techniques HPFS uses to achieve
|
||||
its better performance.
|
||||
|
||||
</BODY>
|
||||
</HTML>
|
||||
800
study/sabre/os/files/FileSystems/HPFS/hpfs1.html
Normal file
@@ -0,0 +1,800 @@
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 1: Introduction</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
<P>
|
||||
This article originally appeared in the February 1996 issue of
|
||||
Significant Bits, the monthly magazine of the Brisbug PC User Group Inc.
|
||||
|
||||
<P>
|
||||
It is sad to think that most OS/2 users are not using HPFS. The main
|
||||
reason is that unless you own the commercial program Partition Magic,
|
||||
switching to HPFS involves a destructive reformat and that most users
|
||||
couldn't be bothered (at least initially). Another reason is user
|
||||
ignorance of the numerous technical advantages of using HPFS.
|
||||
|
||||
<P>
|
||||
This month we start a series that delves into the structures that make
|
||||
up OS/2's HPFS. It is very difficult to get any public information on
|
||||
it aside from what appeared in an article written by Ray Duncan in the
|
||||
September '89 issue of Microsoft Systems Journal, Vol 4 No 5. I suspect
|
||||
that the IBM-Microsoft marriage break-up that occurred in 1991 may have
|
||||
caused an embargo on further HPFS information. I've been searching
|
||||
books and the Internet for more than a year looking for information with
|
||||
very little success. You usually end up finding a superficial
|
||||
description without any detailed discussion of the internal layout of
|
||||
its structures.
|
||||
|
||||
<P>
|
||||
There are three commercial utilities that I've found very useful. SEDIT
|
||||
from the GammaTech Utilities v3 is a wonder. It decodes quite a bit of
|
||||
the information in HPFS' structures. HPFSINFO and HPFSVIEW from the
|
||||
Graham Utilities are also good. HPFSINFO lists information gleaned from
|
||||
HPFS' SuperBlock and SpareBlock sectors, while HPFSVIEW provides the
|
||||
best visual display I've seen of the layout of a HPFS partition. You
|
||||
can receive some information on a sector by clicking on it. HPFSVIEW is
|
||||
also freely available in the demo version of the Graham Utilities,
|
||||
GULITE.xxx. I've also written a REXX program to assist with
|
||||
cross-referencing locations between SEDIT & HPFSVIEW and to provide a
|
||||
convenient means of dumping a sector.
|
||||
|
||||
<P>
|
||||
Probably the most useful program around at the moment is freeware,
|
||||
FST03F.xxx (File System Tool) written by Eberhard Mattes. This provides
|
||||
lots of information and comes with source. Even if you aren't a C
|
||||
programmer (I'm not) you can learn much from its definition of
|
||||
structures. Unfortunately I wrote the first three instalments without
|
||||
seeing this information so that made the task more difficult.
|
||||
|
||||
<P>
|
||||
In the early stages I've had to employ a very laborious process in an
|
||||
attempt to learn more. I created the smallest OS/2 HPFS partition
|
||||
possible (1 MB). Then I created/altered a file or directory and
|
||||
compared the changes. Sometimes I knew where the changes would occur so
|
||||
I could just compare the two sectors but often I ended up comparing two
|
||||
1 MB image files looking for differences and then translated the location
|
||||
in the image into C/H/S (a physical address in Cylinder/Head/Sector
|
||||
format) or LSN (Logical Sector Number). While more information will be
|
||||
presented in this series than I've seen in the public domain, there are
|
||||
still things that I've been unable to decipher.
|
||||
|
||||
<P>
|
||||
<H2>The Win95 Fizzer</H2>
|
||||
|
||||
<P>
|
||||
For me, the most disappointing feature of Win 95 is the preservation of
|
||||
the FAT (File Allocation Table) system. It's now known as VFAT but
|
||||
aside from integrated 32-bit file and disk access, the structure on the
|
||||
disk is basically the same as DOS v4 (circa 1988). An ungainly method
|
||||
involving the volume label file attribute was used to graft long
|
||||
filename support onto the file system. These engineering compromises
|
||||
were made to most easily achieve backward compatibility. It's a pity
|
||||
because Microsoft has an excellent file system available in NT, namely
|
||||
NTFS. This file system is very robust although perhaps NTFS is overkill
|
||||
for the small user.
|
||||
|
||||
<P>
|
||||
The Program Manager graphical user interface (GUI) appeared in OS/2 v1.1
|
||||
in 1988. The sophisticated High-Performance File System came with OS/2
|
||||
v1.2 which was released way back in 1989! The powerful REXX scripting
|
||||
language showed up in OS/2 v1.3 (1991). And the largely
|
||||
object-orientated WPS (Work Place Shell) GUI appeared in 1992 in OS/2
|
||||
v2.0. So it is hardly surprising that experienced OS/2 users were not
|
||||
swept up in the general hysteria about Windows 95 being the latest and
|
||||
greatest.
|
||||
|
||||
<P>
|
||||
A positive aspect of the Win 95 craze has been that the minimum system
|
||||
requirement of 8 MB RAM, 486/33 makes a good platform for OS/2 Warp. So
|
||||
now the disgruntled Win 95 user will find switching OSs less daunting,
|
||||
at least from a hardware viewpoint.
|
||||
|
||||
<P>
|
||||
<H2>Dual Boot and Boot Manager</H2>
|
||||
|
||||
<P>
|
||||
I've never used Dual Boot because it seems so limiting. I've always
|
||||
reformatted and installed Boot manager so that I could select from up to
|
||||
four different Operating Systems, for example OS/2 v2.1, OS/2 Warp
|
||||
Connect (peer-to-peer networking with TCP/IP and Internet support), IBM
|
||||
DOS v7 and Linux.
|
||||
|
||||
<P>
|
||||
In previous OS/2 installations, I've left a small (50 MB) FAT partition
|
||||
that could be seen when I booted under either DOS or OS/2, while the
|
||||
rest of the HD space (910 MB) was formatted as HPFS. Recently I
|
||||
upgraded to Warp Connect and this time I dropped FAT and the separate
|
||||
DOS boot partition completely. This does not mean I am unable to run
|
||||
DOS programs. OS/2 has inbuilt IBM DOS v5 and you can install boot
|
||||
images of other versions of DOS, or even CP/M, for near instantaneous
|
||||
booting of these versions. There is no reason why you can't have
|
||||
multiple flavours of DOS running at the same time as you're running
|
||||
multiple OS/2 sessions. Furthermore DOS programs have no problems
|
||||
reading from, writing to or running programs on HPFS partitions even
|
||||
though the layout is nothing like FAT. It's all handled transparently
|
||||
by OS/2. But this does mean you have to have booted OS/2 first. HPFS
|
||||
is not visible if you use either Dual Boot or Boot Manager to boot
|
||||
directly to DOS, but there are a number of shareware programs around to
|
||||
allow read-access to HPFS drives from DOS.
|
||||
|
||||
<P>
|
||||
DOS uses the system BIOS to access the hard disk. This is limited to
|
||||
dealing with a HD that has no more than 1,024 cylinders due to 10 bits
|
||||
(2^10 = 1,024) being used in the BIOS for cylinder numbering. OS/2 uses
|
||||
the system BIOS at boot time but then completely replaces it in memory
|
||||
with a special Advanced BIOS. This means that the boot partition and,
|
||||
if you use it, Boot Manager's 1 MB partition, must be within the first
|
||||
1,024 cylinders. Once you've booted OS/2, however, you can access
|
||||
partitions on cylinders past the Cyl 1023 point (counting from zero)
|
||||
without having to worry about LBA (Logical Block Addressing) translation
|
||||
schemes.
|
||||
|
||||
<P>
|
||||
Now this can still catch you out if you boot DOS. On my old system I'd
|
||||
sometimes use Boot Manager to boot a native DOS version. I'd load AMOS
|
||||
(a shareware program) to see the HPFS drives. I thought there must have
|
||||
been a bug in AMOS because I could only see half of F: and none of G:
|
||||
until I realised that these partitions were situated on a third HD that
|
||||
had 1,335 cylinders. So this was just the effect of DOS' 1,024 cylinder
|
||||
limitation which the AMOS program was unable to circumvent.
|
||||
|
||||
<P>
|
||||
<H2>Differences between an Easy and an Advanced Installation</H2>
|
||||
|
||||
<P>
|
||||
Most new OS/2 users select the "Easy Installation" option. This is
|
||||
satisfactory but it only utilises FAT, installs OS/2 on the same drive
|
||||
as DOS and Windows, does not reformat the partition and Dual Boot is
|
||||
installed.
|
||||
|
||||
<P>
|
||||
If you know what you're doing or are more aggressive in wanting to take
|
||||
advantage of what OS/2 can provide then the "Advanced Installation"
|
||||
option is for you. Selecting it enables you to selectively install
|
||||
parts of OS/2, install OS/2 in a primary or logical (extended) partition
|
||||
other than C: or even on a 2nd HD (I don't know whether you can install
|
||||
on higher physical drives than the 2nd one in a SCSI multi-drive setup);
|
||||
the option of installing Boot Manager is provided; you can use HPFS if
|
||||
you wish; installation can occur on a blank HD.
|
||||
|
||||
<P>
|
||||
<H2>FAT vs HPFS: If Something Goes Wrong</H2>
|
||||
|
||||
<P>
|
||||
CHKDSK on a HPFS partition can recover from much more severe faults than
|
||||
it can on a FAT system. This is because the cluster linkages in a FAT
|
||||
system are one-way, pointing to the next cluster in the chain. If the
|
||||
link is broken it is usually impossible to work out where the lost
|
||||
clusters ("x lost clusters in y chains") should be reattached. Often
|
||||
they are just artifacts of a program's use of temporary files that
|
||||
haven't been cleaned up properly. But "file truncated" and
|
||||
"cross-linked files" messages are usually an indication of more serious
|
||||
FAT problems.
|
||||
|
||||
<P>
|
||||
HPFS uses double linking: the allocation block of a directory or file
|
||||
points back to its predecessor ("parent") as well as to the next element
|
||||
("child"). Moreover, major structures contain dword (32-bit) signatures
|
||||
identifying their role and each file/directory's FNODE contains the
|
||||
first 15 characters of its name. So blind scanning can be performed by
|
||||
CHKDSK or other utilities to rebuild much of the system after a
|
||||
significant problem.
|
||||
|
||||
<P>
|
||||
As a personal comment, I've been using HPFS since April, 1993, and I've
|
||||
yet to experience any serious file system problems. I've had many OS/2
|
||||
lockups while downloading with a DOS comms program and until recently
|
||||
I was running a 4 MB hardware disk cache with delayed writes, yet,
|
||||
aside from the lost download file, the file system has not been
|
||||
permanently corrupted.
|
||||
|
||||
<P>
|
||||
<H2>Warp, FORMAT /FS:HPFS, CHKDSK /F:3 and The Lazarus Effect</H2>
|
||||
|
||||
<P>
|
||||
Warp, by default, does a quick format when you format a HD under either
|
||||
FAT or HPFS. So FORMAT /FS:HPFS x:, which is what the installation
|
||||
program performs if you decide to format the disk with HPFS, is
|
||||
performed very quickly. It's almost instantaneous if you decide to
|
||||
reformat with FAT (/FS:FAT). Now this speed differential does not mean
|
||||
that FAT is much quicker, only that FORMAT has very little work to
|
||||
perform during a quick FAT reformat since the FAT structures are so
|
||||
simple compared to HPFS.
|
||||
|
||||
<P>
|
||||
As mentioned earlier, CHKDSK has extended recovery abilities when
|
||||
dealing with HPFS. It has four levels of /F:n checking/recovery. These
|
||||
will be considered in greater detail in a later article in this series
|
||||
when we look at fault tolerance. The default of CHKDSK /F is equivalent
|
||||
to using /F:2. If you decide to use /F:3 then CHKDSK will dig deep and
|
||||
recover information that existed on the partition prior to the
|
||||
reformatting providing that it was previously formatted as HPFS. Using
|
||||
CHKDSK /F:3 after performing a quick format on a partition that was
|
||||
previously FAT but is now HPFS will not cause this, since none of the
|
||||
previous data has HPFS signature words embedded at the beginning of its
|
||||
sectors. However, if you ever use /F:3 after quickly reformatting a
|
||||
HPFS partition you could end up with a bit of a mess since everything
|
||||
would be recovered that existed on the old partition and which hadn't
|
||||
been overwritten by the current contents.
|
||||
|
||||
<P>
|
||||
To guard against this, OS/2 stores whether or not a quick format has
|
||||
been performed on a HPFS partition in bit 5 (counting from zero) of byte
|
||||
08h in LSN (Logical Sector Number) 17, the SpareBlock sector. This
|
||||
particular byte is known as the Partition Status byte, with 20h
|
||||
indicating that a quick format was performed. Bit 0 of this byte is
|
||||
also used to indicate whether the partition is "clean" or "dirty" so 21h
|
||||
indicates that the partition was quick formatted and is currently
|
||||
"dirty" (these concepts will be covered in a later instalment).
|
||||
|
||||
<P>
|
||||
If you attempt to perform a CHKDSK /F:3 on a quick-formatted partition,
|
||||
you will receive the following warning:
|
||||
|
||||
<PRE>
|
||||
SYS0641: Using CHKDSK /F:3 on this drive may cause files that existed
|
||||
before the last FORMAT to be recovered. Proceed with CHKDSK (Y/N)?
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
If you type "HELP 641" for further information you'll see:
|
||||
|
||||
<PRE>
|
||||
EXPLANATION: The target drive was formatted in "fast format" mode,
|
||||
which does not erase all data areas. CHKDSK /F:3 searches data areas
|
||||
for "lost" files. If a file existed on this drive before the last
|
||||
format, CHKDSK may find it, and attempt to recover it.
|
||||
</PRE>
|
||||
|
||||
<P>
|
||||
ACTION: Use CHKDSK /F:2 to check this drive. If you use /F:3, be aware
|
||||
that files recovered to the FOUND directories may be old files. Also,
|
||||
if you format a drive using FORMAT /L, FORMAT will completely erase all
|
||||
old files, and avoid this warning.
|
||||
|
||||
<P>
|
||||
It seems a pity to forego the power of the CHKDSK /F:3 in the future.
|
||||
As is suggested, FORMAT /L (for "Long" I presume) will completely
|
||||
obliterate the prior partition's contents, but you can't specify this
|
||||
during a reinstall. To perform it you need to use FORMAT /L on the
|
||||
partition before reinstalling. For this to be practical you will
|
||||
probably need to keep OS/2 and nothing else on a separate partition and
|
||||
to have a recent tape backup of the remaining volumes' contents. Note:
|
||||
in my opinion keeping OS/2 on a separate partition is the best way of
|
||||
laying out a system but make sure you leave enough room for things like
|
||||
extra postscript fonts and programs that insist on putting things on C:.
|
||||
|
||||
<P>
|
||||
<H2>Capacity</H2>
|
||||
|
||||
<P>
|
||||
Figure 1 shows a table comparing the capacity of OS/2's FAT and HPFS
|
||||
file systems. The difference in the logical drive numbers arises due to
|
||||
A: and B: being assigned to floppies which are always FAT. It would
|
||||
be ridiculous to put a complex, relatively large file system, which was
|
||||
designed to overcome FAT's limitations with big partitions, on volumes
|
||||
as small as current FDs.
|
||||
|
||||
<PRE>
|
||||
FAT HPFS
|
||||
|
||||
Logical drives 26 24
|
||||
Num of Partitions 16 16
|
||||
Max Partition Size 2 GB 64 GB
|
||||
Max File Size 2 GB 2 GB
|
||||
Sector Size 512 bytes 512 bytes
|
||||
Cluster/Block Size 0.5 KB-32 K 512 bytes
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig.1 Comparing the capacity of FAT and HPFS
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The next point of interest is the much greater partition size supported by HPFS.
|
||||
HPFS has a maximum possible partition size of about 2,200 GB (2^21 sectors) but
|
||||
is restricted in the current implementation to 64 GB. (Note: older references
|
||||
state that the maximum is 512 GB.) I don't know what imposes this limitation.
|
||||
Note: the effective limitation on partition size is currently around 8 GB.
|
||||
This is due to CHKDSK's inability to handle a larger partition. I presume this
|
||||
limitation will be rectified soon as ultra large HDs will become common in the
|
||||
next year or two.
|
||||
|
||||
<P>
|
||||
The 2 GB maximum filesize limit is common to DOS, OS/2 and 32-bit Unix. A
|
||||
32-bit file size should be able to span a range of 4 GB (2^32) but the
|
||||
DosSetFilePtr API function requires that the highest bit be used for indicating
|
||||
sign (forward or backward direction of movement), leaving 31 for size.
|
||||
|
||||
<P>
|
||||
The cluster size on a 1.4 MB FD is 512 bytes. For a 100 MB HD formatted
|
||||
with FAT it is 2 KB. Due to the relatively small 64K (2^16) limit on
|
||||
cluster numbering, as FAT partitions get bigger the size of clusters
|
||||
must also increase. So for a 1-2 GB partition you end up with whopping
|
||||
32 KB clusters. Since the average wastage of HD space due to the
|
||||
cluster size is half a cluster per file, storing 10,000 files on such a
|
||||
partition will typically waste 160 MB (10,000 * 32 KB / 2).
|
||||
|
||||
<P>
|
||||
HPFS has no such limitation. File space is allocated in sector-sized
|
||||
blocks unlike the FAT system. A FNODE sector is also always associated
|
||||
with each file. So for 10,000 files, the wastage due to sector size is
|
||||
typically 2.5 MB (10,000 * 512 / 2) for the files themselves + 5 MB
|
||||
consumed by the file's FNODEs = 7.5 MB. And this overhead is constant
|
||||
whether the HPFS partition is 10 MB or 100 GB.
|
||||
|
||||
<P>
|
||||
This must be balanced against the diskspace consumed by HPFS. Since
|
||||
HPFS is a sophisticated file system that is designed to accomplish a lot
|
||||
more than FAT, it correspondingly requires more diskspace than FAT.
|
||||
Figure 2 illustrates this. You may think that 10 MB for the file system
|
||||
is too much for a 1,000 MB partition but you should consider this as a
|
||||
percentage.
|
||||
|
||||
<PRE>
|
||||
System Usage including Disk Space available Allocation Unit
|
||||
MBR track to user + Fnode for HPFS
|
||||
FAT/HPFS in KB FAT/HPFS in % FAT/HPFS in KB
|
||||
|
||||
10 MB 44/415 99.57/95.95 4/0.5+0.5
|
||||
|
||||
100 MB 76/3,195 99.77/96.88 2/0.5+0.5
|
||||
|
||||
1000 MB 289(est)/10,430 99.98(est)/98.98 16/0.5+0.5
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig. 2: Space used by FAT and HPFS on different volumes
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Furthermore, once cluster size wastage is also considered, then the
|
||||
break-even point (as regards diskspace) for a 1,000 MB partition is
|
||||
about 2,200 files which isn't very many files. This is based on a 16 KB
|
||||
cluster size. In the 1,024-2,047 MB partition size range the cluster
|
||||
size increases to 32 KB so the "crossover" point shifts to only 1,100
|
||||
files.
|
||||
|
||||
<P>
|
||||
I had to calculate the 1,000 MB FAT partition values since OS/2 wouldn't
|
||||
let me have a FAT partition situated in the greater than Cyl 1023
|
||||
region. The 4 KB cluster size of the 10 MB partition is not a misprint.
|
||||
Below 16 MB, a 12-bit FAT scheme (1.5 bytes in the FAT representing 1
|
||||
cluster) is used instead of a 16-bit one.
|
||||
|
||||
<P>
|
||||
<H2>Directory Search Speed</H2>
|
||||
|
||||
<P>
|
||||
Consider an extreme case: FAT system on a full partition which has a
|
||||
maximum-sized FAT (64K entries - this is the maximum number of files a
|
||||
FAT disk can hold). The size of such a partition would be 128 MB, 256
|
||||
MB, 512 MB, 1 GB or 2 GB, depending on cluster size. Each FAT is 128 KB
|
||||
in size. (There is a second FAT which mirrors the first.) In this
|
||||
example all the files are in one subdirectory. This can't be in the
|
||||
root directory because it only has space for 512 entries. (With HPFS
|
||||
you can have as many files as you want in the root directory.) 64 K of
|
||||
entries in a FAT directory requires 2 MB of diskspace (64K * 32
|
||||
bytes/directory entry). To find a file, on average, 32 K directory
|
||||
entries would need to be searched. To say that a file was not on the
|
||||
disk, the full 64 K entries must be scanned before the "File not found"
|
||||
message was shown. The same figures would apply in you were using a
|
||||
file-finding utility to look for a file in 1,024 directories, each
|
||||
containing 63 files (the subdirectory entry also consumes space).
|
||||
|
||||
<P>
|
||||
If the directory entries were always sorted, the situation would greatly
|
||||
improve. Assuming you had a quick means of getting to the file in the
|
||||
sorted sequence, if it's the file you're looking for then you've found
|
||||
its directory entry (and thus its starting cluster's address). If a
|
||||
file greater in the sequence than the required file is found instead
|
||||
then you immediately know that the file does not exist.
|
||||
|
||||
<P>
|
||||
HPFS stores directory files in a balanced multi-branch tree structure
|
||||
(B-tree) which is always sorted due to the way the branches are
|
||||
assigned. This can lead to some extra HD activity, caused by adjustment
|
||||
of the tree structure, when a new file is added or a file is renamed.
|
||||
This is done to keep the tree balanced i.e. the total length of each
|
||||
branch from the root to the leaves is the same. The extra work when
|
||||
writing to the disk is hidden from the user by the use of "lazy writes"
|
||||
(delayed write caching).
|
||||
|
||||
<P>
|
||||
HPFS directory entries are stored in contiguous directory blocks of four
|
||||
sectors i.e. 2 KB known as DIRBLKs. A lot of information is stored in
|
||||
each variable-length (unlike FAT) file entry in a DIRBLK structure,
|
||||
namely:
|
||||
|
||||
<UL>
|
||||
<LI>The length of the entry;
|
||||
<LI>File attributes;
|
||||
<LI>A pointer to the HPFS structure (FNODE; usually just before the
|
||||
first sector of a file) that describes the sector disposition of the
|
||||
file;
|
||||
<LI>Three different date/time stamps (Created, Last Accessed, Last
|
||||
Modified);
|
||||
<LI>Usage count. Although mentioned in the 1989 document, this has not
|
||||
have been implemented;
|
||||
<LI>The length of the name (up to 254 characters);
|
||||
<LI>A B-tree pointer to the next level of the tree structure if there
|
||||
are any further levels. The pointer will be to another directory
|
||||
block if the directory entries are too numerous to fit in one 2 KB
|
||||
block;
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
At the end of the sector there is extra ("flex") space available for
|
||||
special purposes.
|
||||
|
||||
<P>
|
||||
If the average size of the filenames is 10-13 characters, then a
|
||||
directory block can store 44 of them (11 entries/sector). A two-level
|
||||
B-tree arrangement can store 1,980 entries (1 * 44-entry directory root
|
||||
block + 44 directory leaf blocks * 44 entries/block) while a three-level
|
||||
structure could accommodate 87,164 files (the number of files in the
|
||||
two-level tree + 1,936 third-level directory leaf blocks * 44
|
||||
entries/block). So the 64 K of directory entries in our example can be
|
||||
searched in a maximum of 3 "hits" (disk accesses). The term "maximum"
|
||||
was used because it depends on what level the filename in question is
|
||||
stored in the B-tree structure and what's in the disk cache.
|
||||
|
||||
<P>
|
||||
Adding files to a directory containing many files (say 500+) under FAT
|
||||
becomes an exasperating affair. I've often experienced this because a
|
||||
DOS program we've installed on hundreds of our customer's machines has
|
||||
648 files in a sub-sub-subdirectory. Watching the archive unpack on a
|
||||
machine without disk caching is bad news and it still slows down
|
||||
noticeably on machines with large SMARTDRIVE caches.
|
||||
|
||||
<P>
|
||||
Figure 3 shows a simple REXX program you can create to investigate this
|
||||
phenomenon while Figure 4 tables some results. The program creates a
|
||||
large number of zero-length files in a directory. Perform this test in
|
||||
a subdirectory to overcome FAT's restriction on a maximum of 512 entries
|
||||
in the root directory. Reformating and rebooting was performed before
|
||||
each test to ensure consistent conditions. With both FAT and HPFS, a
|
||||
1,536 KB lazy-writing cache with a maximum cacheable read/write size of
|
||||
8 KB was used. Note 1: with HPFS, a "zero-length" file consumes
|
||||
diskspace because there is always a FNODE sector associated with a
|
||||
file/directory, regardless of the file's contents. So 1,000 empty files
|
||||
consume 500 KB of space. Note 2: there is a timing slop of about 0.1
|
||||
seconds due to the 55 msec timer tick uncertainty affecting both the
|
||||
start time and stop time values.
|
||||
|
||||
<PRE>
|
||||
/* Create or open a large number of empty files in a directory */
|
||||
CALL Time 'R' /* Reset timer */
|
||||
|
||||
DO x = 1 TO 1000
|
||||
CALL STREAM 'file'||x, 'c', 'open' /* Will create if not exist */
|
||||
CALL STREAM 'file'||x, 'c', 'close'
|
||||
END
|
||||
|
||||
SAY Time('E') /* Report elapsed time */
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig 3: A REXX program to assess the directory searching and file
|
||||
creation speeds of FAT and HPFS.
|
||||
</FONT>
|
||||
|
||||
<PRE>
|
||||
Number of Files in a Directory
|
||||
|
||||
125 250 500 1000 2000 4000 4001
|
||||
->4100
|
||||
|
||||
FAT 1.7 3.4 8.0 23.4 99.4 468.4 26.6
|
||||
FAT (LW) 0.7 1.7 5.1 17.9 89.6 447.3 26.1
|
||||
|
||||
HPFS 7.4 14.7 30.7 62.9 129.0 262.6 7.5
|
||||
HPFS (LW) 0.5 1.0 2.2 4.5 9.0 18.3 0.5
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig 4: Timing results of the program in Figure 3. The beneficial effect
|
||||
of lazy writing on performance is clearly demonstrated. Tests were
|
||||
performed in an initially empty subdirectory except for the last one
|
||||
which adds 100 new files to a subdirectory already containing 4,000
|
||||
files.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
To investigate further, the full data set was plotted on a graph with
|
||||
logarithmic axes. Examine Figure 5. As you can see, HPFS' performance
|
||||
is reasonably linear (in y = a*x^b + c, b was actually 1.1) while FAT's
|
||||
performance appears to follow a third-order polynomial (y = a*x^3 +
|
||||
b*x^2 + c*x + d). It is apparent that FAT's write caching becomes less
|
||||
effective when many files are in a directory presumably because much
|
||||
time is being spent sifting through the FAT in memory. (Disk access was
|
||||
only occurring briefly about once a second based on the flashing of the
|
||||
HD light). HPFS' performance was dramatically improved in this test by
|
||||
the use of write caching. Again, disk access was about once a second
|
||||
(due to CACHE's /MAXAGE:1000 parameter). While, typically, most disk
|
||||
access will involve reading rather than writing, this graph shows how
|
||||
effective lazy writing is at hiding the extra work from the user. It is
|
||||
also apparent that HPFS handles large numbers of files well. We now
|
||||
turn to examining how this improvement is achieved.
|
||||
|
||||
<P>
|
||||
<A HREF="fig5.gif">
|
||||
<IMG WIDTH=100 HEIGHT=57 SRC="fig5_small.gif"></A>
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Fig. 5: Log-log graph comparing file system performance creating test
|
||||
files in a subdirectory. Extra data points shown. Number of files was
|
||||
increased using a cube-root-of-2 multiple. (Click for large version.)
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<H2>Directory Location and Fragmentation</H2>
|
||||
|
||||
<P>
|
||||
Subdirectories on a FAT disk are usually splattered all around it.
|
||||
Similarly, entries in a subdirectory may not all be in contiguous
|
||||
sectors on the disk. Searching a FAT system's directory structure can
|
||||
involve a large amount of HD seeking back and forth, i.e. more time.
|
||||
Sure, you can use a defragger option to move all the directories to the
|
||||
front of the disk, but this usually takes a lot of time to reshuffle
|
||||
everything and the next time you create a new subdirectory or add files
|
||||
to an existing subdirectory there will be no free space up the front so
|
||||
directory separation and fragmentation will occur again.
|
||||
|
||||
<P>
|
||||
HPFS takes a much better approach. On typical partitions (i.e. not
|
||||
very small ones) a directory band, containing many DIRBLKs, is placed at
|
||||
or near the seek centre (half the maximum cylinder number). On a 100 MB
|
||||
test partition the directory band starts at Cyl 48 (counting from 0) of
|
||||
a volume that spans 100 cylinders. Here 1,980 contiguous Directory
|
||||
sectors (just under 1 MB) were situated. Assuming 11 entries per
|
||||
Directory sector (44 entries per DIRBLK), this means that the first
|
||||
21,780 directory entries will be next to each other. So if a blind file
|
||||
search needs to be performed this can be done with just 1 or 2 long disk
|
||||
reads (assuming <20,000 files and 1-2 MB disk cache). The maximum
|
||||
size of the contiguous directory band appears to be 8,000 KB for about
|
||||
176,000 entries with 13-character names. Once the directory band is
|
||||
completely full new Directory sectors are scattered throughout the
|
||||
partition but still in four-sector DIRBLKs.
|
||||
|
||||
<P>
|
||||
Another important aspect of HPFS' directory band is its location. By
|
||||
being situated near the seek centre rather than at the very beginning
|
||||
(as in FAT), the average distance that the heads must traverse, when
|
||||
moving between files and directories, is halved. Of course, with lazy
|
||||
writing, traversals to frequently update a directory entry while writing
|
||||
to a temporary file, would be much reduced anyway.
|
||||
|
||||
<P>
|
||||
<H2>File Location and Fragmentation</H2>
|
||||
|
||||
<P>
|
||||
HPFS expends a lot of effort to keep a file either in one piece if
|
||||
possible or otherwise within a minimum number of pieces and close
|
||||
together on the disk so it can be retrieved in the minimum number of
|
||||
reads (remembering also that cache read-ahead can take in more than one
|
||||
nearby piece in the same read). Also, the seek distance, and hence time
|
||||
required to access extra pieces, is kept to an absolute minimum. The
|
||||
main design philosophy of HPFS is that mechanical head movement is a
|
||||
very time-consuming operation in CPU terms. So it is worthwhile doing
|
||||
more work looking for a good spot on the disk to place the file. There
|
||||
are many aspects to this and I'm sure there are plenty of nuances of
|
||||
which I'm ignorant.
|
||||
|
||||
<P>
|
||||
Files are stored in 8 MB contiguous runs of sectors known as data bands.
|
||||
Each data band has a four-sector (2 KB) freespace bitmap situated at
|
||||
either the band's beginning or end. Consecutive data bands have
|
||||
tail-to-head placement of the freespace bitmaps so that maximum
|
||||
contiguous filespace is 16 MB (actually 16,380 KB due to the presence of
|
||||
the bitmaps within the adjoining band). See Figure 6.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=403 HEIGHT=213 SRC="fig6.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Fig. 6: The basic data layout of an HPFS volume
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Near the start of the partition there is a list of the sectors where
|
||||
each of the freespace bitmaps commences. I'm sure that this small list
|
||||
would be kept loaded into memory for performance reasons. Having two
|
||||
small back-to-back bitmaps adjoining a combined 16 MB data band is
|
||||
advantageous when HPFS is looking for the size of each freespace region
|
||||
within bands, prior to allocating a large file. But it does mean that a
|
||||
fair number of seeks to different bitmaps might need to be performed on
|
||||
a well-filled disk, in search of a contiguous space. Or perhaps these
|
||||
bitmaps are also kept memory resident if the disk is not too big.
|
||||
|
||||
<P>
|
||||
A 2 GB file would be split into approximately 128 chunks of 16 MB, but
|
||||
these chunks are right after each other (allowing for the presence of
|
||||
the intervening 4 KB of back-to-back freespace bitmaps). So to refer to
|
||||
this file as "fragmented", while technically correct, would be
|
||||
misleading.
|
||||
|
||||
<P>
|
||||
As mentioned earlier, every file has an associated FNODE, usually right
|
||||
before the start of the file. The number of pieces a file is stored in
|
||||
are referred to as extents. A "zero-length" file has 0 extents; a
|
||||
contiguous file has 1 extent; a file of 2-8 extents is "nearly"
|
||||
contiguous (the extents should be close together).
|
||||
|
||||
<P>
|
||||
An FNODE sector contains:
|
||||
|
||||
<UL>
|
||||
<LI>The real filename length;
|
||||
<LI>The first 15 characters of the filename;
|
||||
<LI>Pointer to the directory LSN that contains this file;
|
||||
<LI>EAs (Extended Attributes) are completely stored within the FNODE
|
||||
structure if the total of the EAs is 145 bytes or less;
|
||||
<LI>0-8 contiguous sector runs (extents), organised as eight LSN
|
||||
run-starting-points (dword), run lengths (dword) and offsets into
|
||||
the file (dword).
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
A run can be up to 16 MB (back-to-back data bands) in size. If the file
|
||||
is too big or more fragmented than can be described in 8 extents, then
|
||||
an ALNODE (allocation block) is pointed to from the FNODE. In this case
|
||||
the FNODE structure changes so that it now contains up to 12 ALNODE
|
||||
pointers within the FNODE and each ALNODE can then point to either 40
|
||||
direct sector runs (extents) or to 60 further ALNODEs, and each of these
|
||||
lower-level ALNODEs could point to either... and so on.
|
||||
|
||||
<P>
|
||||
If ALNODEs are involved then a modified balanced tree structure called a
|
||||
B+tree is used with the file's FNODE forming the root of the structure.
|
||||
So only a two-level B+tree would be required to completely describe a 2
|
||||
GB (or smaller) file if it consists of less than 480 runs (12 ALNODEs *
|
||||
40 direct runs described in each ALNODE). Otherwise a 3-level structure
|
||||
would have no problems since it can handle up to 28,800 runs (12 ALNODEs
|
||||
* 60 further ALNODEs * 40 direct runs). It's difficult to imagine a
|
||||
situation where a four or higher level B+tree would ever be needed.
|
||||
|
||||
<P>
|
||||
Consider how much disk activity would be required to work out the layout
|
||||
of a 2 GB file under FAT and under HPFS. With FAT the full 128 KB of
|
||||
the FAT must be read to determine the file's layout. If this layout can
|
||||
be kept in the cache during the file access then fine. Otherwise the
|
||||
FAT would need to be reread one or more times (probably starting from
|
||||
the beginning on each reread). With HPFS, up to 361 sector reads, in a
|
||||
three-level B+tree structure, and possibly up to just 13 sector reads,
|
||||
in a two-level structure, would provide the information. The HPFS
|
||||
figures are maximums and the actual sector-read figure would most
|
||||
probably be much lower since HPFS was trying hard to reduce the number
|
||||
of runs when the file was written. Also if the ALNODEs are near each
|
||||
other then read-ahead would reduce the actual hits. Furthermore, OS/2
|
||||
will keep the file's allocation information resident in memory while the
|
||||
file is open, so no rereads would be needed.
|
||||
|
||||
<P>
|
||||
If you've ever looked at the layout of files on a HPFS partition, you
|
||||
may have been shocked to see the large gaps in the disk usage. This is
|
||||
FAT-coloured thinking. There are good reasons not to use the first
|
||||
available spot next to an existing file, particularly in a multitasking
|
||||
environment where more than one write operation can be occurring
|
||||
concurrently. HPFS uses three strategies here that I'm aware of.
|
||||
First, the destination of write operations involving new files will tend
|
||||
not to be near (preferably in a different band from) where an existing
|
||||
file is also being updated. Otherwise, fragmentation would be highly
|
||||
likely to occur.
|
||||
|
||||
<P>
|
||||
Second, 4 KB of extra space is allocated by the file system to the end
|
||||
of a file when it is created. Again the reason is to reduce the
|
||||
likelihood of fragmentation from other concurrent writing tasks.
|
||||
If not utilised, this space is recovered afterwards. To test this
|
||||
assertion, create the REXX cmdfile shown in Figure 7 and run it on an
|
||||
empty HPFS partition. (You can also do this on a partition with files
|
||||
in it but it is easier on an empty one.) Run it and when the "Press any
|
||||
key" message appears start up another OS/2 session and run CHKDSK (no
|
||||
switches) on the partition under examination. CHKDSK will get confused
|
||||
about the space allotted to the file open in the other session and will
|
||||
say it is correcting an allocation error (which it really isn't doing
|
||||
because you did not use the /F switch). Ignore this and notice that "4
|
||||
kilobytes are in 1 user files". Switch back to the other session and
|
||||
press Enter to close the file. Repeat and again run CHKDSK in the other
|
||||
session. Notice this time that no extra space is allocated since the
|
||||
file is being reopened rather than being created.
|
||||
|
||||
<PRE>
|
||||
/* Test to check the space
|
||||
preallocated to an open file */
|
||||
|
||||
CALL STREAM 'zerofile', 'c', 'open'
|
||||
/* Will create if it does not exist */
|
||||
'@pause'
|
||||
CALL STREAM 'zerofile', 'c', 'close'
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Fig. 7: A simple REXX program to demonstrate how HPFS allocates 4 KB of
|
||||
diskspace to a new file.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Third, if a program has been written to report the likely filesize to
|
||||
OS/2, or if you are copying an existing file (i.e. the final filesize
|
||||
is known) then HPFS will expend a great deal of effort to find a free
|
||||
space big enough to accommodate the file in one extent. If that is not
|
||||
possible then it looks for two free spaces half the size of the file and
|
||||
so on. Again this can result in two files in a directory not being next
|
||||
to each other on the disk.
|
||||
|
||||
<P>
|
||||
Since DOS and Windows programs are not written with preallocation space
|
||||
requesting in mind, they tend to be more likely candidates for
|
||||
fragmentation than properly written OS/2 programs. So, for example,
|
||||
using a DOS comms program to download a large file will often result in
|
||||
a fragmented file. Compared with FAT, though, fragmentation on heavily
|
||||
used HPFS volumes is very low, usually less than 1%. We'll consider
|
||||
fragmentation levels in more depth in Part 3.
|
||||
|
||||
<P>
|
||||
<H2>Other Matters</H2>
|
||||
|
||||
<P>
|
||||
It has also been written that the HPFS cache is smart enough to adjust
|
||||
the value of its sector read-ahead for each opened file based on the
|
||||
file's usage history or its type (Ray Duncan, 1989). It is claimed that
|
||||
EXE files and files that typically have been fully read in the past are
|
||||
given big read-aheads when next loaded. This is a fascinating concept
|
||||
but unfortunately it has not been implemented.
|
||||
|
||||
<P>
|
||||
Surprisingly, like other device drivers, HPFS is still 16-bit code. I
|
||||
think this is one of the few remaining areas of 16-bit code in Warp. I
|
||||
believe IBM's argument is that 32-bit code here would not help
|
||||
performance much as mechanical factors are the ones imposing the limits,
|
||||
at least in typical single-user scenarios.
|
||||
|
||||
<P>
|
||||
HPFS is run as a ring 3 task in the 80x86 processor protection mechanism
|
||||
i.e. at the application level. HPFS386 is a 32-bit version of HPFS
|
||||
that comes only with IBM LAN SERVER Advanced Version. HPFS386 runs in
|
||||
ring 0, i.e. at kernel level. This ensures the highest file system
|
||||
performance in demanding network situations. It can also provide much
|
||||
bigger caches than standard HPFS which is limited to 2 MB. There is a
|
||||
chance that this version will appear in a later release of Warp.
|
||||
|
||||
<P>
|
||||
OS/2 v2.x onwards also boosts the performance of FAT. This improvement,
|
||||
called "Super FAT", is a combination of 32-bit executable code and the
|
||||
mirroring of the FAT and directory paths in RAM. This requires a fair
|
||||
bit of memory. Also Super FAT speeds the search for free space by
|
||||
representing in memory in a bitmap used sectors in the FAT. This does
|
||||
help the performance but I think the results in Figure 4, which were
|
||||
performed using the Super FAT system, still highlight FAT's
|
||||
architectural weaknesses.
|
||||
|
||||
<P>
|
||||
You can easily tell whether a partition is formatted under HPFS or FAT. Just
|
||||
run DIR in the root directory. If "." and ".." directory entries are shown
|
||||
then HPFS is used [Unless the HPFS partition was formatted under Warp 4 -- Ed].
|
||||
|
||||
<P>
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
<P>
|
||||
HPFS does require 300-400 KB of memory to implement, so it's only
|
||||
suitable for OS/2 v2.1 systems with at least 12 MB or Warp systems with
|
||||
at least 8 MB. For partitions of 100 MB+ it offers definite technical
|
||||
advantages over FAT. By now you should have developed an understanding
|
||||
of how these improvements are achieved.
|
||||
|
||||
<P>
|
||||
In the next installment, we look at a shareware program to visually
|
||||
inspect the layout of a HPFS partition and a REXX program to dump the
|
||||
contents of a disk sector by specifying either decimal LSN, hexadecimal
|
||||
LSN, dword byte-order-reversed hexadecimal LSN (what you see when you
|
||||
look at a dword pointer in a hex dump) or Cyl/Hd/Sec coordinates. Other
|
||||
REXX programs will convert the data stored in the SuperBlock and the
|
||||
SpareBlock sectors into intelligible values. You should find it quite
|
||||
informative.
|
||||
1171
study/sabre/os/files/FileSystems/HPFS/hpfs2.html
Normal file
804
study/sabre/os/files/FileSystems/HPFS/hpfs3.html
Normal file
@@ -0,0 +1,804 @@
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 3: Fragmentation, Diskspace Bitmaps and Code Pages</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
<P>
|
||||
This article originally appeared in the May 1996 issue of Significant
|
||||
Bits, the monthly magazine of the Brisbug PC User Group Inc.
|
||||
|
||||
<P>
|
||||
This month we look at how HPFS knows which sectors are occupied and which ones
|
||||
are free. We examine the amount of file fragmentation on five HPFS volumes and
|
||||
also check out the fragmentation of free space. A program will be presented to
|
||||
show free runs and some other details. Finally, we'll briefly discuss Code
|
||||
Pages and look at a program to display their contents.
|
||||
|
||||
<P>
|
||||
<H2>How Sectors are Mapped on a HPFS Volume</H2>
|
||||
|
||||
<P>
|
||||
The sector usage on a HPFS partition is mapped in data band bitmap blocks.
|
||||
These blocks are 2 KB in size (four sectors) and are usually situated at either
|
||||
the beginning or end of a data band. A data band is almost 8 MB. (Actually
|
||||
8,190 KB since 2 KB is needed for its bitmap.) See Figure 1. The state of each
|
||||
bit in the block indicates whether or not a sector (HPFS' allocation unit) is
|
||||
occupied. If a bit is set (1) then its corresponding sector is free. If the
|
||||
bit is not set (0) than the sector is occupied. Structures situated within the
|
||||
confines of a data band such as Code Page Info & Data sectors, Hotfix
|
||||
sectors,
|
||||
the Root Directory DirBlk etc. are all marked as fully occupied within that
|
||||
band's usage bitmap.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=435 HEIGHT=257 SRC="fig1.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 1: The basic data layout of a HPFS volume.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Since each bit maps a sector, a byte maps eight sectors and the complete 2 KB
|
||||
block maps the 16,384 sectors (including the bitmap block itself) in a 8 MB
|
||||
band. And since two blocks can face each other, we arrive at the maximum
|
||||
possible extent (fragment) size of 16,380 KB. Examine Figure 2 now to see
|
||||
examples of file and freespace mapping.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=429 HEIGHT=302 SRC="fig2.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 2: The correspondence of the first five bytes in a data band's usage
|
||||
bitmap to the first 40 sectors in the band.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
In this example we see 23 occupied sectors ("u") and 4 unoccupied areas (".")
|
||||
which we will refer to as "freeruns" [of sectors]. At one extreme, the 23
|
||||
sectors might belong to one file (here in four extents) while at the other
|
||||
extreme we might have the FNODEs of 23 "zero-length" files. (Every file and
|
||||
directory entry on a HPFS volume must have an FNODE sector.)
|
||||
|
||||
<P>
|
||||
The advantages of the bitmap approach are twofold. First, the small allocation
|
||||
unit size on a HPFS volume means greatly reduced allocation unit wastage
|
||||
compared to large FAT partitions. Second, the compact mapping structure makes
|
||||
it feasible for HPFS to quickly search a data band for enough free space to slot
|
||||
in a file of known size, in one piece if possible. For example, as just
|
||||
mentioned HPFS can map 32,760 allocation units with just 4 KB of bitmaps whereas
|
||||
a 16-bit FAT structure requires 64 KB (per FAT copy) to map 32,768 allocation
|
||||
units.
|
||||
|
||||
<P>
|
||||
<H2>A Fragmentation Analysis</H2>
|
||||
|
||||
<P>
|
||||
In this section we'll examine the level of fragmentation on the five HPFS
|
||||
partitions of my first HD. Look at Figure 3. Notes:
|
||||
|
||||
<P>
|
||||
1. A time-since-last-defrag figure of "Never" means that I've never run a
|
||||
defragger across this partition since upgrading to OS/2 Warp 118 days ago. This
|
||||
value is stored in the SuperBlock (LSN 16) and was determined by using the
|
||||
ShowSuperSpare REXX program featured in Part 2.
|
||||
|
||||
<P>
|
||||
2. The fragmentation levels were reported by the wondrous FST (freeware) with
|
||||
"FST -n check -f C:" while the names of the fragmented files and their sizes
|
||||
came from the GammaTech Utilities (commercial) "HPFSOPT C: -u -d -o1 -l
|
||||
logfile". You can also use the Graham Utilities (commercial) "HPFS-EXT C: -s".
|
||||
|
||||
<P>
|
||||
3. The high number of files with 0 data extents on C: is due to the presence of
|
||||
the WPS folders on this drive. Each of these has "zero" bytes in the main file
|
||||
but they usually have bytes in EAs.
|
||||
|
||||
<P>
|
||||
4. Files with 0 or 1 extents are considered to fully contiguous, so I've placed
|
||||
them in one grouping.
|
||||
|
||||
<P>
|
||||
5. Files with 2-8 extents are considered to be "nearly" contiguous" since the
|
||||
fragments will usually be placed close together on the disk and also because a
|
||||
list of the location and length of up to 8 extents can be kept in a file's FNODE
|
||||
sector. This list will be kept memory resident while the file is open. Note 1:
|
||||
the extents themselves can not be kept memory resident since, theoretically,
|
||||
they could be up to 8*16,380 KB in size. But no non-data disk reads, after the
|
||||
initial read of the FNODE, would be required to work with the file. Note 2:
|
||||
under some circumstances, the 8 extents, if small enough, could be kept memory
|
||||
resident in the sense that they could be held in HPFS' cache. We will consider
|
||||
FNODEs in detail in a later installment.
|
||||
|
||||
<P>
|
||||
6. Files with more than 8 extents have too many fragments to be listed in their
|
||||
FNODEs. Instead an B+tree allocation sector structure (an ALSEC) is used to map
|
||||
the extents. The sector mappings are small enough to keep memory resident while
|
||||
the file is open. ALSECs will be covered in a latter installment.
|
||||
|
||||
<P>
|
||||
7. EAs are usually not fragmented since, in the current implementation of OS/2,
|
||||
the total EA size associated with any one file is only 64 KB. If a file has EAs
|
||||
in 0 extents then the EA information is stored completely within the FNODE
|
||||
sector. (There is space in the FNODE for up to 145 bytes of "internal" EAs.)
|
||||
In all other cases on my system they currently stored in single, external runs
|
||||
of sectors. EAs will be covered in later installments.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=443 HEIGHT=490 SRC="fig3.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 3: Fragmentation analysis of five HPFS partitions.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
We now turn to the topic of what circumstances are leading to file fragmentation
|
||||
on these partitions.
|
||||
|
||||
<P>
|
||||
C: _ The OS/2 system partition. I've run out of space on this drive on
|
||||
occasions. Activity here occurs though the running of Fixpacks (FP 16 and then
|
||||
FP 17 were run), INI maintenance utilities and driver upgrades. There is really
|
||||
nothing of concern here. Most HPFS defraggers suggest not trying to defrag
|
||||
files that have less than 2 or 3 extents since you run the risk of fragmenting
|
||||
the free space. We will return to this topic shortly.
|
||||
|
||||
<P>
|
||||
D: _ My main work area and the location of communications files. I use the DOS
|
||||
comms package TELEMATE because I've always liked its features (although OS/2 has
|
||||
to work hard to handle its modem access during a file transfer - OS/2 comms
|
||||
programs, in general, are much less demanding of the CPU's attention). The
|
||||
other major comms package I use is OS/2 BinkleyTerm v2.60 feeding OS/2 Squish
|
||||
message databases. The fragmented files consist mainly of files downloaded by
|
||||
TELEMATE (DOS comms programs do not inform HPFS, ahead of time, of how much
|
||||
space the downloaded file will occupy) and Squish databases (*.SQD). The drive
|
||||
was defragged 53 days ago at which time no special effort was made to reduce
|
||||
file fragmentation below 2-3 extents, accounting for the presence of 245 files
|
||||
with two extents. This really is an insignificant amount regardless of what the
|
||||
4% figure may lead you to believe.
|
||||
|
||||
<P>
|
||||
The most fragmented file on this partition is a 150 KB BinkleyTerm logfile with
|
||||
30 extents. The main reason I can see for fragmentation in this case is that
|
||||
the file is frequently being updated with information while file transfers are
|
||||
in progress. The Squish databases are also prone to fragmentation. Out of a
|
||||
total of 25 database files there were 8, averaging 500 KB each, with a average
|
||||
of 15 extents.
|
||||
|
||||
<P>
|
||||
E: _ The fragmentation here was insignificant apart from a single 2.8 MB
|
||||
executable Windows program that has had a DOS patch program run over it,
|
||||
resulting in 38 fragments. The 2-extent files were mainly data files that are
|
||||
produced by this same Windows package (being run under WIN-OS2).
|
||||
|
||||
<P>
|
||||
F: _ Almost no fragmentation since this partition is reserved for DOS programs
|
||||
and I don't use them much.
|
||||
|
||||
<P>
|
||||
G: _ My second major work partition. Fragmentation is low and unlikely to go
|
||||
much lower since 2 extents is considered below the point of defragger
|
||||
involvement.
|
||||
|
||||
<P>
|
||||
The conclusions to be drawn from the above is that, if you don't get too hot
|
||||
under the collar about some files having 2 or 3 extents then there will
|
||||
generally be little need to worry about fragmentation under HPFS. Only certain
|
||||
types of files (some comms/DOS/Windows) will be candidates. And keeping
|
||||
partitions less than 80% full should help reduce general fragmentation as well.
|
||||
|
||||
<P>
|
||||
<H2>Defragmenting Files</H2>
|
||||
|
||||
<P>
|
||||
Since fragmentation is a relatively minor concern under HPFS there is not much
|
||||
of an argument for purchasing OS/2 utilities based mainly on their ability to
|
||||
defragment HPFS drives, especially since it's not hard to defragment files
|
||||
yourself. You see, providing there is enough contiguous freespace on a volume,
|
||||
the mere act of copying the files to a temporary directory, deleting the
|
||||
original and then moving the files back will usually eliminate, or at least
|
||||
reduce fragmentation since HPFS, knowing the original filesize, will look for a
|
||||
suitably sized freespace. The success of this technique is demonstrated in
|
||||
Figure 4 where 25 Squish database files (*.SQD) totalling 5.7 MB where shuffled
|
||||
about on D:. Note: don't use the MOVE command to initially transfer the files
|
||||
to the temp directory since this will just alter the directory entry rather than
|
||||
actually rewriting the files.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=159 HEIGHT=232 SRC="fig4.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 4: Number of extents in 25 SQD files before and after the defrag process
|
||||
described in the text.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
I've used the GU's HPFS-EXT to report these figures. This is freely available
|
||||
in the GULITE demo package. Note: the fully functional HPFSDFRG is also in
|
||||
this package but I wanted to show that it's not that hard to do this by hand.
|
||||
HPFSDFRG does much the same as I did except that you can specify the
|
||||
optimisation threshold (minimum number of extents before a file becomes a
|
||||
candidate) and it will retry the copying operation up to ten times if there are
|
||||
more extents after the operation than before it (due to heavily fragmented
|
||||
freespace).
|
||||
|
||||
<P>
|
||||
<H2>The Fragmentation of Freespace</H2>
|
||||
|
||||
<P>
|
||||
Another significant aspect of HPFS' fragmentation resistance is how well the FS
|
||||
keeps disk freespace in big, contiguous chunks. If the current files on a
|
||||
partition are relatively fragmentation free but the remaining freespace is
|
||||
arranged in lots of small chunks then there is a good change that new files will
|
||||
be fragmented. You can check this with "FST -n info -f C:". This produces a
|
||||
table that counts the number of freespace extents that are 1, 2-3, 4-7, 8-15,
|
||||
... 16384-32767 sectors long. In my opinion though it is more important to
|
||||
consider the product of the actual extent size by their frequency since the
|
||||
presence of numerous 1-extent spaces are not important if there are still a
|
||||
number of large spaces available.
|
||||
|
||||
<P>
|
||||
Figure 5 shows the output of the REXX program ShowFreeruns.cmd. The partition
|
||||
of 100 MB is almost empty. The display shows the location of the 2 KB block
|
||||
that holds the list of the starting LSNs of each bitmap block (this figure comes
|
||||
from the dword at offset 18h in the SuperBlock), the location of each bitmap
|
||||
block on the left and the sector size and location of freespace on the right.
|
||||
As you see, this partition has 13 data bands, 6 of which face each other. A
|
||||
version of ShowFreeruns.cmd that only outputs the run size was used to generate
|
||||
a list of figures. This list was loaded into a spreadsheet, sorted and a
|
||||
frequency distribution performed. See Figure 6. You can see that C: has no
|
||||
large areas remaining, D: has the majority of its freespace in the 4 MB < 8 MB
|
||||
range and that E:, F: and G: have kept large majorities of their freespace in
|
||||
very big runs. Overall, this is quite good performance.
|
||||
|
||||
<PRE>
|
||||
Inspecting drive O:
|
||||
|
||||
List of Bmp Sectors: 0x00018FF0 (102384)
|
||||
|
||||
Space-Usage Bitmap Blocks:
|
||||
Freespace Runs:
|
||||
|
||||
0x00000014-00000017 (20-23)
|
||||
0x00007FFC-00007FFF (32764-32767)
|
||||
130-32763 (#1:32634)
|
||||
|
||||
0x00008000-00008003 (32768-32771)
|
||||
0x0000FFFC-0000FFFF (65532-65535)
|
||||
32772-65531 (#2:32760)
|
||||
|
||||
0x00010000-00010003 (65536-65539)
|
||||
0x00017FFC-00017FFF (98300-98303)
|
||||
65540-81919 (#3:16380)
|
||||
81926-98291 (#4:16366)
|
||||
|
||||
0x00018000-00018003 (98304-98307)
|
||||
0x0001FFFC-0001FFFF (131068-131071)
|
||||
100369-102383 (#5:2015)
|
||||
102400-131067 (#6:28668)
|
||||
|
||||
0x00020000-00020003 (131072-131075)
|
||||
0x00027FFC-00027FFF (163836-163839)
|
||||
131076-163835 (#7:32760)
|
||||
|
||||
0x00028000-00028003 (163840-163843)
|
||||
0x0002FFFC-0002FFFF (196604-196607)
|
||||
163844-196603 (#8:32760)
|
||||
|
||||
0x00030000-00030003 (196608-196611)
|
||||
196612-204767 (#9:8156)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 5: Output from the ShowFreeruns.cmd REXX program.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=429 HEIGHT=378 SRC="fig6_3.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 6: Freespace analysis on five HPFS partitions.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<H2>The ShowFreeruns Program</H2>
|
||||
|
||||
<P>
|
||||
Like other programs in this series, ShowFreeruns.cmd (see Figure 7) uses
|
||||
SECTOR.DLL to read a sector off a logical drive. I was motivated to design this
|
||||
program after seeing the output of the GU's "HPFSINFO C: -F". On a one-third
|
||||
full 1.2 GB partition, the program presented here takes 17 secs compared to
|
||||
HPFSINFO's time of 26 secs. HPFSINFO also shows the CHS (Cyl/Hd/Sec)
|
||||
coordinates of each run. I was not interested in these but instead display the
|
||||
freerun's size. HPFSINFO also displays the meaning of what's in the SuperBlock
|
||||
and the SpareBlock. If you want to do this, you can include the code from
|
||||
ShowSuperSpare.cmd from Part 2 and it will only add an extra 0.5 secs to the
|
||||
time. The performance then, for a interpreted program (REXX), is quite good and
|
||||
was achieved primarily through a speed-up technique to be discussed shortly.
|
||||
Moreover, HPFSINFO consistently overstates the end of each freerun by 1 and it
|
||||
sometimes does not show the last run (e.g. on C: it states that there are 366
|
||||
freeruns but only shows 365 of them). This last bug appears to be caused by the
|
||||
last freerun continuing to the end of the partition. My design accounts for
|
||||
this situation.
|
||||
|
||||
<PRE>
|
||||
/* Shows bitmap locations and free space runs */
|
||||
ARG drive . /* First parm should always be drive */
|
||||
|
||||
IF drive = '' THEN CALL HELP
|
||||
parmList = "? /? /H HELP A: B:"
|
||||
IF WordPos(drive, parmList) \= 0 THEN CALL Help
|
||||
|
||||
/* Register external DLL functions */
|
||||
CALL RxFuncAdd 'ReadSect','Sector','ReadSect'
|
||||
CALL RxFuncAdd 'RxDate','RexxDate','RxDate'
|
||||
|
||||
/* Initialise Lookup Table*/
|
||||
DO exponent = 0 TO 7
|
||||
bitValue.exponent = D2C(2**exponent)
|
||||
END exponent
|
||||
|
||||
secString = ReadSect(drive, 16) /*Read Superblk sec*/
|
||||
freespaceBmpList = C2D(Reverse(Substr(secString,25,4)))
|
||||
totalsecs = C2D(Reverse(Substr(secString,17,4)))
|
||||
|
||||
'@cls'
|
||||
SAY
|
||||
SAY "Inspecting drive" drive
|
||||
SAY
|
||||
/* LSN 25 = list of bitmap blocks */
|
||||
CALL ShowDword " List of Bitmap secs",25
|
||||
|
||||
startOfListBlk = 0
|
||||
startOfBlk = 0
|
||||
bmpListBlk = ""
|
||||
bmpBlk = ""
|
||||
getFacingBands = 0
|
||||
runNumber = 0
|
||||
byteOffset = 0
|
||||
runNumber = 0
|
||||
/* Read in 4 secs of the list of sec-usage bmp blks */
|
||||
DO secWithinBlk = freespaceBmpList TO freespaceBmpList+3
|
||||
temp = StartOfListBlk + secWithinBlk
|
||||
bmpListBlk = bmpListBlk||ReadSect(drive, temp)
|
||||
END secWithinBlk
|
||||
|
||||
SAY
|
||||
SAY "Space-Usage Bitmap Blocks:"
|
||||
SAY " Freespace Runs:"
|
||||
|
||||
/* Use dword pointers to bmps to read in 2KB bmp blks */
|
||||
DO listOffset = 1 TO 2048 BY 4
|
||||
startDecStr = C2D(Reverse(Substr(bmpListBlk,ListOffset,4)))
|
||||
IF startDecStr = 0 THEN /* No more bmps listed */
|
||||
DO
|
||||
IF getFacingBands = 1 THEN
|
||||
DO /* Last data band had no facing data band */
|
||||
bmpSize = 2048
|
||||
CALL DetermineFreeruns
|
||||
LEAVE
|
||||
END
|
||||
|
||||
LEAVE
|
||||
END
|
||||
|
||||
/*Display a blank line when a new facing band occurs*/
|
||||
IF (ListOffset+7//8 = 0 THEN SAY
|
||||
|
||||
CALL ShowBmpBlk listOffset
|
||||
DO secWithinBlk = 0 TO 3
|
||||
temp = StartOfBlk + secWithinBlk
|
||||
bmpBlk = bmpBlk||ReadSect(drive, temp)
|
||||
END secWithinBlk
|
||||
|
||||
getFacingBands = getFacingBands + 1
|
||||
IF getFacingBands = 2 THEN /* Wait until you get both */
|
||||
DO /* bmps for the facing data*/
|
||||
bmpSize = 4096 /* bands since maximum extent*/
|
||||
CALL DetermineFreeruns /* length is 16,380 KB */
|
||||
byteOffset = byteOffset+4096
|
||||
getFacingBands = 0
|
||||
bmpBlk = ""
|
||||
END
|
||||
END listOffset
|
||||
|
||||
EXIT /**************EXECUTION ENDS HERE**************/
|
||||
|
||||
|
||||
FourBytes2Hex: /* Given offset, return dword */
|
||||
ARG startPos
|
||||
rearranged = Reverse(Substr(secString,startPos,4))
|
||||
RETURN C2X(rearranged)
|
||||
|
||||
|
||||
ShowDword: /* Display dword and dec equivalent */
|
||||
PARSE ARG label, offset
|
||||
hexStr = FourBytes2Hex(offset)
|
||||
SAY label": 0x"hexStr "("X2D(hexStr)")"
|
||||
RETURN
|
||||
|
||||
|
||||
ShowBmpBlk:
|
||||
/* Show start-end of freespace runs in hex & dec */
|
||||
PARSE ARG offset
|
||||
endDecStr = C2D(Reverse(Substr(bmpListBlk,offset,4)))+3
|
||||
SAY " 0x"D2X(startDecStr,8)"-"D2X(endDecStr,8)
|
||||
" ("startDecStr"-"endDecStr")"
|
||||
startOfBlk = startDecStr
|
||||
RETURN
|
||||
|
||||
|
||||
DetermineFreeruns:
|
||||
runStatus = 0
|
||||
oldchar = ''
|
||||
/* Check 128 secs at a time to speed up operation */
|
||||
DO para = 1 to bmpSize BY 16
|
||||
/* 16 bytes*8 secs/byte = 128 secs per para scanned */
|
||||
char = Substr(bmpBlk,para,16)
|
||||
IF char = 'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF'x &,
|
||||
runstatus = 1 THEN ITERATE para
|
||||
IF char = '00000000000000000000000000000000'x &,
|
||||
runstatus = 0 THEN ITERATE para
|
||||
/* Part of paragraph has run start/end
|
||||
so check a byte (8 secs) at a time. */
|
||||
DO byte = para TO para + 15
|
||||
char = Substr(bmpBlk,byte,1)
|
||||
IF char > '0'x THEN /* 1 or more free secs */
|
||||
DO
|
||||
IF char = 'FF'x THEN /* 8 unoccupied secs */
|
||||
IF runStatus = 1 THEN /* Run is in progress */
|
||||
NOP
|
||||
ELSE /* Run starts on 8 sec boundary */
|
||||
DO
|
||||
startByte = byte + byteOffset
|
||||
startBitPos = 0
|
||||
runStatus = 1 /* Start run determination */
|
||||
END
|
||||
ELSE
|
||||
CALL DetermineBit /* Partial usage of 8 secs */
|
||||
END
|
||||
ELSE
|
||||
DO /* All 8 secs are used */
|
||||
IF runStatus = 1 THEN
|
||||
DO
|
||||
endByte = byte + byteOffset
|
||||
endBitPos = -1 /* Run ends with prior sec */
|
||||
CALL ShowRun
|
||||
END
|
||||
END
|
||||
END byte
|
||||
END para
|
||||
|
||||
IF runStatus = 1 THEN /* Freespace at end of part. */
|
||||
DO
|
||||
endByte = 9999999999 /* Larger than # of secs in */
|
||||
endBitPos = 0 /* max. possible part.(512GB) */
|
||||
CALL ShowRun /* so ShowRun will set runEnd */
|
||||
/* to last LSN in this part. */
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
DetermineBit: /* Free/occupied usage within 8 sec blk */
|
||||
DO bitPos = 0 TO 7
|
||||
IF runStatus = 0 THEN
|
||||
DO /* No run currently in progress */
|
||||
IF BitAnd(char, bitValue.bitPos) > '0'x THEN
|
||||
DO /* sec is free */
|
||||
startByte = byte + byteOffset
|
||||
startBitPos = bitPos
|
||||
runStatus = 1
|
||||
END
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
IF BitAnd(char, bitValue.bitPos) = '0'x THEN
|
||||
DO /* sec is used */
|
||||
endByte = byte + byteOffset
|
||||
/* When a run ends, the sec before the first
|
||||
used one is the last sec in the freerun. */
|
||||
endBitPos = bitPos - 1
|
||||
CALL ShowRun
|
||||
END
|
||||
END
|
||||
END bitPos
|
||||
RETURN
|
||||
|
||||
|
||||
ShowRun:
|
||||
/* Display freerun start-end secs & reset run status */
|
||||
runNumber = runNumber + 1
|
||||
runStart = (startByte - 1) * 8 + startBitPos
|
||||
runEnd = (endByte - 1) * 8 + endBitPos
|
||||
|
||||
IF runEnd > totalSecs THEN runEnd = TotalSecs - 1
|
||||
IF runStart \= runEnd THEN /* More than 1 sec is free */
|
||||
DO
|
||||
run = runStart"-"runEnd
|
||||
run = Left(run||Copies(" ",14),15)
|
||||
SAY Copies(" ",40) run "(#"runNumber":"runEnd-RunStart+1")"
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
run = Left(runStart||Copies(" ",14),15)
|
||||
SAY Copies(" ",40) run "(#"runNumber":1)"
|
||||
END
|
||||
|
||||
runStatus = 0
|
||||
RETURN
|
||||
|
||||
|
||||
Help:
|
||||
SAY
|
||||
SAY "Purpose:"
|
||||
SAY " ShowFreeruns displays the location of the
|
||||
sec-usage bitmap blocks" /* Wrapped long line */
|
||||
SAY " and the location and extent of free space runs."
|
||||
SAY
|
||||
SAY "Example:"
|
||||
SAY " ShowFreeruns C:"
|
||||
SAY
|
||||
EXIT
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 7: The ShowFreeruns.cmd REXX program. Requires SECTOR.DLL. Note that
|
||||
the long SAY line (line 40) should include the next line as well. (SAY clauses
|
||||
can't be continued on to the next line with a comma.)
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
Since a sector is mapped by a bit, the program often needs to check the status
|
||||
of a bit within a bitmap's byte. This is done using the BITAND(string1,
|
||||
string2) inbuilt function. In this design string 1 holds the byte to be
|
||||
examined and string 2 holds a character that only has the corresponding bit set.
|
||||
Rather than having to work out the character for string 2 each time BITAND() is
|
||||
used, we instead precalculate the eight characters and then store them in the
|
||||
BitValue. compound variable for later use.
|
||||
|
||||
<P>
|
||||
The next step is to read in the SuperBlock and from it get the location of the
|
||||
list of bitmap sectors and the total number of sectors. The later value is
|
||||
required so we know when we've reached the end of the partition.
|
||||
|
||||
<P>
|
||||
We then read in the four sectors of the block holding the list of bitmaps. The
|
||||
list consists of dwords that store the starting LSN of each bitmap block. 128
|
||||
dwords can fit in each sector of the list so the four sectors of the list can
|
||||
hold 512 bitmap block LSNs. Now a bitmap block maps 8 MB of diskspace so this
|
||||
'lite' version is only good when dealing with a partition of less than 4 GB.
|
||||
(Earlier works refer to the maximum partition size as 512 GB but in the recent
|
||||
"Just Add OS/2 Warp" package, in its technical section, it is stated that the
|
||||
maximum partition size is 64 GB.) I won't be able to check this aspect of the
|
||||
design until I get a HD bigger than 4 GB and succumb to the mad urge to
|
||||
partition it as one volume.
|
||||
|
||||
<P>
|
||||
The end of the list is indicated by the first occurrence of 0000h. The list of
|
||||
the 100 MB partition shown in Figure 5 contains only 13 dwords since it has 13
|
||||
data bands so, in a typical case, you should not expect to find much data stored
|
||||
in this block.
|
||||
|
||||
<P>
|
||||
A freerun can be bigger than a data band since pairs of bands face each other,
|
||||
so we consider two bands at a time, unless we reach the end of the partition
|
||||
without a facing band. Once we have a data region we call the DetermineFreeruns
|
||||
procedure. Here we examine the two, combined data bitmaps (unless it's a solo
|
||||
band at the end). In the initial design I looked at each byte in the 4 KB
|
||||
bitmap combination to see it if it was either 00h (all eight sectors used) or
|
||||
FFh (all eight sectors free). Typically, you will find lots of occupied or free
|
||||
sectors together, so checking eight at a time speeds up the search. Only when
|
||||
the byte was neither of these is a bit-level search required.
|
||||
|
||||
<P>
|
||||
However, the speed of this version was poor, with the search though each byte of
|
||||
the 322 KB of bitmaps for the 161 databands in the 1.2 GB partition taking a
|
||||
total of 104 secs. The obvious solution was to extend the optimisation method
|
||||
to a second, higher level by checking more bytes first to see if they were all
|
||||
set or clear. I settled on 16 bytes which covers 128 sectors (64 KB) of
|
||||
diskspace at a time and this resulted in the final time of 17 secs. Further
|
||||
experiments with larger (64 byte) groups and also with third-level optimisation
|
||||
did not show much improvement with my mix of partitions but your situation may
|
||||
warrant further experimentation.
|
||||
|
||||
<P>
|
||||
<H2>Code Pages</H2>
|
||||
|
||||
<P>
|
||||
Different languages have different character sets. Code Pages (CPs) are used to
|
||||
map an ASCII character to the actual character. CP tables reside in
|
||||
COUNTRY.SYS. They are also present on a HPFS volume and every directory entry
|
||||
(DIRENT) includes a CP index value.
|
||||
|
||||
<P>
|
||||
CPs are used to map character case (i.e. in a foreign character set the
|
||||
relationship between lower and upper-case characters) and for collating
|
||||
sequences used when sorting. As mentioned in Part 1, HPFS directories use a
|
||||
B-tree structure which, as part of its operation, always store file/directory
|
||||
names in sorted order. Remember that HPFS is not case-sensitive (including when
|
||||
sorting) but it preserves case.
|
||||
|
||||
<P>
|
||||
The European-style language (including English) have relatively straightforward
|
||||
Single-Byte Character Sets (SBCS) i.e. one character is represented by one
|
||||
byte. Asian character sets typically have many characters so they require two
|
||||
bytes per character (DBCS).
|
||||
|
||||
<P>
|
||||
The first 128 characters in all ASCII CPs are the same so the CP tables on the
|
||||
disk only map ASCII 128-255.
|
||||
|
||||
<P>
|
||||
The SpareBlock holds the LSN of the first CP Info sector. There is a header
|
||||
followed by up to 31 16-byte CP Info Entries. There is provision for more than
|
||||
one CP Info sector which could hold CP Info Entries 31-61 (counting from 0).
|
||||
Why so many different CPs are catered for I have no idea since I've been unable
|
||||
to have more than two loaded at a time. In Australia we typically use CP437
|
||||
(standard PC) - Country 061 and CP850 (multilingual Latin-1) - Country 000. The
|
||||
layout of a CP Info sector is shown in Figure 8.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=431 HEIGHT=400 SRC="fig8.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 8: The layout of a Code Page Infomation Sector.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
The CP Info Entry contains the LSN where this entry's CP mapping table is
|
||||
stored. This sector is a CP Data Sector. As well as a header there is enough
|
||||
space for up to three 128-byte CP maps per sector. Figure 9 shows the layout of
|
||||
a CP Data Sector.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=431 HEIGHT=450 SRC="fig9.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 9: The layout of a Code Page Data Sector.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
<H2>The CP.cmd Program</H2>
|
||||
|
||||
<P>
|
||||
Figure 10 shows the display produced by the REXX CP.cmd program (Figure 11).
|
||||
I've stopped it before it reached ASCII 255. Normally, the output will scroll
|
||||
off the screen, so either pause it or send it to the printer. If the mapped
|
||||
character has the same value as its ASCII value the word "same" is displayed
|
||||
instead to reduce clutter.
|
||||
|
||||
<P>
|
||||
<IMG WIDTH=430 HEIGHT=320 SRC="fig10.gif">
|
||||
|
||||
<P>
|
||||
<FONT SIZE=2>
|
||||
Figure 10: Partial output from the CP.cmd program. List continues on to ASCII
|
||||
255.
|
||||
</FONT>
|
||||
|
||||
<PRE>
|
||||
/* Decodes CP info & CP data sectors on a HPFS volume */
|
||||
ARG drive . /* First parm should always be drive */
|
||||
IF drive = '' | drive = "?" | drive = "HELP",
|
||||
| drive = "A:" | drive = "B:" THEN CALL Help
|
||||
|
||||
CALL RxFuncAdd 'ReadSect','Sector','ReadSect' /* In SECTOR.DLL */
|
||||
secString = ReadSect(drive,17) /* SpareBlock is LSN 17 */
|
||||
'@cls'
|
||||
SAY
|
||||
SAY "Inspecting drive" drive
|
||||
SAY
|
||||
|
||||
/* Offset 33 in Spareblock contains dword of CP info LSN */
|
||||
cpInfoSec = C2D(Reverse(Substr(secString,33,2)))
|
||||
secString = ReadSect(drive,cpInfoSec) /* Load CP info sec */
|
||||
numOfCodePages = C2D(Reverse(Substr(secString,5,2)))
|
||||
prevDataSec = ''
|
||||
|
||||
SAY "CODE PAGE INFORMATION (sector" cpInfoSec"):"
|
||||
SAY "Signature Dword: 0x"FourChars2Hex(1)
|
||||
SAY " CP# Ctry Code Code Page CP Data Sec Offset"
|
||||
|
||||
DO x = 0 TO numOfCodePages-1
|
||||
hexCountry = TwoChars2Hex((16*x)+17)
|
||||
decCountry = Right('00'X2D(hexCountry),3)
|
||||
cp = TwoChars2Hex((16*x)+19)
|
||||
country.x = X2D(cp)
|
||||
hexSec = FourChars2Hex((16*x)+25)
|
||||
decSec = X2D(hexSec)
|
||||
cpDataSec = decSec
|
||||
/* Since up to 3 CP tables can fit in 1 CP data sec,
|
||||
only read in a new data sec when the need arises. */
|
||||
IF cpDataSec \= prevDataSec THEN
|
||||
DO
|
||||
dataSecString = ReadSect(drive,cpDataSec)
|
||||
prevDataSec = cpDataSec
|
||||
END
|
||||
|
||||
offset = C2D(Reverse(Substr(dataSecString,(2*x)+21,2)))
|
||||
start = offset + 1
|
||||
SAY " " x " 0x"hexCountry "("decCountry") 0x"cp "("X2D(cp)") 0x"
|
||||
hexSec "("decSec") 0x"D2X(offset) "("offset")"
|
||||
/* Wrapped long line */
|
||||
/* Store table contents of each CP in an array */
|
||||
DO y = 128 TO 255
|
||||
char = Substr(dataSecString,start+6+y-18,1)
|
||||
IF C2D(char) \= y THEN
|
||||
array.x.y = Format(C2D(char),4) "("char")"
|
||||
ELSE
|
||||
array.x.y = " same "
|
||||
END y
|
||||
END x
|
||||
|
||||
/* Work out title line based on number of CPs */
|
||||
titleLine = " ASCII "
|
||||
DO x = 0 TO numOfCodePages-1
|
||||
titleLine = titleLine " CP" country.x
|
||||
END x
|
||||
SAY
|
||||
SAY titleLine
|
||||
|
||||
/* Display each table entry based on number of CPs */
|
||||
DO y = 128 TO 255
|
||||
dispLine = ''
|
||||
DO x = 0 TO numOfCodePages-1
|
||||
dispLine = dispLine" "array.x.y
|
||||
END x
|
||||
SAY "" y "("D2C(y)"):" dispLine
|
||||
END y
|
||||
|
||||
EXIT /****************EXECUTION ENDS HERE****************/
|
||||
|
||||
|
||||
FourChars2Hex:
|
||||
ARG offset
|
||||
RETURN C2X(Reverse(Substr(secString,offset,4)))
|
||||
|
||||
|
||||
TwoChars2Hex:
|
||||
ARG offset
|
||||
RETURN C2X(Reverse(Substr(secString,offset,2)))
|
||||
|
||||
|
||||
Help:
|
||||
SAY "Purpose:"
|
||||
SAY " CP decodes the CodePage Directory sector &"
|
||||
SAY " the CodePage sector on a HPFS volume"
|
||||
SAY
|
||||
SAY "Example:"
|
||||
SAY " CP C:"
|
||||
EXIT
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 11: The CP.cmd REXX program. Requires SECTOR.DLL.
|
||||
</FONT>
|
||||
|
||||
<P>
|
||||
While REXX does not support arrays it does have compound variables and I've used
|
||||
a CV called "array" to store the contents of each CP's mapping table. The
|
||||
design only deals with the first 31 CP Info entries (that should be more than
|
||||
enough anyway) and accommodates additional CPs by adding new columns to the
|
||||
display.
|
||||
|
||||
<P>
|
||||
Armed with this printout you can experiment with different collating sequences
|
||||
when switching CPs. You can check out your current CP by typing "CHCP" and then
|
||||
switch to a different CP by issuing, say, "CHCP 850". I used "REM >
|
||||
File[Alt-nnn]" to create zero-length files, with one or more high-order ASCII
|
||||
characters in their filenames, as test fodder.
|
||||
|
||||
<P>
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
<P>
|
||||
In this installment you've learned how to decode the data band usage bitmaps
|
||||
contents and how to display the contents of the Code Page mapping tables. Next
|
||||
time we'll examine B-trees, DIRBLKs and DIRENTs.
|
||||
1006
study/sabre/os/files/FileSystems/HPFS/hpfs4.html
Normal file
932
study/sabre/os/files/FileSystems/HPFS/hpfs5.html
Normal file
@@ -0,0 +1,932 @@
|
||||
|
||||
<H1>Inside the High Performance File System</H1>
|
||||
<H2>Part 5: FNODEs, ALSECs and B+trees</H2>
|
||||
Written by Dan Bridges
|
||||
|
||||
<H2>Introduction</H2>
|
||||
|
||||
This article originally appeared in the August 1996 issue of Significant
|
||||
Bits, the monthly magazine of the Brisbug PC User Group Inc.
|
||||
|
||||
<P>Last month you saw how DIRENTs (directory entries) are stored in
|
||||
4-sector structures known as DIRBLKs. These blocks have limited space
|
||||
available for entries. Due to the variable length of filenames (1-254
|
||||
characters), the maximum number of entries depends on the average filename
|
||||
length. If the average name length is in the 10-13 character range, a
|
||||
DIRBLK can hold up to 44 entries.
|
||||
|
||||
<P>When there are more files in a directory then can fit in a single
|
||||
DIRBLK, other DIRBLKs will be used and the connection between these blocks
|
||||
forms a structure known as a B-tree. Since there can be many elements
|
||||
(entries) in a node (DIRBLK), a HPFS B-tree has a quick "fan-out" and a
|
||||
low height (number of levels), ensuring fast entry location.
|
||||
|
||||
<P>This time, we'll take a long look at how a file's contents are
|
||||
logically stored under HPFS. To the best of my knowledge, this topic has
|
||||
not been well-covered in the scanty information available about HPFS. You
|
||||
will find it helpful to contrast the following file-sector allocation
|
||||
methods with last month's directory entry concepts.
|
||||
|
||||
<H2>Fragging a File</H2>
|
||||
|
||||
Since HPFS is inherently fragmentation-resistant, we have to twist its arm
|
||||
a little to produce fragmented files. The method I came up with first
|
||||
fills up an empty partition with a number of files created in an ascending
|
||||
name sequence. The next step deletes every second file. Finally, I create
|
||||
a file that is approximately one-half the partition's size. This file then
|
||||
has nowhere to go except into all the discontiguous regions previously
|
||||
occupied by the deleted file entries.
|
||||
|
||||
<P>This process takes some time with a large partition (100 MB) so I
|
||||
suggest you use a very small partition (1 MB). At first glance, you may
|
||||
think that if we fill up a 1 MB partition with say 100 files, then delete
|
||||
File1, File3, ... File99, and then create a 512K file, we will end up with
|
||||
a file with exactly 50 extents (fragments). This is not so, since each
|
||||
individual file occupies a FNODE sector as well as the sectors for the
|
||||
file itself, whereas a single fragmented file still has only 1 FNODE. So
|
||||
there is slightly more space available in each gap for an extent than
|
||||
there was for a file, and a 512K file will find more than 512K of space
|
||||
available and ends up occupying fewer gaps than expected and we end up
|
||||
with a smaller number of extents than was specified. For example, in the
|
||||
50-gap, 1 MB partition scenario we end up with 45 extents. There are also
|
||||
variations produced by things like the centrally located DIRBAND, the
|
||||
separate Root DIRBLK and multiple Databands to "fragment" the available
|
||||
freespace for very large files. So the number of gaps produced by deleting
|
||||
alternate files is only an rough approximation of the number of extents
|
||||
that will be produced.
|
||||
|
||||
<P>Figure 1 shows the MakeExtents.cmd REXX program. You specify the number
|
||||
of gaps you want to produce. For example, to originally produce 100 files
|
||||
on N:, delete half of them and leave 50 gaps, you would issue the command
|
||||
"MakeExtents N: 50".
|
||||
|
||||
<PRE>
|
||||
/* Produces a large, fragmented file */
|
||||
PARSE ARG numOfExts
|
||||
CALL RxFuncAdd 'SysLoadFuncs', 'RexxUtil', 'SysLoadFuncs'
|
||||
CALL SysLoadFuncs /* Load REXXUTIL.DLL external funcs */
|
||||
CALL SysCls
|
||||
EXIT /* Safety line. Delete this when you've adjusted the
|
||||
drive to suit your system. Formats the drive. */
|
||||
'echo y | format n: /l /fs:hpfs'
|
||||
SAY
|
||||
CALL SysMkDir 'n:\test' /* REXX MD. Faster than OS/2 MD */
|
||||
currentDir = Directory() /* Store current drive/directory */
|
||||
CALL Directory 'n:\test' /* Change to test dirve/directory*/
|
||||
/* Determine free space */
|
||||
PARSE VALUE SysDriveInfo('n:') WITH . free .
|
||||
|
||||
/* Determine size of each sequential file */
|
||||
fileSize = (free - (numOfExts*2*512)) % (numOfExts*2)
|
||||
secsInFile = fileSize % 512
|
||||
sectorFill = Copies('x',512) /* 512 bytes of 'x' char */
|
||||
Fill_20K = Copies(sectorFill,40) /* 20,480 bytes of 'x' */
|
||||
|
||||
/* Create string of the required length */
|
||||
CALL MakeFile secsInFile
|
||||
|
||||
DO i = 1 TO numOfExts*2 /* Produce the file sequence */
|
||||
CALL CreateFile /* Fixed-length filenames: File00001 */
|
||||
END i
|
||||
|
||||
DO i = 1 TO numOfExts*2 BY 2 /* Delete alternate files */
|
||||
CALL SysFileDelete 'n:\test\file'||Right("0000"||i,5)
|
||||
END i
|
||||
|
||||
PARSE VALUE SysDriveInfo('n:') WITH . free .
|
||||
|
||||
fragmentedFileSecs = ((free-512) % 512)-1
|
||||
CALL MakeFile fragmentedFileSecs
|
||||
|
||||
i='FRAGG' /* Fragmented filename: FileFRAGG */
|
||||
CALL CreateFile /* Create "FileFRAGG" */
|
||||
CALL Directory currentDir /* Return to original location */
|
||||
|
||||
EXIT /********************************************/
|
||||
|
||||
|
||||
MakeFile: PROCEDURE EXPOSE file sectorFill fill_20K
|
||||
ARG secs
|
||||
file = ''
|
||||
/* If final file is over 20K, speed up creation a little */
|
||||
IF secs>40 THEN
|
||||
file = Copies(fill_20K, secs%40)
|
||||
|
||||
file = file||Copies(sectorFill, secs//40)
|
||||
RETURN file
|
||||
|
||||
|
||||
CreateFile:
|
||||
CALL Charout 'n:\test\file'||Right("0000"||i,5),file,1
|
||||
CALL Stream 'n:\test\file'||Right("0000"||i,5),'C','CLOSE'
|
||||
RETURN
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 1: The MakeExtents.cmd program produces a fragmented file. When set up
|
||||
correctly, this program will wipe a partition.
|
||||
</FONT>
|
||||
|
||||
<H2>FNODEs, ALSECs, ALLEAFs and ALNODEs</H2>
|
||||
|
||||
Every file and directory on a HPFS partition has an associated FNODE,
|
||||
usually situated in the sector just before the file's first sector. The
|
||||
role of an FNODE is quite specific: to map the location of the file's
|
||||
extents (fragments) and any associated components, namely EAs (Extended
|
||||
Attributes - up to 64K of ancillary information) and ACLs (Access Control
|
||||
Lists - to do with LAN Manager).
|
||||
|
||||
<P>FNODEs and ALSECs (to be discussed shortly) contain a list of either
|
||||
ALLEAF or ALNODE entries. See Figure 2. An ALLEAF entry contains three
|
||||
dwords: logical sector offset (where the start of this run of sectors is
|
||||
within the total number of sectors in the file - the logical start sector
|
||||
is 0); run size in sectors; physical LSN (where the run starts in the
|
||||
partition). An ALLEAF entry is at the end of the B+tree. An ALNODE entry
|
||||
is an intermediate component in that it does not contain any extent
|
||||
information. Rather, it points to an ALSEC, and in turn the ALSEC can
|
||||
contain a list of either ALLEAFs (the end of the line) or ALNODEs (another
|
||||
descendant level in the B+tree).
|
||||
|
||||
<PRE>
|
||||
Offset Data Size Comment
|
||||
hex (dec) bytes
|
||||
|
||||
Header
|
||||
00h (1) Signature 4 0xF7E40AAE
|
||||
04h (5) Seq. Read History 4 Not implemented.
|
||||
08h (9) Fast Read History 4 Not Implemented.
|
||||
0Ch (13) Name Length 1 0-254.
|
||||
0Dh (14) Name 15 Last 15 chars. (Full name in DIRBLK.)
|
||||
1Ch (29) Container Dir LSN 4 FNODE of Dir that contains this one.
|
||||
20h (33) ACL Ext. Run Size 4 Secs in external ACL, if present.
|
||||
24h (37) ACL LSN 4 Location of external ACL run.
|
||||
28h (41) ACL Int. Size 2 Bytes in internal (inside FNODE) ACL.
|
||||
2Ah (43) ACL ALSEC Flag 1 >0 if ACL LSN points to an ALSEC.
|
||||
2Bh (44) History Bits Count 1 Not implemented.
|
||||
2Ch (45) EA Ext. Run Size 4
|
||||
30h (49) EA LSN 4
|
||||
34h (53) EA Int. Size 2
|
||||
36h (55) EA ALSEC Flag 1 >0 if EA LSN points to an ALSEC.
|
||||
37h (56) Dir Flag 1 Bit0 = 1 if dir FNODE, else file FNODE.
|
||||
38h (57) B+Tree Info Flag 1 0x20 (5) Parent is an FNODE, else ALSEC.
|
||||
0x80 (7) ALNODEs follow, else ALLEAFs.
|
||||
39h (58) Padding 3 Reestablish 32-bit alignment.
|
||||
3Ch (61) Free Entries 1 Number of free array entries.
|
||||
3Dh (62) Used Entries 1 Number of used array entries.
|
||||
3Eh (63) Free Ent. Offset 2 Offset to next free entry in array.
|
||||
|
||||
If ALLEAFs (Maximum of 8 in an FNODE)
|
||||
Extent #0
|
||||
40h (65) Logical LSN 4 Sec offset of this extent within file.
|
||||
The first extent has an offset of 0.
|
||||
44h (69) Run Size 4 Number of sectors in this extent.
|
||||
48h (73) Physical LSN 4 File: LSN of extent start.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #7
|
||||
94h (149) Logical LSN 4
|
||||
98h (153) Run Size 4
|
||||
9Ch (157) Physical LSN 4
|
||||
|
||||
If ALNODEs (Maximum of 12 in an FNODE)
|
||||
Extent #0
|
||||
40h (65) End Sector Count 4 Running total of secs mapped by this
|
||||
alnode. 1-based. If EOF is within this
|
||||
alnode then field contains 0xFFFFFFFF.
|
||||
44h (69) Physical LSN 4 File: LSN of ALSEC.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #11
|
||||
98h (153) End Sector Count 4
|
||||
9Ch (157) Physical LSN 4
|
||||
|
||||
Tail
|
||||
A0h (161) Valid File Length 4 Should be the same as File Size in DIRENT.
|
||||
A4h (165) "Needed" EAs Count 4 If any, EAs vital to the file's wellbeing.
|
||||
A8h (169) User ID 16 Not used.
|
||||
B8h (185) ACL/EA Offset 2 Offset in FNODE to first ACL, if present,
|
||||
otherwise offset to where EAs would be
|
||||
stored, if internalised.
|
||||
BAh (187) Spare 10 Unused.
|
||||
C4h (197) ACL/EA Storage 316 Only 145 bytes appear avaiable for EAs.
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 2: Layout of an FNODE. This component can contain either an array
|
||||
of ALNODE or ALLEAF entries.
|
||||
</FONT>
|
||||
|
||||
<P>Returning to the B-tree structure of DIRBLKs, you will remember that
|
||||
both intermediate and leaf components contain DIRENT data. So you may find
|
||||
the entry you're looking for in a node. This is not the case with a
|
||||
B+tree. Since an ALNODE can only point to an ALSEC, you must always
|
||||
proceed to the bottom of the tree, to a leaf, to retrieve extent
|
||||
information.
|
||||
|
||||
<P>An ALNODE entry only contains two dwords: a running total indicating
|
||||
the logical sector offset of the last sector in the ALSEC (i.e. how far we
|
||||
are through the file - this starts from 1); the physical LSN of where to
|
||||
find the ALSEC. The advantage of the smaller entry size of an ALNODE
|
||||
compared to an ALLEAF is that, in the same space, there can be more of
|
||||
them.
|
||||
|
||||
<P>An FNODE contains other data. One important piece of information is the
|
||||
last 15 characters of the filename. This comes in handy when we need to
|
||||
undelete. The last 316 bytes of the sector is also set aside for internal
|
||||
ACL/EAs (stored completely within the FNODE). In the Graham Utilities
|
||||
manual it is stated that up to 316 bytes of EAs can be stored within the
|
||||
FNODE but my experiments with OS/2 Warp v3 show that only up to 145 bytes
|
||||
of EAs can be internalised. Refer to Part 6 for further information.
|
||||
|
||||
<P>Figure 3 shows the structure of an ALSEC. You will notice that there is
|
||||
much more space in the sector devoted to ALNODE/ALSEC entries then is
|
||||
available in an FNODE sector (480 bytes compared to 96 bytes). This leads
|
||||
to the following maximum number of entries:
|
||||
|
||||
<PRE>
|
||||
ALLEAF ANODE
|
||||
FNODE 8 12
|
||||
ALSEC 40 60
|
||||
</PRE>
|
||||
|
||||
<PRE>
|
||||
Offset Data Size Comment
|
||||
hex (dec) bytes
|
||||
|
||||
Header
|
||||
00h (1) Signature 4 0x37E40AAE
|
||||
04h (5) This block's LSN 4 Helps when placing other blks nearby.
|
||||
08h (9) Parent's LSN 4 Points to either FNODE or another ALSEC.
|
||||
0Ch (13) Btree Flag 1 0x20 (5) Parent is an FNODE, else ALSEC.
|
||||
0x80 (7) ALNODEs follows, else ALLEAFs.
|
||||
0Dh (14) Padding 3 Reestablish dword alignment.
|
||||
10h (17) Free Entries 1 Number of free array entries.
|
||||
11h (18) Used Entries 1 Number of used array entries.
|
||||
12h (19) Free Ent. Offset 2 Offset to first free entry.
|
||||
|
||||
|
||||
If ALLEAFs (Maximum of 40 in an ALSEC)
|
||||
Extent #0
|
||||
14h (21) Logical LSN 4 Sec offset of this extent within file.
|
||||
Zero-based.
|
||||
18h (25) Run Size 4 Secs in this extent.
|
||||
1Ch (29) Physical LSN 4 File: LSN of extent start.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #39
|
||||
1E8h (489) Logical LSN 4
|
||||
1ECh (493) Run Size 4
|
||||
1F0h (497) Physical LSN 4
|
||||
|
||||
|
||||
If ALNODEs (Maximum of 60 in an ALSEC)
|
||||
Extent #0
|
||||
14h (21) End Sector Count 4 Running total of secs mapped by this
|
||||
alnode. 1-based. If EOF is within this
|
||||
alnode then field contains 0xFFFF.
|
||||
18h (25) Physical LSN 4 File: LSN of ALSEC.
|
||||
Dir: This B-tree's topmost DIRBLK LSN.
|
||||
...
|
||||
|
||||
Extent #59
|
||||
1ECh (493) End Sector Count 4
|
||||
1F0h (497) Physical LSN 4
|
||||
|
||||
|
||||
Tail
|
||||
1F4h (501) Padding 12 Unused.
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 3: The layout of an ALSEC. This component can contain either an
|
||||
array of ALNODE or ALLEAF entries.
|
||||
</FONT>
|
||||
|
||||
<H2>Some Examples</H2>
|
||||
|
||||
The main program this month, ShowExtents.cmd (to be discussed later),
|
||||
needs to know the LSN of the FNODE or ALSEC that you want to start with.
|
||||
It would be possible to design a version that accepted the full pathname
|
||||
of a file but it would be a larger program. For the purpose of
|
||||
comprehending these structures, the requirement of having to specify a LSN
|
||||
is acceptable. To determine the file's FNODE location use last month's
|
||||
ShowBtree.cmd. Figure 4 shows ShowBtree's output on a 1 MB partition after
|
||||
"MakeExtents 7" was issued. From the information reported in Figure 4 we
|
||||
will first examine the TEST directory's FNODE. Figure 5 shows the result
|
||||
of issuing "ShowExtents N: 1033". Since there is no information in the
|
||||
allocation array area of a directory FNODE (the 128 byte region commencing
|
||||
at decimal offset 65), ShowExtents is designed to terminate early in such
|
||||
a situation.
|
||||
|
||||
<PRE>
|
||||
Root Directory:
|
||||
1016-1019 Next Byte Free: 125 Topmost DirBlk
|
||||
This directory's FNODE: 1032 (\ [level 1]) 1016->1032
|
||||
**************************************************
|
||||
SD 21 #00: .. FNODE:1032
|
||||
D 57 #01: test FNODE:1033
|
||||
E 93 #02:
|
||||
|
||||
36-39 Next Byte Free: 409 Topmost DirBlk
|
||||
This directory's FNODE: 1033 (test [level 1]) 36->1033
|
||||
**************************************************
|
||||
SD 21 #00: .. FNODE:1033
|
||||
57 #01: file00002 FNODE:432
|
||||
97 #02: file00004 FNODE:664
|
||||
137 #03: file00006 FNODE:896
|
||||
177 #04: file00008 FNODE:1154
|
||||
217 #05: file00010 FNODE:1386
|
||||
257 #06: file00012 FNODE:1618
|
||||
297 #07: file00014 FNODE:1850
|
||||
337 #08: fileFRAGG FNODE:316
|
||||
E 377 #09:
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 4: Last month's program, ShowBtree.cmd, shows the LSN of
|
||||
FileFRAGG's FNODE.
|
||||
</FONT>
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 1033
|
||||
Signature: F7E40AAE
|
||||
Name Length: 4
|
||||
Name: test
|
||||
Container Dir LSN: 1032
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: Directory FNODE
|
||||
Topmost DIRBLK LSN: 36
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 5: ShowExtents' output when displaying the contents of a directory
|
||||
FNODE.
|
||||
</FONT>
|
||||
|
||||
<P>Next, we'll look at an FNODE with a full complement of 8 ALLEAF
|
||||
entries. On my system, this is produced when "MakeExtents 7" is issued.
|
||||
See Figure 6. The next free entry in the array of ALLEAF entries is at
|
||||
offset 104 dec. Since the start point for this offset is counted from 65
|
||||
dec, this means that the next entry would start at 169 dec. This is
|
||||
actually past the end of the available entry area, at the beginning of the
|
||||
tail region. This is another indication that the array is full. (The main
|
||||
indication is the "0" value in the Free Entries field.)
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 316
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALLEAFs follow
|
||||
Free Entries: 0
|
||||
Used Entries: 8
|
||||
Next Free Offset: 104
|
||||
Valid data size: 420352
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #0: 115 sectors starting at LSN 317 (file sec offset:0)
|
||||
Extent #1: 116 sectors starting at LSN 548 (file sec off:115)
|
||||
Extent #2: 116 sectors starting at LSN 780 (file sec off:231)
|
||||
Extent #3: 116 sectors starting at LSN 1038 (file sec off:347)
|
||||
Extent #4: 116 sectors starting at LSN 1270 (file sec off:463)
|
||||
Extent #5: 116 sectors starting at LSN 1502 (file sec off:579)
|
||||
Extent #6: 116 sectors starting at LSN 1734 (file sec off:695)
|
||||
Extent #7: 10 sectors starting at LSN 1966 (file sec off:811)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 6: A FNODE with a full ALLEAF array.
|
||||
</FONT>
|
||||
|
||||
<P>If we need to map any more extents we must switch from a FNODE (with
|
||||
ALLEAFs) structure to FNODE (with ALNODEs) -> ALSEC (with ALLEAFs). Figure
|
||||
7 shows the mapping of a 10-extent file ("MakeExtents 8"). The B+tree Info
|
||||
Flag tells us that the FNODE contains an array of ALNODEs. There is only
|
||||
one entry in this array. The End Sector Count value is not shown here but,
|
||||
in this example, you could easily check it out using Part 2's SEC.cmd
|
||||
("SEC N: 316") and then look at the four bytes at offset 40h (in the case
|
||||
of a single entry in the array). Since this is the sole entry, you will
|
||||
find FFFFFFFFh (appears to be the array End-of-Entries indicator) at this
|
||||
location.
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 316
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALNODEs follow
|
||||
Free Entries: 11
|
||||
Used Entries: 1
|
||||
Next Free Offset: 16
|
||||
Valid data size: 418304
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
FNODE Entry #0
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 933
|
||||
Parent's LSN: 316
|
||||
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
|
||||
Free Entries: 30
|
||||
Used Entries: 10
|
||||
Next Free Offset: 128
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #0: 101 sectors starting at LSN 317 (file sec off:0)
|
||||
Extent #1: 102 sectors starting at LSN 520 (file sec off:101)
|
||||
Extent #2: 102 sectors starting at LSN 724 (file sec off:203)
|
||||
Extent #3: 102 sectors starting at LSN 1158 (file sec off:305)
|
||||
Extent #4: 102 sectors starting at LSN 1362 (file sec off:407)
|
||||
Extent #5: 102 sectors starting at LSN 1566 (file sec off:509)
|
||||
Extent #6: 102 sectors starting at LSN 1770 (file sec off:611)
|
||||
Extent #7: 42 sectors starting at LSN 1974 (file sec off:713)
|
||||
Extent #8: 5 sectors starting at LSN 928 (file sec off:755)
|
||||
Extent #9: 57 sectors starting at LSN 934 (file sec off:760)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 7: A 10-extent file is mapped in a 1-level B+tree with a single
|
||||
ALSEC.
|
||||
</FONT>
|
||||
|
||||
<P>The next section in the display in Figure 7, labelled "FNODE Entry #0"
|
||||
shows us that the sole ALNODE entry points to LSN 933. Here we are seeing
|
||||
this ALSEC's layout. The B+tree Info Flag informs us that this ALSEC
|
||||
contains ALLEAF entries i.e. the actual mapping of the extents. Notice
|
||||
that we have 10 ALLEAF entries in the allocation array. Remember that an
|
||||
ALSEC has much more space available for array entries than an FNODE has,
|
||||
in that it can store up to 40 ALLEAF entries. You can verify this by
|
||||
adding the ALSEC's Free Entries and the Used Entries values together.
|
||||
|
||||
<P>When you try and map more than 40 extents you will exceed the capacity
|
||||
of a sole ALSEC. What happens in this case is that more ALNODE entries are
|
||||
created in the FNODE, each pointing to an ALSEC. Figure 8 shows a
|
||||
42-extent layout (produced with a parameter of "45").
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 316
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALNODEs follow
|
||||
Free Entries: 10
|
||||
Used Entries: 2
|
||||
Next Free Offset: 24
|
||||
Valid data size: 393192
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
FNODE Entry #0
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 588
|
||||
Parent's LSN: 316
|
||||
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
|
||||
Free Entries: 0
|
||||
Used Entries: 40
|
||||
Next Free Offset: 232
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #0: 16 sectors starting at LSN 317 (file sec off:0)
|
||||
...
|
||||
Extent #39: 17 sectors starting at LSN 1668 (file sec off:720)
|
||||
|
||||
FNODE Entry #1
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 996
|
||||
Parent's LSN: 316
|
||||
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
|
||||
Free Entries: 38
|
||||
Used Entries: 2
|
||||
Next Free Offset: 32
|
||||
|
||||
ALLEAF INFORMATION
|
||||
Extent #40: 17 sectors starting at LSN 1702 (file sec off:737)
|
||||
Extent #41: 14 sectors starting at LSN 1736 (file sec off:754)
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 8: 42 extents require a 1-level B+tree with 2 ALNODE entries in the
|
||||
FNODE pointing to 2 ALSECs.
|
||||
</FONT>
|
||||
|
||||
<P>There is space in an FNODE for 12 ALNODE entries. If each of these
|
||||
points to a full ALSEC (with ALLEAFs) i.e. 40-entries each, this two-level
|
||||
structure can accommodate 480 extents (parameter "564").
|
||||
|
||||
<P>Let's see what happens when we exceed this value. Figure 9 shows a
|
||||
482-extent layout ("565"). Interesting things have occurred. We now have a
|
||||
2-level B+tree structure. The FNODE ALNODE array has been adjusted to
|
||||
contain a sole entry. This in turn points to an ALSEC that has 13 ALNODE
|
||||
entries. Each of these ALNODE points to another ALSEC which contains
|
||||
ALLEAF entries. 12 of the ALSECs (with ALLEAFs) are full i.e. 12*40 while
|
||||
the 13th ALSEC (with ALLEAFs) only maps 2 extents.
|
||||
|
||||
<PRE>
|
||||
FNODE STRUCTURE
|
||||
LSN: 1000
|
||||
Signature: F7E40AAE
|
||||
Name Length: 9
|
||||
Name: fileFRAGG
|
||||
Container Dir LSN: 1033
|
||||
EA Ext. Run Size: 0
|
||||
EA LSN: 0
|
||||
EA Int. Size: 0
|
||||
EA ALSEC Flag: 0
|
||||
Dir Flag: File FNODE
|
||||
B+tree Info Flag: ALNODEs follow
|
||||
Free Entries: 11
|
||||
Used Entries: 1
|
||||
Next Free Offset: 16
|
||||
Valid data size: 524264
|
||||
"Needed" EAs: 0
|
||||
EA/ACL Int. Off: 0
|
||||
|
||||
FNODE Entry #0
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 1333
|
||||
Parent's LSN: 1000
|
||||
B+tree Info Flag: Parent was an FNODE; ALNODEs follow
|
||||
Free Entries: 47
|
||||
Used Entries: 13
|
||||
Next Free Offset: 112
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #0 situated at LSN 328 (file sec count:582)
|
||||
ALSEC STRUCTURE
|
||||
Signature: 37E40AAE
|
||||
This LSN: 328
|
||||
Parent's LSN: 1333
|
||||
B+tree Info Flag: ALLEAFs follow
|
||||
Free Entries: 0
|
||||
Used Entries: 40
|
||||
Next Free Offset: 232 ALLEAF INFORMATION Extent #0-#39
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #1 situated at LSN 394 (file sec count:622)
|
||||
ALSEC STRUCTURE 394 (40) ALLEAF INFORMATION Extent #40-#79
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #2 situated at LSN 476 (file sec count:662)
|
||||
ALSEC STRUCTURE 476 (40) ALLEAF INFORMATION Extent #80-#119
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #3 situated at LSN 558 (file sec count:702)
|
||||
ALSEC STRUCTURE 558 (40) ALLEAF INFORMATION Extent #120-#159
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #4 situated at LSN 640 (file sec count:742)
|
||||
ALSEC STRUCTURE 640 (40) ALLEAF INFORMATION Extent #160-#199
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #5 situated at LSN 722 (file sec count:782)
|
||||
ALSEC STRUCTURE 722 (40) ALLEAF INFORMATION Extent #200-#239
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #6 situated at LSN 804 (file sec count:822)
|
||||
ALSEC STRUCTURE 804 (40) ALLEAF INFORMATION Extent #240-#279
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #7 situated at LSN 886 (file sec count:862)
|
||||
ALSEC STRUCTURE 886 (40) ALLEAF INFORMATION Extent #280-#319
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #8 situated at LSN 968 (file sec count:902)
|
||||
ALSEC STRUCTURE 968 (40) ALLEAF INFORMATION Extent #320-#359
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #9 situated at LSN 1085 (file sec count:942)
|
||||
ALSEC STRUCTURE 1085 (40) ALLEAF INFORMATION Extent #360-#399
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #10 situated at LSN 1167 (file sec count:982)
|
||||
ALSEC STRUCTURE 1167 (40) ALLEAF INFORMATION Extent #400-#439
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #11 situated at LSN 1249 (file sec count:1022)
|
||||
ALSEC STRUCTURE 1249 (40) ALLEAF INFORMATION Extent #440-#479
|
||||
|
||||
ALNODE INFORMATION
|
||||
ALSEC Entry #12 situated at LSN 1331 (file sec count:At end)
|
||||
ALSEC STRUCTURE 1331 (2) ALLEAF INFORMATION Extent #480-#481
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 9: 482 extents are mapped by a 2-level B+tree with 1 ALNODE entry
|
||||
in the FNODE pointing to 1 ALSEC, which in turn points to 13 ALSECs.
|
||||
</FONT>
|
||||
|
||||
<P>If you look at FNODE Entry #0's Used & Free Entries values you can
|
||||
verify that, in an ALSEC, there can be a maximum of 60 ALNODEs. It would
|
||||
take 60*40 = 2,400 extents to fill this level up again. Going past this
|
||||
would require the presence of a second FNODE entry. Since we can have up
|
||||
to 12 ALNODE entries in an FNODE, this means we could map 12*60*40 =
|
||||
28,800 extents before the need to insert another intermediary ALSEC level
|
||||
would arise.
|
||||
|
||||
<P>On a 100 MB partition I produced a 3-level 44,413 extent structure
|
||||
("44500"). To put this discussion on B+tree fan-out in perspective, it
|
||||
should be remembered that, in the fragmentation analysis performed in Part
|
||||
3 on 20,800 files in 5 partitions, there were only 14 files with more than
|
||||
8 extents (i.e. requiring an ALSEC) and the largest number of extents
|
||||
reported was 30.
|
||||
|
||||
<H2>The ShowExtents Program</H2>
|
||||
|
||||
Figure 10 presents the ShowExtents.cmd REXX program. You will need to get
|
||||
SECTOR.DLL. The program first determines if the LSN you've specified
|
||||
belongs to an FNODE or ALSEC. (You can bypass the FNODE and commence the
|
||||
examination from an ALSEC.) Once it has determined this, the next most
|
||||
important consideration is: does the allocation array consist of ALLEAFs
|
||||
or ALNODEs? If it contains ALLEAFs we've reached the end of the tree and
|
||||
need only show the extents. If we are looking at an array of ALNODEs we
|
||||
need to recurse down each ALNODE entry, loading the ALSEC pointed to by
|
||||
the entry and then see whether it contains either ALLEAFs or ALNODEs. And
|
||||
so on...
|
||||
|
||||
<PRE>
|
||||
/*Shows the layout of FNODE and ALSECs. Requires SECTOR.DLL*/
|
||||
PARSE UPPER ARG drive lsn
|
||||
/* There must be at least two parms supplied */
|
||||
IF drive = '' | lsn = '' THEN CALL HELP
|
||||
/* Register external functions */
|
||||
CALL RxFuncAdd 'QDrive','sector','QDrive'
|
||||
CALL RxFuncAdd 'ReadSect','sector','ReadSect'
|
||||
alleafEntryCount = 0
|
||||
anodeEntryCount = 0
|
||||
SAY
|
||||
CALL MainRoutine
|
||||
EXIT /*****************EXECUTION ENDS HERE*****************/
|
||||
|
||||
|
||||
MainRoutine:
|
||||
PROCEDURE EXPOSE drive lsn alleafEntryCount anodeEntryCount
|
||||
usedEntries = 0
|
||||
sectorString = ReadSect(drive,lsn) /* Read in required sec */
|
||||
IF FourBytes2Hex(1) = 'F7E40AAE' THEN
|
||||
/* Is an FNODE */
|
||||
DO
|
||||
alSecIndicator = ''
|
||||
CALL DisplayFnode
|
||||
END
|
||||
ELSE
|
||||
/* Not an FNODE */
|
||||
DO
|
||||
IF FourBytes2Hex(1) = '37E40AAE' THEN
|
||||
/* Is an ALSEC */
|
||||
DO
|
||||
alSecIndicator = 'Y'
|
||||
CALL DisplayALSEC
|
||||
END
|
||||
ELSE
|
||||
/* Neither an FNODE or an ALSEC */
|
||||
DO
|
||||
SAY 'LSN' lsn 'is not an FNODE or ALSEC'
|
||||
EXIT
|
||||
END
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
DisplayFnode:
|
||||
SAY 'FNODE STRUCTURE'
|
||||
SAY 'LSN: ' lsn
|
||||
SAY 'Signature: ' FourBytes2Hex(1)
|
||||
SAY 'Name Length: ' Bytes2Dec(13,1)
|
||||
SAY 'Name: ' Substr(sectorString,14,Bytes2Dec(13,1))
|
||||
SAY 'Container Dir LSN:' Bytes2Dec(29,4)
|
||||
SAY 'EA Ext. Run Size: ' Bytes2Dec(45,4)
|
||||
SAY 'EA LSN: ' Bytes2Dec(49,4)
|
||||
SAY 'EA Int. Size: ' Bytes2Dec(53,2)
|
||||
SAY 'EA ALSEC Flag: ' Bytes2Dec(55,1)
|
||||
IF Bitand(Byte2Char(56),'1'x) = '1'x THEN
|
||||
dirFlag = 'Directory FNODE'
|
||||
ELSE
|
||||
dirFlag = 'File FNODE'
|
||||
|
||||
SAY 'Dir Flag: ' dirFlag
|
||||
IF dirFlag = 'Directory FNODE' THEN
|
||||
SAY 'Topmost DIRBLK LSN:'||Bytes2Dec(73,4)
|
||||
ELSE
|
||||
DO
|
||||
/* Is a file, so determine extents */
|
||||
CALL DetermineBtreeInfo 57
|
||||
SAY 'B+tree Info Flag: ' btreeInfo
|
||||
SAY 'Free Entries: ' Bytes2Dec(61,1)
|
||||
usedEntries = Bytes2Dec(62,1)
|
||||
SAY 'Used Entries: ' usedEntries
|
||||
SAY 'Next Free Offset: ' Bytes2Dec(63,2)
|
||||
SAY 'Valid data size: ' Bytes2Dec(161,4)
|
||||
SAY '"Needed" EAs: ' Bytes2Dec(165,4)
|
||||
SAY 'EA/ACL Int. Off: ' Bytes2Dec(169,4)
|
||||
CALL ShowALLEAF_or_ANODE
|
||||
END
|
||||
RETURN
|
||||
|
||||
FourBytes2Hex: /* Given offset, return Dword */
|
||||
ARG startPos
|
||||
rearranged = Reverse(Substr(sectorString,startPos,4))
|
||||
RETURN C2X(rearranged)
|
||||
|
||||
|
||||
Bytes2Dec:
|
||||
ARG startPos,numOfChars
|
||||
temp = Substr(sectorString,startPos,numOfChars)
|
||||
IF C2X(temp) = 'FFFFFFFF' THEN
|
||||
RETURN 'At the end'
|
||||
ELSE
|
||||
RETURN Format(C2D(Reverse(temp)),,0)
|
||||
|
||||
|
||||
Byte2Char:
|
||||
ARG startPos
|
||||
RETURN Substr(sectorString,startPos,1)
|
||||
|
||||
|
||||
DetermineBtreeInfo:
|
||||
ARG btreeByteOffset
|
||||
IF Bitand(Byte2Char(btreeByteOffset),'20'x) = '20'x THEN
|
||||
btreeInfo = 'Parent was an FNODE; '
|
||||
ELSE
|
||||
btreeInfo = ''
|
||||
|
||||
IF Bitand(Byte2Char(btreeByteOffset),'80'x) = '80'x THEN
|
||||
DO
|
||||
btreeInfo = btreeInfo||'ALNODEs follow'
|
||||
alNodeIndicator = 'Y'
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
btreeInfo = btreeInfo||'ALLEAFs follow'
|
||||
alNodeIndicator = 'N'
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
DisplayALSEC:
|
||||
SAY 'ALSEC STRUCTURE'
|
||||
alSecIndicator = 'Y'
|
||||
SAY 'Signature: ' FourBytes2Hex(1)
|
||||
lsn = Bytes2Dec(5,4)
|
||||
SAY 'This LSN: ' lsn
|
||||
SAY "Parent's LSN: " Bytes2Dec(9,4)
|
||||
CALL DetermineBtreeInfo 13
|
||||
SAY 'B+tree Info Flag: ' btreeInfo
|
||||
SAY 'Free Entries: ' Bytes2Dec(17,1)
|
||||
usedEntries = Bytes2Dec(18,1)
|
||||
SAY 'Used Entries: ' usedEntries
|
||||
SAY 'Next Free Offset: ' Bytes2Dec(19,1)
|
||||
CALL ShowALLEAF_or_ANODE
|
||||
RETURN
|
||||
|
||||
|
||||
ShowALLEAF_or_ANODE: PROCEDURE EXPOSE drive lsn sectorString,
|
||||
usedEntries alleafEntryCount anodeEntryCount entrySize,
|
||||
alsecIndicator alnodeIndicator
|
||||
IF alsecIndicator = 'Y' THEN
|
||||
entryOffset = 21
|
||||
ELSE
|
||||
entryOffset = 65
|
||||
|
||||
IF alnodeIndicator \= 'Y' THEN
|
||||
/* Is an ALLEAF */
|
||||
DO
|
||||
SAY
|
||||
IF usedEntries = 0 THEN
|
||||
DO
|
||||
SAY 'Zero-length file'
|
||||
EXIT
|
||||
END
|
||||
|
||||
SAY 'ALLEAF INFORMATION'
|
||||
entrySize = 12
|
||||
DO entry = alleafEntryCount TO alleafEntryCount+usedEntries-1
|
||||
fileSecOffset = Bytes2Dec(entryOffset,4)
|
||||
runSize = Bytes2Dec(entryOffset+4,4)
|
||||
physicalLSN = Bytes2Dec(entryOffset+8,4)
|
||||
SAY 'Extent #'||entry||':' runSize 'sectors starting
|
||||
at LSN' physicalLSN '(file sec offset:' ||fileSecOffset ||')'
|
||||
/* Wrapped long line */
|
||||
entryOffset = entryOffset+entrySize
|
||||
END entry
|
||||
|
||||
alleafEntryCount = entry
|
||||
END
|
||||
ELSE
|
||||
DO
|
||||
/* Is either an ALNODE in an ALSEC or in an FNODE */
|
||||
entrySize = 8
|
||||
IF alSecIndicator \= 'Y' THEN
|
||||
/* In an FNODE */
|
||||
DO entry = anodeEntryCount TO anodeEntryCount+usedEntries-1
|
||||
lsn = Bytes2Dec(entryOffset+4,4)
|
||||
SAY
|
||||
SAY 'FNODE Entry #' || entry
|
||||
CALL MainRoutine
|
||||
entryOffset = entryOffset+entrySize
|
||||
END entry
|
||||
ELSE
|
||||
DO
|
||||
/* In an ALSEC */
|
||||
listStart = 65
|
||||
sectorString = ReadSect(drive,lsn)
|
||||
DO entry = anodeEntryCount TO anodeEntryCount+usedEntries-1
|
||||
SAY
|
||||
SAY 'ALNODE INFORMATION'
|
||||
fileSecOffset = Bytes2Dec(entryOffset,4)
|
||||
lsn = Bytes2Dec(entryOffset+4,4)
|
||||
SAY 'ALSEC Entry #'||entry 'situated at LSN'
|
||||
lsn '(file sec count:'|| fileSecOffset ||')'
|
||||
/* Wrapped long line */
|
||||
CALL MainRoutine
|
||||
anodeEntryCount = entry
|
||||
entryOffset = entryOffset+entrySize
|
||||
END entry
|
||||
END
|
||||
END
|
||||
RETURN
|
||||
|
||||
|
||||
Help:
|
||||
SAY 'ShowExtents shows the extents mapped by a FNODE or ALSEC'
|
||||
SAY 'structure.'
|
||||
SAY
|
||||
SAY ' Usage: ShowExtents drive LSN_of_a_FNODE/ALSEC'
|
||||
SAY ' Example: ShowExtents C: 316'
|
||||
EXIT
|
||||
</PRE>
|
||||
|
||||
<FONT SIZE=2>
|
||||
Figure 10: The ShowExtents.cmd program.
|
||||
</FONT>
|
||||
|
||||
<H2>Counting Extents</H2>
|
||||
|
||||
It is handy to be able to report just the number of extents in a file.
|
||||
HPFS-EXT, in the Graham Utilities, can do this. It take a filename. It is
|
||||
available in the demo version of the GU's, "GULITE.xxx".
|
||||
|
||||
<P>The freeware FST (currently FST03F.xxx) does just about everything. You
|
||||
can specify either a filename ("FST INFO N: \TEST\FILEFRAGG" - note the
|
||||
space after the drive letter) or a LSN ("FST INFO N: 1000"). It will
|
||||
include the height of the B+tree and the total number of extents at the
|
||||
end of its display. Unfortunately, it displays a lot of other info, and
|
||||
sometimes you're only interesting in just the number of levels and
|
||||
extents.
|
||||
|
||||
<P>I cut down ShowExtents.cmd to produce CountExtents.cmd The design was
|
||||
not amenable to showing the height but it was a straightforward matter to
|
||||
show just the number of extents. I've not bothered to present it here
|
||||
since most readers will probably prefer to specify the filename. (The
|
||||
FNODE LSN keeps changing as you increase the number of extents so this
|
||||
makes it more difficult to use CountExtents.)
|
||||
|
||||
<H2>Conclusion</H2>
|
||||
|
||||
In this installment we have seen how a file's sectors are mapped by FNODEs
|
||||
and ALSECs. These file system components can contain either an array of
|
||||
ALNODE or ALLEAF entries. By following through to the ALLEAFs we can
|
||||
examine the mapping of extents.
|
||||
|
||||
<P>We have also seen how a B+tree is different from a B-tree. In a DIRBLK
|
||||
B-tree, DIRENT information can be found in a node entry. But in an ALSEC
|
||||
B+tree, extent information is not stored in node entries, only in the
|
||||
leaves. The filling of nodes in an ALSEC B+tree is also much more
|
||||
efficient than the utilisation of nodal space in a DIRENT's B-Tree.
|
||||
|
||||
<P>When the next installment is presented we'll look at Extended
|
||||
Attributes. While not specifically a HPFS topic, they are well integrated
|
||||
into the file system and will fit well into this series.
|
||||
BIN
study/sabre/os/files/FileSystems/HPFS/hpfs_11.gif
Normal file
|
After Width: | Height: | Size: 7.0 KiB |
53
study/sabre/os/files/FileSystems/HPFS/index.html
Normal file
@@ -0,0 +1,53 @@
|
||||
<html><head><title>Operating Systems: The HPFS Filesystem</title></head>
|
||||
<body BGCOLOR=#FFFFFF TEXT=#000000 LINK="#0000FF" VLINK="#0000FF" ALINK="#107010">
|
||||
|
||||
<center><font face=Verdana size=7><b>HPFS FileSystem</b></font></center>
|
||||
<hr><p>
|
||||
|
||||
This series of articles apparently originally appeared in now defunct OS2Zone (Their page should be at http://www.os2zone.aus.net) written by Dan Bridges. I ran across it during my journeys of the net, and put it up here... The "original" form is <a href="hpfs.zip">available here</a>. This is a six part series of articles on HPFS.<p>
|
||||
|
||||
<ul><DL>
|
||||
<DT><font size=+1><a href="hpfs0.html">Part #0 - Preface</a></font><br>
|
||||
<DD>This article is the initial "preface" article that explains the motivations behind the series.
|
||||
It also talks about the filesystem organization scheme used by the FAT filesystem... and briefly
|
||||
introduces HPFS.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs1.html">Part #1 - Introduction</a></font><br>
|
||||
<DD>This introductory article compares the FAT filesystem against the HPFS filesystem in terms that
|
||||
a user would understand. This talks about the practical differences, such as speed, size, and
|
||||
fragmentation.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs2.html">Part #2 - The SuperBlock and the SpareBlock</a></font><br>
|
||||
<DD>This article starts delving more deeply into HPFS' internal structures. Two REXX programs are
|
||||
presented that greatly assist in the search for information. It also briefly looks at some
|
||||
other HPFS-related programs. Finally, you will see the Big Picture when the major structures
|
||||
of a HPFS partition are shown. <p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs3.html">Part #3 - Fragmentation, Diskspace Bitmaps and Code Pages</a></font><br>
|
||||
<DD>This article looks at how HPFS knows which sectors are occupied and which ones are free.
|
||||
It examines the amount of file fragmentation on five HPFS volumes and also checks out the
|
||||
fragmentation of free space. A program is presented to show free runs and some other
|
||||
details. Finally, it briefly discusses Code Pages and looks at a program that displays
|
||||
their contents.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs4.html">Part #4 - B-Trees, DIRBLKs, and DIRENTs</a></font><br>
|
||||
<DD>The most basic structures in the HPFS are DIRBLKs, DIRENTs and FNODEs. This article examines
|
||||
DIRBLKs and DIRENTs, talks about the differences between binary trees and B-trees and shows
|
||||
how DIRBLKs are interconnected to facilitate quick access in a large directory (one of HPFS'
|
||||
strengths). To assist in this investigation, a program, ShowBtree.cmd, helps to visualise
|
||||
the layout of directory and file entries in a partition.<p>
|
||||
|
||||
<DT><font size=+1><a href="hpfs5.html">Part #5 - FNODEs, ALSECs and B+trees</a></font><br>
|
||||
<DD>This article takes a long look at how a file's contents are logically stored under HPFS.
|
||||
It is helpful to contrast the following file-sector allocation methods with last articles's
|
||||
directory entry concepts. It also talks about fragmentation and how HPFS deals with it.<p>
|
||||
|
||||
<DT><font size=+1>Part #6 - ?</font><br>
|
||||
<DD>This is as far as I can go... if anyone has any of the other articles that appeared in this
|
||||
series, please please send them my way...<p>
|
||||
|
||||
</DL></ul>
|
||||
|
||||
<p><hr><FONT SIZE = 4><TABLE ALIGN=RIGHT BORDER=0><TR><TD><center>
|
||||
Copyright © 1998 <i><a href="mailto:sabre@nondot.org">Chris Lattner</a></i><br>
|
||||
Last modified: Wednesday, 13-Sep-2000 14:10:50 CDT </center></TD></TR></TABLE>
|
||||