add directory study

This commit is contained in:
gohigh
2024-02-19 00:25:23 -05:00
parent b1306b38b1
commit f3774e2f8c
4001 changed files with 2285787 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

View File

@@ -0,0 +1,238 @@
<H1>Inside the High Performance File System</H1>
<H2>Part 0: Preface</H2>
Written by Dan Bridges
<H2>Introduction</H2>
<P>
I am not a programmer's backside but I am an enthusiast interested in
finding out more about HPFS. There is so little detailed information
available on HPFS that I think you will find this modest series
instructive. The REXX programs to be presented are functional but they
are not particularly pleasing in an aesthetic sense. However they do
ferret out information and will help you to understand what is going on.
I'm sure that a programming guru, once motivated, could come up with
superior versions. Hopefully they will. This installment originally
appeared at the OS2Zone web site (http://www.os2zone.aus.net).
<P>
I've been asked [by someone else. Ed.] to write a preface to this series.
Normally I prefer to write on little-covered topics whereas much of what I'm
going to discuss in this installment often appears in a cursory examination of
the HPFS. The trouble with most of what has been written about HPFS in books on
OS/2 is that the topic is never considered very deeply. After finishing working
your way through this series (still being written on a monthly basis, but
expected to occupy eight parts including this one) you will have a detailed
knowledge of the structures of the HPFS. Having said that, there is a place for
some initial information for readers who currently know very little about the
subject.
<P>
<H2>File Systems</H2>
<P>
A File System (FS) is a combination of hardware and software that
enables the storage and retrieval of information on removable (floppy
disk, tape, CD) and non-removable (HD) media. The File Allocation Table
FS (FAT) is used by DOS. It is also built into OS/2. Now FAT appeared
back in the days of DOS v1 in 1981 and was designed with a backward
glance to CP/M. A hierarchical directory structure arrived with DOS v2
to support the XT's 10 MB HD. OS/2 v1.x used straight FAT. OS/2 v2.x
and later provide "Super FAT". This uses the same layout of
information on the storage medium (e.g. a floppy written under OS/2 v2
can easily be read by a DOS system) but adds performance improvements to
the software used to transfer the data. Super FAT will be covered in
Part 1.
<P>
<H2>FAT</H2>
<P>
Figure 1 shows the layout of a FAT volume. There are two copies of the
FAT. These should be identical. This may seem like a safety feature
but it only works in the case of physical corruption (if a bad sector
develops in one of the sectors in a FAT, the other one is automatically
used instead) not for logical corruption. So if the FS gets confused
and the two copies are not the same there is no easy way to determine
which copy is still O K.
<P>
<IMG SRC="hpfs1.gif" WIDTH=498 HEIGHT=64>
<P>
<FONT SIZE=2>
Figure 1: The layout of a volume formatted with the FAT file system.
Note: this diagram is not to scale. The data area is quite large in
practice.
</FONT>
<P>
The root directory is made a fixed known size because the system files
are placed immediately after it. The known location for the initial
system files enables DOS or OS/2 to commence loading itself. (The boot
record, which loads first off, is small and only has enough space for
code to find the initial system files at a known location.) However
this design decision also limits the number of files that can be listed
in the root directory of a FAT volume.
<P>
Entries in the root directory and in subdirectories are not ordered so
searching for a particular file can take some time, particularly if
there are many files in a directory.
<P>
The FAT and the root directory are positioned at the beginning of the
volume (on a disk this is typically on the outside). These entries are
read often, particularly in a multitasking environment, requiring a lot
of relatively slow (in CPU terms) head movement.
<P>
<H2>How Files are Stored on a FAT Volume</H2>
<P>
Files are stored on a FAT volume using the FS' minimum allocation unit,
the cluster (1-64 sectors). A 32-byte directory entry only provides
sufficient space for a 8.3 filename, file attributes, last alteration
date/time, filesize and the starting cluster. See Figure 2.
<P>
<IMG SRC="hpfs2.gif" WIDTH=388 HEIGHT=209>
<P>
<FONT SIZE=2>
Figure 2: The layout of the 32 bytes in a directory entry in a FAT
system.
</FONT>
<P>
The corresponding initial cluster entry in the FAT then points to the
next FAT entry for the second cluster of the file (assuming that the
file was big enough) which in turn points to the next cluster and so on.
FAT entries can be 16-bit (max. FFFFh) or 12-bit (max. FFFh) in size,
with volumes less than 16 MB using the 12-bit scheme. FAT entries can
be of four types:
<UL>
<LI>Contain 0000h if the cluster is free (available);
<LI>Contain the number of the next cluster in the chain;
<LI>If this is the last cluster in the chain then the FAT entry will
consist of a character which signifies the end of the chain (EOF);
<LI>Another special character if the cluster of the disk is bad
(unreliable).
</UL>
<P>
The FAT FS is prone to fragmentation (i.e. a file's clusters are not in
one, contiguous chain) in a single-tasking environment because the FAT
is searched sequentially for the next free entry in the FAT when a file
is written, regardless of how much needs to be written. The situation
is even worse in a multitasking environment because you can have more
than one writing operation in progress at the same time. See Figures 3
and 4 for an example of a fragmented file under FAT.
<P>
<IMG WIDTH=391 HEIGHT=238 SRC="hpfs3.gif">
<P>
<FONT SIZE=2>
Figure 3: The layout of a contiguous file in the FAT.
</FONT>
<P>
<IMG WIDTH=458 HEIGHT=232 SRC="hpfs4.gif">
<P>
<FONT SIZE=2>
Figure 4: An example of a fragmented file under FAT in three pieces.
</FONT>
<P>
The FAT FS uses a singly-linked scheme i.e. the FAT entry points only
to the next cluster. If, for some reason, the chain is accidentally
broken (the next cluster value is corrupted) then there is no
information in the isolated next cluster to indicate what it was
previously connected to. So the FAT FS, while relatively simple, is
also rather vulnerable.
<P>
FAT was designed in the days of small disk size and today it really
shows its age. The maximum number of entries (clusters) in a 16-bit FAT
is just under 64K (due to technical reasons, the actual maximum is
65,518). Since we can't increase the number of clusters past this
limit, a large volume requires the use of large cluster sizes. So, for
example, a volume in the 1-2 GB range has 32 KB clusters. Now a cluster
is the minimum allocation unit so a 1 byte file on such a volume would
consume 32 KB of space, a 33 KB file would consume 64 KB and so on. A
rough assumption you can make is that, on average, half a cluster of
space is wasted per file. You can run CHKDSK on a FAT volume, note the
total number of files and also the allocation unit size and then
multiply these two figures together and divide the result by 2 to get
some idea of the wastage. The situation is quite different with HPFS as
you will see when you read Part 1.
<P>
Finally, FAT under OS/2 supports Extended Attributes (EAs - up to 64 KB
of extra information associated with a file), but since there is very
little extra space in a 32-byte directory entry it is only possible to
store a pointer into an external file with all EAs on a volume being
stored in this file ("EA DATA. SF"). In general it is fair to state
that EAs are tacked on to FAT. With HPFS the integration is much
better. If the EA is small enough HPFS stores it completely within the
file's FNODE (every file and directory has an FNODE). Otherwise EAs is
stored outside the file but closely associated with it and usually
situated physically close to the file for performance reasons. Some
users have occasionally reported crosslinking of EAs under FAT. This
can be quite a serious matter requiring reinstallation of the operating
system. I've not heard of this occurring under HPFS. Note that the
WorkPlace Shell relies heavily on EAs.
<P>
<H2>HPFS</H2>
<P>
HPFS is example of a class of file systems known as Installable File
Systems (IFS). Other types of IFS include CD support (CDFS), Network
File System (NFS), Toronto Virtual File System (TVFS - combines FS
elements of VM, namely CMS search path, with elements of UNIX, namely
symbolic link), EXT2-OS (read Linux EXT2FS partitions under OS/2) and
HPFS386 (with IBM LAN Server Advanced).
<P>
An IFS is installed at start-up time. The software to access the actual
device is specified as a device driver (usually BASEDEV=xxxxx.DMD/.ADD)
while a Dynamic Link Library (DLL) is load to control the format/layout
of the data (with IFS=xxxxx.IFS). OS/2 can run more than one IFS at a
time so you could, for example, copy from a CD to a HPFS volume in one
session while reading a floppy disk (FAT) in another session.
<P>
HPFS has many advantages over FAT: Long Filename (254 characters
including spaces); excellent performance when directories containing
many files; designed to be fault tolerant; fragmentation resistant;
space efficient with large partitions; works well in a multitasking
environment. These topics will be explored in the series.
<P>
<H2>REXX</H2>
<P>
One of the many benefits of using OS/2 is that it comes with REXX
(providing you install it - it requires very little extra space). REXX
is a surprisingly versatile and powerful scripting language and there
are oodles of REXX programs and add-ons available, much of it for free.
This series presents REXX programs that access HPFS structures and
decode their contents.
<P>
<H2>Conclusion</H2>
<P>
In this installment you have seen that the FAT FS has a number of
problems related to its ancient origins. HPFS comes from a fresh design
with one eye on likely advances in storage that would occur in the
foreseeable future and the other eye on obtaining good performance. In
the next installment we look at the many techniques HPFS uses to achieve
its better performance.
</BODY>
</HTML>

View File

@@ -0,0 +1,800 @@
<H1>Inside the High Performance File System</H1>
<H2>Part 1: Introduction</H2>
Written by Dan Bridges
<H2>Introduction</H2>
<P>
This article originally appeared in the February 1996 issue of
Significant Bits, the monthly magazine of the Brisbug PC User Group Inc.
<P>
It is sad to think that most OS/2 users are not using HPFS. The main
reason is that unless you own the commercial program Partition Magic,
switching to HPFS involves a destructive reformat and that most users
couldn't be bothered (at least initially). Another reason is user
ignorance of the numerous technical advantages of using HPFS.
<P>
This month we start a series that delves into the structures that make
up OS/2's HPFS. It is very difficult to get any public information on
it aside from what appeared in an article written by Ray Duncan in the
September '89 issue of Microsoft Systems Journal, Vol 4 No 5. I suspect
that the IBM-Microsoft marriage break-up that occurred in 1991 may have
caused an embargo on further HPFS information. I've been searching
books and the Internet for more than a year looking for information with
very little success. You usually end up finding a superficial
description without any detailed discussion of the internal layout of
its structures.
<P>
There are three commercial utilities that I've found very useful. SEDIT
from the GammaTech Utilities v3 is a wonder. It decodes quite a bit of
the information in HPFS' structures. HPFSINFO and HPFSVIEW from the
Graham Utilities are also good. HPFSINFO lists information gleaned from
HPFS' SuperBlock and SpareBlock sectors, while HPFSVIEW provides the
best visual display I've seen of the layout of a HPFS partition. You
can receive some information on a sector by clicking on it. HPFSVIEW is
also freely available in the demo version of the Graham Utilities,
GULITE.xxx. I've also written a REXX program to assist with
cross-referencing locations between SEDIT & HPFSVIEW and to provide a
convenient means of dumping a sector.
<P>
Probably the most useful program around at the moment is freeware,
FST03F.xxx (File System Tool) written by Eberhard Mattes. This provides
lots of information and comes with source. Even if you aren't a C
programmer (I'm not) you can learn much from its definition of
structures. Unfortunately I wrote the first three instalments without
seeing this information so that made the task more difficult.
<P>
In the early stages I've had to employ a very laborious process in an
attempt to learn more. I created the smallest OS/2 HPFS partition
possible (1 MB). Then I created/altered a file or directory and
compared the changes. Sometimes I knew where the changes would occur so
I could just compare the two sectors but often I ended up comparing two
1 MB image files looking for differences and then translated the location
in the image into C/H/S (a physical address in Cylinder/Head/Sector
format) or LSN (Logical Sector Number). While more information will be
presented in this series than I've seen in the public domain, there are
still things that I've been unable to decipher.
<P>
<H2>The Win95 Fizzer</H2>
<P>
For me, the most disappointing feature of Win 95 is the preservation of
the FAT (File Allocation Table) system. It's now known as VFAT but
aside from integrated 32-bit file and disk access, the structure on the
disk is basically the same as DOS v4 (circa 1988). An ungainly method
involving the volume label file attribute was used to graft long
filename support onto the file system. These engineering compromises
were made to most easily achieve backward compatibility. It's a pity
because Microsoft has an excellent file system available in NT, namely
NTFS. This file system is very robust although perhaps NTFS is overkill
for the small user.
<P>
The Program Manager graphical user interface (GUI) appeared in OS/2 v1.1
in 1988. The sophisticated High-Performance File System came with OS/2
v1.2 which was released way back in 1989! The powerful REXX scripting
language showed up in OS/2 v1.3 (1991). And the largely
object-orientated WPS (Work Place Shell) GUI appeared in 1992 in OS/2
v2.0. So it is hardly surprising that experienced OS/2 users were not
swept up in the general hysteria about Windows 95 being the latest and
greatest.
<P>
A positive aspect of the Win 95 craze has been that the minimum system
requirement of 8 MB RAM, 486/33 makes a good platform for OS/2 Warp. So
now the disgruntled Win 95 user will find switching OSs less daunting,
at least from a hardware viewpoint.
<P>
<H2>Dual Boot and Boot Manager</H2>
<P>
I've never used Dual Boot because it seems so limiting. I've always
reformatted and installed Boot manager so that I could select from up to
four different Operating Systems, for example OS/2 v2.1, OS/2 Warp
Connect (peer-to-peer networking with TCP/IP and Internet support), IBM
DOS v7 and Linux.
<P>
In previous OS/2 installations, I've left a small (50 MB) FAT partition
that could be seen when I booted under either DOS or OS/2, while the
rest of the HD space (910 MB) was formatted as HPFS. Recently I
upgraded to Warp Connect and this time I dropped FAT and the separate
DOS boot partition completely. This does not mean I am unable to run
DOS programs. OS/2 has inbuilt IBM DOS v5 and you can install boot
images of other versions of DOS, or even CP/M, for near instantaneous
booting of these versions. There is no reason why you can't have
multiple flavours of DOS running at the same time as you're running
multiple OS/2 sessions. Furthermore DOS programs have no problems
reading from, writing to or running programs on HPFS partitions even
though the layout is nothing like FAT. It's all handled transparently
by OS/2. But this does mean you have to have booted OS/2 first. HPFS
is not visible if you use either Dual Boot or Boot Manager to boot
directly to DOS, but there are a number of shareware programs around to
allow read-access to HPFS drives from DOS.
<P>
DOS uses the system BIOS to access the hard disk. This is limited to
dealing with a HD that has no more than 1,024 cylinders due to 10 bits
(2^10 = 1,024) being used in the BIOS for cylinder numbering. OS/2 uses
the system BIOS at boot time but then completely replaces it in memory
with a special Advanced BIOS. This means that the boot partition and,
if you use it, Boot Manager's 1 MB partition, must be within the first
1,024 cylinders. Once you've booted OS/2, however, you can access
partitions on cylinders past the Cyl 1023 point (counting from zero)
without having to worry about LBA (Logical Block Addressing) translation
schemes.
<P>
Now this can still catch you out if you boot DOS. On my old system I'd
sometimes use Boot Manager to boot a native DOS version. I'd load AMOS
(a shareware program) to see the HPFS drives. I thought there must have
been a bug in AMOS because I could only see half of F: and none of G:
until I realised that these partitions were situated on a third HD that
had 1,335 cylinders. So this was just the effect of DOS' 1,024 cylinder
limitation which the AMOS program was unable to circumvent.
<P>
<H2>Differences between an Easy and an Advanced Installation</H2>
<P>
Most new OS/2 users select the "Easy Installation" option. This is
satisfactory but it only utilises FAT, installs OS/2 on the same drive
as DOS and Windows, does not reformat the partition and Dual Boot is
installed.
<P>
If you know what you're doing or are more aggressive in wanting to take
advantage of what OS/2 can provide then the "Advanced Installation"
option is for you. Selecting it enables you to selectively install
parts of OS/2, install OS/2 in a primary or logical (extended) partition
other than C: or even on a 2nd HD (I don't know whether you can install
on higher physical drives than the 2nd one in a SCSI multi-drive setup);
the option of installing Boot Manager is provided; you can use HPFS if
you wish; installation can occur on a blank HD.
<P>
<H2>FAT vs HPFS: If Something Goes Wrong</H2>
<P>
CHKDSK on a HPFS partition can recover from much more severe faults than
it can on a FAT system. This is because the cluster linkages in a FAT
system are one-way, pointing to the next cluster in the chain. If the
link is broken it is usually impossible to work out where the lost
clusters ("x lost clusters in y chains") should be reattached. Often
they are just artifacts of a program's use of temporary files that
haven't been cleaned up properly. But "file truncated" and
"cross-linked files" messages are usually an indication of more serious
FAT problems.
<P>
HPFS uses double linking: the allocation block of a directory or file
points back to its predecessor ("parent") as well as to the next element
("child"). Moreover, major structures contain dword (32-bit) signatures
identifying their role and each file/directory's FNODE contains the
first 15 characters of its name. So blind scanning can be performed by
CHKDSK or other utilities to rebuild much of the system after a
significant problem.
<P>
As a personal comment, I've been using HPFS since April, 1993, and I've
yet to experience any serious file system problems. I've had many OS/2
lockups while downloading with a DOS comms program and until recently
I was running a 4 MB hardware disk cache with delayed writes, yet,
aside from the lost download file, the file system has not been
permanently corrupted.
<P>
<H2>Warp, FORMAT /FS:HPFS, CHKDSK /F:3 and The Lazarus Effect</H2>
<P>
Warp, by default, does a quick format when you format a HD under either
FAT or HPFS. So FORMAT /FS:HPFS x:, which is what the installation
program performs if you decide to format the disk with HPFS, is
performed very quickly. It's almost instantaneous if you decide to
reformat with FAT (/FS:FAT). Now this speed differential does not mean
that FAT is much quicker, only that FORMAT has very little work to
perform during a quick FAT reformat since the FAT structures are so
simple compared to HPFS.
<P>
As mentioned earlier, CHKDSK has extended recovery abilities when
dealing with HPFS. It has four levels of /F:n checking/recovery. These
will be considered in greater detail in a later article in this series
when we look at fault tolerance. The default of CHKDSK /F is equivalent
to using /F:2. If you decide to use /F:3 then CHKDSK will dig deep and
recover information that existed on the partition prior to the
reformatting providing that it was previously formatted as HPFS. Using
CHKDSK /F:3 after performing a quick format on a partition that was
previously FAT but is now HPFS will not cause this, since none of the
previous data has HPFS signature words embedded at the beginning of its
sectors. However, if you ever use /F:3 after quickly reformatting a
HPFS partition you could end up with a bit of a mess since everything
would be recovered that existed on the old partition and which hadn't
been overwritten by the current contents.
<P>
To guard against this, OS/2 stores whether or not a quick format has
been performed on a HPFS partition in bit 5 (counting from zero) of byte
08h in LSN (Logical Sector Number) 17, the SpareBlock sector. This
particular byte is known as the Partition Status byte, with 20h
indicating that a quick format was performed. Bit 0 of this byte is
also used to indicate whether the partition is "clean" or "dirty" so 21h
indicates that the partition was quick formatted and is currently
"dirty" (these concepts will be covered in a later instalment).
<P>
If you attempt to perform a CHKDSK /F:3 on a quick-formatted partition,
you will receive the following warning:
<PRE>
SYS0641: Using CHKDSK /F:3 on this drive may cause files that existed
before the last FORMAT to be recovered. Proceed with CHKDSK (Y/N)?
</PRE>
<P>
If you type "HELP 641" for further information you'll see:
<PRE>
EXPLANATION: The target drive was formatted in "fast format" mode,
which does not erase all data areas. CHKDSK /F:3 searches data areas
for "lost" files. If a file existed on this drive before the last
format, CHKDSK may find it, and attempt to recover it.
</PRE>
<P>
ACTION: Use CHKDSK /F:2 to check this drive. If you use /F:3, be aware
that files recovered to the FOUND directories may be old files. Also,
if you format a drive using FORMAT /L, FORMAT will completely erase all
old files, and avoid this warning.
<P>
It seems a pity to forego the power of the CHKDSK /F:3 in the future.
As is suggested, FORMAT /L (for "Long" I presume) will completely
obliterate the prior partition's contents, but you can't specify this
during a reinstall. To perform it you need to use FORMAT /L on the
partition before reinstalling. For this to be practical you will
probably need to keep OS/2 and nothing else on a separate partition and
to have a recent tape backup of the remaining volumes' contents. Note:
in my opinion keeping OS/2 on a separate partition is the best way of
laying out a system but make sure you leave enough room for things like
extra postscript fonts and programs that insist on putting things on C:.
<P>
<H2>Capacity</H2>
<P>
Figure 1 shows a table comparing the capacity of OS/2's FAT and HPFS
file systems. The difference in the logical drive numbers arises due to
A: and B: being assigned to floppies which are always FAT. It would
be ridiculous to put a complex, relatively large file system, which was
designed to overcome FAT's limitations with big partitions, on volumes
as small as current FDs.
<PRE>
FAT HPFS
Logical drives 26 24
Num of Partitions 16 16
Max Partition Size 2 GB 64 GB
Max File Size 2 GB 2 GB
Sector Size 512 bytes 512 bytes
Cluster/Block Size 0.5 KB-32 K 512 bytes
</PRE>
<FONT SIZE=2>
Fig.1 Comparing the capacity of FAT and HPFS
</FONT>
<P>
The next point of interest is the much greater partition size supported by HPFS.
HPFS has a maximum possible partition size of about 2,200 GB (2^21 sectors) but
is restricted in the current implementation to 64 GB. (Note: older references
state that the maximum is 512 GB.) I don't know what imposes this limitation.
Note: the effective limitation on partition size is currently around 8 GB.
This is due to CHKDSK's inability to handle a larger partition. I presume this
limitation will be rectified soon as ultra large HDs will become common in the
next year or two.
<P>
The 2 GB maximum filesize limit is common to DOS, OS/2 and 32-bit Unix. A
32-bit file size should be able to span a range of 4 GB (2^32) but the
DosSetFilePtr API function requires that the highest bit be used for indicating
sign (forward or backward direction of movement), leaving 31 for size.
<P>
The cluster size on a 1.4 MB FD is 512 bytes. For a 100 MB HD formatted
with FAT it is 2 KB. Due to the relatively small 64K (2^16) limit on
cluster numbering, as FAT partitions get bigger the size of clusters
must also increase. So for a 1-2 GB partition you end up with whopping
32 KB clusters. Since the average wastage of HD space due to the
cluster size is half a cluster per file, storing 10,000 files on such a
partition will typically waste 160 MB (10,000 * 32 KB / 2).
<P>
HPFS has no such limitation. File space is allocated in sector-sized
blocks unlike the FAT system. A FNODE sector is also always associated
with each file. So for 10,000 files, the wastage due to sector size is
typically 2.5 MB (10,000 * 512 / 2) for the files themselves + 5 MB
consumed by the file's FNODEs = 7.5 MB. And this overhead is constant
whether the HPFS partition is 10 MB or 100 GB.
<P>
This must be balanced against the diskspace consumed by HPFS. Since
HPFS is a sophisticated file system that is designed to accomplish a lot
more than FAT, it correspondingly requires more diskspace than FAT.
Figure 2 illustrates this. You may think that 10 MB for the file system
is too much for a 1,000 MB partition but you should consider this as a
percentage.
<PRE>
System Usage including Disk Space available Allocation Unit
MBR track to user + Fnode for HPFS
FAT/HPFS in KB FAT/HPFS in % FAT/HPFS in KB
10 MB 44/415 99.57/95.95 4/0.5+0.5
100 MB 76/3,195 99.77/96.88 2/0.5+0.5
1000 MB 289(est)/10,430 99.98(est)/98.98 16/0.5+0.5
</PRE>
<FONT SIZE=2>
Fig. 2: Space used by FAT and HPFS on different volumes
</FONT>
<P>
Furthermore, once cluster size wastage is also considered, then the
break-even point (as regards diskspace) for a 1,000 MB partition is
about 2,200 files which isn't very many files. This is based on a 16 KB
cluster size. In the 1,024-2,047 MB partition size range the cluster
size increases to 32 KB so the "crossover" point shifts to only 1,100
files.
<P>
I had to calculate the 1,000 MB FAT partition values since OS/2 wouldn't
let me have a FAT partition situated in the greater than Cyl 1023
region. The 4 KB cluster size of the 10 MB partition is not a misprint.
Below 16 MB, a 12-bit FAT scheme (1.5 bytes in the FAT representing 1
cluster) is used instead of a 16-bit one.
<P>
<H2>Directory Search Speed</H2>
<P>
Consider an extreme case: FAT system on a full partition which has a
maximum-sized FAT (64K entries - this is the maximum number of files a
FAT disk can hold). The size of such a partition would be 128 MB, 256
MB, 512 MB, 1 GB or 2 GB, depending on cluster size. Each FAT is 128 KB
in size. (There is a second FAT which mirrors the first.) In this
example all the files are in one subdirectory. This can't be in the
root directory because it only has space for 512 entries. (With HPFS
you can have as many files as you want in the root directory.) 64 K of
entries in a FAT directory requires 2 MB of diskspace (64K * 32
bytes/directory entry). To find a file, on average, 32 K directory
entries would need to be searched. To say that a file was not on the
disk, the full 64 K entries must be scanned before the "File not found"
message was shown. The same figures would apply in you were using a
file-finding utility to look for a file in 1,024 directories, each
containing 63 files (the subdirectory entry also consumes space).
<P>
If the directory entries were always sorted, the situation would greatly
improve. Assuming you had a quick means of getting to the file in the
sorted sequence, if it's the file you're looking for then you've found
its directory entry (and thus its starting cluster's address). If a
file greater in the sequence than the required file is found instead
then you immediately know that the file does not exist.
<P>
HPFS stores directory files in a balanced multi-branch tree structure
(B-tree) which is always sorted due to the way the branches are
assigned. This can lead to some extra HD activity, caused by adjustment
of the tree structure, when a new file is added or a file is renamed.
This is done to keep the tree balanced i.e. the total length of each
branch from the root to the leaves is the same. The extra work when
writing to the disk is hidden from the user by the use of "lazy writes"
(delayed write caching).
<P>
HPFS directory entries are stored in contiguous directory blocks of four
sectors i.e. 2 KB known as DIRBLKs. A lot of information is stored in
each variable-length (unlike FAT) file entry in a DIRBLK structure,
namely:
<UL>
<LI>The length of the entry;
<LI>File attributes;
<LI>A pointer to the HPFS structure (FNODE; usually just before the
first sector of a file) that describes the sector disposition of the
file;
<LI>Three different date/time stamps (Created, Last Accessed, Last
Modified);
<LI>Usage count. Although mentioned in the 1989 document, this has not
have been implemented;
<LI>The length of the name (up to 254 characters);
<LI>A B-tree pointer to the next level of the tree structure if there
are any further levels. The pointer will be to another directory
block if the directory entries are too numerous to fit in one 2 KB
block;
</UL>
<P>
At the end of the sector there is extra ("flex") space available for
special purposes.
<P>
If the average size of the filenames is 10-13 characters, then a
directory block can store 44 of them (11 entries/sector). A two-level
B-tree arrangement can store 1,980 entries (1 * 44-entry directory root
block + 44 directory leaf blocks * 44 entries/block) while a three-level
structure could accommodate 87,164 files (the number of files in the
two-level tree + 1,936 third-level directory leaf blocks * 44
entries/block). So the 64 K of directory entries in our example can be
searched in a maximum of 3 "hits" (disk accesses). The term "maximum"
was used because it depends on what level the filename in question is
stored in the B-tree structure and what's in the disk cache.
<P>
Adding files to a directory containing many files (say 500+) under FAT
becomes an exasperating affair. I've often experienced this because a
DOS program we've installed on hundreds of our customer's machines has
648 files in a sub-sub-subdirectory. Watching the archive unpack on a
machine without disk caching is bad news and it still slows down
noticeably on machines with large SMARTDRIVE caches.
<P>
Figure 3 shows a simple REXX program you can create to investigate this
phenomenon while Figure 4 tables some results. The program creates a
large number of zero-length files in a directory. Perform this test in
a subdirectory to overcome FAT's restriction on a maximum of 512 entries
in the root directory. Reformating and rebooting was performed before
each test to ensure consistent conditions. With both FAT and HPFS, a
1,536 KB lazy-writing cache with a maximum cacheable read/write size of
8 KB was used. Note 1: with HPFS, a "zero-length" file consumes
diskspace because there is always a FNODE sector associated with a
file/directory, regardless of the file's contents. So 1,000 empty files
consume 500 KB of space. Note 2: there is a timing slop of about 0.1
seconds due to the 55 msec timer tick uncertainty affecting both the
start time and stop time values.
<PRE>
/* Create or open a large number of empty files in a directory */
CALL Time 'R' /* Reset timer */
DO x = 1 TO 1000
CALL STREAM 'file'||x, 'c', 'open' /* Will create if not exist */
CALL STREAM 'file'||x, 'c', 'close'
END
SAY Time('E') /* Report elapsed time */
</PRE>
<FONT SIZE=2>
Fig 3: A REXX program to assess the directory searching and file
creation speeds of FAT and HPFS.
</FONT>
<PRE>
Number of Files in a Directory
125 250 500 1000 2000 4000 4001
->4100
FAT 1.7 3.4 8.0 23.4 99.4 468.4 26.6
FAT (LW) 0.7 1.7 5.1 17.9 89.6 447.3 26.1
HPFS 7.4 14.7 30.7 62.9 129.0 262.6 7.5
HPFS (LW) 0.5 1.0 2.2 4.5 9.0 18.3 0.5
</PRE>
<FONT SIZE=2>
Fig 4: Timing results of the program in Figure 3. The beneficial effect
of lazy writing on performance is clearly demonstrated. Tests were
performed in an initially empty subdirectory except for the last one
which adds 100 new files to a subdirectory already containing 4,000
files.
</FONT>
<P>
To investigate further, the full data set was plotted on a graph with
logarithmic axes. Examine Figure 5. As you can see, HPFS' performance
is reasonably linear (in y = a*x^b + c, b was actually 1.1) while FAT's
performance appears to follow a third-order polynomial (y = a*x^3 +
b*x^2 + c*x + d). It is apparent that FAT's write caching becomes less
effective when many files are in a directory presumably because much
time is being spent sifting through the FAT in memory. (Disk access was
only occurring briefly about once a second based on the flashing of the
HD light). HPFS' performance was dramatically improved in this test by
the use of write caching. Again, disk access was about once a second
(due to CACHE's /MAXAGE:1000 parameter). While, typically, most disk
access will involve reading rather than writing, this graph shows how
effective lazy writing is at hiding the extra work from the user. It is
also apparent that HPFS handles large numbers of files well. We now
turn to examining how this improvement is achieved.
<P>
<A HREF="fig5.gif">
<IMG WIDTH=100 HEIGHT=57 SRC="fig5_small.gif"></A>
<P>
<FONT SIZE=2>
Fig. 5: Log-log graph comparing file system performance creating test
files in a subdirectory. Extra data points shown. Number of files was
increased using a cube-root-of-2 multiple. (Click for large version.)
</FONT>
<P>
<H2>Directory Location and Fragmentation</H2>
<P>
Subdirectories on a FAT disk are usually splattered all around it.
Similarly, entries in a subdirectory may not all be in contiguous
sectors on the disk. Searching a FAT system's directory structure can
involve a large amount of HD seeking back and forth, i.e. more time.
Sure, you can use a defragger option to move all the directories to the
front of the disk, but this usually takes a lot of time to reshuffle
everything and the next time you create a new subdirectory or add files
to an existing subdirectory there will be no free space up the front so
directory separation and fragmentation will occur again.
<P>
HPFS takes a much better approach. On typical partitions (i.e. not
very small ones) a directory band, containing many DIRBLKs, is placed at
or near the seek centre (half the maximum cylinder number). On a 100 MB
test partition the directory band starts at Cyl 48 (counting from 0) of
a volume that spans 100 cylinders. Here 1,980 contiguous Directory
sectors (just under 1 MB) were situated. Assuming 11 entries per
Directory sector (44 entries per DIRBLK), this means that the first
21,780 directory entries will be next to each other. So if a blind file
search needs to be performed this can be done with just 1 or 2 long disk
reads (assuming &lt;20,000 files and 1-2 MB disk cache). The maximum
size of the contiguous directory band appears to be 8,000 KB for about
176,000 entries with 13-character names. Once the directory band is
completely full new Directory sectors are scattered throughout the
partition but still in four-sector DIRBLKs.
<P>
Another important aspect of HPFS' directory band is its location. By
being situated near the seek centre rather than at the very beginning
(as in FAT), the average distance that the heads must traverse, when
moving between files and directories, is halved. Of course, with lazy
writing, traversals to frequently update a directory entry while writing
to a temporary file, would be much reduced anyway.
<P>
<H2>File Location and Fragmentation</H2>
<P>
HPFS expends a lot of effort to keep a file either in one piece if
possible or otherwise within a minimum number of pieces and close
together on the disk so it can be retrieved in the minimum number of
reads (remembering also that cache read-ahead can take in more than one
nearby piece in the same read). Also, the seek distance, and hence time
required to access extra pieces, is kept to an absolute minimum. The
main design philosophy of HPFS is that mechanical head movement is a
very time-consuming operation in CPU terms. So it is worthwhile doing
more work looking for a good spot on the disk to place the file. There
are many aspects to this and I'm sure there are plenty of nuances of
which I'm ignorant.
<P>
Files are stored in 8 MB contiguous runs of sectors known as data bands.
Each data band has a four-sector (2 KB) freespace bitmap situated at
either the band's beginning or end. Consecutive data bands have
tail-to-head placement of the freespace bitmaps so that maximum
contiguous filespace is 16 MB (actually 16,380 KB due to the presence of
the bitmaps within the adjoining band). See Figure 6.
<P>
<IMG WIDTH=403 HEIGHT=213 SRC="fig6.gif">
<P>
<FONT SIZE=2>
Fig. 6: The basic data layout of an HPFS volume
</FONT>
<P>
Near the start of the partition there is a list of the sectors where
each of the freespace bitmaps commences. I'm sure that this small list
would be kept loaded into memory for performance reasons. Having two
small back-to-back bitmaps adjoining a combined 16 MB data band is
advantageous when HPFS is looking for the size of each freespace region
within bands, prior to allocating a large file. But it does mean that a
fair number of seeks to different bitmaps might need to be performed on
a well-filled disk, in search of a contiguous space. Or perhaps these
bitmaps are also kept memory resident if the disk is not too big.
<P>
A 2 GB file would be split into approximately 128 chunks of 16 MB, but
these chunks are right after each other (allowing for the presence of
the intervening 4 KB of back-to-back freespace bitmaps). So to refer to
this file as "fragmented", while technically correct, would be
misleading.
<P>
As mentioned earlier, every file has an associated FNODE, usually right
before the start of the file. The number of pieces a file is stored in
are referred to as extents. A "zero-length" file has 0 extents; a
contiguous file has 1 extent; a file of 2-8 extents is "nearly"
contiguous (the extents should be close together).
<P>
An FNODE sector contains:
<UL>
<LI>The real filename length;
<LI>The first 15 characters of the filename;
<LI>Pointer to the directory LSN that contains this file;
<LI>EAs (Extended Attributes) are completely stored within the FNODE
structure if the total of the EAs is 145 bytes or less;
<LI>0-8 contiguous sector runs (extents), organised as eight LSN
run-starting-points (dword), run lengths (dword) and offsets into
the file (dword).
</UL>
<P>
A run can be up to 16 MB (back-to-back data bands) in size. If the file
is too big or more fragmented than can be described in 8 extents, then
an ALNODE (allocation block) is pointed to from the FNODE. In this case
the FNODE structure changes so that it now contains up to 12 ALNODE
pointers within the FNODE and each ALNODE can then point to either 40
direct sector runs (extents) or to 60 further ALNODEs, and each of these
lower-level ALNODEs could point to either... and so on.
<P>
If ALNODEs are involved then a modified balanced tree structure called a
B+tree is used with the file's FNODE forming the root of the structure.
So only a two-level B+tree would be required to completely describe a 2
GB (or smaller) file if it consists of less than 480 runs (12 ALNODEs *
40 direct runs described in each ALNODE). Otherwise a 3-level structure
would have no problems since it can handle up to 28,800 runs (12 ALNODEs
* 60 further ALNODEs * 40 direct runs). It's difficult to imagine a
situation where a four or higher level B+tree would ever be needed.
<P>
Consider how much disk activity would be required to work out the layout
of a 2 GB file under FAT and under HPFS. With FAT the full 128 KB of
the FAT must be read to determine the file's layout. If this layout can
be kept in the cache during the file access then fine. Otherwise the
FAT would need to be reread one or more times (probably starting from
the beginning on each reread). With HPFS, up to 361 sector reads, in a
three-level B+tree structure, and possibly up to just 13 sector reads,
in a two-level structure, would provide the information. The HPFS
figures are maximums and the actual sector-read figure would most
probably be much lower since HPFS was trying hard to reduce the number
of runs when the file was written. Also if the ALNODEs are near each
other then read-ahead would reduce the actual hits. Furthermore, OS/2
will keep the file's allocation information resident in memory while the
file is open, so no rereads would be needed.
<P>
If you've ever looked at the layout of files on a HPFS partition, you
may have been shocked to see the large gaps in the disk usage. This is
FAT-coloured thinking. There are good reasons not to use the first
available spot next to an existing file, particularly in a multitasking
environment where more than one write operation can be occurring
concurrently. HPFS uses three strategies here that I'm aware of.
First, the destination of write operations involving new files will tend
not to be near (preferably in a different band from) where an existing
file is also being updated. Otherwise, fragmentation would be highly
likely to occur.
<P>
Second, 4 KB of extra space is allocated by the file system to the end
of a file when it is created. Again the reason is to reduce the
likelihood of fragmentation from other concurrent writing tasks.
If not utilised, this space is recovered afterwards. To test this
assertion, create the REXX cmdfile shown in Figure 7 and run it on an
empty HPFS partition. (You can also do this on a partition with files
in it but it is easier on an empty one.) Run it and when the "Press any
key" message appears start up another OS/2 session and run CHKDSK (no
switches) on the partition under examination. CHKDSK will get confused
about the space allotted to the file open in the other session and will
say it is correcting an allocation error (which it really isn't doing
because you did not use the /F switch). Ignore this and notice that "4
kilobytes are in 1 user files". Switch back to the other session and
press Enter to close the file. Repeat and again run CHKDSK in the other
session. Notice this time that no extra space is allocated since the
file is being reopened rather than being created.
<PRE>
/* Test to check the space
preallocated to an open file */
CALL STREAM 'zerofile', 'c', 'open'
/* Will create if it does not exist */
'@pause'
CALL STREAM 'zerofile', 'c', 'close'
</PRE>
<FONT SIZE=2>
Fig. 7: A simple REXX program to demonstrate how HPFS allocates 4 KB of
diskspace to a new file.
</FONT>
<P>
Third, if a program has been written to report the likely filesize to
OS/2, or if you are copying an existing file (i.e. the final filesize
is known) then HPFS will expend a great deal of effort to find a free
space big enough to accommodate the file in one extent. If that is not
possible then it looks for two free spaces half the size of the file and
so on. Again this can result in two files in a directory not being next
to each other on the disk.
<P>
Since DOS and Windows programs are not written with preallocation space
requesting in mind, they tend to be more likely candidates for
fragmentation than properly written OS/2 programs. So, for example,
using a DOS comms program to download a large file will often result in
a fragmented file. Compared with FAT, though, fragmentation on heavily
used HPFS volumes is very low, usually less than 1%. We'll consider
fragmentation levels in more depth in Part 3.
<P>
<H2>Other Matters</H2>
<P>
It has also been written that the HPFS cache is smart enough to adjust
the value of its sector read-ahead for each opened file based on the
file's usage history or its type (Ray Duncan, 1989). It is claimed that
EXE files and files that typically have been fully read in the past are
given big read-aheads when next loaded. This is a fascinating concept
but unfortunately it has not been implemented.
<P>
Surprisingly, like other device drivers, HPFS is still 16-bit code. I
think this is one of the few remaining areas of 16-bit code in Warp. I
believe IBM's argument is that 32-bit code here would not help
performance much as mechanical factors are the ones imposing the limits,
at least in typical single-user scenarios.
<P>
HPFS is run as a ring 3 task in the 80x86 processor protection mechanism
i.e. at the application level. HPFS386 is a 32-bit version of HPFS
that comes only with IBM LAN SERVER Advanced Version. HPFS386 runs in
ring 0, i.e. at kernel level. This ensures the highest file system
performance in demanding network situations. It can also provide much
bigger caches than standard HPFS which is limited to 2 MB. There is a
chance that this version will appear in a later release of Warp.
<P>
OS/2 v2.x onwards also boosts the performance of FAT. This improvement,
called "Super FAT", is a combination of 32-bit executable code and the
mirroring of the FAT and directory paths in RAM. This requires a fair
bit of memory. Also Super FAT speeds the search for free space by
representing in memory in a bitmap used sectors in the FAT. This does
help the performance but I think the results in Figure 4, which were
performed using the Super FAT system, still highlight FAT's
architectural weaknesses.
<P>
You can easily tell whether a partition is formatted under HPFS or FAT. Just
run DIR in the root directory. If "." and ".." directory entries are shown
then HPFS is used [Unless the HPFS partition was formatted under Warp 4 -- Ed].
<P>
<H2>Conclusion</H2>
<P>
HPFS does require 300-400 KB of memory to implement, so it's only
suitable for OS/2 v2.1 systems with at least 12 MB or Warp systems with
at least 8 MB. For partitions of 100 MB+ it offers definite technical
advantages over FAT. By now you should have developed an understanding
of how these improvements are achieved.
<P>
In the next installment, we look at a shareware program to visually
inspect the layout of a HPFS partition and a REXX program to dump the
contents of a disk sector by specifying either decimal LSN, hexadecimal
LSN, dword byte-order-reversed hexadecimal LSN (what you see when you
look at a dword pointer in a hex dump) or Cyl/Hd/Sec coordinates. Other
REXX programs will convert the data stored in the SuperBlock and the
SpareBlock sectors into intelligible values. You should find it quite
informative.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,804 @@
<H1>Inside the High Performance File System</H1>
<H2>Part 3: Fragmentation, Diskspace Bitmaps and Code Pages</H2>
Written by Dan Bridges
<H2>Introduction</H2>
<P>
This article originally appeared in the May 1996 issue of Significant
Bits, the monthly magazine of the Brisbug PC User Group Inc.
<P>
This month we look at how HPFS knows which sectors are occupied and which ones
are free. We examine the amount of file fragmentation on five HPFS volumes and
also check out the fragmentation of free space. A program will be presented to
show free runs and some other details. Finally, we'll briefly discuss Code
Pages and look at a program to display their contents.
<P>
<H2>How Sectors are Mapped on a HPFS Volume</H2>
<P>
The sector usage on a HPFS partition is mapped in data band bitmap blocks.
These blocks are 2 KB in size (four sectors) and are usually situated at either
the beginning or end of a data band. A data band is almost 8 MB. (Actually
8,190 KB since 2 KB is needed for its bitmap.) See Figure 1. The state of each
bit in the block indicates whether or not a sector (HPFS' allocation unit) is
occupied. If a bit is set (1) then its corresponding sector is free. If the
bit is not set (0) than the sector is occupied. Structures situated within the
confines of a data band such as Code Page Info &amp; Data sectors, Hotfix
sectors,
the Root Directory DirBlk etc. are all marked as fully occupied within that
band's usage bitmap.
<P>
<IMG WIDTH=435 HEIGHT=257 SRC="fig1.gif">
<P>
<FONT SIZE=2>
Figure 1: The basic data layout of a HPFS volume.
</FONT>
<P>
Since each bit maps a sector, a byte maps eight sectors and the complete 2 KB
block maps the 16,384 sectors (including the bitmap block itself) in a 8 MB
band. And since two blocks can face each other, we arrive at the maximum
possible extent (fragment) size of 16,380 KB. Examine Figure 2 now to see
examples of file and freespace mapping.
<P>
<IMG WIDTH=429 HEIGHT=302 SRC="fig2.gif">
<P>
<FONT SIZE=2>
Figure 2: The correspondence of the first five bytes in a data band's usage
bitmap to the first 40 sectors in the band.
</FONT>
<P>
In this example we see 23 occupied sectors ("u") and 4 unoccupied areas (".")
which we will refer to as "freeruns" [of sectors]. At one extreme, the 23
sectors might belong to one file (here in four extents) while at the other
extreme we might have the FNODEs of 23 "zero-length" files. (Every file and
directory entry on a HPFS volume must have an FNODE sector.)
<P>
The advantages of the bitmap approach are twofold. First, the small allocation
unit size on a HPFS volume means greatly reduced allocation unit wastage
compared to large FAT partitions. Second, the compact mapping structure makes
it feasible for HPFS to quickly search a data band for enough free space to slot
in a file of known size, in one piece if possible. For example, as just
mentioned HPFS can map 32,760 allocation units with just 4 KB of bitmaps whereas
a 16-bit FAT structure requires 64 KB (per FAT copy) to map 32,768 allocation
units.
<P>
<H2>A Fragmentation Analysis</H2>
<P>
In this section we'll examine the level of fragmentation on the five HPFS
partitions of my first HD. Look at Figure 3. Notes:
<P>
1. A time-since-last-defrag figure of "Never" means that I've never run a
defragger across this partition since upgrading to OS/2 Warp 118 days ago. This
value is stored in the SuperBlock (LSN 16) and was determined by using the
ShowSuperSpare REXX program featured in Part 2.
<P>
2. The fragmentation levels were reported by the wondrous FST (freeware) with
"FST -n check -f C:" while the names of the fragmented files and their sizes
came from the GammaTech Utilities (commercial) "HPFSOPT C: -u -d -o1 -l
logfile". You can also use the Graham Utilities (commercial) "HPFS-EXT C: -s".
<P>
3. The high number of files with 0 data extents on C: is due to the presence of
the WPS folders on this drive. Each of these has "zero" bytes in the main file
but they usually have bytes in EAs.
<P>
4. Files with 0 or 1 extents are considered to fully contiguous, so I've placed
them in one grouping.
<P>
5. Files with 2-8 extents are considered to be "nearly" contiguous" since the
fragments will usually be placed close together on the disk and also because a
list of the location and length of up to 8 extents can be kept in a file's FNODE
sector. This list will be kept memory resident while the file is open. Note 1:
the extents themselves can not be kept memory resident since, theoretically,
they could be up to 8*16,380 KB in size. But no non-data disk reads, after the
initial read of the FNODE, would be required to work with the file. Note 2:
under some circumstances, the 8 extents, if small enough, could be kept memory
resident in the sense that they could be held in HPFS' cache. We will consider
FNODEs in detail in a later installment.
<P>
6. Files with more than 8 extents have too many fragments to be listed in their
FNODEs. Instead an B+tree allocation sector structure (an ALSEC) is used to map
the extents. The sector mappings are small enough to keep memory resident while
the file is open. ALSECs will be covered in a latter installment.
<P>
7. EAs are usually not fragmented since, in the current implementation of OS/2,
the total EA size associated with any one file is only 64 KB. If a file has EAs
in 0 extents then the EA information is stored completely within the FNODE
sector. (There is space in the FNODE for up to 145 bytes of "internal" EAs.)
In all other cases on my system they currently stored in single, external runs
of sectors. EAs will be covered in later installments.
<P>
<IMG WIDTH=443 HEIGHT=490 SRC="fig3.gif">
<P>
<FONT SIZE=2>
Figure 3: Fragmentation analysis of five HPFS partitions.
</FONT>
<P>
We now turn to the topic of what circumstances are leading to file fragmentation
on these partitions.
<P>
C: _ The OS/2 system partition. I've run out of space on this drive on
occasions. Activity here occurs though the running of Fixpacks (FP 16 and then
FP 17 were run), INI maintenance utilities and driver upgrades. There is really
nothing of concern here. Most HPFS defraggers suggest not trying to defrag
files that have less than 2 or 3 extents since you run the risk of fragmenting
the free space. We will return to this topic shortly.
<P>
D: _ My main work area and the location of communications files. I use the DOS
comms package TELEMATE because I've always liked its features (although OS/2 has
to work hard to handle its modem access during a file transfer - OS/2 comms
programs, in general, are much less demanding of the CPU's attention). The
other major comms package I use is OS/2 BinkleyTerm v2.60 feeding OS/2 Squish
message databases. The fragmented files consist mainly of files downloaded by
TELEMATE (DOS comms programs do not inform HPFS, ahead of time, of how much
space the downloaded file will occupy) and Squish databases (*.SQD). The drive
was defragged 53 days ago at which time no special effort was made to reduce
file fragmentation below 2-3 extents, accounting for the presence of 245 files
with two extents. This really is an insignificant amount regardless of what the
4% figure may lead you to believe.
<P>
The most fragmented file on this partition is a 150 KB BinkleyTerm logfile with
30 extents. The main reason I can see for fragmentation in this case is that
the file is frequently being updated with information while file transfers are
in progress. The Squish databases are also prone to fragmentation. Out of a
total of 25 database files there were 8, averaging 500 KB each, with a average
of 15 extents.
<P>
E: _ The fragmentation here was insignificant apart from a single 2.8 MB
executable Windows program that has had a DOS patch program run over it,
resulting in 38 fragments. The 2-extent files were mainly data files that are
produced by this same Windows package (being run under WIN-OS2).
<P>
F: _ Almost no fragmentation since this partition is reserved for DOS programs
and I don't use them much.
<P>
G: _ My second major work partition. Fragmentation is low and unlikely to go
much lower since 2 extents is considered below the point of defragger
involvement.
<P>
The conclusions to be drawn from the above is that, if you don't get too hot
under the collar about some files having 2 or 3 extents then there will
generally be little need to worry about fragmentation under HPFS. Only certain
types of files (some comms/DOS/Windows) will be candidates. And keeping
partitions less than 80% full should help reduce general fragmentation as well.
<P>
<H2>Defragmenting Files</H2>
<P>
Since fragmentation is a relatively minor concern under HPFS there is not much
of an argument for purchasing OS/2 utilities based mainly on their ability to
defragment HPFS drives, especially since it's not hard to defragment files
yourself. You see, providing there is enough contiguous freespace on a volume,
the mere act of copying the files to a temporary directory, deleting the
original and then moving the files back will usually eliminate, or at least
reduce fragmentation since HPFS, knowing the original filesize, will look for a
suitably sized freespace. The success of this technique is demonstrated in
Figure 4 where 25 Squish database files (*.SQD) totalling 5.7 MB where shuffled
about on D:. Note: don't use the MOVE command to initially transfer the files
to the temp directory since this will just alter the directory entry rather than
actually rewriting the files.
<P>
<IMG WIDTH=159 HEIGHT=232 SRC="fig4.gif">
<P>
<FONT SIZE=2>
Figure 4: Number of extents in 25 SQD files before and after the defrag process
described in the text.
</FONT>
<P>
I've used the GU's HPFS-EXT to report these figures. This is freely available
in the GULITE demo package. Note: the fully functional HPFSDFRG is also in
this package but I wanted to show that it's not that hard to do this by hand.
HPFSDFRG does much the same as I did except that you can specify the
optimisation threshold (minimum number of extents before a file becomes a
candidate) and it will retry the copying operation up to ten times if there are
more extents after the operation than before it (due to heavily fragmented
freespace).
<P>
<H2>The Fragmentation of Freespace</H2>
<P>
Another significant aspect of HPFS' fragmentation resistance is how well the FS
keeps disk freespace in big, contiguous chunks. If the current files on a
partition are relatively fragmentation free but the remaining freespace is
arranged in lots of small chunks then there is a good change that new files will
be fragmented. You can check this with "FST -n info -f C:". This produces a
table that counts the number of freespace extents that are 1, 2-3, 4-7, 8-15,
... 16384-32767 sectors long. In my opinion though it is more important to
consider the product of the actual extent size by their frequency since the
presence of numerous 1-extent spaces are not important if there are still a
number of large spaces available.
<P>
Figure 5 shows the output of the REXX program ShowFreeruns.cmd. The partition
of 100 MB is almost empty. The display shows the location of the 2 KB block
that holds the list of the starting LSNs of each bitmap block (this figure comes
from the dword at offset 18h in the SuperBlock), the location of each bitmap
block on the left and the sector size and location of freespace on the right.
As you see, this partition has 13 data bands, 6 of which face each other. A
version of ShowFreeruns.cmd that only outputs the run size was used to generate
a list of figures. This list was loaded into a spreadsheet, sorted and a
frequency distribution performed. See Figure 6. You can see that C: has no
large areas remaining, D: has the majority of its freespace in the 4 MB &lt; 8 MB
range and that E:, F: and G: have kept large majorities of their freespace in
very big runs. Overall, this is quite good performance.
<PRE>
Inspecting drive O:
List of Bmp Sectors: 0x00018FF0 (102384)
Space-Usage Bitmap Blocks:
Freespace Runs:
0x00000014-00000017 (20-23)
0x00007FFC-00007FFF (32764-32767)
130-32763 (#1:32634)
0x00008000-00008003 (32768-32771)
0x0000FFFC-0000FFFF (65532-65535)
32772-65531 (#2:32760)
0x00010000-00010003 (65536-65539)
0x00017FFC-00017FFF (98300-98303)
65540-81919 (#3:16380)
81926-98291 (#4:16366)
0x00018000-00018003 (98304-98307)
0x0001FFFC-0001FFFF (131068-131071)
100369-102383 (#5:2015)
102400-131067 (#6:28668)
0x00020000-00020003 (131072-131075)
0x00027FFC-00027FFF (163836-163839)
131076-163835 (#7:32760)
0x00028000-00028003 (163840-163843)
0x0002FFFC-0002FFFF (196604-196607)
163844-196603 (#8:32760)
0x00030000-00030003 (196608-196611)
196612-204767 (#9:8156)
</PRE>
<FONT SIZE=2>
Figure 5: Output from the ShowFreeruns.cmd REXX program.
</FONT>
<P>
<IMG WIDTH=429 HEIGHT=378 SRC="fig6_3.gif">
<P>
<FONT SIZE=2>
Figure 6: Freespace analysis on five HPFS partitions.
</FONT>
<P>
<H2>The ShowFreeruns Program</H2>
<P>
Like other programs in this series, ShowFreeruns.cmd (see Figure 7) uses
SECTOR.DLL to read a sector off a logical drive. I was motivated to design this
program after seeing the output of the GU's "HPFSINFO C: -F". On a one-third
full 1.2 GB partition, the program presented here takes 17 secs compared to
HPFSINFO's time of 26 secs. HPFSINFO also shows the CHS (Cyl/Hd/Sec)
coordinates of each run. I was not interested in these but instead display the
freerun's size. HPFSINFO also displays the meaning of what's in the SuperBlock
and the SpareBlock. If you want to do this, you can include the code from
ShowSuperSpare.cmd from Part 2 and it will only add an extra 0.5 secs to the
time. The performance then, for a interpreted program (REXX), is quite good and
was achieved primarily through a speed-up technique to be discussed shortly.
Moreover, HPFSINFO consistently overstates the end of each freerun by 1 and it
sometimes does not show the last run (e.g. on C: it states that there are 366
freeruns but only shows 365 of them). This last bug appears to be caused by the
last freerun continuing to the end of the partition. My design accounts for
this situation.
<PRE>
/* Shows bitmap locations and free space runs */
ARG drive . /* First parm should always be drive */
IF drive = '' THEN CALL HELP
parmList = "? /? /H HELP A: B:"
IF WordPos(drive, parmList) \= 0 THEN CALL Help
/* Register external DLL functions */
CALL RxFuncAdd 'ReadSect','Sector','ReadSect'
CALL RxFuncAdd 'RxDate','RexxDate','RxDate'
/* Initialise Lookup Table*/
DO exponent = 0 TO 7
bitValue.exponent = D2C(2**exponent)
END exponent
secString = ReadSect(drive, 16) /*Read Superblk sec*/
freespaceBmpList = C2D(Reverse(Substr(secString,25,4)))
totalsecs = C2D(Reverse(Substr(secString,17,4)))
'@cls'
SAY
SAY "Inspecting drive" drive
SAY
/* LSN 25 = list of bitmap blocks */
CALL ShowDword " List of Bitmap secs",25
startOfListBlk = 0
startOfBlk = 0
bmpListBlk = ""
bmpBlk = ""
getFacingBands = 0
runNumber = 0
byteOffset = 0
runNumber = 0
/* Read in 4 secs of the list of sec-usage bmp blks */
DO secWithinBlk = freespaceBmpList TO freespaceBmpList+3
temp = StartOfListBlk + secWithinBlk
bmpListBlk = bmpListBlk||ReadSect(drive, temp)
END secWithinBlk
SAY
SAY "Space-Usage Bitmap Blocks:"
SAY " Freespace Runs:"
/* Use dword pointers to bmps to read in 2KB bmp blks */
DO listOffset = 1 TO 2048 BY 4
startDecStr = C2D(Reverse(Substr(bmpListBlk,ListOffset,4)))
IF startDecStr = 0 THEN /* No more bmps listed */
DO
IF getFacingBands = 1 THEN
DO /* Last data band had no facing data band */
bmpSize = 2048
CALL DetermineFreeruns
LEAVE
END
LEAVE
END
/*Display a blank line when a new facing band occurs*/
IF (ListOffset+7//8 = 0 THEN SAY
CALL ShowBmpBlk listOffset
DO secWithinBlk = 0 TO 3
temp = StartOfBlk + secWithinBlk
bmpBlk = bmpBlk||ReadSect(drive, temp)
END secWithinBlk
getFacingBands = getFacingBands + 1
IF getFacingBands = 2 THEN /* Wait until you get both */
DO /* bmps for the facing data*/
bmpSize = 4096 /* bands since maximum extent*/
CALL DetermineFreeruns /* length is 16,380 KB */
byteOffset = byteOffset+4096
getFacingBands = 0
bmpBlk = ""
END
END listOffset
EXIT /**************EXECUTION ENDS HERE**************/
FourBytes2Hex: /* Given offset, return dword */
ARG startPos
rearranged = Reverse(Substr(secString,startPos,4))
RETURN C2X(rearranged)
ShowDword: /* Display dword and dec equivalent */
PARSE ARG label, offset
hexStr = FourBytes2Hex(offset)
SAY label": 0x"hexStr "("X2D(hexStr)")"
RETURN
ShowBmpBlk:
/* Show start-end of freespace runs in hex &amp; dec */
PARSE ARG offset
endDecStr = C2D(Reverse(Substr(bmpListBlk,offset,4)))+3
SAY " 0x"D2X(startDecStr,8)"-"D2X(endDecStr,8)
" ("startDecStr"-"endDecStr")"
startOfBlk = startDecStr
RETURN
DetermineFreeruns:
runStatus = 0
oldchar = ''
/* Check 128 secs at a time to speed up operation */
DO para = 1 to bmpSize BY 16
/* 16 bytes*8 secs/byte = 128 secs per para scanned */
char = Substr(bmpBlk,para,16)
IF char = 'FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF'x &amp;,
runstatus = 1 THEN ITERATE para
IF char = '00000000000000000000000000000000'x &amp;,
runstatus = 0 THEN ITERATE para
/* Part of paragraph has run start/end
so check a byte (8 secs) at a time. */
DO byte = para TO para + 15
char = Substr(bmpBlk,byte,1)
IF char &gt; '0'x THEN /* 1 or more free secs */
DO
IF char = 'FF'x THEN /* 8 unoccupied secs */
IF runStatus = 1 THEN /* Run is in progress */
NOP
ELSE /* Run starts on 8 sec boundary */
DO
startByte = byte + byteOffset
startBitPos = 0
runStatus = 1 /* Start run determination */
END
ELSE
CALL DetermineBit /* Partial usage of 8 secs */
END
ELSE
DO /* All 8 secs are used */
IF runStatus = 1 THEN
DO
endByte = byte + byteOffset
endBitPos = -1 /* Run ends with prior sec */
CALL ShowRun
END
END
END byte
END para
IF runStatus = 1 THEN /* Freespace at end of part. */
DO
endByte = 9999999999 /* Larger than # of secs in */
endBitPos = 0 /* max. possible part.(512GB) */
CALL ShowRun /* so ShowRun will set runEnd */
/* to last LSN in this part. */
END
RETURN
DetermineBit: /* Free/occupied usage within 8 sec blk */
DO bitPos = 0 TO 7
IF runStatus = 0 THEN
DO /* No run currently in progress */
IF BitAnd(char, bitValue.bitPos) &gt; '0'x THEN
DO /* sec is free */
startByte = byte + byteOffset
startBitPos = bitPos
runStatus = 1
END
END
ELSE
DO
IF BitAnd(char, bitValue.bitPos) = '0'x THEN
DO /* sec is used */
endByte = byte + byteOffset
/* When a run ends, the sec before the first
used one is the last sec in the freerun. */
endBitPos = bitPos - 1
CALL ShowRun
END
END
END bitPos
RETURN
ShowRun:
/* Display freerun start-end secs &amp; reset run status */
runNumber = runNumber + 1
runStart = (startByte - 1) * 8 + startBitPos
runEnd = (endByte - 1) * 8 + endBitPos
IF runEnd &gt; totalSecs THEN runEnd = TotalSecs - 1
IF runStart \= runEnd THEN /* More than 1 sec is free */
DO
run = runStart"-"runEnd
run = Left(run||Copies(" ",14),15)
SAY Copies(" ",40) run "(#"runNumber":"runEnd-RunStart+1")"
END
ELSE
DO
run = Left(runStart||Copies(" ",14),15)
SAY Copies(" ",40) run "(#"runNumber":1)"
END
runStatus = 0
RETURN
Help:
SAY
SAY "Purpose:"
SAY " ShowFreeruns displays the location of the
sec-usage bitmap blocks" /* Wrapped long line */
SAY " and the location and extent of free space runs."
SAY
SAY "Example:"
SAY " ShowFreeruns C:"
SAY
EXIT
</PRE>
<FONT SIZE=2>
Figure 7: The ShowFreeruns.cmd REXX program. Requires SECTOR.DLL. Note that
the long SAY line (line 40) should include the next line as well. (SAY clauses
can't be continued on to the next line with a comma.)
</FONT>
<P>
Since a sector is mapped by a bit, the program often needs to check the status
of a bit within a bitmap's byte. This is done using the BITAND(string1,
string2) inbuilt function. In this design string 1 holds the byte to be
examined and string 2 holds a character that only has the corresponding bit set.
Rather than having to work out the character for string 2 each time BITAND() is
used, we instead precalculate the eight characters and then store them in the
BitValue. compound variable for later use.
<P>
The next step is to read in the SuperBlock and from it get the location of the
list of bitmap sectors and the total number of sectors. The later value is
required so we know when we've reached the end of the partition.
<P>
We then read in the four sectors of the block holding the list of bitmaps. The
list consists of dwords that store the starting LSN of each bitmap block. 128
dwords can fit in each sector of the list so the four sectors of the list can
hold 512 bitmap block LSNs. Now a bitmap block maps 8 MB of diskspace so this
'lite' version is only good when dealing with a partition of less than 4 GB.
(Earlier works refer to the maximum partition size as 512 GB but in the recent
"Just Add OS/2 Warp" package, in its technical section, it is stated that the
maximum partition size is 64 GB.) I won't be able to check this aspect of the
design until I get a HD bigger than 4 GB and succumb to the mad urge to
partition it as one volume.
<P>
The end of the list is indicated by the first occurrence of 0000h. The list of
the 100 MB partition shown in Figure 5 contains only 13 dwords since it has 13
data bands so, in a typical case, you should not expect to find much data stored
in this block.
<P>
A freerun can be bigger than a data band since pairs of bands face each other,
so we consider two bands at a time, unless we reach the end of the partition
without a facing band. Once we have a data region we call the DetermineFreeruns
procedure. Here we examine the two, combined data bitmaps (unless it's a solo
band at the end). In the initial design I looked at each byte in the 4 KB
bitmap combination to see it if it was either 00h (all eight sectors used) or
FFh (all eight sectors free). Typically, you will find lots of occupied or free
sectors together, so checking eight at a time speeds up the search. Only when
the byte was neither of these is a bit-level search required.
<P>
However, the speed of this version was poor, with the search though each byte of
the 322 KB of bitmaps for the 161 databands in the 1.2 GB partition taking a
total of 104 secs. The obvious solution was to extend the optimisation method
to a second, higher level by checking more bytes first to see if they were all
set or clear. I settled on 16 bytes which covers 128 sectors (64 KB) of
diskspace at a time and this resulted in the final time of 17 secs. Further
experiments with larger (64 byte) groups and also with third-level optimisation
did not show much improvement with my mix of partitions but your situation may
warrant further experimentation.
<P>
<H2>Code Pages</H2>
<P>
Different languages have different character sets. Code Pages (CPs) are used to
map an ASCII character to the actual character. CP tables reside in
COUNTRY.SYS. They are also present on a HPFS volume and every directory entry
(DIRENT) includes a CP index value.
<P>
CPs are used to map character case (i.e. in a foreign character set the
relationship between lower and upper-case characters) and for collating
sequences used when sorting. As mentioned in Part 1, HPFS directories use a
B-tree structure which, as part of its operation, always store file/directory
names in sorted order. Remember that HPFS is not case-sensitive (including when
sorting) but it preserves case.
<P>
The European-style language (including English) have relatively straightforward
Single-Byte Character Sets (SBCS) i.e. one character is represented by one
byte. Asian character sets typically have many characters so they require two
bytes per character (DBCS).
<P>
The first 128 characters in all ASCII CPs are the same so the CP tables on the
disk only map ASCII 128-255.
<P>
The SpareBlock holds the LSN of the first CP Info sector. There is a header
followed by up to 31 16-byte CP Info Entries. There is provision for more than
one CP Info sector which could hold CP Info Entries 31-61 (counting from 0).
Why so many different CPs are catered for I have no idea since I've been unable
to have more than two loaded at a time. In Australia we typically use CP437
(standard PC) - Country 061 and CP850 (multilingual Latin-1) - Country 000. The
layout of a CP Info sector is shown in Figure 8.
<P>
<IMG WIDTH=431 HEIGHT=400 SRC="fig8.gif">
<P>
<FONT SIZE=2>
Figure 8: The layout of a Code Page Infomation Sector.
</FONT>
<P>
The CP Info Entry contains the LSN where this entry's CP mapping table is
stored. This sector is a CP Data Sector. As well as a header there is enough
space for up to three 128-byte CP maps per sector. Figure 9 shows the layout of
a CP Data Sector.
<P>
<IMG WIDTH=431 HEIGHT=450 SRC="fig9.gif">
<P>
<FONT SIZE=2>
Figure 9: The layout of a Code Page Data Sector.
</FONT>
<P>
<H2>The CP.cmd Program</H2>
<P>
Figure 10 shows the display produced by the REXX CP.cmd program (Figure 11).
I've stopped it before it reached ASCII 255. Normally, the output will scroll
off the screen, so either pause it or send it to the printer. If the mapped
character has the same value as its ASCII value the word "same" is displayed
instead to reduce clutter.
<P>
<IMG WIDTH=430 HEIGHT=320 SRC="fig10.gif">
<P>
<FONT SIZE=2>
Figure 10: Partial output from the CP.cmd program. List continues on to ASCII
255.
</FONT>
<PRE>
/* Decodes CP info &amp; CP data sectors on a HPFS volume */
ARG drive . /* First parm should always be drive */
IF drive = '' | drive = "?" | drive = "HELP",
| drive = "A:" | drive = "B:" THEN CALL Help
CALL RxFuncAdd 'ReadSect','Sector','ReadSect' /* In SECTOR.DLL */
secString = ReadSect(drive,17) /* SpareBlock is LSN 17 */
'@cls'
SAY
SAY "Inspecting drive" drive
SAY
/* Offset 33 in Spareblock contains dword of CP info LSN */
cpInfoSec = C2D(Reverse(Substr(secString,33,2)))
secString = ReadSect(drive,cpInfoSec) /* Load CP info sec */
numOfCodePages = C2D(Reverse(Substr(secString,5,2)))
prevDataSec = ''
SAY "CODE PAGE INFORMATION (sector" cpInfoSec"):"
SAY "Signature Dword: 0x"FourChars2Hex(1)
SAY " CP# Ctry Code Code Page CP Data Sec Offset"
DO x = 0 TO numOfCodePages-1
hexCountry = TwoChars2Hex((16*x)+17)
decCountry = Right('00'X2D(hexCountry),3)
cp = TwoChars2Hex((16*x)+19)
country.x = X2D(cp)
hexSec = FourChars2Hex((16*x)+25)
decSec = X2D(hexSec)
cpDataSec = decSec
/* Since up to 3 CP tables can fit in 1 CP data sec,
only read in a new data sec when the need arises. */
IF cpDataSec \= prevDataSec THEN
DO
dataSecString = ReadSect(drive,cpDataSec)
prevDataSec = cpDataSec
END
offset = C2D(Reverse(Substr(dataSecString,(2*x)+21,2)))
start = offset + 1
SAY " " x " 0x"hexCountry "("decCountry") 0x"cp "("X2D(cp)") 0x"
hexSec "("decSec") 0x"D2X(offset) "("offset")"
/* Wrapped long line */
/* Store table contents of each CP in an array */
DO y = 128 TO 255
char = Substr(dataSecString,start+6+y-18,1)
IF C2D(char) \= y THEN
array.x.y = Format(C2D(char),4) "("char")"
ELSE
array.x.y = " same "
END y
END x
/* Work out title line based on number of CPs */
titleLine = " ASCII "
DO x = 0 TO numOfCodePages-1
titleLine = titleLine " CP" country.x
END x
SAY
SAY titleLine
/* Display each table entry based on number of CPs */
DO y = 128 TO 255
dispLine = ''
DO x = 0 TO numOfCodePages-1
dispLine = dispLine" "array.x.y
END x
SAY "" y "("D2C(y)"):" dispLine
END y
EXIT /****************EXECUTION ENDS HERE****************/
FourChars2Hex:
ARG offset
RETURN C2X(Reverse(Substr(secString,offset,4)))
TwoChars2Hex:
ARG offset
RETURN C2X(Reverse(Substr(secString,offset,2)))
Help:
SAY "Purpose:"
SAY " CP decodes the CodePage Directory sector &amp;"
SAY " the CodePage sector on a HPFS volume"
SAY
SAY "Example:"
SAY " CP C:"
EXIT
</PRE>
<FONT SIZE=2>
Figure 11: The CP.cmd REXX program. Requires SECTOR.DLL.
</FONT>
<P>
While REXX does not support arrays it does have compound variables and I've used
a CV called "array" to store the contents of each CP's mapping table. The
design only deals with the first 31 CP Info entries (that should be more than
enough anyway) and accommodates additional CPs by adding new columns to the
display.
<P>
Armed with this printout you can experiment with different collating sequences
when switching CPs. You can check out your current CP by typing "CHCP" and then
switch to a different CP by issuing, say, "CHCP 850". I used "REM &gt;
File[Alt-nnn]" to create zero-length files, with one or more high-order ASCII
characters in their filenames, as test fodder.
<P>
<H2>Conclusion</H2>
<P>
In this installment you've learned how to decode the data band usage bitmaps
contents and how to display the contents of the Code Page mapping tables. Next
time we'll examine B-trees, DIRBLKs and DIRENTs.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,932 @@
<H1>Inside the High Performance File System</H1>
<H2>Part 5: FNODEs, ALSECs and B+trees</H2>
Written by Dan Bridges
<H2>Introduction</H2>
This article originally appeared in the August 1996 issue of Significant
Bits, the monthly magazine of the Brisbug PC User Group Inc.
<P>Last month you saw how DIRENTs (directory entries) are stored in
4-sector structures known as DIRBLKs. These blocks have limited space
available for entries. Due to the variable length of filenames (1-254
characters), the maximum number of entries depends on the average filename
length. If the average name length is in the 10-13 character range, a
DIRBLK can hold up to 44 entries.
<P>When there are more files in a directory then can fit in a single
DIRBLK, other DIRBLKs will be used and the connection between these blocks
forms a structure known as a B-tree. Since there can be many elements
(entries) in a node (DIRBLK), a HPFS B-tree has a quick "fan-out" and a
low height (number of levels), ensuring fast entry location.
<P>This time, we'll take a long look at how a file's contents are
logically stored under HPFS. To the best of my knowledge, this topic has
not been well-covered in the scanty information available about HPFS. You
will find it helpful to contrast the following file-sector allocation
methods with last month's directory entry concepts.
<H2>Fragging a File</H2>
Since HPFS is inherently fragmentation-resistant, we have to twist its arm
a little to produce fragmented files. The method I came up with first
fills up an empty partition with a number of files created in an ascending
name sequence. The next step deletes every second file. Finally, I create
a file that is approximately one-half the partition's size. This file then
has nowhere to go except into all the discontiguous regions previously
occupied by the deleted file entries.
<P>This process takes some time with a large partition (100 MB) so I
suggest you use a very small partition (1 MB). At first glance, you may
think that if we fill up a 1 MB partition with say 100 files, then delete
File1, File3, ... File99, and then create a 512K file, we will end up with
a file with exactly 50 extents (fragments). This is not so, since each
individual file occupies a FNODE sector as well as the sectors for the
file itself, whereas a single fragmented file still has only 1 FNODE. So
there is slightly more space available in each gap for an extent than
there was for a file, and a 512K file will find more than 512K of space
available and ends up occupying fewer gaps than expected and we end up
with a smaller number of extents than was specified. For example, in the
50-gap, 1 MB partition scenario we end up with 45 extents. There are also
variations produced by things like the centrally located DIRBAND, the
separate Root DIRBLK and multiple Databands to "fragment" the available
freespace for very large files. So the number of gaps produced by deleting
alternate files is only an rough approximation of the number of extents
that will be produced.
<P>Figure 1 shows the MakeExtents.cmd REXX program. You specify the number
of gaps you want to produce. For example, to originally produce 100 files
on N:, delete half of them and leave 50 gaps, you would issue the command
"MakeExtents N: 50".
<PRE>
/* Produces a large, fragmented file */
PARSE ARG numOfExts
CALL RxFuncAdd 'SysLoadFuncs', 'RexxUtil', 'SysLoadFuncs'
CALL SysLoadFuncs /* Load REXXUTIL.DLL external funcs */
CALL SysCls
EXIT /* Safety line. Delete this when you've adjusted the
drive to suit your system. Formats the drive. */
'echo y | format n: /l /fs:hpfs'
SAY
CALL SysMkDir 'n:\test' /* REXX MD. Faster than OS/2 MD */
currentDir = Directory() /* Store current drive/directory */
CALL Directory 'n:\test' /* Change to test dirve/directory*/
/* Determine free space */
PARSE VALUE SysDriveInfo('n:') WITH . free .
/* Determine size of each sequential file */
fileSize = (free - (numOfExts*2*512)) % (numOfExts*2)
secsInFile = fileSize % 512
sectorFill = Copies('x',512) /* 512 bytes of 'x' char */
Fill_20K = Copies(sectorFill,40) /* 20,480 bytes of 'x' */
/* Create string of the required length */
CALL MakeFile secsInFile
DO i = 1 TO numOfExts*2 /* Produce the file sequence */
CALL CreateFile /* Fixed-length filenames: File00001 */
END i
DO i = 1 TO numOfExts*2 BY 2 /* Delete alternate files */
CALL SysFileDelete 'n:\test\file'||Right("0000"||i,5)
END i
PARSE VALUE SysDriveInfo('n:') WITH . free .
fragmentedFileSecs = ((free-512) % 512)-1
CALL MakeFile fragmentedFileSecs
i='FRAGG' /* Fragmented filename: FileFRAGG */
CALL CreateFile /* Create "FileFRAGG" */
CALL Directory currentDir /* Return to original location */
EXIT /********************************************/
MakeFile: PROCEDURE EXPOSE file sectorFill fill_20K
ARG secs
file = ''
/* If final file is over 20K, speed up creation a little */
IF secs&gt;40 THEN
file = Copies(fill_20K, secs%40)
file = file||Copies(sectorFill, secs//40)
RETURN file
CreateFile:
CALL Charout 'n:\test\file'||Right("0000"||i,5),file,1
CALL Stream 'n:\test\file'||Right("0000"||i,5),'C','CLOSE'
RETURN
</PRE>
<FONT SIZE=2>
Figure 1: The MakeExtents.cmd program produces a fragmented file. When set up
correctly, this program will wipe a partition.
</FONT>
<H2>FNODEs, ALSECs, ALLEAFs and ALNODEs</H2>
Every file and directory on a HPFS partition has an associated FNODE,
usually situated in the sector just before the file's first sector. The
role of an FNODE is quite specific: to map the location of the file's
extents (fragments) and any associated components, namely EAs (Extended
Attributes - up to 64K of ancillary information) and ACLs (Access Control
Lists - to do with LAN Manager).
<P>FNODEs and ALSECs (to be discussed shortly) contain a list of either
ALLEAF or ALNODE entries. See Figure 2. An ALLEAF entry contains three
dwords: logical sector offset (where the start of this run of sectors is
within the total number of sectors in the file - the logical start sector
is 0); run size in sectors; physical LSN (where the run starts in the
partition). An ALLEAF entry is at the end of the B+tree. An ALNODE entry
is an intermediate component in that it does not contain any extent
information. Rather, it points to an ALSEC, and in turn the ALSEC can
contain a list of either ALLEAFs (the end of the line) or ALNODEs (another
descendant level in the B+tree).
<PRE>
Offset Data Size Comment
hex (dec) bytes
Header
00h (1) Signature 4 0xF7E40AAE
04h (5) Seq. Read History 4 Not implemented.
08h (9) Fast Read History 4 Not Implemented.
0Ch (13) Name Length 1 0-254.
0Dh (14) Name 15 Last 15 chars. (Full name in DIRBLK.)
1Ch (29) Container Dir LSN 4 FNODE of Dir that contains this one.
20h (33) ACL Ext. Run Size 4 Secs in external ACL, if present.
24h (37) ACL LSN 4 Location of external ACL run.
28h (41) ACL Int. Size 2 Bytes in internal (inside FNODE) ACL.
2Ah (43) ACL ALSEC Flag 1 &gt;0 if ACL LSN points to an ALSEC.
2Bh (44) History Bits Count 1 Not implemented.
2Ch (45) EA Ext. Run Size 4
30h (49) EA LSN 4
34h (53) EA Int. Size 2
36h (55) EA ALSEC Flag 1 &gt;0 if EA LSN points to an ALSEC.
37h (56) Dir Flag 1 Bit0 = 1 if dir FNODE, else file FNODE.
38h (57) B+Tree Info Flag 1 0x20 (5) Parent is an FNODE, else ALSEC.
0x80 (7) ALNODEs follow, else ALLEAFs.
39h (58) Padding 3 Reestablish 32-bit alignment.
3Ch (61) Free Entries 1 Number of free array entries.
3Dh (62) Used Entries 1 Number of used array entries.
3Eh (63) Free Ent. Offset 2 Offset to next free entry in array.
If ALLEAFs (Maximum of 8 in an FNODE)
Extent #0
40h (65) Logical LSN 4 Sec offset of this extent within file.
The first extent has an offset of 0.
44h (69) Run Size 4 Number of sectors in this extent.
48h (73) Physical LSN 4 File: LSN of extent start.
Dir: This B-tree's topmost DIRBLK LSN.
...
Extent #7
94h (149) Logical LSN 4
98h (153) Run Size 4
9Ch (157) Physical LSN 4
If ALNODEs (Maximum of 12 in an FNODE)
Extent #0
40h (65) End Sector Count 4 Running total of secs mapped by this
alnode. 1-based. If EOF is within this
alnode then field contains 0xFFFFFFFF.
44h (69) Physical LSN 4 File: LSN of ALSEC.
Dir: This B-tree's topmost DIRBLK LSN.
...
Extent #11
98h (153) End Sector Count 4
9Ch (157) Physical LSN 4
Tail
A0h (161) Valid File Length 4 Should be the same as File Size in DIRENT.
A4h (165) "Needed" EAs Count 4 If any, EAs vital to the file's wellbeing.
A8h (169) User ID 16 Not used.
B8h (185) ACL/EA Offset 2 Offset in FNODE to first ACL, if present,
otherwise offset to where EAs would be
stored, if internalised.
BAh (187) Spare 10 Unused.
C4h (197) ACL/EA Storage 316 Only 145 bytes appear avaiable for EAs.
</PRE>
<FONT SIZE=2>
Figure 2: Layout of an FNODE. This component can contain either an array
of ALNODE or ALLEAF entries.
</FONT>
<P>Returning to the B-tree structure of DIRBLKs, you will remember that
both intermediate and leaf components contain DIRENT data. So you may find
the entry you're looking for in a node. This is not the case with a
B+tree. Since an ALNODE can only point to an ALSEC, you must always
proceed to the bottom of the tree, to a leaf, to retrieve extent
information.
<P>An ALNODE entry only contains two dwords: a running total indicating
the logical sector offset of the last sector in the ALSEC (i.e. how far we
are through the file - this starts from 1); the physical LSN of where to
find the ALSEC. The advantage of the smaller entry size of an ALNODE
compared to an ALLEAF is that, in the same space, there can be more of
them.
<P>An FNODE contains other data. One important piece of information is the
last 15 characters of the filename. This comes in handy when we need to
undelete. The last 316 bytes of the sector is also set aside for internal
ACL/EAs (stored completely within the FNODE). In the Graham Utilities
manual it is stated that up to 316 bytes of EAs can be stored within the
FNODE but my experiments with OS/2 Warp v3 show that only up to 145 bytes
of EAs can be internalised. Refer to Part 6 for further information.
<P>Figure 3 shows the structure of an ALSEC. You will notice that there is
much more space in the sector devoted to ALNODE/ALSEC entries then is
available in an FNODE sector (480 bytes compared to 96 bytes). This leads
to the following maximum number of entries:
<PRE>
ALLEAF ANODE
FNODE 8 12
ALSEC 40 60
</PRE>
<PRE>
Offset Data Size Comment
hex (dec) bytes
Header
00h (1) Signature 4 0x37E40AAE
04h (5) This block's LSN 4 Helps when placing other blks nearby.
08h (9) Parent's LSN 4 Points to either FNODE or another ALSEC.
0Ch (13) Btree Flag 1 0x20 (5) Parent is an FNODE, else ALSEC.
0x80 (7) ALNODEs follows, else ALLEAFs.
0Dh (14) Padding 3 Reestablish dword alignment.
10h (17) Free Entries 1 Number of free array entries.
11h (18) Used Entries 1 Number of used array entries.
12h (19) Free Ent. Offset 2 Offset to first free entry.
If ALLEAFs (Maximum of 40 in an ALSEC)
Extent #0
14h (21) Logical LSN 4 Sec offset of this extent within file.
Zero-based.
18h (25) Run Size 4 Secs in this extent.
1Ch (29) Physical LSN 4 File: LSN of extent start.
Dir: This B-tree's topmost DIRBLK LSN.
...
Extent #39
1E8h (489) Logical LSN 4
1ECh (493) Run Size 4
1F0h (497) Physical LSN 4
If ALNODEs (Maximum of 60 in an ALSEC)
Extent #0
14h (21) End Sector Count 4 Running total of secs mapped by this
alnode. 1-based. If EOF is within this
alnode then field contains 0xFFFF.
18h (25) Physical LSN 4 File: LSN of ALSEC.
Dir: This B-tree's topmost DIRBLK LSN.
...
Extent #59
1ECh (493) End Sector Count 4
1F0h (497) Physical LSN 4
Tail
1F4h (501) Padding 12 Unused.
</PRE>
<FONT SIZE=2>
Figure 3: The layout of an ALSEC. This component can contain either an
array of ALNODE or ALLEAF entries.
</FONT>
<H2>Some Examples</H2>
The main program this month, ShowExtents.cmd (to be discussed later),
needs to know the LSN of the FNODE or ALSEC that you want to start with.
It would be possible to design a version that accepted the full pathname
of a file but it would be a larger program. For the purpose of
comprehending these structures, the requirement of having to specify a LSN
is acceptable. To determine the file's FNODE location use last month's
ShowBtree.cmd. Figure 4 shows ShowBtree's output on a 1 MB partition after
"MakeExtents 7" was issued. From the information reported in Figure 4 we
will first examine the TEST directory's FNODE. Figure 5 shows the result
of issuing "ShowExtents N: 1033". Since there is no information in the
allocation array area of a directory FNODE (the 128 byte region commencing
at decimal offset 65), ShowExtents is designed to terminate early in such
a situation.
<PRE>
Root Directory:
1016-1019 Next Byte Free: 125 Topmost DirBlk
This directory's FNODE: 1032 (\ [level 1]) 1016-&gt;1032
**************************************************
SD 21 #00: .. FNODE:1032
D 57 #01: test FNODE:1033
E 93 #02:
36-39 Next Byte Free: 409 Topmost DirBlk
This directory's FNODE: 1033 (test [level 1]) 36-&gt;1033
**************************************************
SD 21 #00: .. FNODE:1033
57 #01: file00002 FNODE:432
97 #02: file00004 FNODE:664
137 #03: file00006 FNODE:896
177 #04: file00008 FNODE:1154
217 #05: file00010 FNODE:1386
257 #06: file00012 FNODE:1618
297 #07: file00014 FNODE:1850
337 #08: fileFRAGG FNODE:316
E 377 #09:
</PRE>
<FONT SIZE=2>
Figure 4: Last month's program, ShowBtree.cmd, shows the LSN of
FileFRAGG's FNODE.
</FONT>
<PRE>
FNODE STRUCTURE
LSN: 1033
Signature: F7E40AAE
Name Length: 4
Name: test
Container Dir LSN: 1032
EA Ext. Run Size: 0
EA LSN: 0
EA Int. Size: 0
EA ALSEC Flag: 0
Dir Flag: Directory FNODE
Topmost DIRBLK LSN: 36
</PRE>
<FONT SIZE=2>
Figure 5: ShowExtents' output when displaying the contents of a directory
FNODE.
</FONT>
<P>Next, we'll look at an FNODE with a full complement of 8 ALLEAF
entries. On my system, this is produced when "MakeExtents 7" is issued.
See Figure 6. The next free entry in the array of ALLEAF entries is at
offset 104 dec. Since the start point for this offset is counted from 65
dec, this means that the next entry would start at 169 dec. This is
actually past the end of the available entry area, at the beginning of the
tail region. This is another indication that the array is full. (The main
indication is the "0" value in the Free Entries field.)
<PRE>
FNODE STRUCTURE
LSN: 316
Signature: F7E40AAE
Name Length: 9
Name: fileFRAGG
Container Dir LSN: 1033
EA Ext. Run Size: 0
EA LSN: 0
EA Int. Size: 0
EA ALSEC Flag: 0
Dir Flag: File FNODE
B+tree Info Flag: ALLEAFs follow
Free Entries: 0
Used Entries: 8
Next Free Offset: 104
Valid data size: 420352
"Needed" EAs: 0
EA/ACL Int. Off: 0
ALLEAF INFORMATION
Extent #0: 115 sectors starting at LSN 317 (file sec offset:0)
Extent #1: 116 sectors starting at LSN 548 (file sec off:115)
Extent #2: 116 sectors starting at LSN 780 (file sec off:231)
Extent #3: 116 sectors starting at LSN 1038 (file sec off:347)
Extent #4: 116 sectors starting at LSN 1270 (file sec off:463)
Extent #5: 116 sectors starting at LSN 1502 (file sec off:579)
Extent #6: 116 sectors starting at LSN 1734 (file sec off:695)
Extent #7: 10 sectors starting at LSN 1966 (file sec off:811)
</PRE>
<FONT SIZE=2>
Figure 6: A FNODE with a full ALLEAF array.
</FONT>
<P>If we need to map any more extents we must switch from a FNODE (with
ALLEAFs) structure to FNODE (with ALNODEs) -&gt; ALSEC (with ALLEAFs). Figure
7 shows the mapping of a 10-extent file ("MakeExtents 8"). The B+tree Info
Flag tells us that the FNODE contains an array of ALNODEs. There is only
one entry in this array. The End Sector Count value is not shown here but,
in this example, you could easily check it out using Part 2's SEC.cmd
("SEC N: 316") and then look at the four bytes at offset 40h (in the case
of a single entry in the array). Since this is the sole entry, you will
find FFFFFFFFh (appears to be the array End-of-Entries indicator) at this
location.
<PRE>
FNODE STRUCTURE
LSN: 316
Signature: F7E40AAE
Name Length: 9
Name: fileFRAGG
Container Dir LSN: 1033
EA Ext. Run Size: 0
EA LSN: 0
EA Int. Size: 0
EA ALSEC Flag: 0
Dir Flag: File FNODE
B+tree Info Flag: ALNODEs follow
Free Entries: 11
Used Entries: 1
Next Free Offset: 16
Valid data size: 418304
"Needed" EAs: 0
EA/ACL Int. Off: 0
FNODE Entry #0
ALSEC STRUCTURE
Signature: 37E40AAE
This LSN: 933
Parent's LSN: 316
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
Free Entries: 30
Used Entries: 10
Next Free Offset: 128
ALLEAF INFORMATION
Extent #0: 101 sectors starting at LSN 317 (file sec off:0)
Extent #1: 102 sectors starting at LSN 520 (file sec off:101)
Extent #2: 102 sectors starting at LSN 724 (file sec off:203)
Extent #3: 102 sectors starting at LSN 1158 (file sec off:305)
Extent #4: 102 sectors starting at LSN 1362 (file sec off:407)
Extent #5: 102 sectors starting at LSN 1566 (file sec off:509)
Extent #6: 102 sectors starting at LSN 1770 (file sec off:611)
Extent #7: 42 sectors starting at LSN 1974 (file sec off:713)
Extent #8: 5 sectors starting at LSN 928 (file sec off:755)
Extent #9: 57 sectors starting at LSN 934 (file sec off:760)
</PRE>
<FONT SIZE=2>
Figure 7: A 10-extent file is mapped in a 1-level B+tree with a single
ALSEC.
</FONT>
<P>The next section in the display in Figure 7, labelled "FNODE Entry #0"
shows us that the sole ALNODE entry points to LSN 933. Here we are seeing
this ALSEC's layout. The B+tree Info Flag informs us that this ALSEC
contains ALLEAF entries i.e. the actual mapping of the extents. Notice
that we have 10 ALLEAF entries in the allocation array. Remember that an
ALSEC has much more space available for array entries than an FNODE has,
in that it can store up to 40 ALLEAF entries. You can verify this by
adding the ALSEC's Free Entries and the Used Entries values together.
<P>When you try and map more than 40 extents you will exceed the capacity
of a sole ALSEC. What happens in this case is that more ALNODE entries are
created in the FNODE, each pointing to an ALSEC. Figure 8 shows a
42-extent layout (produced with a parameter of "45").
<PRE>
FNODE STRUCTURE
LSN: 316
Signature: F7E40AAE
Name Length: 9
Name: fileFRAGG
Container Dir LSN: 1033
EA Ext. Run Size: 0
EA LSN: 0
EA Int. Size: 0
EA ALSEC Flag: 0
Dir Flag: File FNODE
B+tree Info Flag: ALNODEs follow
Free Entries: 10
Used Entries: 2
Next Free Offset: 24
Valid data size: 393192
"Needed" EAs: 0
EA/ACL Int. Off: 0
FNODE Entry #0
ALSEC STRUCTURE
Signature: 37E40AAE
This LSN: 588
Parent's LSN: 316
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
Free Entries: 0
Used Entries: 40
Next Free Offset: 232
ALLEAF INFORMATION
Extent #0: 16 sectors starting at LSN 317 (file sec off:0)
...
Extent #39: 17 sectors starting at LSN 1668 (file sec off:720)
FNODE Entry #1
ALSEC STRUCTURE
Signature: 37E40AAE
This LSN: 996
Parent's LSN: 316
B+tree Info Flag: Parent was an FNODE; ALLEAFs follow
Free Entries: 38
Used Entries: 2
Next Free Offset: 32
ALLEAF INFORMATION
Extent #40: 17 sectors starting at LSN 1702 (file sec off:737)
Extent #41: 14 sectors starting at LSN 1736 (file sec off:754)
</PRE>
<FONT SIZE=2>
Figure 8: 42 extents require a 1-level B+tree with 2 ALNODE entries in the
FNODE pointing to 2 ALSECs.
</FONT>
<P>There is space in an FNODE for 12 ALNODE entries. If each of these
points to a full ALSEC (with ALLEAFs) i.e. 40-entries each, this two-level
structure can accommodate 480 extents (parameter "564").
<P>Let's see what happens when we exceed this value. Figure 9 shows a
482-extent layout ("565"). Interesting things have occurred. We now have a
2-level B+tree structure. The FNODE ALNODE array has been adjusted to
contain a sole entry. This in turn points to an ALSEC that has 13 ALNODE
entries. Each of these ALNODE points to another ALSEC which contains
ALLEAF entries. 12 of the ALSECs (with ALLEAFs) are full i.e. 12*40 while
the 13th ALSEC (with ALLEAFs) only maps 2 extents.
<PRE>
FNODE STRUCTURE
LSN: 1000
Signature: F7E40AAE
Name Length: 9
Name: fileFRAGG
Container Dir LSN: 1033
EA Ext. Run Size: 0
EA LSN: 0
EA Int. Size: 0
EA ALSEC Flag: 0
Dir Flag: File FNODE
B+tree Info Flag: ALNODEs follow
Free Entries: 11
Used Entries: 1
Next Free Offset: 16
Valid data size: 524264
"Needed" EAs: 0
EA/ACL Int. Off: 0
FNODE Entry #0
ALSEC STRUCTURE
Signature: 37E40AAE
This LSN: 1333
Parent's LSN: 1000
B+tree Info Flag: Parent was an FNODE; ALNODEs follow
Free Entries: 47
Used Entries: 13
Next Free Offset: 112
ALNODE INFORMATION
ALSEC Entry #0 situated at LSN 328 (file sec count:582)
ALSEC STRUCTURE
Signature: 37E40AAE
This LSN: 328
Parent's LSN: 1333
B+tree Info Flag: ALLEAFs follow
Free Entries: 0
Used Entries: 40
Next Free Offset: 232 ALLEAF INFORMATION Extent #0-#39
ALNODE INFORMATION
ALSEC Entry #1 situated at LSN 394 (file sec count:622)
ALSEC STRUCTURE 394 (40) ALLEAF INFORMATION Extent #40-#79
ALNODE INFORMATION
ALSEC Entry #2 situated at LSN 476 (file sec count:662)
ALSEC STRUCTURE 476 (40) ALLEAF INFORMATION Extent #80-#119
ALNODE INFORMATION
ALSEC Entry #3 situated at LSN 558 (file sec count:702)
ALSEC STRUCTURE 558 (40) ALLEAF INFORMATION Extent #120-#159
ALNODE INFORMATION
ALSEC Entry #4 situated at LSN 640 (file sec count:742)
ALSEC STRUCTURE 640 (40) ALLEAF INFORMATION Extent #160-#199
ALNODE INFORMATION
ALSEC Entry #5 situated at LSN 722 (file sec count:782)
ALSEC STRUCTURE 722 (40) ALLEAF INFORMATION Extent #200-#239
ALNODE INFORMATION
ALSEC Entry #6 situated at LSN 804 (file sec count:822)
ALSEC STRUCTURE 804 (40) ALLEAF INFORMATION Extent #240-#279
ALNODE INFORMATION
ALSEC Entry #7 situated at LSN 886 (file sec count:862)
ALSEC STRUCTURE 886 (40) ALLEAF INFORMATION Extent #280-#319
ALNODE INFORMATION
ALSEC Entry #8 situated at LSN 968 (file sec count:902)
ALSEC STRUCTURE 968 (40) ALLEAF INFORMATION Extent #320-#359
ALNODE INFORMATION
ALSEC Entry #9 situated at LSN 1085 (file sec count:942)
ALSEC STRUCTURE 1085 (40) ALLEAF INFORMATION Extent #360-#399
ALNODE INFORMATION
ALSEC Entry #10 situated at LSN 1167 (file sec count:982)
ALSEC STRUCTURE 1167 (40) ALLEAF INFORMATION Extent #400-#439
ALNODE INFORMATION
ALSEC Entry #11 situated at LSN 1249 (file sec count:1022)
ALSEC STRUCTURE 1249 (40) ALLEAF INFORMATION Extent #440-#479
ALNODE INFORMATION
ALSEC Entry #12 situated at LSN 1331 (file sec count:At end)
ALSEC STRUCTURE 1331 (2) ALLEAF INFORMATION Extent #480-#481
</PRE>
<FONT SIZE=2>
Figure 9: 482 extents are mapped by a 2-level B+tree with 1 ALNODE entry
in the FNODE pointing to 1 ALSEC, which in turn points to 13 ALSECs.
</FONT>
<P>If you look at FNODE Entry #0's Used & Free Entries values you can
verify that, in an ALSEC, there can be a maximum of 60 ALNODEs. It would
take 60*40 = 2,400 extents to fill this level up again. Going past this
would require the presence of a second FNODE entry. Since we can have up
to 12 ALNODE entries in an FNODE, this means we could map 12*60*40 =
28,800 extents before the need to insert another intermediary ALSEC level
would arise.
<P>On a 100 MB partition I produced a 3-level 44,413 extent structure
("44500"). To put this discussion on B+tree fan-out in perspective, it
should be remembered that, in the fragmentation analysis performed in Part
3 on 20,800 files in 5 partitions, there were only 14 files with more than
8 extents (i.e. requiring an ALSEC) and the largest number of extents
reported was 30.
<H2>The ShowExtents Program</H2>
Figure 10 presents the ShowExtents.cmd REXX program. You will need to get
SECTOR.DLL. The program first determines if the LSN you've specified
belongs to an FNODE or ALSEC. (You can bypass the FNODE and commence the
examination from an ALSEC.) Once it has determined this, the next most
important consideration is: does the allocation array consist of ALLEAFs
or ALNODEs? If it contains ALLEAFs we've reached the end of the tree and
need only show the extents. If we are looking at an array of ALNODEs we
need to recurse down each ALNODE entry, loading the ALSEC pointed to by
the entry and then see whether it contains either ALLEAFs or ALNODEs. And
so on...
<PRE>
/*Shows the layout of FNODE and ALSECs. Requires SECTOR.DLL*/
PARSE UPPER ARG drive lsn
/* There must be at least two parms supplied */
IF drive = '' | lsn = '' THEN CALL HELP
/* Register external functions */
CALL RxFuncAdd 'QDrive','sector','QDrive'
CALL RxFuncAdd 'ReadSect','sector','ReadSect'
alleafEntryCount = 0
anodeEntryCount = 0
SAY
CALL MainRoutine
EXIT /*****************EXECUTION ENDS HERE*****************/
MainRoutine:
PROCEDURE EXPOSE drive lsn alleafEntryCount anodeEntryCount
usedEntries = 0
sectorString = ReadSect(drive,lsn) /* Read in required sec */
IF FourBytes2Hex(1) = 'F7E40AAE' THEN
/* Is an FNODE */
DO
alSecIndicator = ''
CALL DisplayFnode
END
ELSE
/* Not an FNODE */
DO
IF FourBytes2Hex(1) = '37E40AAE' THEN
/* Is an ALSEC */
DO
alSecIndicator = 'Y'
CALL DisplayALSEC
END
ELSE
/* Neither an FNODE or an ALSEC */
DO
SAY 'LSN' lsn 'is not an FNODE or ALSEC'
EXIT
END
END
RETURN
DisplayFnode:
SAY 'FNODE STRUCTURE'
SAY 'LSN: ' lsn
SAY 'Signature: ' FourBytes2Hex(1)
SAY 'Name Length: ' Bytes2Dec(13,1)
SAY 'Name: ' Substr(sectorString,14,Bytes2Dec(13,1))
SAY 'Container Dir LSN:' Bytes2Dec(29,4)
SAY 'EA Ext. Run Size: ' Bytes2Dec(45,4)
SAY 'EA LSN: ' Bytes2Dec(49,4)
SAY 'EA Int. Size: ' Bytes2Dec(53,2)
SAY 'EA ALSEC Flag: ' Bytes2Dec(55,1)
IF Bitand(Byte2Char(56),'1'x) = '1'x THEN
dirFlag = 'Directory FNODE'
ELSE
dirFlag = 'File FNODE'
SAY 'Dir Flag: ' dirFlag
IF dirFlag = 'Directory FNODE' THEN
SAY 'Topmost DIRBLK LSN:'||Bytes2Dec(73,4)
ELSE
DO
/* Is a file, so determine extents */
CALL DetermineBtreeInfo 57
SAY 'B+tree Info Flag: ' btreeInfo
SAY 'Free Entries: ' Bytes2Dec(61,1)
usedEntries = Bytes2Dec(62,1)
SAY 'Used Entries: ' usedEntries
SAY 'Next Free Offset: ' Bytes2Dec(63,2)
SAY 'Valid data size: ' Bytes2Dec(161,4)
SAY '"Needed" EAs: ' Bytes2Dec(165,4)
SAY 'EA/ACL Int. Off: ' Bytes2Dec(169,4)
CALL ShowALLEAF_or_ANODE
END
RETURN
FourBytes2Hex: /* Given offset, return Dword */
ARG startPos
rearranged = Reverse(Substr(sectorString,startPos,4))
RETURN C2X(rearranged)
Bytes2Dec:
ARG startPos,numOfChars
temp = Substr(sectorString,startPos,numOfChars)
IF C2X(temp) = 'FFFFFFFF' THEN
RETURN 'At the end'
ELSE
RETURN Format(C2D(Reverse(temp)),,0)
Byte2Char:
ARG startPos
RETURN Substr(sectorString,startPos,1)
DetermineBtreeInfo:
ARG btreeByteOffset
IF Bitand(Byte2Char(btreeByteOffset),'20'x) = '20'x THEN
btreeInfo = 'Parent was an FNODE; '
ELSE
btreeInfo = ''
IF Bitand(Byte2Char(btreeByteOffset),'80'x) = '80'x THEN
DO
btreeInfo = btreeInfo||'ALNODEs follow'
alNodeIndicator = 'Y'
END
ELSE
DO
btreeInfo = btreeInfo||'ALLEAFs follow'
alNodeIndicator = 'N'
END
RETURN
DisplayALSEC:
SAY 'ALSEC STRUCTURE'
alSecIndicator = 'Y'
SAY 'Signature: ' FourBytes2Hex(1)
lsn = Bytes2Dec(5,4)
SAY 'This LSN: ' lsn
SAY "Parent's LSN: " Bytes2Dec(9,4)
CALL DetermineBtreeInfo 13
SAY 'B+tree Info Flag: ' btreeInfo
SAY 'Free Entries: ' Bytes2Dec(17,1)
usedEntries = Bytes2Dec(18,1)
SAY 'Used Entries: ' usedEntries
SAY 'Next Free Offset: ' Bytes2Dec(19,1)
CALL ShowALLEAF_or_ANODE
RETURN
ShowALLEAF_or_ANODE: PROCEDURE EXPOSE drive lsn sectorString,
usedEntries alleafEntryCount anodeEntryCount entrySize,
alsecIndicator alnodeIndicator
IF alsecIndicator = 'Y' THEN
entryOffset = 21
ELSE
entryOffset = 65
IF alnodeIndicator \= 'Y' THEN
/* Is an ALLEAF */
DO
SAY
IF usedEntries = 0 THEN
DO
SAY 'Zero-length file'
EXIT
END
SAY 'ALLEAF INFORMATION'
entrySize = 12
DO entry = alleafEntryCount TO alleafEntryCount+usedEntries-1
fileSecOffset = Bytes2Dec(entryOffset,4)
runSize = Bytes2Dec(entryOffset+4,4)
physicalLSN = Bytes2Dec(entryOffset+8,4)
SAY 'Extent #'||entry||':' runSize 'sectors starting
at LSN' physicalLSN '(file sec offset:' ||fileSecOffset ||')'
/* Wrapped long line */
entryOffset = entryOffset+entrySize
END entry
alleafEntryCount = entry
END
ELSE
DO
/* Is either an ALNODE in an ALSEC or in an FNODE */
entrySize = 8
IF alSecIndicator \= 'Y' THEN
/* In an FNODE */
DO entry = anodeEntryCount TO anodeEntryCount+usedEntries-1
lsn = Bytes2Dec(entryOffset+4,4)
SAY
SAY 'FNODE Entry #' || entry
CALL MainRoutine
entryOffset = entryOffset+entrySize
END entry
ELSE
DO
/* In an ALSEC */
listStart = 65
sectorString = ReadSect(drive,lsn)
DO entry = anodeEntryCount TO anodeEntryCount+usedEntries-1
SAY
SAY 'ALNODE INFORMATION'
fileSecOffset = Bytes2Dec(entryOffset,4)
lsn = Bytes2Dec(entryOffset+4,4)
SAY 'ALSEC Entry #'||entry 'situated at LSN'
lsn '(file sec count:'|| fileSecOffset ||')'
/* Wrapped long line */
CALL MainRoutine
anodeEntryCount = entry
entryOffset = entryOffset+entrySize
END entry
END
END
RETURN
Help:
SAY 'ShowExtents shows the extents mapped by a FNODE or ALSEC'
SAY 'structure.'
SAY
SAY ' Usage: ShowExtents drive LSN_of_a_FNODE/ALSEC'
SAY ' Example: ShowExtents C: 316'
EXIT
</PRE>
<FONT SIZE=2>
Figure 10: The ShowExtents.cmd program.
</FONT>
<H2>Counting Extents</H2>
It is handy to be able to report just the number of extents in a file.
HPFS-EXT, in the Graham Utilities, can do this. It take a filename. It is
available in the demo version of the GU's, "GULITE.xxx".
<P>The freeware FST (currently FST03F.xxx) does just about everything. You
can specify either a filename ("FST INFO N: \TEST\FILEFRAGG" - note the
space after the drive letter) or a LSN ("FST INFO N: 1000"). It will
include the height of the B+tree and the total number of extents at the
end of its display. Unfortunately, it displays a lot of other info, and
sometimes you're only interesting in just the number of levels and
extents.
<P>I cut down ShowExtents.cmd to produce CountExtents.cmd The design was
not amenable to showing the height but it was a straightforward matter to
show just the number of extents. I've not bothered to present it here
since most readers will probably prefer to specify the filename. (The
FNODE LSN keeps changing as you increase the number of extents so this
makes it more difficult to use CountExtents.)
<H2>Conclusion</H2>
In this installment we have seen how a file's sectors are mapped by FNODEs
and ALSECs. These file system components can contain either an array of
ALNODE or ALLEAF entries. By following through to the ALLEAFs we can
examine the mapping of extents.
<P>We have also seen how a B+tree is different from a B-tree. In a DIRBLK
B-tree, DIRENT information can be found in a node entry. But in an ALSEC
B+tree, extent information is not stored in node entries, only in the
leaves. The filling of nodes in an ALSEC B+tree is also much more
efficient than the utilisation of nodal space in a DIRENT's B-Tree.
<P>When the next installment is presented we'll look at Extended
Attributes. While not specifically a HPFS topic, they are well integrated
into the file system and will fit well into this series.

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.0 KiB

View File

@@ -0,0 +1,53 @@
<html><head><title>Operating Systems: The HPFS Filesystem</title></head>
<body BGCOLOR=#FFFFFF TEXT=#000000 LINK="#0000FF" VLINK="#0000FF" ALINK="#107010">
<center><font face=Verdana size=7><b>HPFS FileSystem</b></font></center>
<hr><p>
This series of articles apparently originally appeared in now defunct OS2Zone (Their page should be at http://www.os2zone.aus.net) written by Dan Bridges. I ran across it during my journeys of the net, and put it up here... The "original" form is <a href="hpfs.zip">available here</a>. This is a six part series of articles on HPFS.<p>
<ul><DL>
<DT><font size=+1><a href="hpfs0.html">Part #0 - Preface</a></font><br>
<DD>This article is the initial "preface" article that explains the motivations behind the series.
It also talks about the filesystem organization scheme used by the FAT filesystem... and briefly
introduces HPFS.<p>
<DT><font size=+1><a href="hpfs1.html">Part #1 - Introduction</a></font><br>
<DD>This introductory article compares the FAT filesystem against the HPFS filesystem in terms that
a user would understand. This talks about the practical differences, such as speed, size, and
fragmentation.<p>
<DT><font size=+1><a href="hpfs2.html">Part #2 - The SuperBlock and the SpareBlock</a></font><br>
<DD>This article starts delving more deeply into HPFS' internal structures. Two REXX programs are
presented that greatly assist in the search for information. It also briefly looks at some
other HPFS-related programs. Finally, you will see the Big Picture when the major structures
of a HPFS partition are shown. <p>
<DT><font size=+1><a href="hpfs3.html">Part #3 - Fragmentation, Diskspace Bitmaps and Code Pages</a></font><br>
<DD>This article looks at how HPFS knows which sectors are occupied and which ones are free.
It examines the amount of file fragmentation on five HPFS volumes and also checks out the
fragmentation of free space. A program is presented to show free runs and some other
details. Finally, it briefly discusses Code Pages and looks at a program that displays
their contents.<p>
<DT><font size=+1><a href="hpfs4.html">Part #4 - B-Trees, DIRBLKs, and DIRENTs</a></font><br>
<DD>The most basic structures in the HPFS are DIRBLKs, DIRENTs and FNODEs. This article examines
DIRBLKs and DIRENTs, talks about the differences between binary trees and B-trees and shows
how DIRBLKs are interconnected to facilitate quick access in a large directory (one of HPFS'
strengths). To assist in this investigation, a program, ShowBtree.cmd, helps to visualise
the layout of directory and file entries in a partition.<p>
<DT><font size=+1><a href="hpfs5.html">Part #5 - FNODEs, ALSECs and B+trees</a></font><br>
<DD>This article takes a long look at how a file's contents are logically stored under HPFS.
It is helpful to contrast the following file-sector allocation methods with last articles's
directory entry concepts. It also talks about fragmentation and how HPFS deals with it.<p>
<DT><font size=+1>Part #6 - ?</font><br>
<DD>This is as far as I can go... if anyone has any of the other articles that appeared in this
series, please please send them my way...<p>
</DL></ul>
<p><hr><FONT SIZE = 4><TABLE ALIGN=RIGHT BORDER=0><TR><TD><center>
Copyright &copy; 1998 <i><a href="mailto:sabre@nondot.org">Chris Lattner</a></i><br>
Last modified: Wednesday, 13-Sep-2000 14:10:50 CDT </center></TD></TR></TABLE>