add directory study

This commit is contained in:
gohigh
2024-02-19 00:25:23 -05:00
parent b1306b38b1
commit f3774e2f8c
4001 changed files with 2285787 additions and 0 deletions

View File

@@ -0,0 +1,128 @@
Appending Files to EXEs.
After seeing several questions on appending files to EXEs, I decided to
write this text. I did NOT originate this idea. While this text describes
"a" way of implementing the technique it may not be the best way for your
needs. I have simply attempted to supply you with a basic understanding of
the process.
WHY?
A couple years ago, I purchased a copy of Ultima 7. After installing it
I looked at the directory. There were a lot of files and moving any one of
them out of the directory crashed the program. When I got Unreal by Future
Crew all of my preconceived ideas went out the window.
1. You can't run a 2meg EXE, can you?!
2. Where are the music and graphic files?!
3. How'd they do that? (This includes the effects :)
The answer to #1 : "It runs; therefore, you must be able to do it. Idiot!"
The answer to #2 : "All the music and graphic files are contained IN the EXE."
Question #3 is a little harder to explain, I still don't know exactly what FC
did, but the technique I discuss in this file gives you similar results.
Appending a file
Before you append a file to the end of your EXE, ask yourself how do
access it. If your adding 10 files how do you know where they are? This is
actually really simple once you think it through. Create a directory
structure of your own and make it the very last file you append! Use your own
structure if you want but feel free to use mine.
Directory structure:
repeat
name - string
filepos - long int, pointer to the first byte of the file
filesize - long int
for each file being attach
long int - number of entries
Since this is similar to a WAD file, we'll call is a KAD file.
KAD = Kodiak Wad file, get it a KAD file.
Okay, so it wasn't that good, lets move on.
To build the KAD file all you have to do is tack one file after another INTO
a single file and add the directory to the end of it.
see packer.c
Open output file
repeat
save the output's file position in directory structure
save the input's file name, ignoring path, in directory structure
save the input's file size in directory structure
open input file
copy input file to output file
close input file
until all files are appended
save directory info
close file
Simple, ehh?
Now that you have the KAD file what do you do with it?
To access the KAD your code should read the directory into a memory array.
Just read the last dword of the KAD multiply by 8 (2 dwords) add 4, and seek
from the end of the file back that many bytes and fill your directory array
from there.
Now if you want to load the first file from the KAD, get the file offset from
your directory array, seek to the file position and load. What could be
simpler? How about using a pre-written function GETFILE. :)
While you are developing your program use the KAD file. Once your code is
done your ready for the final step. Instead of reading from the KAD file,
change the input name your program is looking for, to itself. Then repack the
files to the end of your EXE.
see packer.c
Open EXE file
seek the end of the EXE file
repeat
save the output's file position in directory structure
save the input's file type in directory structure
save the input's file size in directory structure
open input file
copy input file to output file
close input file
until all files are appended
save directory info
close file
That's it! YOUR DONE!
Keep in mind that this is NOT the only way to accomplish this. I have
included a fully functional KAD system implemented for Watcom C. It includes
LZARI decompression routine. If this file has helped you, let me know. Feel
free to use the code included, but if you do greet me. A postcard would be
nice too. :)
NOTE: I have heard it said "you can't do this when using an EXE compression
loader like Pklite." I have one thing to say......BULL! The trick is to
compress your EXE prior to appending the KAD to it.
Coded by Kodiak of The Apollo Project
AKA Charles Jones
1122 s 32nd St #2
Omaha, NE 68105
(402)-346-8974
Email: CAD@UnOmaha.edu
IRC : #Coders (lo *, Bri_acid: I still want to be on OPPER's list)

View File

@@ -0,0 +1,58 @@
COM Format
Intel byte order
Information from File Format List 2.0 by Max Maischein.
--------!-CONTACT_INFO----------------------
If you notice any mistakes or omissions, please let me know! It is only
with YOUR help that the list can continue to grow. Please send
all changes to me rather than distributing a modified version of the list.
This file has been authored in the style of the INTERxxy.* file list
by Ralf Brown, and uses almost the same format.
Please read the file FILEFMTS.1ST before asking me any questions. You may find
that they have already been addressed.
Max Maischein
Max Maischein, 2:244/1106.17
Max_Maischein@spam.fido.de
corion@informatik.uni-frankfurt.de
Corion on #coders@IRC
--------!-DISCLAIMER------------------------
DISCLAIMER: THIS MATERIAL IS PROVIDED "AS IS". I verify the information
contained in this list to the best of my ability, but I cannot be held
responsible for any problems caused by use or misuse of the information,
especially for those file formats foreign to the PC, like AMIGA or SUN file
formats. If an information it is marked "guesswork" or undocumented, you
should check it carefully to make sure your program will not break with
an unexpected value (and please let me know whether or not it works
the same way).
Information marked with "???" is known to be incomplete or guesswork.
Some file formats were not released by their creators, others are regarded
as proprietary, which means that if your programs deal with them, you might
be looking for trouble. I don't care about this.
--------------------------------------------
The COM files are raw binary executables and are a leftover from the old CP/M
machines with 64K RAM. A COM program can only have a size of less than one
segment (64K), including code and static data since no fixups for segment
relocation or anything else is included. One method to check for a COM file is
to check if the first byte in the file could be a valid jump or call opcode, but
this is a very weak test since a COM file is not required to start with a jump
or a call. In principle, a COM file is just loaded at offset 100h in the segment
and then executed.
OFFSET Count TYPE Description
0000h 1 byte ID=0E9h
ID=0EBh
Those are not safe ways to determine wether a
file is a COM file or not, but most COM files
start with a jump.
Further information not available.
EXTENSION:COM
OCCURENCES:PC
SEE ALSO:EXE,MZ EXE,NE EXE

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,445 @@
Executable-File Header Format (3.1)
An executable (.EXE) file for the Windows operating system contains a
combination of code and data or a combination of code, data, and resources. The
executable file also contains two headers: an MS-DOS header and a Windows
header. The next two sections describe these headers; the third section
describes the code and data contained in a Windows executable file.
MS-DOS Header
The MS-DOS (old-style) executable-file header contains four distinct parts: a
collection of header information (such as the signature word, the file size,
and so on), a reserved section, a pointer to a Windows header (if one exists),
and a stub program. The following illustration shows the MS-DOS executable-file
header: If the word value at offset 18h is 40h or greater, the word value at
3Ch is typically an offset to a Windows header. Applications must verify this
for each executable-file header being tested, because a few applications have a
different header style. MS-DOS uses the stub program to display a message if
Windows has not been loaded when the user attempts to run a program.
Windows Header
The Windows (new-style) executable-file header contains information that the
loader requires for segmented executable files. This information includes the
linker version number, data specified by the linker, data specified by the
resource compiler, tables of segment data, tables of resource data, and so on.
The following illustration shows the Windows executable-file header:
The following sections describe the entries in the Windows executable-file
header.
Information Block
The information block in the Windows header contains the linker version number,
the lengths of various tables that further describe the executable file, the
offsets from the beginning of the header to the beginning of these tables, the
heap and stack sizes, and so on. The following list summarizes the contents of
the header information block (the locations are relative to the beginning of
the block):
Loc Description
00h The signature word. The low byte contains "N" (4Eh) and the high byte
contains "E" (45h).
02h The linker version number.
03h The linker revision number.
04h The offset to the entry table (relative to the beginning of the
header).
06h The length of the entry table, in bytes.
08h Reserved.
0Ch Flags that describe the contents of the executable file. This value can
be one or more of the following bits:
Bit Meaning
0 The linker sets this bit if the executable-file format is
SINGLEDATA. An executable file with this format contains one
data segment. This bit is set if the file is a dynamic-link
library (DLL).
1 The linker sets this bit if the executable-file format is
MULTIPLEDATA. An executable file with this format contains
multiple data segments. This bit is set if the file is a
Windows application.
If neither bit 0 nor bit 1 is set, the executable-file format
is NOAUTODATA. An executable file with this format does not
contain an automatic data segment.
2 Reserved.
3 Reserved.
8 Reserved.
9 Reserved.
11 If this bit is set, the first segment in the executable file
contains code that loads the application.
13 If this bit is set, the linker detects errors at link time but
still creates an executable file.
14 Reserved.
15 If this bit is set, the executable file is a library module.
If bit 15 is set, the CS:IP registers point to an
initialization procedure called with the value in the AX
register equal to the module handle. The initialization
procedure must execute a far return to the caller. If the
procedure is successful, the value in AX is nonzero. Otherwise,
the value in AX is zero. The value in the DS register is set to
the library's data segment if SINGLEDATA is set. Otherwise, DS
is set to the data segment of the application that loads the
library.
0Eh The automatic data segment number. (0Eh is zero if the SINGLEDATA and
MULTIPLEDATA bits are cleared.)
10h The initial size, in bytes, of the local heap. This value is zero if
there is no local allocation.
12h The initial size, in bytes, of the stack. This value is zero if the SS
register value does not equal the DS register value.
14h The segment:offset value of CS:IP.
18h The segment:offset value of SS:SP.
The value specified in SS is an index to the module's segment table.
The first entry in the segment table corresponds to segment number 1.
If SS addresses the automatic data segment and SP is zero, SP is set to
the address obtained by adding the size of the automatic data segment
to the size of the stack.
1Ch The number of entries in the segment table.
1Eh The number of entries in the module-reference table.
20h The number of bytes in the nonresident-name table.
22h A relative offset from the beginning of the Windows header to the
beginning of the segment table.
24h A relative offset from the beginning of the Windows header to the
beginning of the resource table.
26h A relative offset from the beginning of the Windows header to the
beginning of the resident-name table.
28h A relative offset from the beginning of the Windows header to the
beginning of the module-reference table.
2Ah A relative offset from the beginning of the Windows header to the
beginning of the imported-name table.
2Ch A relative offset from the beginning of the file to the beginning of
the nonresident-name table.
30h The number of movable entry points.
32h A shift count that is used to align the logical sector. This count is
log2 of the segment sector size. It is typically 4, although the
default count is 9. (This value corresponds to the /alignment [/a]
linker switch. When the linker command line contains /a:16, the shift
count is 4. When the linker command line contains /a:512, the shift
count is 9.)
34h The number of resource segments.
36h The target operating system, depending on which bits are set:
Bit Meaning
0 Operating system format is unknown.
1 Reserved.
2 Operating system is Microsoft Windows.
3 Reserved.
4 Reserved.
37h Additional information about the executable file. It can be one or more
of the following values:
Bit Meaning
1 If this bit is set, the executable file contains a Windows 2.x
application that runs in version 3.x protected mode.
2 If this bit is set, the executable file contains a Windows 2.x
application that supports proportional fonts.
3 If this bit is set, the executable file contains a fast-load
area.
38h The offset, in sectors, to the beginning of the fast-load area. (Only
Windows uses this value.)
3Ah The length, in sectors, of the fast-load area. (Only Windows uses this
value.)
3Ch Reserved.
3Eh The expected version number for Windows. (Only Windows uses this
value.)
Segment Table
The segment table contains information that describes each segment in an
executable file. This information includes the segment length, segment type,
and segment-relocation data. The following list summarizes the values found in
the segment table (the locations are relative to the beginning of each entry):
Loc Description
00h The offset, in sectors, to the segment data (relative to the beginning
of the file). A value of zero means no data exists.
02h The length, in bytes, of the segment, in the file. A value of zero
indicates that the segment length is 64K, unless the selector offset is
also zero.
04h Flags that describe the contents of the executable file. This value can
be one or more of the following:
Bit Meaning
0 If this bit is set, the segment is a data segment. Otherwise,
the segment is a code segment.
1 If this bit is set, the loader has allocated memory for the
segment.
2 If this bit is set, the segment is loaded.
3 Reserved.
4 If this bit is set, the segment type is MOVABLE. Otherwise, the
segment type is FIXED.
5 If this bit is set, the segment type is PURE or SHAREABLE.
Otherwise, the segment type is IMPURE or NONSHAREABLE.
6 If this bit is set, the segment type is PRELOAD. Otherwise, the
segment type is LOADONCALL.
7 If this bit is set and the segment is a code segment, the
segment type is EXECUTEONLY. If this bit is set and the segment
is a data segment, the segment type is READONLY.
8 If this bit is set, the segment contains relocation data.
9 Reserved.
10 Reserved.
11 Reserved.
12 If this bit is set, the segment is discardable.
13 Reserved.
14 Reserved.
15 Reserved.
06h The minimum allocation size of the segment, in bytes. A value of zero
indicates that the minimum allocation size is 64K.
Resource Table
The resource table describes and identifies the location of each resource in
the executable file.
Following are the members in the resource table:
rscAlignShift The alignment shift count for resource data. When the shift
count is used as an exponent of 2, the resulting value
specifies the factor, in bytes, for computing the location of a
resource in the executable file.
rscTypes An array of TTYPEINFO structures containing information about
resource types. There must be one TTYPEINFO structure for each
type of resource in the executable file.
rscEndTypes The end of the resource type definitions. This member must be
zero.
rscResourceNames The names (if any) associated with the resources in this
table. Each name is stored as consecutive bytes; the first
byte specifies the number of characters in the name.
rscEndNames The end of the resource names and the end of the resource
table. This member must be zero.
Type Information
Following are the members in the TTYPEINFO structure:
rtTypeID The type identifier of the resource. This integer value is
either a resource-type value or an offset to a resource-type
name. If the high bit in this member is set (0x8000), the value
is one of the following resource-type values:
Value Resource type
RT_ACCELERATOR Accelerator table
RT_BITMAP Bitmap
RT_CURSOR Cursor
RT_DIALOG Dialog box
RT_FONT Font component
RT_FONTDIR Font directory
RT_GROUP_CURSOR Cursor directory
RT_GROUP_ICON Icon directory
RT_ICON Icon
RT_MENU Menu
RT_RCDATA Resource data
RT_STRING String table
If the high bit of the value in this member is not set, the value represents an
offset, in bytes relative to the beginning of the resource table, to a name in
the rscResourceNames member.
rtResourceCount The number of resources of this type in the executable file.
rtReserved Reserved.
rtNameInfo An array of TNAMEINFO structures containing information about
individual resources. The rtResourceCount member specifies the
number of structures in the array.
Name Information
Following are the members in the TNAMEINFO structure:
rnOffset An offset to the contents of the resource data (relative to the
beginning of the file). The offset is in terms of alignment
units specified by the rscAlignShift member at the beginning of
the resource table.
rnLength The resource length, in bytes.
rnFlags Whether the resource is fixed, preloaded, or shareable. This
member can be one or more of the following values:
Value Meaning
0x0010 Resource is movable (MOVEABLE). Otherwise, it is fixed.
0x0020 Resource can be shared (PURE).
0x0040 Resource is preloaded (PRELOAD). Otherwise, it is
loaded on demand.
rnID Specifies or points to the resource identifier. If the
identifier is an integer, the high bit is set (8000h).
Otherwise, it is an offset to a resource string, relative to
the beginning of the resource table.
rnHandle Reserved.
rnUsage Reserved.
Resident-Name Table
The resident-name table contains strings that identify exported functions in
the executable file. As the name implies, these strings are resident in system
memory and are never discarded. The resident-name strings are case-sensitive
and are not null-terminated. The following list summarizes the values found in
the resident-name table (the locations are relative to the beginning of each
entry):
Location Description
00h The length of a string. If there are no more strings in the
table, this value is zero.
01h - xxh The resident-name text. This string is case-sensitive and is
not null-terminated.
xxh + 01h An ordinal number that identifies the string. This number is an
index into the entry table.
The first string in the resident-name table is the module name.
Module-Reference Table
The module-reference table contains offsets for module names stored in the
imported-name table. Each entry in this table is 2 bytes long.
Imported-Name Table
The imported-name table contains the names of modules that the executable file
imports. Each entry contains two parts: a single byte that specifies the length
of the string and the string itself. The strings in this table are not
null-terminated.
Entry Table
The entry table contains bundles of entry points from the executable file (the
linker generates each bundle). The numbering system for these ordinal values is
1-based--that is, the ordinal value corresponding to the first entry point is
1. The linker generates the densest possible bundles under the restriction that
it cannot reorder the entry points. This restriction is necessary because other
executable files may refer to entry points within a given bundle by their
ordinal values. The entry-table data is organized by bundle, each of which
begins with a 2-byte header. The first byte of the header specifies the number
of entries in the bundle (a value of 00h designates the end of the table). The
second byte specifies whether the corresponding segment is movable or fixed. If
the value in this byte is 0FFh, the segment is movable. If the value in this
byte is 0FEh, the entry does not refer to a segment but refers, instead, to a
constant defined within the module. If the value in this byte is neither 0FFh
nor 0FEh, it is a segment index.
For movable segments, each entry consists of 6 bytes and has the following
form:
Loc Description
00h Specifies a byte value. This value can be a combination of the
following bits:
Bit(s) Meaning
0 If this bit is set, the entry is exported.
1 If this bit is set, the segment uses a global (shared) data
segment.
3-7 If the executable file contains code that performs ring
transitions, these bits specify the number of words that
compose the stack. At the time of the ring transition, these
words must be copied from one ring to the other.
01h An int 3fh instruction.
03h The segment number.
04h The segment offset.
For fixed segments, each entry consists of 3 bytes and has the following form:
Loc Description
00h Specifies a byte value. This value can be a combination of the
following bits:
Bit(s) Meaning
0 If this bit is set, the entry is exported.
1 If this bit is set, the entry uses a global (shared) data
segment. (This may be set only for SINGLEDATA library modules.)
3-7 If the executable file contains code that performs ring
transitions, these bits specify the number of words that
compose the stack. At the time of the ring transition, these
words must be copied from one ring to the other.
01h Specifies an offset.
Nonresident-Name Table
The nonresident-name table contains strings that identify exported functions in
the executable file. As the name implies, these strings are not always resident
in system memory and are discardable. The nonresident-name strings are
case-sensitive; they are not null-terminated. The following list summarizes the
values found in the nonresident-name table (the specified locations are
relative to the beginning of each entry):
Location Description
00h The length, in bytes, of a string. If this byte is 00h, there
are no more strings in the table.
01h - xxh The nonresident-name text. This string is case-sensitive and is
not null-terminated.
xx + 01h An ordinal number that is an index to the entry table.
The first name that appears in the nonresident-name table is the module
description string (which was specified in the module-definition file).
Code Segments and Relocation Data
Code and data segments follow the Windows header. Some of the code segments may
contain calls to functions in other segments and may, therefore, require
relocation data to resolve those references. This relocation data is stored in
a relocation table that appears immediately after the code or data in the
segment. The first 2 bytes in this table specify the number of relocation items
the table contains. A relocation item is a collection of bytes specifying the
following information:
- Address type (segment only, offset only, segment and offset)
- Relocation type (internal reference, imported ordinal, imported name)
- Segment number or ordinal identifier (for internal references)
- Reference-table index or function ordinal number (for imported ordinals)
- Reference-table index or name-table offset (for imported names)
Each relocation item contains 8 bytes of data, the first byte of which
specifies one of the following relocation-address types:
Value Meaning
0 Low byte at the specified offset
2 16-bit selector
3 32-bit pointer
5 16-bit offset
11 48-bit pointer
13 32-bit offset
The second byte specifies one of the following relocation types:
Value Meaning
0 Internal reference
1 Imported ordinal
2 Imported name
3 OSFIXUP
The third and fourth bytes specify the offset of the relocation item within the
segment.
If the relocation type is imported ordinal, the fifth and sixth bytes specify
an index to a module's reference table and the seventh and eighth bytes specify
a function ordinal value.
If the relocation type is imported name, the fifth and sixth bytes specify an
index to a module's reference table and the seventh and eighth bytes specify an
offset to an imported-name table.
If the relocation type is internal reference and the segment is fixed, the
fifth byte specifies the segment number, the sixth byte is zero, and the
seventh and eighth bytes specify an offset to the segment. If the relocation
type is internal reference and the segment is movable, the fifth byte specifies
0FFh, the sixth byte is zero; and the seventh and eighth bytes specify an
ordinal value found in the segment's entry table.

View File

@@ -0,0 +1,527 @@
<20> Advanced Linking Techniques
<20> Part 1
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ŀ
How to put everything into <20>
- ONE BIG EXE FILE -
In the good old days most of the programs had many files. That was a
simple and convenient way for storing the necessary data. But the time quic-
kly ran forth and a new tendency appeared: the single EXE method. This
is more difficult to deal with (from the coders' point of view), but it's
also more elegant. So this article discusses some system coding, which is
important in a demo, but quite invisible.
I. Capabilities of EXE files
II. Link data to the executable file
III. Overlays
IV. Link EXEs together (chaining)
V. Virtual file systems
I. Concerning EXE files
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
An EXE file consists of three parts: header, body and overlay. The header
contains info about the body (the executable part). The rest of the file
is the overlay: it can be any data copied to the end of the body. For
example, the debug info. The body is 512-byte aligned in the file by
default, but it may be put to 16-byte boundary to save some space. (Actually
the loading of 512-aligned executable parts is faster in DOS.)
!USEFUL! For source-level debugging assemble with the /zi switch and
link with /v. Then in Turbo Debugger select View/module and You're
debugging in your source code. Also You may take a look at the end of the
file - how the debug info looks like. (Extremely interesing ;-) Plus one
thing: Pklite doesn't kill overlays - compressed files remain wonderfully
debuggable!
Now I wouldn't like to discuss over the structure of the EXE header - its
description may be found in many places (DosRef, IntrList, TechHelp) -
just to point to some funny things.
Some facts about the EXE header
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Signature: Can be either 'MZ' or 'ZM'. If a file to be executed starts with
'MZ' or 'ZM', it will be treated as EXE file, otherwise .COM file. (The
extension (EXE/COM) doesn't matter at all.)
Partial Page: Equals the length of the executable part + length of the
header mod 512. (I'll refer to 'Length of the file without overlays' as 'EXE
Length' in the followings, so this field is EXE Length mod 512.)
PageCounter : This is NOT EXE Length div 512 and also NOT EXE Length div
512 +1 as some docs claims. It's exactly the upper whole part (UpRound) of
EXE Length / 512. Practically, if PartialPage = 0, it's EXELength div
512, else EXE Length div 512 + 1.
Checksum: Nobody cares it.
Originally it's a pad word that the sum of the words in an EXE file would
be 0. No info on what should happen if the file is odd-length... So this word
can be anything.
Start of the relocation table : Tlink sets it to 3eh, but it can be placed
elsewhere.
Overlay number : Another unused area. To save space, the relocation table
can start here. According to some documentations this doesn't belong
to the EXE header. So the shortest EXE file in the world is 26 bytes
long, and consists of only a header. Its entry point is the 'int 20h' ins-
truction in the PSP. Executable files under 26 byte are all .COM files even
if they start with 'MZ'...
And the shortest .COM file is a single 'retn' instruction ;-)
!TRICK! It's funny to add some text to the beginning to the EXE file with
a message "Ripping is lame!" or something... Here's the technique:
a postprocessor program places the relocation table elsewhere and copies
a message after the header.
A freshly compiled EXE file looks like this:
"MZ" <- signature
<header data>
"<22>0jr" <- Tasm crap
<relocation table (if any)>
<Numerous pad bytes> <- Body is 512
<Body> aligned
This can be modified/compressed into:
"MZ" <- Remains
<header data> <- New reloc.
table start!
"Ripping is lame!" <- Message
<relocation table>
<Max. 12 pad bytes> <-Body will be
<Body> paragraph aligned
So if an inquistive dude looks to the EXE file, he immediately confronts
the message :-)
Now some general things
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
- You may copy any data to your EXE file, e.g.
'copy /b demoEXE+piccy.raw demo2EXE'
This won't affect the execution of the EXE file. It's Your problem how to
access the data... see chapter 3.
- .COM to EXE conversion : All to be done is to insert a 32-byte header be-
fore the .COM file. The only interesting thing is how to calculate the
entry point. DOS loads the body to PSP+10h:0, and adds it to the Relative
initial CS. This value will be the program's CS. The problem is that at
.COM files the initial CS equals the PSP's segment... So the Rel. init. CS
in this case must be fff0. It's added to PSP+10h will be exactly the PSP's
segment ;-) Some programs don't recognize this technique (Like F-Prot and
Hacker's View), but it works anyway. The appropriate header looks like this:
"MZ" <- Ususal sign
PartPage = (COM's length+32) mod 512
PageCnt = UpRound((COM's length+32) / 512)
Checksum <- Anything
Size of header=2 <- (2 paragraphs)
Minimal Memory=(ffff - COM's length) / 16
Maximal Memory=ffff
Initial IP=100h <- .COM property
Rel. init CS=fff0 <- It will overflow
Initial SP=fffe <- .COM property
Rel. init SS=fff0 <- Same as CS
Number of relocations=0 <- No relos
Start of relocation table <- Anything
Overlay number <- Anything
4 pad bytes <- Anything
Now the most important topic comes in
this article:
How to kick out Windows from a demo nicely and intelligently
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
The Windows EXE file format is a superset of the DOS EXE format. How
Windows starts executing a program?
1. Checks the 1st word of the file. If it's not 'MZ' treates it as a DOS prg.
2. if it's 'MZ', gets a word from the file at offset 003dh (let's call this
word New EXE Header Offset (NEHO)), then checks the word at NEHO. If it's
'NE', then it's a Windows EXE file, otherwise not.
Also every Windows EXE contains a little DOS EXE (called STUB) which
will run when somebody tries to start the program from DOS. Usually it shows
up a message like 'This program requires Microsoft Windows'. (Gosh! Some
evil stubs start Windows if they find it :-( What is our goal? We want a
program which runs perfectly in DOS, and under Windows shows up a message
box: 'This program requires NO Microsoft Windows.', then kills Windoze,
executes itself under DOS and restarts Windoze. The main idea is that we
change the 'stub' program of a Windooz application.
Here's the C code of the Windows
program:
#include <windows.h>
int PASCAL WinMain(HWND hInstance,
HWND hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow){
char MyName[128];
MessageBox(0,"This program"
"requires NO Microsoft Windows.",
"Windooz suks", 0);
GetModuleFileName(hInstance, MyName,
128);
ExitWindowsExec(MyName, lpCmdLine);
return 0;
}
In the module-definition (.DEF) file the 'STUB' entry must be changed from
winstub.exe to demo.exe :-)
One thing must be maintaned : the relocation table of the 'stub' file
must start AFTER 003dh. This is not a problem for freshly assembled or PKLI-
TEd files.
Problem that the 'stub' proggy must be less than 64k. This is enough for pro-
tecting intros - for big demos some postprocessing is required.
II. Link data to the executable file
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Here come few tips how to put data to the program at compile and link-
time. I assume the using of full segment declarations, NOT the simpli-
fied version like .model and .data.
1. Include method
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Let's assume we want to insert a bunch of bytes to the program (a raw
picture, for example, 'piccy.bin'). First convert it to ASCII form:
BIN2ASM piccy.bin piccy.inc
Piccy.inc will be approximately 3-4 times large than the binary file. Now
insert the following lines to the source code (e.g. demo.asm):
piccy label byte
include piccy.inc
Wow. We've done it. The data will get into the program at compile-time. This
is the most simple and most slow way. Why to compile the whole data again
when only the code changes? And why to store the huge include file on the
expensive harddisk?
2. Link method
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Now the data will get into the proggy at link-time. We'll use object (.obj)
files. First we have to make the object files from binary files.
There's a utility (binobj) but it's quite unusable (it doesn't handle seg-
ment names). For a moment we'll have to use include files... Make piccy.inc
from the binary file! Then create a source file named piccy.asm:
main segment use16
public piccy
piccy label byte
include piccy.inc
main ends
end
and compile it. (The segment name should match one of the main source
module's segment names.) Now let's have a look at the main module
(demo.asm):
o equ offset
main segment use16
extrn piccy:byte
;Here can be anything...
;For example,
mov si, o piccy
xor di,di
rep movsd
main ends
And finally put the things together:
tasm demo /m9
tasm piccy
tlink demo piccy
Basically these are the steps of the object-level linking. Some extensions:
- When You want to link independent segments (such segments which don't
occur in the main module), enough to use segment names only. This may
be needed when big data arrays are in use, e.g. bitmaps, and it's
unnecessary to fool with identifier names like 'piccy'. In this case You
don't have to add the 'public' and 'extrn' directives, just declare the
segment:
piccy.asm:
picture segment use16
include piccy.inc
picture ends
end
demo.asm:
main segment
mov ax,picture
mov ds,ax
xor si,si
xor di,di
rep movsd
main ends
picture segment ;Just declare segment
picture ends
- Link more than 64k arrays
One way is to cut the data to 64k segments... but it's better to link
it in one step. Simply change piccy.asm:
.386
picture segment use32
include piccy.inc
picture ends
end
and the 'picture' segment can be refered as a 'normal' segment in
real mode too. In this case, link with the /3 switch.
- Never forget to delete the temporary include files. They're kinda long.
- Use makefiles instead of batch files. Makefiles handle time-depen-
dencies, so only those parts will be compiled which were modified since
the last compilation. (Working time can be heavily reduced) Imagine what
would happen if at every compilation the include files were in use :-(
If the makefile's name is 'makefile' then enough to type 'make' at the
command prompt, else
'make -fdemo.mak'. If the dates of the source files are not correct,
use the 'touch' utility. It's useful when You want to compile something
even if it wasn't changed.
3. Advantages and disadvantages
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
- When compressing the EXE file the linked data will be compressed too.
- Doesn't require postprocessing.
- Data is available when the program starts.
- The amount of linkable data is limited.
- Structure of the source code is more complex.
III. Overlays
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
The advantage of the previous method is that all data You've linked are
available when the program starts - no need for additional reading from the
disk. The disadvantage is that the amount of the linkable data is fairly
limited because of the lovely real mode 640k barrier. Overlays allow
unlimited quantity of aditional data. How overlay works? As I mentioned be-
fore, we can copy anything after an EXE file, that won't be loaded to the
memory. It's our problem how to reach that data. Let's make an overlaid EXE
file:
'copy /b demo.exe+piccy.bin demo2.exe'
The method is very simple : the demo2EXE opens itself, seeks to the
beginning of the overlay data, and reads it to the memory. This reqires
some plus administration: we have to know the length of the overlay file(s)
in advance. The demo.exe alone of course is unusable without the overlay
data.
(Now let's assume we want to show a simple picture on the screen - 64000
bytes)
Borland Pascal version:
Var F:File;
Assume (F, paramstr(0));
Reset (F, 1);
Seek (F, Filesize(F)-64000);
BlockRead(F, Mem[$a000:0], 64000);
Assembly version (Provided that DS points to the PSP):
mov es,[2ch] ; Get env str
xor di,di
mov cx,0ffffh
mov al,0
get_argv0:
repne scasb
scasb
jne get_argv0
push es ; Open file
pop ds
mov dx,di
mov ax,3d20h
int 21h
xchg bx,ax ; Seek to ovr
mov ax,4202h
mov cx,0ffffh
mov dx,-64000
int 21h
push 0a000h ; Read picture
pop ds
mov ah,3fh
mov cx,64000
xor dx,dx
int 21h
This is the 'backward' method:
We seek from the end of the file. It's good because we don't have to know the
size of the main EXE file.
IV. Chaining
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
Perhaps this is the most interesting topic in this article... The base
problem : we have a couple of EXE files (...a demo's parts...) and we
want ONE NICE BIG EXE file. The most convenient way is renaming these files
to *.DAT and writing a 'master' proggy which sequentially executes them. But
then there are many files which isn't so elegant... The solution : an EXE
loader must be written which stores the independent EXE files in itself
(as overlays), and executes them. Unfortunately DOS doesn't have such a
service :-(
1. Simple EXE loader
This works for non-overlayed EXE files only. The files to be executed must
NOT open themselves for reading or writing.
Structure of this big EXE file:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
<EFBFBD>Loader<EFBFBD> 1st EXE <20> 2nd EXE <20>...
<EFBFBD> <20>(Overlay1)<29>(Overlay2)<29>
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
The loader's task to process each 'file':
- Load the header to get info on the proggy
- Load the body
- Load relocation table and process it
- Jump to the beginning of the program
The detailed process:
- Reduce the loader's occupied memory to minimal
- Open the loader file, seek to the start of the next program
- Read the header (1ah bytes)
- Create a PSP (there's a DOS function, but copying loader's PSP will do too)
- Seek to the body's start
- Read the body to the memory (Page Counter*512 bytes right after the
newly created PSP - You can ignore the Partial Page field)
- Seek to the relocation table's beginning
- Load the relocation table and relocate the body (the table can be
loaded in 4-byte steps to save space). One relocation item consists
of two words: ReloSeg and ReloOffset. Process for one item:
Add the body's segment address to ReloSeg (This will be a segment
address, let's call it ReloSeg2), then add the body's segment to the
word at ReloSeg2:ReloOffset.
- Make the new PSP active
- Redirect DOS exit function 4c that it could catch the terminating process
- Set DS & ES to the new PSP, FS & GS to 0, SS to new PSP+Relative Initial
SS, SP to Initial SP, other registers to 0
- Jump to new PSP + Initial Relative CS:Initial IP
Of course these steps can be extended with safety and convenience services.
For example, handling the TSR exit (27h) function. Let's say we have a
resident modplayer, but normally it can't be killed from the memory...
What should the loader do when a program wants to exit as TSR? It's
enough to reserve the required memory for it, then create the next program's
PSP after that. And when the loader exits, it should restore the whole
interrupt table (which was saved in the beginning of the whole process ;-)
2. More complex EXE loader
This method allows self-overlaying files to run. The individual programs
can read/write themselves without noticing that they're not alone on the
disk but in the overlay area of a loader! Of course, it requires very
much work... The file structure:
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
<EFBFBD>Loader<EFBFBD>EXE1<EFBFBD>EXE1's <20>EXE2<45>EXE2's <20>
<EFBFBD> <20>body<64>overlay<61>body<64>overlay<61>...
<EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
<EFBFBD> <20> Overlay I <20> Overlay II <20>...
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
The 'soul' of this kind of loader is the redirected set of DOS functions.
Among others, the 'File open' function (3dh) must be revised. If a running
program wants to open itself, an appropriate file handle must be given
back (... which should be a real, valid file handle; it initially points
in the loader's overlay area to the beginning of the current process' EXE
header...) Other functions to take over:
seek (42h), close (3e), read (3f),
write (40h), TSR exit (27h),
normal exit (4c), and the exec.
Most of these must check whether the call refers to the EXE itself or an
external file. How to notice that a program wants to open itself? At least
two ways must be maintained:
1. Check by the original filename
2. Check by the enironment string
Actually I developed this system for putting HUGE demo parts together, not
simple routines...
My chainer program is able to handle self-overlaying files. It can put into
one file, for example, the followings: Verses, Hell, Timeless, No!, Epsilon,
Doom, Face, and Scream Tracker. Also it was able to make a single EXE from
the Project Angel unlinked version's EXE files. (Well, with a minor modifi-
cation - right, Walken? :-) The reasons that I didn't include it to
Imphobia:
a) the source is too ugly at the moment and partially uncommented,
b) the exapmle program for Imphobia (in my opinion) should be a nice demo
effect, or at least something visible, not some creepy system code. If You
want it, mail me, I will send it with the source.
V. Virtual file systems
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
What to do when a lot of data files are in use? Imagine a 'master' program
which copies these into one file, then hooks DOS services (open, read, etc.)
that other programs believe they use the original files - actually they
will use this big file! It's very familiar with the complicated version
of the EXE loader. Let's have a look at the problems of this method (these
apply for the 2nd type EXE loader too):
- Every file handle must be administrated by the 'kernel'.
- The same file can be opened more than once.
- It should be impossible to read when the 'virtual file handle' reached
the end of the appropriate 'virtual file', although the big file is
longer...
Compressed virtual file systems
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
It looks like a simplified version of Stacker. (You know, modrn diskk comprs
soin sftwarez ar nearli foOlproOf...) The 'master' program compresses the
necessary data files and when the demo wants to open one, uncompresses it to
the memory, and until the 'file' is opened, keeps it uncompressed. If the
demo reads the file, the kernel simply copies the required amount of data.
This system is not useful for handling big files because its memory require-
ment.
Ervin/Abaddon

View File

@@ -0,0 +1,52 @@
offset length description comments
----------------------------------------------------------------------
0 word exe file signature usually 4d5a
2 word length of last used sector in file modulo 512
4 word size of file, incl. header in 512-pages
6 word number of relocation table items
8 word size of header in 16-byte paragraphs
a word min. paragraphs needed above program in 16-byte paragraphs
c word max. paragraphs needen above program in 16-byte paragraphs
e word displacement of stack segment in module rel. to start of prog.
10 word contents of SP reg. at entry
12 word checksum 2's complement
14 word contents of IP reg. at entry
16 word displacement of code module rel. to start of prog.
18 word offset to first relocation item in file rel. to start of prog.
1a word overlay number 0 for resident prog.
1c varies variable RESERVED place
varies varies relocation table
varies varies variable RESERVED place
varies varies program and data space
varies varies stack segment
The relocation table is a set of far pointers (eg: 1234:5678h) and it appears
you just add the relocation factor to the value at that address. The relocation
factor is the start segment of where the program is loaded.
Example:
------------------------------------------------
code segment
start:
mov ax,seg _myseg
code ends
_myseg segment
_myseg ends
end start
-------------------------------------------------
Start Stop Length Name Class
00000H 00002H 00003H CODE
00010H 00010H 00000H _MYSEG
-------------------------------------------------
Note that _MYSEG is exactly one segment above CODE.
Generated output is B8 01 00; which is "mov ax,0001"
The fixup table for this file has a single entry, 0000:0001. Thus if the start
of the program begins at segment 3562 then the "mov ax,0001" gets converted to
"mov ax,3563".

View File

@@ -0,0 +1,97 @@
Document ID: Q79259
Product: Microsoft BASIC Compiler
Title: Microsoft Library (.LIB) Format, Created by LIB.EXE
Updated: 27-DEC-1991
Operating System Versions: 6.00 6.00B 7.00 7.10
Operating Systems: MS-DOS
Summary:
This article describes the components of the Microsoft Library Format
(for .LIB files created by the LIB.EXE Library Manager). The Microsoft
Library Format is consistent between compatible Microsoft languages.
While future library utilities will remain backward- compatible with
the older library formats, the actual library format itself is subject
to change. This information is taken from Chapter 5 of the "Microsoft
C Developer's Toolkit Reference," which contains more in-depth
information on the Microsoft Library Format.
This information applies to Microsoft QuickBasic versions 4.0, 4.0b,
and 4.5 for MS-DOS, to Microsoft Basic Compiler versions 6.0 and 6.0b
for MS-DOS, and to Microsoft Basic Professional Development System
(PDS) versions 7.0 and 7.1 for MS-DOS.
More Information:
Library Header Record
---------------------
Object code library .LIB files under MS-DOS always contain blocks of
data in multiples of 512 bytes. The first record in the library is a
library header. This record is structured the same as a Microsoft
object-module-format (MS OMF) record. That is, the first byte of the
record identifies the record's type, and the next two bytes specify
the number of bytes remaining in the record. Note that the length
field is byte-swapped (in other words, the low-order byte precedes the
high-order byte). The record type for this library header is F0 hex
(240 decimal).
Modules in a library always start at the beginning of a page. Page
size is determined by adding three (one for the record type byte and
two for the record length field itself) to the value in the record
length field; thus the library header record always occupies exactly
one page. Legal values for page size are given by the range of 2
through the n, where n is a value from 4 through 15.
The four bytes immediately following the length field are a byte-
swapped long integer specifying the byte offset within the library of
the first block of the dictionary. The next two bytes are a byte-
swapped word field that specifies the number of blocks in the
dictionary. (Note: The Library Manager, LIB.EXE for MS-DOS, cannot
create a library whose dictionary requires more than 251 512-byte
pages.)
The next byte contains flags describing the library. One current flag
definition is "0x01 = case sensitive". This applies to both regular
and extended dictionaries. All other values are reserved for future
use and should be 0. The remaining bytes in the library header record
are not significant. This record deviates from the typical Microsoft
OMF record in that the last byte is not used as a checksum on the rest
of the record.
Object Modules
--------------
The first object module in the library immediately follows the header.
The first object module is followed in turn by all other object
modules in the library. Each module is in Microsoft OMF. Individual
modules are aligned so that they start at the beginning of a new page.
If, as is commonly the case, a module does not occupy a number of
bytes that is exactly a multiple of the page size, then its last block
is padded with as many null bytes as are required to fill it. This
special format is covered in detail in the "C Developer's Toolkit
Reference."
Dictionary Blocks
-----------------
The remaining blocks in the library compose the dictionary. The number
of blocks in the dictionary is given in the library header. Dictionary
length is in 512-byte blocks. Detailed information on the exact
content and format of the dictionary are contained in the "C
Developer's Toolkit Reference."
Extended Dictionary
-------------------
The extended dictionary is optional and indicates dependencies between
modules in the library. Versions of LIB.EXE earlier than version 3.09
do not create an extended dictionary. The extended dictionary is
placed at the end of the library. Again, see the "C Developer's
Toolkit Reference" for details on the structure of the Extended
Dictionary.
Additional reference words: 6.00 6.00b 7.00 7.10 4.00 4.00b 4.50

View File

@@ -0,0 +1,129 @@
Document ID: Q71891
Product: Microsoft EXEMOD, EXEPACK, or LIB Utility
Title: Dictionary Hashing Algorithm Used by the LIB Utility
Updated: 16-MAY-1991
Operating System Versions: 3.0X 3.10 3.11 3.14 3.15 3.17 3.18 | 3.1
Operating Systems: MS-DOS | OS/2
Summary:
The last part of each library produced by the Microsoft Library
Manager (LIB) contains a dictionary that holds all the public symbols
in the library. The hashing algorithm mentioned on page 63 of the
"Microsoft C Developer's Toolkit Reference" is used to place data in
the dictionary. The code required to implement the hashing algorithm
is shown at the end of this article.
More Information:
The library dictionary is divided into pages that are 512 bytes long.
Each page starts with a 37-byte bucket table, which contains 37
separate offsets to the symbols in the rest of the page. The values in
the buckets are multiplied by 2 to get the actual offset (since 1 byte
can contain only 256 different values).
The hashing algorithm analyzes a symbol's name and produces two
indexes (page index and bucket index) and two deltas (page index delta
and bucket index delta). Using the offset contained in the bucket at
bucket index in the page at page index, you must compare the symbol at
that location with the one you are looking for.
If (due to symbol collision) you have not found the correct symbol,
add the bucket index delta to the current bucket index, modulo 37, and
try again. Continue until all the buckets in the current page are
tried. Then, add the page index delta to the current page, modulo by
the page count, and try all the buckets in that page starting at
bucket index. Continue this process until all of the possible page and
offset combinations have been tried.
For more information on the actual format of the symbols in the
dictionary, and information on the format for the rest of the library,
see the "Microsoft C Developer's Toolkit Reference."
Sample Code
-----------
/* This code illustrates the hashing algorithm used by LIB */
/* Compile options needed: none
*/
#include <stdio.h>
#include <string.h>
#include <malloc.h>
#include <stdlib.h>
#define XOR ^
#define MODULO %
char *symbol; /* Symbol to find (or to place) */
int dictlength; /* Dictionary length in pages */
int buckets; /* Number of buckets on one page */
char *pb; /* A pointer to the beginning of the symbol */
char *pe; /* A pointer to the end of the symbol */
int slength; /* Length of the symbol's name */
int page_index; /* Page Index */
int page_index_delta; /* Page Index Delta */
int bucket_index; /* Bucket Index */
int bucket_index_delta; /* Bucket Index Delta */
unsigned c;
void hash(void)
{
page_index = 0;
page_index_delta = 0;
bucket_index = 0;
bucket_index_delta = 0;
while( slength--)
{
c = *(pb++) | 32; /* Convert character to lower case */
page_index = (page_index<<2) XOR c; /* Hash */
bucket_index_delta = (bucket_index_delta>>2) XOR c; /* Hash */
c = *(pe--) | 32;
bucket_index = (bucket_index>>2) XOR c; /* Hash */
page_index_delta = (page_index_delta<<2) XOR c; /* Hash */
}
/* Calculate page index */
page_index = page_index MODULO dictlength;
/* Calculate page index delta */
if( (page_index_delta = page_index_delta MODULO dictlength) == 0)
page_index_delta = 1;
/* Calculate bucket offset */
bucket_index = bucket_index MODULO buckets;
/* Calculate bucket offset delta */
if( (bucket_index_delta = bucket_index_delta MODULO buckets) == 0)
bucket_index_delta = 1;
}
void main(void)
{
int i;
dictlength = 3;
buckets = 37;
if ( (symbol = (char *) malloc( sizeof(char) * 4 )) == NULL )
exit(1);
strcpy( symbol, "one");
for( i = 0; i < 2; i++ ) {
slength = strlen(symbol);
pb = symbol;
pe = symbol + slength ;
hash();
printf("\npage_index: %2d page_index_delta: %d",
page_index, page_index_delta);
printf("\nbucket_index: %2d bucket_index_delta: %d",
bucket_index, bucket_index_delta);
strcpy( symbol, "two");
}

File diff suppressed because it is too large Load Diff

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,320 @@
A.OUT(5) OpenBSD Programmer's Manual A.OUT(5)
NAME
a.out - format of executable binary files
SYNOPSIS
#include <a.out.h>
DESCRIPTION
The include file <a.out.h> declares three structures and several macros.
The structures describe the format of executable machine code files
(``binaries'') on the system.
A binary file consists of up to 7 sections. In order, these sections
are:
exec header Contains parameters used by the kernel to load a binary
file into memory and execute it, and by the link editor
ld(1) to combine a binary file with other binary files.
This section is the only mandatory one.
text segment Contains machine code and related data that are loaded
into memory when a program executes. May be loaded
read-only.
data segment Contains initialized data; always loaded into writable
memory.
text relocations Contains records used by the link editor to update
pointers in the text segment when combining binary
files.
data relocations Like the text relocation section, but for data segment
pointers.
symbol table Contains records used by the link editor to cross ref-
erence the addresses of named variables and functions
(``symbols'') between binary files.
string table Contains the character strings corresponding to the
symbol names.
Every binary file begins with an exec structure:
struct exec {
u_int32_t a_midmag;
u_int32_t a_text;
u_int32_t a_data;
u_int32_t a_bss;
u_int32_t a_syms;
u_int32_t a_entry;
u_int32_t a_trsize;
u_int32_t a_drsize;
};
The fields have the following functions:
a_midmag This field is stored in network byte-order so that binaries for
machines with alternate byte orders can be distinguished. It
has a number of sub-components accessed by the macros
N_GETFLAG(), N_GETMID(),and N_GETMAGIC(), and set by the macro
N_SETMAGIC().
The macro N_GETFLAG()() returns a few flags:
EX_DYNAMIC Indicates that the executable requires the services
of the run-time link editor.
EX_PIC Indicates that the object contains position inde-
pendent code. This flag is set by as(1) when given
the -k flag and is preserved by ld(1) if necessary.
If both EX_DYNAMIC and EX_PIC are set, the object file is a po-
sition independent executable image (e.g., a shared library),
which is to be loaded into the process address space by the
run-time link editor.
The macro N_GETMID() returns the machine-id. This indicates
which machine(s) the binary is intended to run on.
N_GETMAGIC() specifies the magic number, which uniquely identi-
fies binary files and distinguishes different loading conven-
tions. The field must contain one of the following values:
OMAGIC The text and data segments immediately follow the head-
er and are contiguous. The kernel loads both text and
data segments into writable memory.
NMAGIC As with OMAGIC, text and data segments immediately fol-
low the header and are contiguous. However, the kernel
loads the text into read-only memory and loads the data
into writable memory at the next page boundary after
the text.
ZMAGIC The kernel loads individual pages on demand from the
binary. The header, text segment and data segment are
all padded by the link editor to a multiple of the page
size. Pages that the kernel loads from the text seg-
ment are read-only, while pages from the data segment
are writable.
a_text Contains the size of the text segment in bytes.
a_data Contains the size of the data segment in bytes.
a_bss Contains the number of bytes in the ``bss segment'' and is used
by the kernel to set the initial break (brk(2)) after the data
segment. The kernel loads the program so that this amount of
writable memory appears to follow the data segment and initial-
ly reads as zeroes.
a_syms Contains the size in bytes of the symbol table section.
a_entry Contains the address in memory of the entry point of the pro-
gram after the kernel has loaded it; the kernel starts the exe-
cution of the program from the machine instruction at this ad-
dress.
a_trsize Contains the size in bytes of the text relocation table.
a_drsize Contains the size in bytes of the data relocation table.
The a.out.h include file defines several macros which use an exec struc-
ture to test consistency or to locate section offsets in the binary file.
N_BADMAG(exec) Non-zero if the a_magic field does not contain a recog-
nized value.
N_TXTOFF(exec) The byte offset in the binary file of the beginning of
the text segment.
N_SYMOFF(exec) The byte offset of the beginning of the symbol table.
N_STROFF(exec) The byte offset of the beginning of the string table.
Relocation records have a standard format which is described by the
relocation_info structure:
struct relocation_info {
int r_address;
unsigned int r_symbolnum : 24,
r_pcrel : 1,
r_length : 2,
r_extern : 1,
r_baserel : 1,
r_jmptable : 1,
r_relative : 1,
r_copy : 1;
};
The relocation_info fields are used as follows:
r_address Contains the byte offset of a pointer that needs to be link-
edited. Text relocation offsets are reckoned from the start
of the text segment, and data relocation offsets from the
start of the data segment. The link editor adds the value
that is already stored at this offset into the new value
that it computes using this relocation record.
r_symbolnum Contains the ordinal number of a symbol structure in the
symbol table (it is not a byte offset). After the link edi-
tor resolves the absolute address for this symbol, it adds
that address to the pointer that is undergoing relocation.
(If the r_extern bit is clear, the situation is different;
see below.)
r_pcrel If this is set, the link editor assumes that it is updating
a pointer that is part of a machine code instruction using
pc-relative addressing. The address of the relocated point-
er is implicitly added to its value when the running program
uses it.
r_length Contains the log base 2 of the length of the pointer in
bytes; 0 for 1-byte displacements, 1 for 2-byte displace-
ments, 2 for 4-byte displacements.
r_extern Set if this relocation requires an external reference; the
link editor must use a symbol address to update the pointer.
When the r_extern bit is clear, the relocation is ``local'';
the link editor updates the pointer to reflect changes in
the load addresses of the various segments, rather than
changes in the value of a symbol (except when r_baserel is
also set, see below). In this case, the content of the
r_symbolnum field is an n_type value (see below); this type
field tells the link editor what segment the relocated
pointer points into.
r_baserel If set, the symbol, as identified by the r_symbolnum field,
is to be relocated to an offset into the Global Offset
Table. At run-time, the entry in the Global Offset Table at
this offset is set to be the address of the symbol.
r_jmptable If set, the symbol, as identified by the r_symbolnum field,
is to be relocated to an offset into the Procedure Linkage
Table.
r_relative If set, this relocation is relative to the (run-time) load
address of the image this object file is going to be a part
of. This type of relocation only occurs in shared objects.
r_copy If set, this relocation record identifies a symbol whose
contents should be copied to the location given in
r_address. The copying is done by the run-time link editor
from a suitable data item in a shared object.
Symbols map names to addresses (or more generally, strings to values).
Since the link editor adjusts addresses, a symbol's name must be used to
stand for its address until an absolute value has been assigned. Symbols
consist of a fixed-length record in the symbol table and a variable-
length name in the string table. The symbol table is an array of nlist
structures:
struct nlist {
union {
char *n_name;
long n_strx;
} n_un;
unsigned char n_type;
char n_other;
short n_desc;
unsigned long n_value;
};
The fields are used as follows:
n_un.n_strx Contains a byte offset into the string table for the name of
this symbol. When a program accesses a symbol table with
the nlist(3) function, this field is replaced with the
n_un.n_name field, which is a pointer to the string in memo-
ry.
n_type Used by the link editor to determine how to update the sym-
bol's value. The n_type field is broken down into three
sub-fields using bitmasks. The link editor treats symbols
with the N_EXT type bit set as ``external'' symbols and per-
mits references to them from other binary files. The N_TYPE
mask selects bits of interest to the link editor:
N_UNDF An undefined symbol. The link editor must locate an
external symbol with the same name in another binary
file to determine the absolute value of this symbol.
As a special case, if the n_value field is non-zero
and no binary file in the link-edit defines this
symbol, the link editor will resolve this symbol to
an address in the bss segment, reserving an amount
of bytes equal to n_value. If this symbol is unde-
fined in more than one binary file and the binary
files do not agree on the size, the link editor
chooses the greatest size found across all binaries.
N_ABS An absolute symbol. The link editor does not update
an absolute symbol.
N_TEXT A text symbol. This symbol's value is a text ad-
dress and the link editor will update it when it
merges binary files.
N_DATA A data symbol; similar to N_TEXT but for data ad-
dresses. The values for text and data symbols are
not file offsets but addresses; to recover the file
offsets, it is necessary to identify the loaded ad-
dress of the beginning of the corresponding section
and subtract it, then add the offset of the section.
N_BSS A bss symbol; like text or data symbols but has no
corresponding offset in the binary file.
N_FN A filename symbol. The link editor inserts this
symbol before the other symbols from a binary file
when merging binary files. The name of the symbol
is the filename given to the link editor, and its
value is the first text address from that binary
file. Filename symbols are not needed for link
editing or loading, but are useful for debuggers.
The N_STAB mask selects bits of interest to symbolic debug-
gers such as gdb(1); the values are described in stab(5).
n_other This field provides information on the nature of the symbol
independent of the symbol's location in terms of segments as
determined by the n_type field. Currently, the lower 4 bits
of the n_other field hold one of two values: AUX_FUNC and
AUX_OBJECT (see <link.h> for their definitions). AUX_FUNC
associates the symbol with a callable function, while
AUX_OBJECT associates the symbol with data, irrespective of
their locations in either the text or the data segment.
This field is intended to be used by ld(1) for the construc-
tion of dynamic executables.
n_desc Reserved for use by debuggers; passed untouched by the link
editor. Different debuggers use this field for different
purposes.
n_value Contains the value of the symbol. For text, data and bss
symbols, this is an address; for other symbols (such as de-
bugger symbols), the value may be arbitrary.
The string table consists of an u_int32_t length followed by null-termi-
nated symbol strings. The length represents the size of the entire table
in bytes, so its minimum value (or the offset of the first string) is al-
ways 4 on 32-bit machines.
SEE ALSO
as(1), gdb(1), ld(1), brk(2), execve(2), nlist(3), core(5),
link(5), stab(5)
HISTORY
The a.out.h include file appeared in Version 7 AT&T UNIX.
BUGS
Nobody seems to agree on what bss stands for.
New binary file formats may be supported in the future, and they probably
will not be compatible at any level with this ancient format.
OpenBSD 2.6 June 5, 1993 5

Binary file not shown.

View File

@@ -0,0 +1,86 @@
===========================================================================
From: BRIAN FRASER Refer#: NONE
To: MATHIEU BOUCHARD Recvd: NO
Subj: .SYS format. Conf: (99) 80xxxProgr
---------------------------------------------------------------------------
Main Header:
00h word - Link to next driver, offset
02h word - Link to next driver, segment
04h word - Device Attribute
06h word - Strategy entry point, offset
0ah word - interrupt entry point, offset
-- Character device --
0ch 8 bytes - Logical Name
-- Block device --
0ch byte - Number of units
Header Attribute word:
bit 15 - 1= Character device; 0= Block device
bit 14 - 1= IOCTL read and write supported
-- Character device --
bit 13 - 1= Output until busy supported
-- Block device --
bit 13 - 1= Check BIOS to determine media characteristics; 0= Media ID
should be used instead
bit 12 - should be 0
bit 11 - 1= if open/close/removable media supported
bit 7-10 - 0
bit 6 - 1= if generic IOCTL and get/set logical drive supported
bit 5 - 0
bit 4 - 1= if CON driver and int 29h fast-output supported
bit 3 - 1= if current CLOCK$ device
bit 2 - 1= if current NULL device
-- Character device --
bit 1 - 1= if standard output device (stdout)
-- Block device --
bit 1 - 1= if 32bit sector addressing supported
bit 0 - 1= if current standard input device (stdin)
Strategy Request Header:
00h byte - length of request header
01h byte - unit number for this request
02h byte - request headers command code
03h word - drivers return status
05h 8 bytes - ? (reserved)
The rest of the header varies depending on what function is being called.
I would think it's best to find a book, as I don't really want to type out all
the different headers for each function. :)
This book I am using is Advanced MS-DOS, Second Ed. Provided, it's a little out
of date, but alot of the information is still the same. Plus, I got it for 8
bucks.. Can't complain for that price! :) Check out the book list.
Heres just a little info on what the above headers are for...
There are two different kinds of device drivers. Character, and Block.
Character devices handle 1 character at a time, while Block devices deal with
Blocks of data. Character devices can have a logical name like "MYSYS", which
can be used like "CON" or "PRN" etc.. Block devices use units (drives), which
are assigned upon install.
The Main Header is the first few bytes of the SYS file, The link to next driver
is to be -1:-1 (or FFFF:FFFF) unless there is more then one driver in this SYS
file, then you set this to the next driver in the chain. BUT, the last driver
must have FFFF:FFFF as the next driver, or you have big problems! :)
The Device attribute is fairly strate forward.
The strategy routine is a routine that is called my DOS with the address of the
Request Header. All this routine has to do is save the address in a local
memory location.
The interrupt routine is then called after the strategy routine. The interrupt
routine process the request header, and performs the requested function, and
returns.
If you can't find a book.. Maybe I'll type out the return attributes, and the
info for each function.
Brian

File diff suppressed because it is too large Load Diff

Binary file not shown.

View File

@@ -0,0 +1,7 @@
<html>
<head>
<meta http-equiv="refresh" content="0;url=/Linux.old/sabre/os/articles">
</head>
<body lang="zh-CN">
</body>
</html>

View File

@@ -0,0 +1,274 @@
:gdoc sec='Copyright IBM Corp. 1991'.
:prolog.
:docprof
ldrdots='yes'
duplex='no'.
:title.
:tline.IBM OS/2 32 bit Object Module Format (OMF)
:tline.and Linear eXecutable Module Format (LX)
:tline.&rbl.
:tline.Draft 5
:etitle.
.*
:date.
.*
.*
:address.
:aline.Boca Programming Center
:aline.Boca Raton, Florida
:eaddress
:date.
:eprolog.
:frontm.
:tipage.
:lblbox.Purpose of this document
:p.
THIS DOCUMENT PROVIDED BY IBM SHALL BE PROVIDED ON AN "AS IS" BASIS
WITHOUT ANY WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED.
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE EXPRESSLY DISCLAIMED.
:p.
FURTHERMORE, THIS DOCUMENTATION IS IN A PRELIMINARY FORM; IS NOT
COMPLETE; HAS NOT YET BEEN TESTED, VALIDATED OR REVIEWED; MAY CONTAIN
ERRORS, OMISSIONS, INACCURACIES OR THE LIKE; AND IS SUBJECT TO BEING
CHANGED, REVISED OR SUPERSEDED IN WHOLE OR IN PART BY IBM.
IBM DOES NOT ASSUME ANY RESPONSIBILITY TO NOTIFY ANY PARTIES, COMPANIES,
USERS, AND OR OTHERS OF DEFECTS, DEFICIENCIES, CHANGES, ERRORS OR OTHER
FAILINGS OR SHORTCOMING OF THE DOCUMENTATION.
:p.
RECIPIENT'S USE OF THIS DOCUMENT IS LIMITED TO RECIPIENT'S PERSONAL USE
FOR THE SOLE PURPOSE OF CREATING TOOLS FOR THE OS/2:fnref refid=ibm.
OPERATING SYSTEM.
:elblbox
:fn id=ibm.
OS/2 is a Registered Trademark of International Business Machines Corp.
:efn.
:toc.
:figlist.
:revision id=r1 char='|' run=yes
:revision id=r2 char='X' run=yes
:revision id=r3 char='B' run=yes
:revision id=r4 char='D' run=yes
.* :rev refid=r1.
.* :p.This line is marked for revision.
.* :erev refid=r1.
:body.
:lblbox.Major changes to this document
:ul
:rev refid=r1.
:li.Draft 1 = Combined information from several documents into one.
:li.Draft 2 = Added Comments from Lexington and Toronto.
:li.Draft 3 = Added the Linear Executable format (LX).
:eul
:elblbox
:erev refid=r1.
:h1.Introduction
:p.This document is intended to describe the interface that is
used by language translators and generators as their intermediate
:rev refid=r1.
output to the linker for the 32-bit OS/2 operating system.
:erev refid=r1.
The linker will generate the executable module that is used by
the loader to invoke the .EXE and .DLL programs at execution time.
:h1.THE 32-BIT OBJECT MODULE FORMAT
:fig place=inline.
:cgraphic.
Record Format:
All object records conform to the following format:
1 byte 2 byte
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>---
<20>Record <20> Record <20>
<20>Type <20> Length <20>
<20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>---
<------ record length in bytes -------->
<variable length> 1 byte
--<2D><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>Ŀ
<20> Record <20>Chk Sum<75>
<20> Contents <20>or 0 <20>
--<2D><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>
:ecgraphic
:figcap.Standard object module record format
:figdesc.
:p.The Record type field is a 1-byte field containing the
hexadecimal number that identifies the type of object record.
The format is determined by the least significant bit of the
RecTyp field.
Note that this does not govern Use32/Use16
segment attributes; it simply specifies the size of certain numeric
fields within the record.
An odd RecTyp indicates that 32-bit
values are present; the fields affected are described with each
record.
:p.
:rev refid=r1.
An entire record occupies RecLength + 3 bytes.
:erev refid=r1.
The record length does not include the count for the record type and
record length fields.
Unless otherwise noted within the record definition, the record length
should not exceed 1024 bytes.
:p.
The byte sum over the entire record, ignoring overflow, is zero.
:p.
The record contents are determined by the record type.
:efig.
:h2.Frequent Object Record Subfields
:p.
Certain subfields appear frequently; the format of such fields is
described next.
:p.
:h3.Names
:p.Name strings are encoded as an 8-bit unsigned count followed by a
string of &odq.count&cdq. characters. The character set is usually
some ASCII subset. A null name is specified by a single byte
of 0 (indicating a string of length zero).
:p.
:h3.Indexed References
:p.Certain items are ordered by occurrence, and referenced by index
(starting index is 1). Index fields can contain 0, indicating
not-present, or values from 1 through 7FFF. The index is encoded
as 1 or 2 bytes and a 16-bit value is obtained as follows&colon.
.fo off
.in +10
if (first_byte & 0x80)
index_word = (first_byte & 7F) * 0x100 + second_byte;
else
index_word = first_byte
.in -10
.fo on
:p.
:h4.Type indices
:p.
The type index is treated as an index field when a record is parsed
(occupies one or two bytes, occurs in PUBDEF, COMDEF,
EXTDEF records).
They are encoded as described under indexed references.
:p.
NOTE&colon. At present, no type checking is done by the linker.
If any link-time semantics are defined, that information will
be recorded somewhere within this document.
:p.
:h4.Ordered Collections
:p.
Certain records and record groups are ordered; the ordering is
obtained from the order of the record types within the file together
with the ordering of repeated fields within these records.
Such ordered collections are referenced by index, counting from 1
(index 0 indicates unknown or decline-to-state).
:p.The ordered collections are&colon.
:ul.
:li.NAMES: ordered by LNAMES record and names within each.
Referenced as a Name Index.
:li.LOGICAL SEGMENTS: ordered by SEGDEF records in file.
Referenced as a Segment Index.
:li.GROUPS: ordered by GRPDEF of records in file.
Referenced as a Group Index.
:rev refid=r1.
:li.External symbols: ordered by EXTDEF and COMDEF
:erev refid=r1.
records and symbols within each.
Referenced as an External Index (in FIXUPs).
:eul.
:p.
:h3.Numeric 2 and 4 byte fields
:p.Words and double words (16 and 32 bit quantities) are stored
in Intel byte order (lowest address is least significant).
:p.Certain records, notably SEGDEF, PUBDEF, LINNUM, LEDATA,
LIDATA, FIXUPP and MODEND, contain size, offset, and
displacement values which may be 32 bit quantities for Use32 segments.
The encoding is as follows.
:ul.
:li.When the least significant bit of the record type byte is
set (ie record type is an odd number), the numeric fields are 4 bytes.
:li.When the least significant bit of the record type byte is
clear, the fields occupy 2 bytes (16 bit Object Module Format).
The values are zero-extended when applied to Use32 segments.
:eul.
:p.See the description of SEGDEF records for an explanation of
Use16/Use32 segments.
.***************
:h2.Order of records
:p.
The record order is chosen so that bind/link passes through an object
module are minimized. This differs from the previous
less specific ordering in that all symbolic information (in particular,
all export and public symbols) must occur at the start of the object
module.
This order is recommended but not mandatory.
.cp 1i
:ol.
:lp.:hp1.Identifier record(s)&colon.:ehp1.
:li.:hp2.Must be the first record.:ehp2.
:rev refid=r1.
:li.THEADR
:erev refid=r1.
.sk
:lp.:hp1.Records processed by Link Pass one&colon.:ehp1.
:li.:hp2.May occur in any order but must precede the Link pass separator
if it is present.:ehp2.
:li.COMENT identifying object format and extensions
:li.COMENT any, other than link pass separator comment.
:li.LNAMES providing ordered name list
:li.SEGDEF providing ordered list of program segments
:li.GRPDEF providing ordered list of logical segments
:li.PUBDEF locating and naming public symbols
:li.COMDEF and EXTDEF records.
:ul.
:li.This group of records is indexed together, so External Index fields
in FIXUPP records may refer to any of the record types listed.
:eul.
.sk
:lp.:hp1.Link pass separator (optional)&colon.:ehp1.
:li.COMENT class A2 indicating that pass 1 of the linker is complete.
When this record is encountered, LINK immediately starts Pass 2; no
records after this comment are read in Pass 1.
All the above listed records must come before this comment record.
:p.For greater linking speed, all LIDATA, LEDATA, FIXUPP and
LINNUM records should come after the A2 comment record, but this is
not required.
In LINK, Pass 2 begins again at the start of the object module, so
LIDATA records, etc., are processed in Pass 2 no matter where they
are placed in the object module.
.sk
:lp.:hp1.Records ignored by link pass one and processed by link pass
two&colon.:ehp1.
:li.LIDATA, LEDATA or COMDAT records followed by applicable FIXUPP records.
:li.FIXUPPs containing THREADs only.
:li.LINNUM providing line number to program code
or data association.
.br
:lp.:hp1.Terminator:ehp1.
:li.MODEND indicating end of module with optional start address.
:eol.
.********************
.** embedded record types in type sequence
.im theadr
.im coment
.im modend
.im extdef
.im pubdef
.im linnum
.im lnames
.im segdef
.im grpdef
.im fixupp
.im ledata
.im lidata
.im comdef
.im comdat
.im exe1
.im exe2
.im exe3
.pa
.pa
:egdoc.