add directory study

This commit is contained in:
gohigh
2024-02-19 00:25:23 -05:00
parent b1306b38b1
commit f3774e2f8c
4001 changed files with 2285787 additions and 0 deletions

View File

@@ -0,0 +1,43 @@
# @(#) Makefile, Ver. 2.1 created 00:00:00 87/09/01
# Makefile for 8088 symbolic disassembler
# Copyright (C) 1987 G. M. Harding, all rights reserved.
# Permission to copy and redistribute is hereby granted,
# provided full source code, with all copyright notices,
# accompanies any redistribution.
# This Makefile automates the process of compiling and linking
# a symbolic object-file disassembler program for the Intel
# 8088 CPU. Relatively machine-independent code is contained in
# the file dismain.c; lookup tables and handler routines, which
# are by their nature machine-specific, are contained in two
# files named distabs.c and dishand.c, respectively. (A third
# machine-specific file, disfp.c, contains handler routines for
# floating-point coprocessor opcodes.) A header file, dis.h,
# attempts to mediate between the machine-specific and machine-
# independent portions of the code. An attempt has been made to
# isolate machine dependencies and to deal with them in fairly
# straightforward ways. Thus, it should be possible to target a
# different CPU by rewriting the handler routines and changing
# the initialization data in the lookup tables. It should not
# be necessary to alter the formats of the tables.
OBJ = disrel.s dismain.s distabs.s dishand.s disfp.s
dis88 : $(OBJ)
cc -o dis88 $(OBJ)
disrel.s : disrel.c
dismain.s : dismain.c dis.h
distabs.s : distabs.c dis.h
dishand.s : dishand.c dis.h
disfp.s : disfp.c dis.h
clean:
@rm -f *.bak *.s dis88

View File

@@ -0,0 +1,239 @@
dis88
Beta Release
87/09/01
---
G. M. HARDING
POB 4142
Santa Clara CA 95054-0142
"Dis88" is a symbolic disassembler for the Intel 8088 CPU,
designed to run under the PC/IX operating system on an IBM XT
or fully-compatible clone. Its output is in the format of, and
is completely compatible with, the PC/IX assembler, "as". The
program is copyrighted by its author, but may be copied and re-
distributed freely provided that complete source code, with all
copyright notices, accompanies any distribution. This provision
also applies to any modifications you may make. You are urged
to comment such changes, giving, as a miminum, your name and
complete address.
This release of the program is a beta release, which means
that it has been extensively, but not exhaustively, tested.
User comments, recommendations, and bug fixes are welcome. The
principal features of the current release are:
(a) The ability to disassemble any file in PC/IX object
format, making full use of symbol and relocation information if
it is present, regardless of whether the file is executable or
linkable, and regardless of whether it has continuous or split
I/D space;
(b) Automatic generation of synthetic labels when no sym-
bol table is available; and
(c) Optional output of address and object-code informa-
tion as assembler comment text.
Limitations of the current release are:
(a) Numeric co-processor (i.e., 8087) mnemonics are not
supported. Instructions for the co-processor are disassembled
as CPU escape sequences, or as interrupts, depending on how
they were assembled in the first place. This limitation will be
addressed in a future release.
(b) Symbolic references within the object file's data
segment are not supported. Thus, for example, if a data segment
location is initialized to point to a text segment address, no
reference to a text segment symbol will be detected. This limi-
tation is likely to remain in future releases, because object
code does not, in most cases, contain sufficient information to
allow meaningful interpretation of pure data. (Note, however,
that symbolic references to the data segment from within the
text segment are always supported.)
As a final caveat, be aware that the PC/IX assembler does
not recognize the "esc" mnemonic, even though it refers to a
completely valid CPU operation which is documented in all the
Intel literature. Thus, the corresponding opcodes (0xd8 through
0xdf) are disassembled as .byte directives. For reference, how-
ever, the syntactically-correct "esc" instruction is output as
a comment.
To build the disassembler program, transfer all the source
files, together with the Makefile, to a suitable (preferably
empty) PC/IX directory. Then, simply type "make".
To use dis88, place it in a directory which appears in
your $PATH list. It may then be invoked by name from whatever
directory you happen to be in. As a minimum, the program must
be invoked with one command-line argument: the name of the ob-
ject file to be disassembled. (Dis88 will complain if the file
specified is not an object file.) Optionally, you may specify
an output file; stdout is the default. One command-line switch
is available: "-o", which makes the program display addresses
and object code along with its mnemonic disassembly.
The "-o" option is useful primarily for verifying the cor-
rectness of the program's output. In particular, it may be used
to check the accuracy of local relative jump opcodes. These
jumps often target local labels, which are lost at assembly
time; thus, the disassembly may contain cryptic instructions
like "jnz .+39". As a user convenience, all relative jump and
call opcodes are output with a comment which identifies the
physical target address.
By convention, the release level of the program as a whole
is the SID of the file disrel.c, and this SID string appears in
each disassembly. Release 2.1 of the program is the first beta
release to be distributed on Usenet.
.TH dis88 1 LOCAL
.SH "NAME"
dis88 \- 8088 symbolic disassembler
.SH "SYNOPSIS"
\fBdis88\fP [ -o ] ifile [ ofile ]
.SH "DESCRIPTION"
Dis88 reads ifile, which must be in PC/IX a.out format.
It interprets the binary opcodes and data locations, and
writes corresponding assembler source code to stdout, or
to ofile if specified. The program's output is in the
format of, and fully compatible with, the PC/IX assembler,
as(1). If a symbol table is present in ifile, labels and
references will be symbolic in the output. If the input
file lacks a symbol table, the fact will be noted, and the
disassembly will proceed, with the disassembler generating
synthetic labels as needed. If the input file has split
I/D space, or if it is executable, the disassembler will
make all necessary adjustments in address-reference calculations.
.PP
If the "-o" option appears, object code will be included
in comments during disassembly of the text segment. This
feature is used primarily for debugging the disassembler
itself, but may provide information of passing interest
to users.
.PP
The program always outputs the current machine address
before disassembling an opcode. If a symbol table is
present, this address is output as an assembler comment;
otherwise, it is incorporated into the synthetic label
which is generated internally. Since relative jumps,
especially short ones, may target unlabelled locations,
the program always outputs the physical target address
as a comment, to assist the user in following the code.
.PP
The text segment of an object file is always padded to
an even machine address. In addition, if the file has
split I/D space, the text segment will be padded to a
paragraph boundary (i.e., an address divisible by 16).
As a result of this padding, the disassembler may produce
a few spurious, but harmless, instructions at the
end of the text segment.
.PP
Disassembly of the data segment is a difficult matter.
The information to which initialized data refers cannot
be inferred from context, except in the special case
of an external data or address reference, which will be
reflected in the relocation table. Internal data and
address references will already be resolved in the object file,
and cannot be recreated. Therefore, the data
segment is disassembled as a byte stream, with long
stretches of null data represented by an appropriate
".zerow" pseudo-op. This limitation notwithstanding,
labels (as opposed to symbolic references) are always
output at appropriate points within the data segment.
.PP
If disassembly of the data segment is difficult, disassembly of the
bss segment is quite easy, because uninitialized data is all
zero by definition. No data
is output in the bss segment, but symbolic labels are
output as appropriate.
.PP
For each opcode which takes an operand, a particular
symbol type (text, data, or bss) is appropriate. This
tidy correspondence is complicated somewhat, however,
by the existence of assembler symbolic constants and
segment override opcodes. Therefore, the disassembler's
symbol lookup routine attempts to apply a certain amount
of intelligence when it is asked to find a symbol. If
it cannot match on a symbol of the preferred type, it
may return a symbol of some other type, depending on
preassigned (and somewhat arbitrary) rankings within
each type. Finally, if all else fails, it returns a
string containing the address sought as a hex constant;
this behavior allows calling routines to use the output
of the lookup function regardless of the success of its
search.
.PP
It is worth noting, at this point, that the symbol lookup
routine operates linearly, and has not been optimized in
any way. Execution time is thus likely to increase
geometrically with input file size. The disassembler is
internally limited to 1500 symbol table entries and 1500
relocation table entries; while these limits are generous
(/unix, itself, has fewer than 800 symbols), they are not
guaranteed to be adequate in all cases. If the symbol
table or the relocation table overflows, the disassembly
aborts.
.PP
Finally, users should be aware of a bug in the assembler,
which causes it not to parse the "esc" mnemonic, even
though "esc" is a completely legitimate opcode which is
documented in all the Intel literature. To accommodate
this deficiency, the disassembler translates opcodes of
the "esc" family to .byte directives, but notes the
correct mnemonic in a comment for reference.
.PP
In all cases, it should be possible to submit the output
of the disassembler program to the assembler, and assemble
it without error. In most cases, the resulting object
code will be identical to the original; in any event, it
will be functionally equivalent.
.SH "SEE ALSO"
adb(1), as(1), cc(1), ld(1).
.br
"Assembler Reference Manual" in the PC/IX Programmer's
Guide.
.SH "DIAGNOSTICS"
"can't access input file" if the input file cannot be
found, opened, or read.
.sp
"can't open output file" if the output file cannot be
created.
.sp
"warning: host/cpu clash" if the program is run on a
machine with a different CPU.
.sp
"input file not in object format" if the magic number
does not correspond to that of a PC/IX object file.
.sp
"not an 8086/8088 object file" if the CPU ID of the
file header is incorrect.
.sp
"reloc table overflow" if there are more than 1500
entries in the relocation table.
.sp
"symbol table overflow" if there are more than 1500
entries in the symbol table.
.sp
"lseek error" if the input file is corrupted (should
never happen).
.sp
"warning: no symbols" if the symbol table is missing.
.sp
"can't reopen input file" if the input file is removed
or altered during program execution (should never happen).
.SH "BUGS"
Numeric co-processor (i.e., 8087) mnemonics are not currently supported.
Instructions for the co-processor are
disassembled as CPU escape sequences, or as interrupts,
depending on how they were assembled in the first place.
.sp
Despite the program's best efforts, a symbol retrieved
from the symbol table may sometimes be different from
the symbol used in the original assembly.
.sp
The disassembler's internal tables are of fixed size,
and the program aborts if they overflow.

View File

@@ -0,0 +1,210 @@
/*
** @(#) dis.h, Ver. 2.1 created 00:00:00 87/09/01
*/
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* *
* Copyright (C) 1987 G. M. Harding, all rights reserved *
* *
* Permission to copy and redistribute is hereby granted, *
* provided full source code, with all copyright notices, *
* accompanies any redistribution. *
* *
* This file contains declarations and definitions used by *
* the 8088 disassembler program. The program was designed *
* for execution on a machine of its own type (i.e., it is *
* not designed as a cross-disassembler); consequently, A *
* SIXTEEN-BIT INTEGER SIZE HAS BEEN ASSUMED. This assump- *
* tion is not particularly important, however, except in *
* the machine-specific portions of the code (i.e., the *
* handler routines and the optab[] array). It should be *
* possible to override this assumption, for execution on *
* 32-bit machines, by use of a pre-processor directive *
* (see below); however, this has not been tested. *
* *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
#include <sys/types.h>
#include <a.out.h> /* Object file format definitions */
#include <fcntl.h> /* System file-control definitions */
#include <unistd.h>
#include <string.h>
#include <stdio.h> /* System standard I/O definitions */
#if i8086 || i8088 /* For CPU's with 16-bit integers */
#undef int
#else /* Defaults (for 32-bit CPU types) */
#define int short
#endif
#define MAXSYM 1500 /* Maximum entries in symbol table */
extern struct nlist /* Array to hold the symbol table */
symtab[MAXSYM];
extern struct reloc /* Array to hold relocation table */
relo[MAXSYM];
extern int symptr; /* Index into the symtab[] array */
extern int relptr; /* Index into the relo[] array */
struct opcode /* Format for opcode data records */
{
char *text; /* Pointer to mnemonic text */
void (*func)(); /* Pointer to handler routine */
unsigned min; /* Minimum # of object bytes */
unsigned max; /* Maximum # of object bytes */
};
extern struct opcode /* Array to hold the opcode table */
optab[256];
/*
+---------------------------------------------
| The following functions are the specialized
| handlers for each opcode group. They are, of
| course, highly MACHINE-SPECIFIC. Each entry
| in the opcode[] array contains a pointer to
| one of these handlers. The handlers in the
| first group are in dishand.c; those in the
| second group are in disfp.c.
+---------------------------------------------
*/
extern void dfhand(), /* Default handler routine */
sbhand(), /* Single-byte handler */
aohand(), /* Arithmetic-op handler */
sjhand(), /* Short-jump handler */
imhand(), /* Immediate-operand handler */
mvhand(), /* Simple move handler */
mshand(), /* Segreg-move handler */
pohand(), /* Pop memory/reg handler */
cihand(), /* Intersegment call handler */
mihand(), /* Immediate-move handler */
mqhand(), /* Quick-move handler */
tqhand(), /* Quick-test handler */
rehand(), /* Return handler */
mmhand(), /* Move-to-memory handler */
srhand(), /* Shift and rotate handler */
aahand(), /* ASCII-adjust handler */
iohand(), /* Immediate port I/O handler */
ljhand(), /* Long-jump handler */
mahand(), /* Misc. arithmetic handler */
mjhand(); /* Miscellaneous jump handler */
extern void eshand(), /* Bus-escape opcode handler */
fphand(), /* Floating-point handler */
inhand(); /* Interrupt-opcode handler */
extern char *REGS[]; /* Table of register names */
extern char *REGS0[]; /* Mode 0 register name table */
extern char *REGS1[]; /* Mode 1 register name table */
#define AL REGS[0] /* CPU register manifests */
#define CL REGS[1]
#define DL REGS[2]
#define BL REGS[3]
#define AH REGS[4]
#define CH REGS[5]
#define DH REGS[6]
#define BH REGS[7]
#define AX REGS[8]
#define CX REGS[9]
#define DX REGS[10]
#define BX REGS[11]
#define SP REGS[12]
#define BP REGS[13]
#define SI REGS[14]
#define DI REGS[15]
#define ES REGS[16]
#define CS REGS[17]
#define SS REGS[18]
#define DS REGS[19]
#define BX_SI REGS0[0]
#define BX_DI REGS0[1]
#define BP_SI REGS0[2]
#define BP_DI REGS0[3]
extern int symrank[6][6]; /* Symbol type/rank matrix */
extern unsigned long PC; /* Current program counter */
extern int segflg; /* Flag: segment override in effect */
extern int objflg; /* Flag: output object as a comment */
#define OBJMAX 8 /* Size of the object code buffer */
extern unsigned char /* Internal buffer for object code */
objbuf[OBJMAX];
extern void objini(), /* Object-buffer init routine */
objout(); /* Object-code output routine */
extern int objptr; /* Index into the objbuf[] array */
extern void badseq(); /* Bad-code-sequence function */
extern char *getnam(); /* Symbol-name string function */
extern char *lookup(); /* Symbol-table lookup function */
extern int lookext(); /* Extern-definition lookup routine */
extern char *mtrans(); /* Interpreter for the mode byte */
extern void mtrunc(); /* Mode string truncator function */
extern char ADD[], /* Opcode family mnemonic strings */
OR[],
ADC[],
SBB[],
AND[],
SUB[],
XOR[],
CMP[],
NOT[],
NEG[],
MUL[],
DIV[],
MOV[],
ESC[],
TEST[],
AMBIG[];
extern char *OPFAM[]; /* Indexed mnemonic family table */
extern struct exec HDR; /* Holds the object file's header */
#define LOOK_ABS 0 /* Arguments to lookup() function */
#define LOOK_REL 1
#define LOOK_LNG 2
#define TR_STD 0 /* Arguments to mtrans() function */
#define TR_SEG 8
/* Macro for byte input primitive */
#define FETCH(p) \
++PC; p = getchar() & 0xff; objbuf[objptr++] = p
#ifdef OBSOLETE /* Declarations to use if headers */
/* are inadequate. sprintf() and */
/* strlen() may have the wrong type.*/
extern int close(); /* System file-close primitive */
extern long lseek(); /* System file-position primitive */
extern int open(); /* System file-open primitive */
extern int read(); /* System file-read primitive */
extern char *strcat(); /* Library string-join function */
extern char *strcpy(); /* Library string-copy function */
extern int strlen(); /* Library string-length function */
#endif
/* extern int sprintf(); /* Library string-output function */
/* extern int printf(); /* Library output-format function */
/* extern int fprintf(); /* Library file-output function */
/* * * * * * * * * * * END OF dis.h * * * * * * * * * * */

View File

@@ -0,0 +1,157 @@
static char *sccsid =
"@(#) disfp.c, Ver. 2.1 created 00:00:00 87/09/01";
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* *
* Copyright (C) 1987 G. M. Harding, all rights reserved *
* *
* Permission to copy and redistribute is hereby granted, *
* provided full source code, with all copyright notices, *
* accompanies any redistribution. *
* *
* This file contains handler routines for the numeric op- *
* codes of the 8087 co-processor, as well as a few other *
* opcodes which are related to 8087 emulation. *
* *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
#include "dis.h" /* Disassembler declarations */
#define FPINT0 0xd8 /* Floating-point interrupts */
#define FPINT1 0xd9
#define FPINT2 0xda
#define FPINT3 0xdb
#define FPINT4 0xdc
#define FPINT5 0xdd
#define FPINT6 0xde
#define FPINT7 0xdf
/* Test for floating opcodes */
#define ISFLOP(x) \
(((x) >= FPINT0) && ((x) <= FPINT7))
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* *
* This is the handler for the escape family of opcodes. *
* These opcodes place the contents of a specified memory *
* location on the system bus, for access by a peripheral *
* or by a co-processor such as the 8087. (The 8087 NDP is *
* accessed only via bus escapes.) Due to a bug in the *
* PC/IX assembler, the "esc" mnemonic is not recognized; *
* consequently, escape opcodes are disassembled as .byte *
* directives, with the appropriate mnemonic and operand *
* included as a comment. FOR NOW, those escape sequences *
* corresponding to 8087 opcodes are treated as simple *
* escapes. *
* *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
void
eshand(j)
register int j; /* Pointer to optab[] entry */
{/* * * * * * * * * * START OF eshand() * * * * * * * * * */
register char *a;
register int k;
objini(j);
FETCH(k);
a = mtrans((j & 0xfd),(k & 0xc7),TR_STD);
mtrunc(a);
printf("\t.byte\t0x%02.2x\t\t| esc\t%s\n",j,a);
for (k = 1; k < objptr; ++k)
printf("\t.byte\t0x%02.2x\n",objbuf[k]);
}/* * * * * * * * * * * END OF eshand() * * * * * * * * * * */
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* *
* This is the handler routine for floating-point opcodes. *
* Since PC/IX must accommodate systems with and without *
* 8087 co-processors, it allows floating-point operations *
* to be initiated in either of two ways: by a software *
* interrput whose type is in the range 0xd8 through 0xdf, *
* or by a CPU escape sequence, which is invoked by an op- *
* code in the same range. In either case, the subsequent *
* byte determines the actual numeric operation to be per- *
* formed. However, depending on the method of access, *
* either one or two code bytes will precede that byte, *
* and the fphand() routine has no way of knowing whether *
* it was invoked by interrupt or by an escape sequence. *
* Therefore, unlike all of the other handler routines ex- *
* cept dfhand(), fphand() does not initialize the object *
* buffer, leaving that chore to the caller. *
* *
* FOR NOW, fphand() does not disassemble floating-point *
* opcodes to floating mnemonics, but simply outputs the *
* object code as .byte directives. *
* *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
void
fphand(j)
register int j; /* Pointer to optab[] entry */
{/* * * * * * * * * * START OF fphand() * * * * * * * * * */
register int k;
segflg = 0;
FETCH(k);
printf("\t.byte\t0x%02.2x\t\t| 8087 code sequence\n",
objbuf[0]);
for (k = 1; k < objptr; ++k)
printf("\t.byte\t0x%02.2x\n",objbuf[k]);
/* objout(); FOR NOW */
}/* * * * * * * * * * * END OF fphand() * * * * * * * * * * */
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* *
* This is the handler for variable software interrupt *
* opcodes. It is included in this file because PC/IX im- *
* plements its software floating-point emulation by means *
* of interrupts. Any interrupt in the range 0xd8 through *
* 0xdf is an NDP-emulation interrupt, and is specially *
* handled by the assembler. *
* *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
void
inhand(j)
register int j; /* Pointer to optab[] entry */
{/* * * * * * * * * * START OF inhand() * * * * * * * * * */
register int k;
objini(j);
FETCH(k);
if (ISFLOP(k))
{
fphand(k);
return;
}
printf("%s\t%d\n",optab[j].text,k);
objout();
}/* * * * * * * * * * * END OF inhand() * * * * * * * * * * */

View File

@@ -0,0 +1,30 @@
static char *copyright =
"@(#) Copyright (C) 1987 G. M. Harding, all rights reserved";
static char *sccsid =
"@(#) disrel.c, Ver. 2.1 created 00:00:00 87/09/01";
char *release =
"release 2.1 (MINIX)";
/*
**
** This file documents the major revisions to the 8088 sym-
** bolic disassembler. It also contains the release string
** which is output at the head of each disassembly, and the
** copyright string which must be incorporated in any code
** distribution.
**
** Permission to copy and redistribute is hereby granted,
** provided full source code, with all copyright notices,
** accompanies any redistribution.
**
** REVISION HISTORY:
**
** SEP 87:
** After internal shakeout, released on Usenet.
**
** JUN 88:
** Ported to MINIX
*/