Files
oldlinux-files/gnu/gcc/gcc-2.2.2/README.DWARF
2024-02-19 00:24:47 -05:00

483 lines
22 KiB
Plaintext

Notes on the GNU Implementation of DWARF Debugging Information
--------------------------------------------------------------
Last Updated: Sun Jun 3 09:56:42 1992 by rfg@ncd.com
-----------------------------------------------------
This file describes special and unique aspects of the GNU implementation
of the DWARF debugging information language, as provided in the GCC Version
2 compiler.
For general information about the DWARF debugging information language,
you should obtain the latest DWARF draft specification document developed
by the UNIX International Programming Languages Special Interest Group.
A copy of the this document (in PostScript form) may be obtained either
from me <rfg@ncd.com> or from UNIX International. (See below.) The file
you are looking at now only describes known deviations from the current
UI/PLSIG DWARF specification document (1.0.1) together with those things
which are allowed by the current DWARF specification but which are known
to cause interoperability problems (e.g. with svr4 SDB).
To obtain a copy of the latest DWARF specification document from UNIX
International, use the following procedure:
---------------------------------------------------------------------------
Send mail to archive@ui.org containing the following:
path yourname@your.site
send PUBLIC/dwarf.v1.mm
for the troff source, or
send PUBLIC/dwarf.v1.ps
for the postscript. If you system supports uncompress and uudecode,
you can request that the data be compressed by placing the command
'compress' in the message.
If you have any questions about the archive service, please contact
Shane P. McCarron, UI Project Manager, <s.mccarron@ui.org>.
---------------------------------------------------------------------------
The generation of DWARF debugging information in GCC v2.x has now been
tested rather extensively (in conjunction with the C language front end
only) for m88k, i386, i860, and Sparc targets. GCC's DWARF output (for C)
appears to interoperate well with the standard svr4 SDB debugger on these
kinds of target systems (but of course, there are no guarrantees).
The DWARF generation enhancement for GCC was initially donated to the
Free Software Foundation by Network Computing Devices. (Thanks NCD!)
Additional development and maintenance of dwarfout.c has been largely
supported (i.e. funded) by Intel Corporation. (Thanks Intel!) The code
in dwarfout.c is (of course) covered under the GNU General Public
License (aka `copyleft') just as the rest of GCC is.
If you have questions or comments about the DWARF generation feature
of GCC, please send mail to me <rfg@ncd.com>. I will be happy to
investigate any bugs reported and I may even provide fixes (but of
course, I can make no promises).
The DWARF debugging information produced by GCC may deviate in a few minor
(but perhaps significant) respects from the DWARF debugging information
currently produced by other C compilers. A serious attempt has been made
however to conform to the published specifications, to existing practice,
and to generally accepted norms in the GNU implementation of DWARF.
If you are interested in obtaining more information about DWARF or in
participating in the continuing evolution of DWARF within the UI/PLSIG
group, please contact either myself or the UI/PLSIG chairman, Dan Oldman
<oldman@dg-rtp.dg.com>. The UI/PLSIG welcomes and encourages the
participation of new members who might be interested in discussing debugging
issues in general, and DWARF in particular. There are no dues and you
DO NOT have to be a UI member in order to join the UI/PLSIG. The UI/PLSIG
operates an E-mail mailing list and holds regular meeting in various cities.
If you don't have time to participate actively, but would like to be kept
abrest of recent developments, you con join the UI/PLSIG mailing list and
just listen in on our lively discussions.
(Footnote: Within this file, the term `Debugging Information Entry' will
be abbreviated as `DIE'.)
Release Notes (aka known bugs)
-------------------------------
The AT_bit_offset value for bit-fields whose high-order bit lies smack up
against an "alignment unit" boundary will probably be incorrect. For
example, compiling:
struct S {
int foo:9;
short bar:7;
int last:16;
};
... for an x86 target yields a situation where `bar' lies smack up against
the high-order end of the hypothetical type `short' object which contains it.
In such cases, the AT_bit_offset should be zero, but it may instead be equal
to the size of the containing object (which, for this example, is 16 bits).
--------------------------------
At this time, GCC does not know how to handle the GNU C "nested functions"
extension. (See the GCC manual for more info on this extension to ANSI C.)
--------------------------------
At this time, GCC does not represent inlined instances of inline functions
as called for by the current DWARF draft specification. Support for inlined
instances of inline functions is still "under construction".
Recently, a new approach to the representation of inlined functions (within
DWARF) has been proposed (by me) to the UI/PLSIG. This proposal was well
received, but has not yet been formally approved. It is hoped that this
proposal will be approved by the UI/PLSIG sometime in the future. A future
release of GCC may incorporate the new approach for representing inlined
functions (which I have already implemented). Contact me for further details.
--------------------------------
At this time, GCC does not generate the kind of really precise information
about the exact declared types of entities with signed integral types which
is required by the current DWARF draft specification.
Specifically, the current DWARF draft specification seems to require that
the type of an non-unsigned integral bit-field member of a struct or union
type be represented as either a "signed" type or as a "plain" type,
depending upon the the exact set of keywords that were used in the
type specification for the given bit-field member. It was felt (by the
UI/PLSIG) that this distinction between "plain" and "signed" integral types
could have some significance (in the case of bit-fields) because ANSI C
does not constrain the signedness of a plain bit-field, whereas it does
constrain the signedness of an explicitly "signed" bit-field. For this
reason, the current DWARF specification calls for compilers to produce
type information (for *all* integral typed entities... not just bit-fields)
which explicitly indicates the signedness of the relevant type to be
"signed" or "plain" or "unsigned".
Unfortunately, the GNU DWARF implementation is currently incapable of making
such distinctions.
--------------------------------
Full DWARF support for the GNU C++ language/front-end (aka g++) is not
implemented at this time.
--------------------------------
Known Interoperability Problems
-------------------------------
Although the GNU implementation of DWARF conforms (for the most part) with
the current UI/PLSIG DWARF draft specification, there are a few known cases
where GCC's DWARF output causes confusion in System V Release 4 SDB debuggers
anyway. There cases are described in this section.
--------------------------------
The current DWARF draft specification includes the fundamental type codes
FT_ext_prec_float, FT_complex, FT_dbl_prec_complex, and FT_ext_prec_complex.
Since GNU C is only a C compiler (and since C doesn't provide any "complex"
data types) the only one of these fundamental type codes which GCC ever
generates is FT_ext_prec_float. This fundamental type code is generated
by GCC for the `long double' data type. Unfortunately, due to an apparent
bug, SVR4 SDB can become very confused wherever any attempt is made to
print a variable, parameter, or field whose type was given in terms of
FT_ext_prec_float.
(Actually, SVR4 SDB fails to understand *any* of the four fundamental type
codes mentioned here. This will fact will cause additional problems when
there is a GNU FORTRAN front-end.)
--------------------------------
In general, it appears that SVR4 SDB is not able to effectively ignore
fundamental type codes in the "implementation defined" range. This can
cause problems when a program being debugged uses the `long long' data
type (or the signed or unsigned varieties thereof) because these types
are not defined by ANSI C, and thus, GCC must use its own private fundamental
type codes (from the implementation-defined range) to represent these types.
--------------------------------
General GNU DWARF extensions
----------------------------
In the current DWARF draft specification, no provision is made for providing
accurate information about executable lines which came into the current
compilation unit by way of an include file.
Recently, a scheme for providing accurate information about code in include
files was proposed (by me) to the UI/PLSIG. This scheme was rejected by the
UI/PLSIG for inclusion into the DWARF Version 1 specification, but GNU DWARF
implements this extension anyway.
To understand this GNU DWARF extension, imagine that the sequence of entries
in the .lines section is broken up into several subsections. Each contiguous
sequence of .line entries which relates to a sequence of lines (or statements)
from one particular file (either a `base' file or an `include' file) could
be called a `line entries chunk' (LEC).
For each LEC there is one entry in the .debug_srcinfo section.
Each normal entry in the .debug_srcinfo section consists of two 4-byte
words of data as follows:
(1) The starting address (relative to the entire .line section)
of the first .line entry in the relevant LEC.
(2) The starting address (relative to the entire .debug_sfnames
section) of a NUL terminated string representing the
relevant filename. (This filename name be either a
relative or an absolute filename, depending upon how the
given source file was located during compilation.)
Obviously, each .debug_srcinfo entry allows you to find the relevant filename,
and it also points you to the first .line entry that was generated as a result
of having compiled a given source line from the given source file.
Each subsequent .line entry should also be assumed to have been produced
as a result of compiling yet more lines from the same file. The end of
any given LEC is easily found by looking at the first 4-byte pointer in
the *next* .debug_srcinfo entry. That next .debug_srcinfo entry points
to a new and different LEC, so the preceding LEC (implicitly) must have
ended with the last .line section entry which occurs at the 2 1/2 words
just before the address given in the first pointer of the new .debug_srcinfo
entry.
The following picture may help to clarify this feature. Let's assume that
`LE' stands for `.line entry'. Also, assume that `* 'stands for a pointer.
.line section .debug_srcinfo section .debug_sfnames section
----------------------------------------------------------------
LE <---------------------- *
LE * -----------------> "foobar.c" <---
LE |
LE |
LE <---------------------- * |
LE * -----------------> "foobar.h" <| |
LE | |
LE | |
LE <---------------------- * | |
LE * -----------------> "inner.h" | |
LE | |
LE <---------------------- * | |
LE * ------------------------------- |
LE |
LE |
LE |
LE |
LE <---------------------- * |
LE * -----------------------------------
LE
LE
LE
In effect, each entry in the .debug_srcinfo section points to *both* a
filename (in the .debug_sfnames section) and to the start of a block of
consecutive LEs (in the .line section).
Note that just like in the .line section, there are specialized first and
last entries in the .debug_srcinfo section for each object file. These
special first and last entries for the .debug_srcinfo section are very
different from the normal .debug_srcinfo section entries. They provide
additional information which may be helpful to a debugger when it is
interpreting the data in the .debug_srcinfo, .debug_sfnames, and .line
sections.
The first entry in the .debug_srcinfo section for each compilation unit
consists of five 4-byte words of data. The contents of these five words
should be interpreted (by debuggers) as follows:
(1) The starting address (relative to the entire .line section)
of the .line section for this compilation unit.
(2) The starting address (relative to the entire .debug_sfnames
section) of the .debug_sfnames section for this compilation
unit.
(3) The starting address (in the execution virtual address space)
of the .text section for this compilation unit.
(4) The ending address plus one (in the execution virtual address
space) of the .text section for this compilation unit.
(5) The date/time (in seconds since midnight 1/1/70) at which the
compilation of this compilation unit occurred. This value
should be interpreted as an unsigned quantity because gcc
might be configured to generate a default value of 0xffffffff
in this field (in cases where it is desired to have object
files created at different times from identical source files
be byte-for-byte identical).
Note that the first string placed into the .debug_sfnames section for each
compilation unit is the name of the directory in which compilation occurred.
This string ends with a `/' (to help indicate that it is the pathname of a
directory). Thus, the second word of each specialized initial .debug_srcinfo
entry for each compilation unit may be used as a pointer to the (string)
name of the compilation directory, and that string may in trun be used to
"absolutize" any relative pathnames which may appear later on in the
.debug_sfnames section entries for the same compilation unit.
The fifth and last word of each specialized starting entry for a compilation
unit in the .debug_srcinfo section indicates the date/time of compilation,
and this may be used (by the debugger) to determine if any of the source
files which contributed code to this compilation unit are newer than the
object code for the compilation unit itself. If so, the debugger may wish
to print an "out-of-date" warning about the compilation unit.
The .debug_srcinfo section associated with each compilation will also have
a specialized terminating entry. This terminating .debug_srcinfo section
entry will consist of the following two 4-byte words of data:
(1) The offset, measured from the start of the .line section to
the beginning of the terminating entry for the .line section.
(2) A word containing the value 0xffffffff.
--------------------------------
In the current DWARF draft specification, no provision is made for recording
any information about macro definitions and un-definitions.
Recently, a scheme for providing accurate information about macro definitions
an un-definitions was proposed (by me) to the UI/PLSIG. This scheme was
rejected by the UI/PLSIG for inclusion into the DWARF Version 1 specification,
but GNU DWARF implements this extension anyway (when the -g3 option is used).
GCC records information about macro definitions and undefinitions primarily
in a section called the .debug_macinfo section. Normal entries in the
.debug_macinfo section consist of the following three parts:
(1) A special "type" byte.
(2) A 3-byte line-number/filename-offset field.
(3) A NUL terminated string.
The interpretation of the second and third parts is dependent upon the
value of the leading (type) byte.
The type byte may have one of four values depending upon the type of the
.debug_macinfo entry which follows. The 1-byte MACINFO type codes presently
used, and their meanings are as follows:
MACINFO_start A base file or an include file starts here.
MACINFO_resume The current base or include file ends here.
MACINFO_define A #define directive occurs here.
MACINFO_undef A #undef directive occur here.
(Note that the MACINFO_... codes mentioned here are simply symbolic names
for constants which are defined in the GNU dwarf.h file.)
For MACINFO_define and MACINFO_undef entries, the second (3-byte) field
contains the number of the source line (relative to the start of the current
base source file or the current include files) when the #define or #undef
directive appears. For a MACINFO_define entry, the following string field
contains the name of the macro which is defined, followed by its definition.
Note that the definition is always separated from the name of the macro
by at least one whitespace character. For a MACINFO_undef entry, the
string which follows the 3-byte line number field contains just the name
of the macro which is being undef'ed.
For a MACINFO_start entry, the 3-byte field following the type byte contains
the offset, relative to the start of the .debug_sfnames section for the
current compilation unit, of a string which names the new source file which
is beginning its inclusion at this point. Following that 3-byte field,
each MACINFO_start entry always contains a zero length NUL terminated
string.
For a MACINFO_resume entry, the 3-byte field following the type byte contains
the line number WITHIN THE INCLUDING FILE at which the inclusion of the
current file (whose inclusion ends here) was initiated. Following that
3-byte field, each MACINFO_resume entry always contains a zero length NUL
terminated string.
Each set of .debug_macinfo entries for each compilation unit is terminated
by a special .debug_macinfo entry consisting of a 4-byte zero value followed
by a single NUL byte.
--------------------------------
In the current DWARF draft specification, no provision is made for providing
a separate level of (limited) debugging information necessary to support
tracebacks (only) through fully-debugged code (e.g. code in system libraries).
Recently, a proposal to define such a level was submitted (by me) to the
UI/PLSIG. This proposal was rejected by the UI/PLSIG for inclusion into
the DWARF Version 1 specification because it was felt that the issues
involved in supporting a "traceback only" subset of dwarf were not yet
well understood. Nonetheless, the GNU implementation of DWARF provides
this extension anyway (when the -g1 option is used).
--------------------------------
GNU DWARF Representation of GNU C Extensions to ANSI C
------------------------------------------------------
The file dwarfout.c has been designed and implemented so as to provide
some reasonable DWARF representation for each and every declarative
construct which is accepted by the GNU C compiler. Since the GNU C
compiler accepts a superset of ANSI C, this means that there are some
cases in which the DWARF information produced by GCC must take some
liberties in improvising DWARF representations for declarations which
are only valid in (extended) GNU C.
In particular, GNU C provides at least three significant extensions to
ANSI C when it comes to declarations. These are (1) inline functions,
and (2) dynamic arrays, and (3) incomplete enum types. (See the GCC
manual for more information on these GNU extensions to ANSI C.) When
used, these GNU C extensions are represented (in the generated DWARF
output of GCC) in the most natural and intuitively obvious ways.
In the case of inline functions, the DWARF representation is exactly as
called for (in the current UI/PLSIG DWARF draft specification) for an
identical function written in C++; i.e. we "reuse" the representation
of inline functions defined for C++ to support this GNU C extension.
In the case of dynamic arrays, we use the most obvious representational
mechanism available; i.e. an array type in which the upper bound of
some dimension (usually the first and only dimension) is a variable
rather than a constant. See the UI/PLSIG DWARF draft specification
for more details.
In the case of incomplete enum types, such types are represented simply
as TAG_enumeration_type DIEs which DO NOT contain either AT_byte_size
attributes or AT_element_list attributes.
--------------------------------
Future Directions
-----------------
The codes, formats, and other paraphernalia necessary to provide proper
support for symbolic debugging for the C++ language have now been defined
and accepted by the UI/PLSIG. Support for C++ (i.e. the g++ front-end)
in dwarfout.c has not been fully implemented yet however.
Likewise, the UI/PLSIG has defined what is believed to be a complete and
sufficient set of codes and rules for adequately representing all of
FORTRAN 77, and most of Fortran 90 in DWARF. While some support for
this has been implemented in dwarfout.c, further implementation and
testing will have to await the arrival of the GNU Fortran front-end.
DWARF support for other languages (i.e. Pascal and Modula) currently has
a number of known problems as far as the current UI/PLSIG DWARF draft
specification is concerned. Hopefully, A more complete form of DWARF
which can handle *all* of the symbolic debugging requirements for Pascal,
Modula, and Ada will evolve in the future. Efforts are currently underway
to develop DWARF more fully for these and other languages under the auspices
of the UI/PLSIG. Contact the Chairman, Dan Oldman <oldman@dg-rtp.dg.com>
for further information.
As currently defined, DWARF only describes a (binary) language which can
be used to communicate symbolic debugging information from a compiler
through an assembler and a linker, to a debugger. There is no clear
specification of what processing should be (or must be) done by the
assembler and/or the linker. Fortunately, the role of the assembler
is easily inferred (by anyone knowledgeable about assemblers) just by
looking at examples of assembly-level DWARF code. Sadly though, the
allowable (or required) processing steps performed by a linker are
harder to infer and (perhaps) even harder to agree upon. There are
several forms of very useful `post-processing' steps which intelligent
linkers *could* (in theory) perform on object files containing DWARF,
but any and all such link-time transformations are currently both disallowed
and unspecified.
In particular, possible link-time transformations of DWARF code which could
provide significant benefits include (but are not limited to):
Commonization of duplicate DIEs obtained from multiple input
(object) files.
Cross-compilation type checking based upon DWARF type information
for objects and functions.
Other possible `compacting' transformations designed to save disk
space and to reduce linker & debugger I/O activity.