Files
oldlinux-files/study/Ref-docs/manual as/as_4.html
2024-02-19 00:25:23 -05:00

396 lines
14 KiB
HTML

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
from ../texi/as.texinfo on 24 April 1999 -->
<TITLE>Using as - Sections and Relocation</TITLE>
</HEAD>
<BODY>
Go to the <A HREF="as_1.html">first</A>, <A HREF="as_3.html">previous</A>, <A HREF="as_5.html">next</A>, <A HREF="as_27.html">last</A> section, <A HREF="as_toc.html">table of contents</A>.
<P><HR><P>
<H1><A NAME="SEC39" HREF="as_toc.html#TOC39">Sections and Relocation</A></H1>
<P>
<A NAME="IDX167"></A>
<A NAME="IDX168"></A>
</P>
<H2><A NAME="SEC40" HREF="as_toc.html#TOC40">Background</A></H2>
<P>
Roughly, a section is a range of addresses, with no gaps; all data
"in" those addresses is treated the same for some particular purpose.
For example there may be a "read only" section.
</P>
<P>
<A NAME="IDX169"></A>
<A NAME="IDX170"></A>
The linker <CODE>ld</CODE> reads many object files (partial programs) and
combines their contents to form a runnable program. When <CODE>as</CODE>
emits an object file, the partial program is assumed to start at address 0.
<CODE>ld</CODE> assigns the final addresses for the partial program, so that
different partial programs do not overlap. This is actually an
oversimplification, but it suffices to explain how <CODE>as</CODE> uses
sections.
</P>
<P>
<CODE>ld</CODE> moves blocks of bytes of your program to their run-time
addresses. These blocks slide to their run-time addresses as rigid
units; their length does not change and neither does the order of bytes
within them. Such a rigid unit is called a <EM>section</EM>. Assigning
run-time addresses to sections is called <EM>relocation</EM>. It includes
the task of adjusting mentions of object-file addresses so they refer to
the proper run-time addresses.
For the H8/300 and H8/500,
and for the Hitachi SH,
<CODE>as</CODE> pads sections if needed to
ensure they end on a word (sixteen bit) boundary.
</P>
<P>
<A NAME="IDX171"></A>
An object file written by <CODE>as</CODE> has at least three sections, any
of which may be empty. These are named <EM>text</EM>, <EM>data</EM> and
<EM>bss</EM> sections.
</P>
<P>
When it generates COFF output,
<CODE>as</CODE> can also generate whatever other named sections you specify
using the <SAMP>`.section'</SAMP> directive (see section <A HREF="as_7.html#SEC119"><CODE>.section <VAR>name</CODE></VAR></A>).
If you do not use any directives that place output in the <SAMP>`.text'</SAMP>
or <SAMP>`.data'</SAMP> sections, these sections still exist, but are empty.
</P>
<P>
When <CODE>as</CODE> generates SOM or ELF output for the HPPA,
<CODE>as</CODE> can also generate whatever other named sections you
specify using the <SAMP>`.space'</SAMP> and <SAMP>`.subspace'</SAMP> directives. See
<CITE>HP9000 Series 800 Assembly Language Reference Manual</CITE>
(HP 92432-90001) for details on the <SAMP>`.space'</SAMP> and <SAMP>`.subspace'</SAMP>
assembler directives.
</P>
<P>
Additionally, <CODE>as</CODE> uses different names for the standard
text, data, and bss sections when generating SOM output. Program text
is placed into the <SAMP>`$CODE$'</SAMP> section, data into <SAMP>`$DATA$'</SAMP>, and
BSS into <SAMP>`$BSS$'</SAMP>.
</P>
<P>
Within the object file, the text section starts at address <CODE>0</CODE>, the
data section follows, and the bss section follows the data section.
</P>
<P>
When generating either SOM or ELF output files on the HPPA, the text
section starts at address <CODE>0</CODE>, the data section at address
<CODE>0x4000000</CODE>, and the bss section follows the data section.
</P>
<P>
To let <CODE>ld</CODE> know which data changes when the sections are
relocated, and how to change that data, <CODE>as</CODE> also writes to the
object file details of the relocation needed. To perform relocation
<CODE>ld</CODE> must know, each time an address in the object
file is mentioned:
<UL>
<LI>
Where in the object file is the beginning of this reference to
an address?
<LI>
How long (in bytes) is this reference?
<LI>
Which section does the address refer to? What is the numeric value of
<PRE>
(<VAR>address</VAR>) - (<VAR>start-address of section</VAR>)?
</PRE>
<LI>
Is the reference to an address "Program-Counter relative"?
</UL>
<P>
<A NAME="IDX172"></A>
<A NAME="IDX173"></A>
In fact, every address <CODE>as</CODE> ever uses is expressed as
<PRE>
(<VAR>section</VAR>) + (<VAR>offset into section</VAR>)
</PRE>
<P>
Further, most expressions <CODE>as</CODE> computes have this section-relative
nature.
(For some object formats, such as SOM for the HPPA, some expressions are
symbol-relative instead.)
</P>
<P>
In this manual we use the notation {<VAR>secname</VAR> <VAR>N</VAR>} to mean "offset
<VAR>N</VAR> into section <VAR>secname</VAR>."
</P>
<P>
Apart from text, data and bss sections you need to know about the
<EM>absolute</EM> section. When <CODE>ld</CODE> mixes partial programs,
addresses in the absolute section remain unchanged. For example, address
<CODE>{absolute 0}</CODE> is "relocated" to run-time address 0 by
<CODE>ld</CODE>. Although the linker never arranges two partial programs'
data sections with overlapping addresses after linking, <EM>by definition</EM>
their absolute sections must overlap. Address <CODE>{absolute 239}</CODE> in one
part of a program is always the same address when the program is running as
address <CODE>{absolute 239}</CODE> in any other part of the program.
</P>
<P>
The idea of sections is extended to the <EM>undefined</EM> section. Any
address whose section is unknown at assembly time is by definition
rendered {undefined <VAR>U</VAR>}---where <VAR>U</VAR> is filled in later.
Since numbers are always defined, the only way to generate an undefined
address is to mention an undefined symbol. A reference to a named
common block would be such a symbol: its value is unknown at assembly
time so it has section <EM>undefined</EM>.
</P>
<P>
By analogy the word <EM>section</EM> is used to describe groups of sections in
the linked program. <CODE>ld</CODE> puts all partial programs' text
sections in contiguous addresses in the linked program. It is
customary to refer to the <EM>text section</EM> of a program, meaning all
the addresses of all partial programs' text sections. Likewise for
data and bss sections.
</P>
<P>
Some sections are manipulated by <CODE>ld</CODE>; others are invented for
use of <CODE>as</CODE> and have no meaning except during assembly.
</P>
<H2><A NAME="SEC41" HREF="as_toc.html#TOC41">Linker Sections</A></H2>
<P>
<CODE>ld</CODE> deals with just four kinds of sections, summarized below.
</P>
<DL COMPACT>
<DT><STRONG>named sections</STRONG>
<DD>
<A NAME="IDX174"></A>
<A NAME="IDX175"></A>
<A NAME="IDX176"></A>
<A NAME="IDX177"></A>
<DT><STRONG>text section</STRONG>
<DD>
<DT><STRONG>data section</STRONG>
<DD>
These sections hold your program. <CODE>as</CODE> and <CODE>ld</CODE> treat them as
separate but equal sections. Anything you can say of one section is
true another.
When the program is running, however, it is
customary for the text section to be unalterable. The
text section is often shared among processes: it contains
instructions, constants and the like. The data section of a running
program is usually alterable: for example, C variables would be stored
in the data section.
<A NAME="IDX178"></A>
<DT><STRONG>bss section</STRONG>
<DD>
This section contains zeroed bytes when your program begins running. It
is used to hold unitialized variables or common storage. The length of
each partial program's bss section is important, but because it starts
out containing zeroed bytes there is no need to store explicit zero
bytes in the object file. The bss section was invented to eliminate
those explicit zeros from object files.
<A NAME="IDX179"></A>
<DT><STRONG>absolute section</STRONG>
<DD>
Address 0 of this section is always "relocated" to runtime address 0.
This is useful if you want to refer to an address that <CODE>ld</CODE> must
not change when relocating. In this sense we speak of absolute
addresses being "unrelocatable": they do not change during relocation.
<A NAME="IDX180"></A>
<DT><STRONG>undefined section</STRONG>
<DD>
This "section" is a catch-all for address references to objects not in
the preceding sections.
</DL>
<P>
<A NAME="IDX181"></A>
An idealized example of three relocatable sections follows.
The example uses the traditional section names <SAMP>`.text'</SAMP> and <SAMP>`.data'</SAMP>.
Memory addresses are on the horizontal axis.
</P>
<H2><A NAME="SEC42" HREF="as_toc.html#TOC42">Assembler Internal Sections</A></H2>
<P>
<A NAME="IDX182"></A>
<A NAME="IDX183"></A>
These sections are meant only for the internal use of <CODE>as</CODE>. They
have no meaning at run-time. You do not really need to know about these
sections for most purposes; but they can be mentioned in <CODE>as</CODE>
warning messages, so it might be helpful to have an idea of their
meanings to <CODE>as</CODE>. These sections are used to permit the
value of every expression in your assembly language program to be a
section-relative address.
</P>
<DL COMPACT>
<DT><B>ASSEMBLER-INTERNAL-LOGIC-ERROR!</B>
<DD>
<A NAME="IDX184"></A>
An internal assembler logic error has been found. This means there is a
bug in the assembler.
<A NAME="IDX185"></A>
<DT><B>expr section</B>
<DD>
The assembler stores complex expression internally as combinations of
symbols. When it needs to represent an expression as a symbol, it puts
it in the expr section.
</DL>
<H2><A NAME="SEC43" HREF="as_toc.html#TOC43">Sub-Sections</A></H2>
<P>
<A NAME="IDX186"></A>
<A NAME="IDX187"></A>
Assembled bytes
conventionally
fall into two sections: text and data.
You may have separate groups of
data in named sections
text or data
that you want to end up near to each other in the object file, even though they
are not contiguous in the assembler source. <CODE>as</CODE> allows you to
use <EM>subsections</EM> for this purpose. Within each section, there can be
numbered subsections with values from 0 to 8192. Objects assembled into the
same subsection go into the object file together with other objects in the same
subsection. For example, a compiler might want to store constants in the text
section, but might not want to have them interspersed with the program being
assembled. In this case, the compiler could issue a <SAMP>`.text 0'</SAMP> before each
section of code being output, and a <SAMP>`.text 1'</SAMP> before each group of
constants being output.
</P>
<P>
Subsections are optional. If you do not use subsections, everything
goes in subsection number zero.
</P>
<P>
Each subsection is zero-padded up to a multiple of four bytes.
(Subsections may be padded a different amount on different flavors
of <CODE>as</CODE>.)
</P>
<P>
Subsections appear in your object file in numeric order, lowest numbered
to highest. (All this to be compatible with other people's assemblers.)
The object file contains no representation of subsections; <CODE>ld</CODE> and
other programs that manipulate object files see no trace of them.
They just see all your text subsections as a text section, and all your
data subsections as a data section.
</P>
<P>
To specify which subsection you want subsequent statements assembled
into, use a numeric argument to specify it, in a <SAMP>`.text
<VAR>expression</VAR>'</SAMP> or a <SAMP>`.data <VAR>expression</VAR>'</SAMP> statement.
When generating COFF output, you
can also use an extra subsection
argument with arbitrary named sections: <SAMP>`.section <VAR>name</VAR>,
<VAR>expression</VAR>'</SAMP>.
<VAR>Expression</VAR> should be an absolute expression.
(See section <A HREF="as_6.html#SEC60">Expressions</A>.) If you just say <SAMP>`.text'</SAMP> then <SAMP>`.text 0'</SAMP>
is assumed. Likewise <SAMP>`.data'</SAMP> means <SAMP>`.data 0'</SAMP>. Assembly
begins in <CODE>text 0</CODE>. For instance:
<PRE>
.text 0 # The default subsection is text 0 anyway.
.ascii "This lives in the first text subsection. *"
.text 1
.ascii "But this lives in the second text subsection."
.data 0
.ascii "This lives in the data section,"
.ascii "in the first data subsection."
.text 0
.ascii "This lives in the first text section,"
.ascii "immediately following the asterisk (*)."
</PRE>
<P>
Each section has a <EM>location counter</EM> incremented by one for every byte
assembled into that section. Because subsections are merely a convenience
restricted to <CODE>as</CODE> there is no concept of a subsection location
counter. There is no way to directly manipulate a location counter--but the
<CODE>.align</CODE> directive changes it, and any label definition captures its
current value. The location counter of the section where statements are being
assembled is said to be the <EM>active</EM> location counter.
</P>
<H2><A NAME="SEC44" HREF="as_toc.html#TOC44">bss Section</A></H2>
<P>
<A NAME="IDX188"></A>
<A NAME="IDX189"></A>
The bss section is used for local common variable storage.
You may allocate address space in the bss section, but you may
not dictate data to load into it before your program executes. When
your program starts running, all the contents of the bss
section are zeroed bytes.
</P>
<P>
The <CODE>.lcomm</CODE> pseudo-op defines a symbol in the bss section; see
section <A HREF="as_7.html#SEC101"><CODE>.lcomm <VAR>symbol</CODE> , <VAR>length</VAR></VAR></A>.
</P>
<P>
The <CODE>.comm</CODE> pseudo-op may be used to declare a common symbol, which is
another form of uninitialized symbol; see See section <A HREF="as_7.html#SEC76"><CODE>.comm <VAR>symbol</CODE> , <VAR>length</VAR> </VAR></A>.
</P>
<P>
When assembling for a target which supports multiple sections, such as ELF or
COFF, you may switch into the <CODE>.bss</CODE> section and define symbols as usual;
see section <A HREF="as_7.html#SEC119"><CODE>.section <VAR>name</CODE></VAR></A>. You may only assemble zero values into the
section. Typically the section will only contain symbol definitions and
<CODE>.skip</CODE> directives (see section <A HREF="as_7.html#SEC125"><CODE>.skip <VAR>size</CODE> , <VAR>fill</VAR></VAR></A>).
</P>
<P><HR><P>
Go to the <A HREF="as_1.html">first</A>, <A HREF="as_3.html">previous</A>, <A HREF="as_5.html">next</A>, <A HREF="as_27.html">last</A> section, <A HREF="as_toc.html">table of contents</A>.
</BODY>
</HTML>