oldlinux-files/Ref-docs/nasmdoc/html/nasmdoc8.html

<html><head><title>NASM Manual</title></head>
<body><h1 align=center>The Netwide Assembler: NASM</h1>

<p align=center><a href="nasmdoc9.html">Next Chapter</a> |
<a href="nasmdoc7.html">Previous Chapter</a> |
<a href="nasmdoc0.html">Contents</a> |
<a href="nasmdoci.html">Index</a>
<h2><a name="chapter-8">Chapter 8: Writing 32-bit Code (Unix, Win32, DJGPP)</a></h2>
<p>This chapter attempts to cover some of the common issues involved when
writing 32-bit code, to run under Win32 or Unix, or to be linked with C
code generated by a Unix-style C compiler such as DJGPP. It covers how to
write assembly code to interface with 32-bit C routines, and how to write
position-independent code for shared libraries.
<p>Almost all 32-bit code, and in particular all code running under
<code><nobr>Win32</nobr></code>, <code><nobr>DJGPP</nobr></code> or any of
the PC Unix variants, runs in <em>flat</em> memory model. This means that
the segment registers and paging have already been set up to give you the
same 32-bit 4Gb address space no matter what segment you work relative to,
and that you should ignore all segment registers completely. When writing
flat-model application code, you never need to use a segment override or
modify any segment register, and the code-section addresses you pass to
<code><nobr>CALL</nobr></code> and <code><nobr>JMP</nobr></code> live in
the same address space as the data-section addresses you access your
variables by and the stack-section addresses you access local variables and
procedure parameters by. Every address is 32 bits long and contains only an
offset part.
<h3><a name="section-8.1">8.1 Interfacing to 32-bit C Programs</a></h3>
<p>A lot of the discussion in <a href="nasmdoc7.html#section-7.4">section
7.4</a>, about interfacing to 16-bit C programs, still applies when working
in 32 bits. The absence of memory models or segmentation worries simplifies
things a lot.
<h4><a name="section-8.1.1">8.1.1 External Symbol Names</a></h4>
<p>Most 32-bit C compilers share the convention used by 16-bit compilers,
that the names of all global symbols (functions or data) they define are
formed by prefixing an underscore to the name as it appears in the C
program. However, not all of them do: the <code><nobr>ELF</nobr></code>
specification states that C symbols do <em>not</em> have a leading
underscore on their assembly-language names.
<p>The older Linux <code><nobr>a.out</nobr></code> C compiler, all
<code><nobr>Win32</nobr></code> compilers, <code><nobr>DJGPP</nobr></code>,
and <code><nobr>NetBSD</nobr></code> and <code><nobr>FreeBSD</nobr></code>,
all use the leading underscore; for these compilers, the macros
<code><nobr>cextern</nobr></code> and <code><nobr>cglobal</nobr></code>, as
given in <a href="nasmdoc7.html#section-7.4.1">section 7.4.1</a>, will
still work. For <code><nobr>ELF</nobr></code>, though, the leading
underscore should not be used.
<p>See also <a href="nasmdoc2.html#section-2.1.21">section 2.1.21</a>.
<h4><a name="section-8.1.2">8.1.2 Function Definitions and Function Calls</a></h4>
<p>The C calling conventionThe C calling convention in 32-bit programs is
as follows. In the following description, the words <em>caller</em> and
<em>callee</em> are used to denote the function doing the calling and the
function which gets called.
<ul>
<li>The caller pushes the function's parameters on the stack, one after
another, in reverse order (right to left, so that the first argument
specified to the function is pushed last).
<li>The caller then executes a near <code><nobr>CALL</nobr></code>
instruction to pass control to the callee.
<li>The callee receives control, and typically (although this is not
actually necessary, in functions which do not need to access their
parameters) starts by saving the value of <code><nobr>ESP</nobr></code> in
<code><nobr>EBP</nobr></code> so as to be able to use
<code><nobr>EBP</nobr></code> as a base pointer to find its parameters on
the stack. However, the caller was probably doing this too, so part of the
calling convention states that <code><nobr>EBP</nobr></code> must be
preserved by any C function. Hence the callee, if it is going to set up
<code><nobr>EBP</nobr></code> as a frame pointer, must push the previous
value first.
<li>The callee may then access its parameters relative to
<code><nobr>EBP</nobr></code>. The doubleword at
<code><nobr>[EBP]</nobr></code> holds the previous value of
<code><nobr>EBP</nobr></code> as it was pushed; the next doubleword, at
<code><nobr>[EBP+4]</nobr></code>, holds the return address, pushed
implicitly by <code><nobr>CALL</nobr></code>. The parameters start after
that, at <code><nobr>[EBP+8]</nobr></code>. The leftmost parameter of the
function, since it was pushed last, is accessible at this offset from
<code><nobr>EBP</nobr></code>; the others follow, at successively greater
offsets. Thus, in a function such as <code><nobr>printf</nobr></code> which
takes a variable number of parameters, the pushing of the parameters in
reverse order means that the function knows where to find its first
parameter, which tells it the number and type of the remaining ones.
<li>The callee may also wish to decrease <code><nobr>ESP</nobr></code>
further, so as to allocate space on the stack for local variables, which
will then be accessible at negative offsets from
<code><nobr>EBP</nobr></code>.
<li>The callee, if it wishes to return a value to the caller, should leave
the value in <code><nobr>AL</nobr></code>, <code><nobr>AX</nobr></code> or
<code><nobr>EAX</nobr></code> depending on the size of the value.
Floating-point results are typically returned in
<code><nobr>ST0</nobr></code>.
<li>Once the callee has finished processing, it restores
<code><nobr>ESP</nobr></code> from <code><nobr>EBP</nobr></code> if it had
allocated local stack space, then pops the previous value of
<code><nobr>EBP</nobr></code>, and returns via
<code><nobr>RET</nobr></code> (equivalently,
<code><nobr>RETN</nobr></code>).
<li>When the caller regains control from the callee, the function
parameters are still on the stack, so it typically adds an immediate
constant to <code><nobr>ESP</nobr></code> to remove them (instead of
executing a number of slow <code><nobr>POP</nobr></code> instructions).
Thus, if a function is accidentally called with the wrong number of
parameters due to a prototype mismatch, the stack will still be returned to
a sensible state since the caller, which <em>knows</em> how many parameters
it pushed, does the removing.
</ul>
<p>There is an alternative calling convention used by Win32 programs for
Windows API calls, and also for functions called <em>by</em> the Windows
API such as window procedures: they follow what Microsoft calls the
<code><nobr>__stdcall</nobr></code> convention. This is slightly closer to
the Pascal convention, in that the callee clears the stack by passing a
parameter to the <code><nobr>RET</nobr></code> instruction. However, the
parameters are still pushed in right-to-left order.
<p>Thus, you would define a function in C style in the following way:
<p><pre>
global  _myfunc

_myfunc:
        push    ebp
        mov     ebp,esp
        sub     esp,0x40        ; 64 bytes of local stack space
        mov     ebx,[ebp+8]     ; first parameter to function

        ; some more code

        leave                   ; mov esp,ebp / pop ebp
        ret
</pre>
<p>At the other end of the process, to call a C function from your assembly
code, you would do something like this:
<p><pre>
extern  _printf

        ; and then, further down...

        push    dword [myint]   ; one of my integer variables
        push    dword mystring  ; pointer into my data segment
        call    _printf
        add     esp,byte 8      ; `byte' saves space

        ; then those data items...

segment _DATA

myint       dd   1234
mystring    db   'This number -&gt; %d &lt;- should be 1234',10,0
</pre>
<p>This piece of code is the assembly equivalent of the C code
<p><pre>
    int myint = 1234;
    printf("This number -&gt; %d &lt;- should be 1234\n", myint);
</pre>
<h4><a name="section-8.1.3">8.1.3 Accessing Data Items</a></h4>
<p>To get at the contents of C variables, or to declare variables which C
can access, you need only declare the names as
<code><nobr>GLOBAL</nobr></code> or <code><nobr>EXTERN</nobr></code>.
(Again, the names require leading underscores, as stated in
<a href="#section-8.1.1">section 8.1.1</a>.) Thus, a C variable declared as
<code><nobr>int i</nobr></code> can be accessed from assembler as
<p><pre>
          extern _i
          mov eax,[_i]
</pre>
<p>And to declare your own integer variable which C programs can access as
<code><nobr>extern int j</nobr></code>, you do this (making sure you are
assembling in the <code><nobr>_DATA</nobr></code> segment, if necessary):
<p><pre>
          global _j
_j        dd 0
</pre>
<p>To access a C array, you need to know the size of the components of the
array. For example, <code><nobr>int</nobr></code> variables are four bytes
long, so if a C program declares an array as
<code><nobr>int a[10]</nobr></code>, you can access
<code><nobr>a[3]</nobr></code> by coding
<code><nobr>mov ax,[_a+12]</nobr></code>. (The byte offset 12 is obtained
by multiplying the desired array index, 3, by the size of the array
element, 4.) The sizes of the C base types in 32-bit compilers are: 1 for
<code><nobr>char</nobr></code>, 2 for <code><nobr>short</nobr></code>, 4
for <code><nobr>int</nobr></code>, <code><nobr>long</nobr></code> and
<code><nobr>float</nobr></code>, and 8 for
<code><nobr>double</nobr></code>. Pointers, being 32-bit addresses, are
also 4 bytes long.
<p>To access a C data structure, you need to know the offset from the base
of the structure to the field you are interested in. You can either do this
by converting the C structure definition into a NASM structure definition
(using <code><nobr>STRUC</nobr></code>), or by calculating the one offset
and using just that.
<p>To do either of these, you should read your C compiler's manual to find
out how it organises data structures. NASM gives no special alignment to
structure members in its own <code><nobr>STRUC</nobr></code> macro, so you
have to specify alignment yourself if the C compiler generates it.
Typically, you might find that a structure like
<p><pre>
struct {
    char c;
    int i;
} foo;
</pre>
<p>might be eight bytes long rather than five, since the
<code><nobr>int</nobr></code> field would be aligned to a four-byte
boundary. However, this sort of feature is sometimes a configurable option
in the C compiler, either using command-line options or
<code><nobr>#pragma</nobr></code> lines, so you have to find out how your
own compiler does it.
<h4><a name="section-8.1.4">8.1.4 <code><nobr>c32.mac</nobr></code>: Helper Macros for the 32-bit C Interface</a></h4>
<p>Included in the NASM archives, in the <code><nobr>misc</nobr></code>
directory, is a file <code><nobr>c32.mac</nobr></code> of macros. It
defines three macros: <code><nobr>proc</nobr></code>,
<code><nobr>arg</nobr></code> and <code><nobr>endproc</nobr></code>. These
are intended to be used for C-style procedure definitions, and they
automate a lot of the work involved in keeping track of the calling
convention.
<p>An example of an assembly function using the macro set is given here:
<p><pre>
proc    _proc32

%$i     arg
%$j     arg
        mov     eax,[ebp + %$i]
        mov     ebx,[ebp + %$j]
        add     eax,[ebx]

endproc
</pre>
<p>This defines <code><nobr>_proc32</nobr></code> to be a procedure taking
two arguments, the first (<code><nobr>i</nobr></code>) an integer and the
second (<code><nobr>j</nobr></code>) a pointer to an integer. It returns
<code><nobr>i + *j</nobr></code>.
<p>Note that the <code><nobr>arg</nobr></code> macro has an
<code><nobr>EQU</nobr></code> as the first line of its expansion, and since
the label before the macro call gets prepended to the first line of the
expanded macro, the <code><nobr>EQU</nobr></code> works, defining
<code><nobr>%$i</nobr></code> to be an offset from
<code><nobr>BP</nobr></code>. A context-local variable is used, local to
the context pushed by the <code><nobr>proc</nobr></code> macro and popped
by the <code><nobr>endproc</nobr></code> macro, so that the same argument
name can be used in later procedures. Of course, you don't <em>have</em> to
do that.
<p><code><nobr>arg</nobr></code> can take an optional parameter, giving the
size of the argument. If no size is given, 4 is assumed, since it is likely
that many function parameters will be of type <code><nobr>int</nobr></code>
or pointers.
<h3><a name="section-8.2">8.2 Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF Shared Libraries</a></h3>
<p><code><nobr>ELF</nobr></code> replaced the older
<code><nobr>a.out</nobr></code> object file format under Linux because it
contains support for position-independent code (PIC), which makes writing
shared libraries much easier. NASM supports the
<code><nobr>ELF</nobr></code> position-independent code features, so you
can write Linux <code><nobr>ELF</nobr></code> shared libraries in NASM.
<p>NetBSD, and its close cousins FreeBSD and OpenBSD, take a different
approach by hacking PIC support into the <code><nobr>a.out</nobr></code>
format. NASM supports this as the <code><nobr>aoutb</nobr></code> output
format, so you can write BSD shared libraries in NASM too.
<p>The operating system loads a PIC shared library by memory-mapping the
library file at an arbitrarily chosen point in the address space of the
running process. The contents of the library's code section must therefore
not depend on where it is loaded in memory.
<p>Therefore, you cannot get at your variables by writing code like this:
<p><pre>
        mov     eax,[myvar]             ; WRONG
</pre>
<p>Instead, the linker provides an area of memory called the <em>global
offset table</em>, or GOT; the GOT is situated at a constant distance from
your library's code, so if you can find out where your library is loaded
(which is typically done using a <code><nobr>CALL</nobr></code> and
<code><nobr>POP</nobr></code> combination), you can obtain the address of
the GOT, and you can then load the addresses of your variables out of
linker-generated entries in the GOT.
<p>The <em>data</em> section of a PIC shared library does not have these
restrictions: since the data section is writable, it has to be copied into
memory anyway rather than just paged in from the library file, so as long
as it's being copied it can be relocated too. So you can put ordinary types
of relocation in the data section without too much worry (but see
<a href="#section-8.2.4">section 8.2.4</a> for a caveat).
<h4><a name="section-8.2.1">8.2.1 Obtaining the Address of the GOT</a></h4>
<p>Each code module in your shared library should define the GOT as an
external symbol:
<p><pre>
extern  _GLOBAL_OFFSET_TABLE_   ; in ELF
extern  __GLOBAL_OFFSET_TABLE_  ; in BSD a.out
</pre>
<p>At the beginning of any function in your shared library which plans to
access your data or BSS sections, you must first calculate the address of
the GOT. This is typically done by writing the function in this form:
<p><pre>
func:   push    ebp
        mov     ebp,esp
        push    ebx
        call    .get_GOT
.get_GOT:
        pop     ebx
        add     ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc

        ; the function body comes here

        mov     ebx,[ebp-4]
        mov     esp,ebp
        pop     ebp
        ret
</pre>
<p>(For BSD, again, the symbol
<code><nobr>_GLOBAL_OFFSET_TABLE</nobr></code> requires a second leading
underscore.)
<p>The first two lines of this function are simply the standard C prologue
to set up a stack frame, and the last three lines are standard C function
epilogue. The third line, and the fourth to last line, save and restore the
<code><nobr>EBX</nobr></code> register, because PIC shared libraries use
this register to store the address of the GOT.
<p>The interesting bit is the <code><nobr>CALL</nobr></code> instruction
and the following two lines. The <code><nobr>CALL</nobr></code> and
<code><nobr>POP</nobr></code> combination obtains the address of the label
<code><nobr>.get_GOT</nobr></code>, without having to know in advance where
the program was loaded (since the <code><nobr>CALL</nobr></code>
instruction is encoded relative to the current position). The
<code><nobr>ADD</nobr></code> instruction makes use of one of the special
PIC relocation types: GOTPC relocation. With the
<code><nobr>WRT ..gotpc</nobr></code> qualifier specified, the symbol
referenced (here <code><nobr>_GLOBAL_OFFSET_TABLE_</nobr></code>, the
special symbol assigned to the GOT) is given as an offset from the
beginning of the section. (Actually, <code><nobr>ELF</nobr></code> encodes
it as the offset from the operand field of the
<code><nobr>ADD</nobr></code> instruction, but NASM simplifies this
deliberately, so you do things the same way for both
<code><nobr>ELF</nobr></code> and <code><nobr>BSD</nobr></code>.) So the
instruction then <em>adds</em> the beginning of the section, to get the
real address of the GOT, and subtracts the value of
<code><nobr>.get_GOT</nobr></code> which it knows is in
<code><nobr>EBX</nobr></code>. Therefore, by the time that instruction has
finished, <code><nobr>EBX</nobr></code> contains the address of the GOT.
<p>If you didn't follow that, don't worry: it's never necessary to obtain
the address of the GOT by any other means, so you can put those three
instructions into a macro and safely ignore them:
<p><pre>
%macro  get_GOT 0

        call    %%getgot
  %%getgot:
        pop     ebx
        add     ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc

%endmacro
</pre>
<h4><a name="section-8.2.2">8.2.2 Finding Your Local Data Items</a></h4>
<p>Having got the GOT, you can then use it to obtain the addresses of your
data items. Most variables will reside in the sections you have declared;
they can be accessed using the <code><nobr>..gotoff</nobr></code> special
<code><nobr>WRT</nobr></code> type. The way this works is like this:
<p><pre>
        lea     eax,[ebx+myvar wrt ..gotoff]
</pre>
<p>The expression <code><nobr>myvar wrt ..gotoff</nobr></code> is
calculated, when the shared library is linked, to be the offset to the
local variable <code><nobr>myvar</nobr></code> from the beginning of the
GOT. Therefore, adding it to <code><nobr>EBX</nobr></code> as above will
place the real address of <code><nobr>myvar</nobr></code> in
<code><nobr>EAX</nobr></code>.
<p>If you declare variables as <code><nobr>GLOBAL</nobr></code> without
specifying a size for them, they are shared between code modules in the
library, but do not get exported from the library to the program that
loaded it. They will still be in your ordinary data and BSS sections, so
you can access them in the same way as local variables, using the above
<code><nobr>..gotoff</nobr></code> mechanism.
<p>Note that due to a peculiarity of the way BSD
<code><nobr>a.out</nobr></code> format handles this relocation type, there
must be at least one non-local symbol in the same section as the address
you're trying to access.
<h4><a name="section-8.2.3">8.2.3 Finding External and Common Data Items</a></h4>
<p>If your library needs to get at an external variable (external to the
<em>library</em>, not just to one of the modules within it), you must use
the <code><nobr>..got</nobr></code> type to get at it. The
<code><nobr>..got</nobr></code> type, instead of giving you the offset from
the GOT base to the variable, gives you the offset from the GOT base to a
GOT <em>entry</em> containing the address of the variable. The linker will
set up this GOT entry when it builds the library, and the dynamic linker
will place the correct address in it at load time. So to obtain the address
of an external variable <code><nobr>extvar</nobr></code> in
<code><nobr>EAX</nobr></code>, you would code
<p><pre>
        mov     eax,[ebx+extvar wrt ..got]
</pre>
<p>This loads the address of <code><nobr>extvar</nobr></code> out of an
entry in the GOT. The linker, when it builds the shared library, collects
together every relocation of type <code><nobr>..got</nobr></code>, and
builds the GOT so as to ensure it has every necessary entry present.
<p>Common variables must also be accessed in this way.
<h4><a name="section-8.2.4">8.2.4 Exporting Symbols to the Library User</a></h4>
<p>If you want to export symbols to the user of the library, you have to
declare whether they are functions or data, and if they are data, you have
to give the size of the data item. This is because the dynamic linker has
to build procedure linkage table entries for any exported functions, and
also moves exported data items away from the library's data section in
which they were declared.
<p>So to export a function to users of the library, you must use
<p><pre>
global  func:function           ; declare it as a function

func:   push    ebp

        ; etc.
</pre>
<p>And to export a data item such as an array, you would have to code
<p><pre>
global  array:data array.end-array      ; give the size too

array:  resd    128
.end:
</pre>
<p>Be careful: If you export a variable to the library user, by declaring
it as <code><nobr>GLOBAL</nobr></code> and supplying a size, the variable
will end up living in the data section of the main program, rather than in
your library's data section, where you declared it. So you will have to
access your own global variable with the <code><nobr>..got</nobr></code>
mechanism rather than <code><nobr>..gotoff</nobr></code>, as if it were
external (which, effectively, it has become).
<p>Equally, if you need to store the address of an exported global in one
of your data sections, you can't do it by means of the standard sort of
code:
<p><pre>
dataptr:        dd      global_data_item        ; WRONG
</pre>
<p>NASM will interpret this code as an ordinary relocation, in which
<code><nobr>global_data_item</nobr></code> is merely an offset from the
beginning of the <code><nobr>.data</nobr></code> section (or whatever); so
this reference will end up pointing at your data section instead of at the
exported global which resides elsewhere.
<p>Instead of the above code, then, you must write
<p><pre>
dataptr:        dd      global_data_item wrt ..sym
</pre>
<p>which makes use of the special <code><nobr>WRT</nobr></code> type
<code><nobr>..sym</nobr></code> to instruct NASM to search the symbol table
for a particular symbol at that address, rather than just relocating by
section base.
<p>Either method will work for functions: referring to one of your
functions by means of
<p><pre>
funcptr:        dd      my_function
</pre>
<p>will give the user the address of the code you wrote, whereas
<p><pre>
funcptr:        dd      my_function wrt .sym
</pre>
<p>will give the address of the procedure linkage table for the function,
which is where the calling program will <em>believe</em> the function
lives. Either address is a valid way to call the function.
<h4><a name="section-8.2.5">8.2.5 Calling Procedures Outside the Library</a></h4>
<p>Calling procedures outside your shared library has to be done by means
of a <em>procedure linkage table</em>, or PLT. The PLT is placed at a known
offset from where the library is loaded, so the library code can make calls
to the PLT in a position-independent way. Within the PLT there is code to
jump to offsets contained in the GOT, so function calls to other shared
libraries or to routines in the main program can be transparently passed
off to their real destinations.
<p>To call an external routine, you must use another special PIC relocation
type, <code><nobr>WRT ..plt</nobr></code>. This is much easier than the
GOT-based ones: you simply replace calls such as
<code><nobr>CALL printf</nobr></code> with the PLT-relative version
<code><nobr>CALL printf WRT ..plt</nobr></code>.
<h4><a name="section-8.2.6">8.2.6 Generating the Library File</a></h4>
<p>Having written some code modules and assembled them to
<code><nobr>.o</nobr></code> files, you then generate your shared library
with a command such as
<p><pre>
ld -shared -o library.so module1.o module2.o       # for ELF
ld -Bshareable -o library.so module1.o module2.o   # for BSD
</pre>
<p>For ELF, if your shared library is going to reside in system directories
such as <code><nobr>/usr/lib</nobr></code> or
<code><nobr>/lib</nobr></code>, it is usually worth using the
<code><nobr>-soname</nobr></code> flag to the linker, to store the final
library file name, with a version number, into the library:
<p><pre>
ld -shared -soname library.so.1 -o library.so.1.2 *.o
</pre>
<p>You would then copy <code><nobr>library.so.1.2</nobr></code> into the
library directory, and create <code><nobr>library.so.1</nobr></code> as a
symbolic link to it.
<p align=center><a href="nasmdoc9.html">Next Chapter</a> |
<a href="nasmdoc7.html">Previous Chapter</a> |
<a href="nasmdoc0.html">Contents</a> |
<a href="nasmdoci.html">Index</a>
</body></html>