Files
oldlinux-files/study/Ref-docs/manual as/as_3.html
2024-02-19 00:25:23 -05:00

611 lines
18 KiB
HTML

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.52
from ../texi/as.texinfo on 24 April 1999 -->
<TITLE>Using as - Syntax</TITLE>
</HEAD>
<BODY>
Go to the <A HREF="as_1.html">first</A>, <A HREF="as_2.html">previous</A>, <A HREF="as_4.html">next</A>, <A HREF="as_27.html">last</A> section, <A HREF="as_toc.html">table of contents</A>.
<P><HR><P>
<H1><A NAME="SEC25" HREF="as_toc.html#TOC25">Syntax</A></H1>
<P>
<A NAME="IDX91"></A>
<A NAME="IDX92"></A>
This chapter describes the machine-independent syntax allowed in a
source file. <CODE>as</CODE> syntax is similar to what many other
assemblers use; it is inspired by the BSD 4.2
assembler, except that <CODE>as</CODE> does not assemble Vax bit-fields.
</P>
<H2><A NAME="SEC26" HREF="as_toc.html#TOC26">Preprocessing</A></H2>
<P>
<A NAME="IDX93"></A>
The <CODE>as</CODE> internal preprocessor:
<UL>
<LI>
<A NAME="IDX94"></A>
adjusts and removes extra whitespace. It leaves one space or tab before
the keywords on a line, and turns any other whitespace on the line into
a single space.
<A NAME="IDX95"></A>
<LI>
removes all comments, replacing them with a single space, or an
appropriate number of newlines.
<A NAME="IDX96"></A>
<LI>
converts character constants into the appropriate numeric values.
</UL>
<P>
It does not do macro processing, include file handling, or
anything else you may get from your C compiler's preprocessor. You can
do include file processing with the <CODE>.include</CODE> directive
(see section <A HREF="as_7.html#SEC97"><CODE>.include "<VAR>file</CODE>"</VAR></A>). You can use the GNU C compiler driver
to get other "CPP" style preprocessing, by giving the input file a
<SAMP>`.S'</SAMP> suffix. See section `Options Controlling the Kind of Output' in <CITE>Using GNU CC</CITE>.
</P>
<P>
Excess whitespace, comments, and character constants
cannot be used in the portions of the input text that are not
preprocessed.
</P>
<P>
<A NAME="IDX97"></A>
<A NAME="IDX98"></A>
<A NAME="IDX99"></A>
<A NAME="IDX100"></A>
If the first line of an input file is <CODE>#NO_APP</CODE> or if you use the
<SAMP>`-f'</SAMP> option, whitespace and comments are not removed from the input file.
Within an input file, you can ask for whitespace and comment removal in
specific portions of the by putting a line that says <CODE>#APP</CODE> before the
text that may contain whitespace or comments, and putting a line that says
<CODE>#NO_APP</CODE> after this text. This feature is mainly intend to support
<CODE>asm</CODE> statements in compilers whose output is otherwise free of comments
and whitespace.
</P>
<H2><A NAME="SEC27" HREF="as_toc.html#TOC27">Whitespace</A></H2>
<P>
<A NAME="IDX101"></A>
<EM>Whitespace</EM> is one or more blanks or tabs, in any order.
Whitespace is used to separate symbols, and to make programs neater for
people to read. Unless within character constants
(see section <A HREF="as_3.html#SEC32">Character Constants</A>), any whitespace means the same
as exactly one space.
</P>
<H2><A NAME="SEC28" HREF="as_toc.html#TOC28">Comments</A></H2>
<P>
<A NAME="IDX102"></A>
There are two ways of rendering comments to <CODE>as</CODE>. In both
cases the comment is equivalent to one space.
</P>
<P>
Anything from <SAMP>`/*'</SAMP> through the next <SAMP>`*/'</SAMP> is a comment.
This means you may not nest these comments.
</P>
<PRE>
/*
The only way to include a newline ('\n') in a comment
is to use this sort of comment.
*/
/* This sort of comment does not nest. */
</PRE>
<P>
<A NAME="IDX103"></A>
Anything from the <EM>line comment</EM> character to the next newline
is considered a comment and is ignored. The line comment character is
<SAMP>`;'</SAMP> for the AMD 29K family;
<SAMP>`;'</SAMP> on the ARC;
<SAMP>`;'</SAMP> for the H8/300 family;
<SAMP>`!'</SAMP> for the H8/500 family;
<SAMP>`;'</SAMP> for the HPPA;
<SAMP>`#'</SAMP> on the i960;
<SAMP>`!'</SAMP> for the Hitachi SH;
<SAMP>`!'</SAMP> on the SPARC;
<SAMP>`#'</SAMP> on the m32r;
<SAMP>`|'</SAMP> on the 680x0;
<SAMP>`#'</SAMP> on the Vax;
<SAMP>`!'</SAMP> for the Z8000;
<SAMP>`#'</SAMP> on the V850;
see section <A HREF="as_8.html#SEC138">Machine Dependent Features</A>.
</P>
<P>
On some machines there are two different line comment characters. One
character only begins a comment if it is the first non-whitespace character on
a line, while the other always begins a comment.
</P>
<P>
The V850 assembler also supports a double dash as starting a comment that
extends to the end of the line.
</P>
<P>
<SAMP>`--'</SAMP>;
</P>
<P>
<A NAME="IDX104"></A>
<A NAME="IDX105"></A>
<A NAME="IDX106"></A>
To be compatible with past assemblers, lines that begin with <SAMP>`#'</SAMP> have a
special interpretation. Following the <SAMP>`#'</SAMP> should be an absolute
expression (see section <A HREF="as_6.html#SEC60">Expressions</A>): the logical line number of the <EM>next</EM>
line. Then a string (see section <A HREF="as_3.html#SEC33">Strings</A>) is allowed: if present it is a
new logical file name. The rest of the line, if any, should be whitespace.
</P>
<P>
If the first non-whitespace characters on the line are not numeric,
the line is ignored. (Just like a comment.)
</P>
<PRE>
# This is an ordinary comment.
# 42-6 "new_file_name" # New logical file name
# This is logical line # 36.
</PRE>
<P>
This feature is deprecated, and may disappear from future versions
of <CODE>as</CODE>.
</P>
<H2><A NAME="SEC29" HREF="as_toc.html#TOC29">Symbols</A></H2>
<P>
<A NAME="IDX107"></A>
A <EM>symbol</EM> is one or more characters chosen from the set of all
letters (both upper and lower case), digits and the three characters
<SAMP>`_.$'</SAMP>.
On most machines, you can also use <CODE>$</CODE> in symbol names; exceptions
are noted in section <A HREF="as_8.html#SEC138">Machine Dependent Features</A>.
No symbol may begin with a digit. Case is significant.
There is no length limit: all characters are significant. Symbols are
delimited by characters not in that set, or by the beginning of a file
(since the source program must end with a newline, the end of a file is
not a possible symbol delimiter). See section <A HREF="as_5.html#SEC45">Symbols</A>.
<A NAME="IDX108"></A>
</P>
<H2><A NAME="SEC30" HREF="as_toc.html#TOC30">Statements</A></H2>
<P>
<A NAME="IDX109"></A>
<A NAME="IDX110"></A>
<A NAME="IDX111"></A>
A <EM>statement</EM> ends at a newline character (<SAMP>`\n'</SAMP>) or an "at"
sign (<SAMP>`@'</SAMP>). The newline or at sign is considered part of the
preceding statement. Newlines and at signs within character constants
are an exception: they do not end statements.
A <EM>statement</EM> ends at a newline character (<SAMP>`\n'</SAMP>) or an exclamation
point (<SAMP>`!'</SAMP>). The newline or exclamation point is considered part of the
preceding statement. Newlines and exclamation points within character
constants are an exception: they do not end statements.
A <EM>statement</EM> ends at a newline character (<SAMP>`\n'</SAMP>); or (for the
H8/300) a dollar sign (<SAMP>`$'</SAMP>); or (for the
Hitachi-SH or the
H8/500) a semicolon
(<SAMP>`;'</SAMP>). The newline or separator character is considered part of
the preceding statement. Newlines and separators within character
constants are an exception: they do not end statements.
A <EM>statement</EM> ends at a newline character (<SAMP>`\n'</SAMP>) or line
separator character. (The line separator is usually <SAMP>`;'</SAMP>, unless
this conflicts with the comment character; see section <A HREF="as_8.html#SEC138">Machine Dependent Features</A>.) The
newline or separator character is considered part of the preceding
statement. Newlines and separators within character constants are an
exception: they do not end statements.
</P>
<P>
<A NAME="IDX112"></A>
<A NAME="IDX113"></A>
It is an error to end any statement with end-of-file: the last
character of any input file should be a newline.
</P>
<P>
<A NAME="IDX114"></A>
<A NAME="IDX115"></A>
<A NAME="IDX116"></A>
You may write a statement on more than one line if you put a
backslash (<KBD>\</KBD>) immediately in front of any newlines within the
statement. When <CODE>as</CODE> reads a backslashed newline both
characters are ignored. You can even put backslashed newlines in
the middle of symbol names without changing the meaning of your
source program.
</P>
<P>
An empty statement is allowed, and may include whitespace. It is ignored.
</P>
<P>
<A NAME="IDX117"></A>
<A NAME="IDX118"></A>
A statement begins with zero or more labels, optionally followed by a
key symbol which determines what kind of statement it is. The key
symbol determines the syntax of the rest of the statement. If the
symbol begins with a dot <SAMP>`.'</SAMP> then the statement is an assembler
directive: typically valid for any computer. If the symbol begins with
a letter the statement is an assembly language <EM>instruction</EM>: it
assembles into a machine language instruction.
Different versions of <CODE>as</CODE> for different computers
recognize different instructions. In fact, the same symbol may
represent a different instruction in a different computer's assembly
language.
</P>
<P>
<A NAME="IDX119"></A>
<A NAME="IDX120"></A>
A label is a symbol immediately followed by a colon (<CODE>:</CODE>).
Whitespace before a label or after a colon is permitted, but you may not
have whitespace between a label's symbol and its colon. See section <A HREF="as_5.html#SEC46">Labels</A>.
</P>
<P>
For HPPA targets, labels need not be immediately followed by a colon, but
the definition of a label must begin in column zero. This also implies that
only one label may be defined on each line.
</P>
<PRE>
label: .directive followed by something
another_label: # This is an empty statement.
instruction operand_1, operand_2, ...
</PRE>
<H2><A NAME="SEC31" HREF="as_toc.html#TOC31">Constants</A></H2>
<P>
<A NAME="IDX121"></A>
A constant is a number, written so that its value is known by
inspection, without knowing any context. Like this:
<PRE>
.byte 74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value.
.ascii "Ring the bell\7" # A string constant.
.octa 0x123456789abcdef0123456789ABCDEF0 # A bignum.
.float 0f-314159265358979323846264338327\
95028841971.693993751E-40 # - pi, a flonum.
</PRE>
<H3><A NAME="SEC32" HREF="as_toc.html#TOC32">Character Constants</A></H3>
<P>
<A NAME="IDX122"></A>
<A NAME="IDX123"></A>
There are two kinds of character constants. A <EM>character</EM> stands
for one character in one byte and its value may be used in
numeric expressions. String constants (properly called string
<EM>literals</EM>) are potentially many bytes and their values may not be
used in arithmetic expressions.
</P>
<H4><A NAME="SEC33" HREF="as_toc.html#TOC33">Strings</A></H4>
<P>
<A NAME="IDX124"></A>
<A NAME="IDX125"></A>
A <EM>string</EM> is written between double-quotes. It may contain
double-quotes or null characters. The way to get special characters
into a string is to <EM>escape</EM> these characters: precede them with
a backslash <SAMP>`\'</SAMP> character. For example <SAMP>`\\'</SAMP> represents
one backslash: the first <CODE>\</CODE> is an escape which tells
<CODE>as</CODE> to interpret the second character literally as a backslash
(which prevents <CODE>as</CODE> from recognizing the second <CODE>\</CODE> as an
escape character). The complete list of escapes follows.
</P>
<P>
<A NAME="IDX126"></A>
<A NAME="IDX127"></A>
<DL COMPACT>
<DT><KBD>\b</KBD>
<DD>
<A NAME="IDX128"></A>
<A NAME="IDX129"></A>
Mnemonic for backspace; for ASCII this is octal code 010.
<A NAME="IDX130"></A>
<A NAME="IDX131"></A>
<DT><KBD>\f</KBD>
<DD>
Mnemonic for FormFeed; for ASCII this is octal code 014.
<A NAME="IDX132"></A>
<A NAME="IDX133"></A>
<DT><KBD>\n</KBD>
<DD>
Mnemonic for newline; for ASCII this is octal code 012.
<A NAME="IDX134"></A>
<A NAME="IDX135"></A>
<DT><KBD>\r</KBD>
<DD>
Mnemonic for carriage-Return; for ASCII this is octal code 015.
<A NAME="IDX136"></A>
<A NAME="IDX137"></A>
<DT><KBD>\t</KBD>
<DD>
Mnemonic for horizontal Tab; for ASCII this is octal code 011.
<A NAME="IDX138"></A>
<A NAME="IDX139"></A>
<DT><KBD>\ <VAR>digit</VAR> <VAR>digit</VAR> <VAR>digit</VAR></KBD>
<DD>
An octal character code. The numeric code is 3 octal digits.
For compatibility with other Unix systems, 8 and 9 are accepted as digits:
for example, <CODE>\008</CODE> has the value 010, and <CODE>\009</CODE> the value 011.
<A NAME="IDX140"></A>
<A NAME="IDX141"></A>
<DT><KBD>\<CODE>x</CODE> <VAR>hex-digits...</VAR></KBD>
<DD>
A hex character code. All trailing hex digits are combined. Either upper or
lower case <CODE>x</CODE> works.
<A NAME="IDX142"></A>
<A NAME="IDX143"></A>
<DT><KBD>\\</KBD>
<DD>
Represents one <SAMP>`\'</SAMP> character.
<A NAME="IDX144"></A>
<A NAME="IDX145"></A>
<DT><KBD>\"</KBD>
<DD>
Represents one <SAMP>`"'</SAMP> character. Needed in strings to represent
this character, because an unescaped <SAMP>`"'</SAMP> would end the string.
<DT><KBD>\ <VAR>anything-else</VAR></KBD>
<DD>
Any other character when escaped by <KBD>\</KBD> gives a warning, but
assembles as if the <SAMP>`\'</SAMP> was not present. The idea is that if
you used an escape sequence you clearly didn't want the literal
interpretation of the following character. However <CODE>as</CODE> has no
other interpretation, so <CODE>as</CODE> knows it is giving you the wrong
code and warns you of the fact.
</DL>
<P>
Which characters are escapable, and what those escapes represent,
varies widely among assemblers. The current set is what we think
the BSD 4.2 assembler recognizes, and is a subset of what most C
compilers recognize. If you are in doubt, do not use an escape
sequence.
</P>
<H4><A NAME="SEC34" HREF="as_toc.html#TOC34">Characters</A></H4>
<P>
<A NAME="IDX146"></A>
<A NAME="IDX147"></A>
<A NAME="IDX148"></A>
A single character may be written as a single quote immediately
followed by that character. The same escapes apply to characters as
to strings. So if you want to write the character backslash, you
must write <KBD>'\\</KBD> where the first <CODE>\</CODE> escapes the second
<CODE>\</CODE>. As you can see, the quote is an acute accent, not a
grave accent. A newline
(or at sign <SAMP>`@'</SAMP>)
(or dollar sign <SAMP>`$'</SAMP>, for the H8/300; or semicolon <SAMP>`;'</SAMP> for the
Hitachi SH or
H8/500)
immediately following an acute accent is taken as a literal character
and does not count as the end of a statement. The value of a character
constant in a numeric expression is the machine's byte-wide code for
that character. <CODE>as</CODE> assumes your character code is ASCII:
<KBD>'A</KBD> means 65, <KBD>'B</KBD> means 66, and so on.
</P>
<H3><A NAME="SEC35" HREF="as_toc.html#TOC35">Number Constants</A></H3>
<P>
<A NAME="IDX149"></A>
<A NAME="IDX150"></A>
<CODE>as</CODE> distinguishes three kinds of numbers according to how they
are stored in the target machine. <EM>Integers</EM> are numbers that
would fit into an <CODE>int</CODE> in the C language. <EM>Bignums</EM> are
integers, but they are stored in more than 32 bits. <EM>Flonums</EM>
are floating point numbers, described below.
</P>
<H4><A NAME="SEC36" HREF="as_toc.html#TOC36">Integers</A></H4>
<P>
<A NAME="IDX151"></A>
<A NAME="IDX152"></A>
</P>
<P>
<A NAME="IDX153"></A>
<A NAME="IDX154"></A>
A binary integer is <SAMP>`0b'</SAMP> or <SAMP>`0B'</SAMP> followed by zero or more of
the binary digits <SAMP>`01'</SAMP>.
</P>
<P>
<A NAME="IDX155"></A>
<A NAME="IDX156"></A>
An octal integer is <SAMP>`0'</SAMP> followed by zero or more of the octal
digits (<SAMP>`01234567'</SAMP>).
</P>
<P>
<A NAME="IDX157"></A>
<A NAME="IDX158"></A>
A decimal integer starts with a non-zero digit followed by zero or
more digits (<SAMP>`0123456789'</SAMP>).
</P>
<P>
<A NAME="IDX159"></A>
<A NAME="IDX160"></A>
A hexadecimal integer is <SAMP>`0x'</SAMP> or <SAMP>`0X'</SAMP> followed by one or
more hexadecimal digits chosen from <SAMP>`0123456789abcdefABCDEF'</SAMP>.
</P>
<P>
Integers have the usual values. To denote a negative integer, use
the prefix operator <SAMP>`-'</SAMP> discussed under expressions
(see section <A HREF="as_6.html#SEC65">Prefix Operator</A>).
</P>
<H4><A NAME="SEC37" HREF="as_toc.html#TOC37">Bignums</A></H4>
<P>
<A NAME="IDX161"></A>
<A NAME="IDX162"></A>
A <EM>bignum</EM> has the same syntax and semantics as an integer
except that the number (or its negative) takes more than 32 bits to
represent in binary. The distinction is made because in some places
integers are permitted while bignums are not.
</P>
<H4><A NAME="SEC38" HREF="as_toc.html#TOC38">Flonums</A></H4>
<P>
<A NAME="IDX163"></A>
<A NAME="IDX164"></A>
<A NAME="IDX165"></A>
</P>
<P>
<A NAME="IDX166"></A>
A <EM>flonum</EM> represents a floating point number. The translation is
indirect: a decimal floating point number from the text is converted by
<CODE>as</CODE> to a generic binary floating point number of more than
sufficient precision. This generic floating point number is converted
to a particular computer's floating point format (or formats) by a
portion of <CODE>as</CODE> specialized to that computer.
</P>
<P>
A flonum is written by writing (in order)
<UL>
<LI>
The digit <SAMP>`0'</SAMP>.
(<SAMP>`0'</SAMP> is optional on the HPPA.)
<LI>
A letter, to tell <CODE>as</CODE> the rest of the number is a flonum.
<KBD>e</KBD> is recommended. Case is not important.
On the H8/300, H8/500,
Hitachi SH,
and AMD 29K architectures, the letter must be
one of the letters <SAMP>`DFPRSX'</SAMP> (in upper or lower case).
On the ARC, the letter must be one of the letters <SAMP>`DFRS'</SAMP>
(in upper or lower case).
On the Intel 960 architecture, the letter must be
one of the letters <SAMP>`DFT'</SAMP> (in upper or lower case).
On the HPPA architecture, the letter must be <SAMP>`E'</SAMP> (upper case only).
<LI>
An optional sign: either <SAMP>`+'</SAMP> or <SAMP>`-'</SAMP>.
<LI>
An optional <EM>integer part</EM>: zero or more decimal digits.
<LI>
An optional <EM>fractional part</EM>: <SAMP>`.'</SAMP> followed by zero
or more decimal digits.
<LI>
An optional exponent, consisting of:
<UL>
<LI>
An <SAMP>`E'</SAMP> or <SAMP>`e'</SAMP>.
<LI>
Optional sign: either <SAMP>`+'</SAMP> or <SAMP>`-'</SAMP>.
<LI>
One or more decimal digits.
</UL>
</UL>
<P>
At least one of the integer part or the fractional part must be
present. The floating point number has the usual base-10 value.
</P>
<P>
<CODE>as</CODE> does all processing using integers. Flonums are computed
independently of any floating point hardware in the computer running
<CODE>as</CODE>.
</P>
<P><HR><P>
Go to the <A HREF="as_1.html">first</A>, <A HREF="as_2.html">previous</A>, <A HREF="as_4.html">next</A>, <A HREF="as_27.html">last</A> section, <A HREF="as_toc.html">table of contents</A>.
</BODY>
</HTML>