6304 lines
330 KiB
HTML
6304 lines
330 KiB
HTML
<html><head><title>NASM Manual</title></head>
|
|
<body><h1 align=center>The Netwide Assembler: NASM</h1>
|
|
|
|
<p align=center><a href="nasmdoca.html">Previous Chapter</a> |
|
|
<a href="nasmdoc0.html">Contents</a> |
|
|
<a href="nasmdoci.html">Index</a>
|
|
<h2><a name="appendix-B">Appendix B: x86 Instruction Reference</a></h2>
|
|
<p>This appendix provides a complete list of the machine instructions which
|
|
NASM will assemble, and a short description of the function of each one.
|
|
<p>It is not intended to be exhaustive documentation on the fine details of
|
|
the instructions' function, such as which exceptions they can trigger: for
|
|
such documentation, you should go to Intel's Web site,
|
|
<a href="http://developer.intel.com/design/Pentium4/manuals/"><code><nobr>http://developer.intel.com/design/Pentium4/manuals/</nobr></code></a>.
|
|
<p>Instead, this appendix is intended primarily to provide documentation on
|
|
the way the instructions may be used within NASM. For example, looking up
|
|
<code><nobr>LOOP</nobr></code> will tell you that NASM allows
|
|
<code><nobr>CX</nobr></code> or <code><nobr>ECX</nobr></code> to be
|
|
specified as an optional second argument to the
|
|
<code><nobr>LOOP</nobr></code> instruction, to enforce which of the two
|
|
possible counter registers should be used if the default is not the one
|
|
desired.
|
|
<p>The instructions are not quite listed in alphabetical order, since
|
|
groups of instructions with similar functions are lumped together in the
|
|
same entry. Most of them don't move very far from their alphabetic position
|
|
because of this.
|
|
<h3><a name="section-B.1">B.1 Key to Operand Specifications</a></h3>
|
|
<p>The instruction descriptions in this appendix specify their operands
|
|
using the following notation:
|
|
<ul>
|
|
<li>Registers: <code><nobr>reg8</nobr></code> denotes an 8-bit general
|
|
purpose register, <code><nobr>reg16</nobr></code> denotes a 16-bit general
|
|
purpose register, and <code><nobr>reg32</nobr></code> a 32-bit one.
|
|
<code><nobr>fpureg</nobr></code> denotes one of the eight FPU stack
|
|
registers, <code><nobr>mmxreg</nobr></code> denotes one of the eight 64-bit
|
|
MMX registers, and <code><nobr>segreg</nobr></code> denotes a segment
|
|
register. In addition, some registers (such as
|
|
<code><nobr>AL</nobr></code>, <code><nobr>DX</nobr></code> or
|
|
<code><nobr>ECX</nobr></code>) may be specified explicitly.
|
|
<li>Immediate operands: <code><nobr>imm</nobr></code> denotes a generic
|
|
immediate operand. <code><nobr>imm8</nobr></code>,
|
|
<code><nobr>imm16</nobr></code> and <code><nobr>imm32</nobr></code> are
|
|
used when the operand is intended to be a specific size. For some of these
|
|
instructions, NASM needs an explicit specifier: for example,
|
|
<code><nobr>ADD ESP,16</nobr></code> could be interpreted as either
|
|
<code><nobr>ADD r/m32,imm32</nobr></code> or
|
|
<code><nobr>ADD r/m32,imm8</nobr></code>. NASM chooses the former by
|
|
default, and so you must specify <code><nobr>ADD ESP,BYTE 16</nobr></code>
|
|
for the latter.
|
|
<li>Memory references: <code><nobr>mem</nobr></code> denotes a generic
|
|
memory reference; <code><nobr>mem8</nobr></code>,
|
|
<code><nobr>mem16</nobr></code>, <code><nobr>mem32</nobr></code>,
|
|
<code><nobr>mem64</nobr></code> and <code><nobr>mem80</nobr></code> are
|
|
used when the operand needs to be a specific size. Again, a specifier is
|
|
needed in some cases: <code><nobr>DEC [address]</nobr></code> is ambiguous
|
|
and will be rejected by NASM. You must specify
|
|
<code><nobr>DEC BYTE [address]</nobr></code>,
|
|
<code><nobr>DEC WORD [address]</nobr></code> or
|
|
<code><nobr>DEC DWORD [address]</nobr></code> instead.
|
|
<li>Restricted memory references: one form of the
|
|
<code><nobr>MOV</nobr></code> instruction allows a memory address to be
|
|
specified <em>without</em> allowing the normal range of register
|
|
combinations and effective address processing. This is denoted by
|
|
<code><nobr>memoffs8</nobr></code>, <code><nobr>memoffs16</nobr></code> and
|
|
<code><nobr>memoffs32</nobr></code>.
|
|
<li>Register or memory choices: many instructions can accept either a
|
|
register <em>or</em> a memory reference as an operand.
|
|
<code><nobr>r/m8</nobr></code> is a shorthand for
|
|
<code><nobr>reg8/mem8</nobr></code>; similarly
|
|
<code><nobr>r/m16</nobr></code> and <code><nobr>r/m32</nobr></code>.
|
|
<code><nobr>r/m64</nobr></code> is MMX-related, and is a shorthand for
|
|
<code><nobr>mmxreg/mem64</nobr></code>.
|
|
</ul>
|
|
<h3><a name="section-B.2">B.2 Key to Opcode Descriptions</a></h3>
|
|
<p>This appendix also provides the opcodes which NASM will generate for
|
|
each form of each instruction. The opcodes are listed in the following way:
|
|
<ul>
|
|
<li>A hex number, such as <code><nobr>3F</nobr></code>, indicates a fixed
|
|
byte containing that number.
|
|
<li>A hex number followed by <code><nobr>+r</nobr></code>, such as
|
|
<code><nobr>C8+r</nobr></code>, indicates that one of the operands to the
|
|
instruction is a register, and the `register value' of that register should
|
|
be added to the hex number to produce the generated byte. For example, EDX
|
|
has register value 2, so the code <code><nobr>C8+r</nobr></code>, when the
|
|
register operand is EDX, generates the hex byte
|
|
<code><nobr>CA</nobr></code>. Register values for specific registers are
|
|
given in <a href="#section-B.2.1">section B.2.1</a>.
|
|
<li>A hex number followed by <code><nobr>+cc</nobr></code>, such as
|
|
<code><nobr>40+cc</nobr></code>, indicates that the instruction name has a
|
|
condition code suffix, and the numeric representation of the condition code
|
|
should be added to the hex number to produce the generated byte. For
|
|
example, the code <code><nobr>40+cc</nobr></code>, when the instruction
|
|
contains the <code><nobr>NE</nobr></code> condition, generates the hex byte
|
|
<code><nobr>45</nobr></code>. Condition codes and their numeric
|
|
representations are given in <a href="#section-B.2.2">section B.2.2</a>.
|
|
<li>A slash followed by a digit, such as <code><nobr>/2</nobr></code>,
|
|
indicates that one of the operands to the instruction is a memory address
|
|
or register (denoted <code><nobr>mem</nobr></code> or
|
|
<code><nobr>r/m</nobr></code>, with an optional size). This is to be
|
|
encoded as an effective address, with a ModR/M byte, an optional SIB byte,
|
|
and an optional displacement, and the spare (register) field of the ModR/M
|
|
byte should be the digit given (which will be from 0 to 7, so it fits in
|
|
three bits). The encoding of effective addresses is given in
|
|
<a href="#section-B.2.5">section B.2.5</a>.
|
|
<li>The code <code><nobr>/r</nobr></code> combines the above two: it
|
|
indicates that one of the operands is a memory address or
|
|
<code><nobr>r/m</nobr></code>, and another is a register, and that an
|
|
effective address should be generated with the spare (register) field in
|
|
the ModR/M byte being equal to the `register value' of the register
|
|
operand. The encoding of effective addresses is given in
|
|
<a href="#section-B.2.5">section B.2.5</a>; register values are given in
|
|
<a href="#section-B.2.1">section B.2.1</a>.
|
|
<li>The codes <code><nobr>ib</nobr></code>, <code><nobr>iw</nobr></code>
|
|
and <code><nobr>id</nobr></code> indicate that one of the operands to the
|
|
instruction is an immediate value, and that this is to be encoded as a
|
|
byte, little-endian word or little-endian doubleword respectively.
|
|
<li>The codes <code><nobr>rb</nobr></code>, <code><nobr>rw</nobr></code>
|
|
and <code><nobr>rd</nobr></code> indicate that one of the operands to the
|
|
instruction is an immediate value, and that the <em>difference</em> between
|
|
this value and the address of the end of the instruction is to be encoded
|
|
as a byte, word or doubleword respectively. Where the form
|
|
<code><nobr>rw/rd</nobr></code> appears, it indicates that either
|
|
<code><nobr>rw</nobr></code> or <code><nobr>rd</nobr></code> should be used
|
|
according to whether assembly is being performed in
|
|
<code><nobr>BITS 16</nobr></code> or <code><nobr>BITS 32</nobr></code>
|
|
state respectively.
|
|
<li>The codes <code><nobr>ow</nobr></code> and <code><nobr>od</nobr></code>
|
|
indicate that one of the operands to the instruction is a reference to the
|
|
contents of a memory address specified as an immediate value: this encoding
|
|
is used in some forms of the <code><nobr>MOV</nobr></code> instruction in
|
|
place of the standard effective-address mechanism. The displacement is
|
|
encoded as a word or doubleword. Again, <code><nobr>ow/od</nobr></code>
|
|
denotes that <code><nobr>ow</nobr></code> or <code><nobr>od</nobr></code>
|
|
should be chosen according to the <code><nobr>BITS</nobr></code> setting.
|
|
<li>The codes <code><nobr>o16</nobr></code> and
|
|
<code><nobr>o32</nobr></code> indicate that the given form of the
|
|
instruction should be assembled with operand size 16 or 32 bits. In other
|
|
words, <code><nobr>o16</nobr></code> indicates a
|
|
<code><nobr>66</nobr></code> prefix in <code><nobr>BITS 32</nobr></code>
|
|
state, but generates no code in <code><nobr>BITS 16</nobr></code> state;
|
|
and <code><nobr>o32</nobr></code> indicates a <code><nobr>66</nobr></code>
|
|
prefix in <code><nobr>BITS 16</nobr></code> state but generates nothing in
|
|
<code><nobr>BITS 32</nobr></code>.
|
|
<li>The codes <code><nobr>a16</nobr></code> and
|
|
<code><nobr>a32</nobr></code>, similarly to <code><nobr>o16</nobr></code>
|
|
and <code><nobr>o32</nobr></code>, indicate the address size of the given
|
|
form of the instruction. Where this does not match the
|
|
<code><nobr>BITS</nobr></code> setting, a <code><nobr>67</nobr></code>
|
|
prefix is required.
|
|
</ul>
|
|
<h4><a name="section-B.2.1">B.2.1 Register Values</a></h4>
|
|
<p>Where an instruction requires a register value, it is already implicit
|
|
in the encoding of the rest of the instruction what type of register is
|
|
intended: an 8-bit general-purpose register, a segment register, a debug
|
|
register, an MMX register, or whatever. Therefore there is no problem with
|
|
registers of different types sharing an encoding value.
|
|
<p>The encodings for the various classes of register are:
|
|
<ul>
|
|
<li>8-bit general registers: <code><nobr>AL</nobr></code> is 0,
|
|
<code><nobr>CL</nobr></code> is 1, <code><nobr>DL</nobr></code> is 2,
|
|
<code><nobr>BL</nobr></code> is 3, <code><nobr>AH</nobr></code> is 4,
|
|
<code><nobr>CH</nobr></code> is 5, <code><nobr>DH</nobr></code> is 6, and
|
|
<code><nobr>BH</nobr></code> is 7.
|
|
<li>16-bit general registers: <code><nobr>AX</nobr></code> is 0,
|
|
<code><nobr>CX</nobr></code> is 1, <code><nobr>DX</nobr></code> is 2,
|
|
<code><nobr>BX</nobr></code> is 3, <code><nobr>SP</nobr></code> is 4,
|
|
<code><nobr>BP</nobr></code> is 5, <code><nobr>SI</nobr></code> is 6, and
|
|
<code><nobr>DI</nobr></code> is 7.
|
|
<li>32-bit general registers: <code><nobr>EAX</nobr></code> is 0,
|
|
<code><nobr>ECX</nobr></code> is 1, <code><nobr>EDX</nobr></code> is 2,
|
|
<code><nobr>EBX</nobr></code> is 3, <code><nobr>ESP</nobr></code> is 4,
|
|
<code><nobr>EBP</nobr></code> is 5, <code><nobr>ESI</nobr></code> is 6, and
|
|
<code><nobr>EDI</nobr></code> is 7.
|
|
<li>Segment registers: <code><nobr>ES</nobr></code> is 0,
|
|
<code><nobr>CS</nobr></code> is 1, <code><nobr>SS</nobr></code> is 2,
|
|
<code><nobr>DS</nobr></code> is 3, <code><nobr>FS</nobr></code> is 4, and
|
|
<code><nobr>GS</nobr></code> is 5.
|
|
<li>Floating-point registers: <code><nobr>ST0</nobr></code> is 0,
|
|
<code><nobr>ST1</nobr></code> is 1, <code><nobr>ST2</nobr></code> is 2,
|
|
<code><nobr>ST3</nobr></code> is 3, <code><nobr>ST4</nobr></code> is 4,
|
|
<code><nobr>ST5</nobr></code> is 5, <code><nobr>ST6</nobr></code> is 6, and
|
|
<code><nobr>ST7</nobr></code> is 7.
|
|
<li>64-bit MMX registers: <code><nobr>MM0</nobr></code> is 0,
|
|
<code><nobr>MM1</nobr></code> is 1, <code><nobr>MM2</nobr></code> is 2,
|
|
<code><nobr>MM3</nobr></code> is 3, <code><nobr>MM4</nobr></code> is 4,
|
|
<code><nobr>MM5</nobr></code> is 5, <code><nobr>MM6</nobr></code> is 6, and
|
|
<code><nobr>MM7</nobr></code> is 7.
|
|
<li>Control registers: <code><nobr>CR0</nobr></code> is 0,
|
|
<code><nobr>CR2</nobr></code> is 2, <code><nobr>CR3</nobr></code> is 3, and
|
|
<code><nobr>CR4</nobr></code> is 4.
|
|
<li>Debug registers: <code><nobr>DR0</nobr></code> is 0,
|
|
<code><nobr>DR1</nobr></code> is 1, <code><nobr>DR2</nobr></code> is 2,
|
|
<code><nobr>DR3</nobr></code> is 3, <code><nobr>DR6</nobr></code> is 6, and
|
|
<code><nobr>DR7</nobr></code> is 7.
|
|
<li>Test registers: <code><nobr>TR3</nobr></code> is 3,
|
|
<code><nobr>TR4</nobr></code> is 4, <code><nobr>TR5</nobr></code> is 5,
|
|
<code><nobr>TR6</nobr></code> is 6, and <code><nobr>TR7</nobr></code> is 7.
|
|
</ul>
|
|
<p>(Note that wherever a register name contains a number, that number is
|
|
also the register value for that register.)
|
|
<h4><a name="section-B.2.2">B.2.2 Condition Codes</a></h4>
|
|
<p>The available condition codes are given here, along with their numeric
|
|
representations as part of opcodes. Many of these condition codes have
|
|
synonyms, so several will be listed at a time.
|
|
<p>In the following descriptions, the word `either', when applied to two
|
|
possible trigger conditions, is used to mean `either or both'. If `either
|
|
but not both' is meant, the phrase `exactly one of' is used.
|
|
<ul>
|
|
<li><code><nobr>O</nobr></code> is 0 (trigger if the overflow flag is set);
|
|
<code><nobr>NO</nobr></code> is 1.
|
|
<li><code><nobr>B</nobr></code>, <code><nobr>C</nobr></code> and
|
|
<code><nobr>NAE</nobr></code> are 2 (trigger if the carry flag is set);
|
|
<code><nobr>AE</nobr></code>, <code><nobr>NB</nobr></code> and
|
|
<code><nobr>NC</nobr></code> are 3.
|
|
<li><code><nobr>E</nobr></code> and <code><nobr>Z</nobr></code> are 4
|
|
(trigger if the zero flag is set); <code><nobr>NE</nobr></code> and
|
|
<code><nobr>NZ</nobr></code> are 5.
|
|
<li><code><nobr>BE</nobr></code> and <code><nobr>NA</nobr></code> are 6
|
|
(trigger if either of the carry or zero flags is set);
|
|
<code><nobr>A</nobr></code> and <code><nobr>NBE</nobr></code> are 7.
|
|
<li><code><nobr>S</nobr></code> is 8 (trigger if the sign flag is set);
|
|
<code><nobr>NS</nobr></code> is 9.
|
|
<li><code><nobr>P</nobr></code> and <code><nobr>PE</nobr></code> are 10
|
|
(trigger if the parity flag is set); <code><nobr>NP</nobr></code> and
|
|
<code><nobr>PO</nobr></code> are 11.
|
|
<li><code><nobr>L</nobr></code> and <code><nobr>NGE</nobr></code> are 12
|
|
(trigger if exactly one of the sign and overflow flags is set);
|
|
<code><nobr>GE</nobr></code> and <code><nobr>NL</nobr></code> are 13.
|
|
<li><code><nobr>LE</nobr></code> and <code><nobr>NG</nobr></code> are 14
|
|
(trigger if either the zero flag is set, or exactly one of the sign and
|
|
overflow flags is set); <code><nobr>G</nobr></code> and
|
|
<code><nobr>NLE</nobr></code> are 15.
|
|
</ul>
|
|
<p>Note that in all cases, the sense of a condition code may be reversed by
|
|
changing the low bit of the numeric representation.
|
|
<p>For details of when an instruction sets each of the status flags, see
|
|
the individual instruction, plus the Status Flags reference in
|
|
<a href="#section-B.2.4">section B.2.4</a>
|
|
<h4><a name="section-B.2.3">B.2.3 SSE Condition Predicates</a></h4>
|
|
<p>The condition predicates for SSE comparison instructions are the codes
|
|
used as part of the opcode, to determine what form of comparison is being
|
|
carried out. In each case, the imm8 value is the final byte of the opcode
|
|
encoding, and the predicate is the code used as part of the mnemonic for
|
|
the instruction (equivalent to the "cc" in an integer instruction that used
|
|
a condition code). The instructions that use this will give details of what
|
|
the various mnemonics are, this table is used to help you work out details
|
|
of what is happening.
|
|
<p><pre>
|
|
Predi- imm8 Description Relation where: Emula- Result QNaN
|
|
cate Encod- A Is 1st Operand tion if NaN Signal
|
|
ing B Is 2nd Operand Operand Invalid
|
|
|
|
EQ 000B equal A = B False No
|
|
|
|
LT 001B less-than A < B False Yes
|
|
|
|
LE 010B less-than- A <= B False Yes
|
|
or-equal
|
|
|
|
--- ---- greater A > B Swap False Yes
|
|
than Operands,
|
|
Use LT
|
|
|
|
--- ---- greater- A >= B Swap False Yes
|
|
than-or-equal Operands,
|
|
Use LE
|
|
|
|
UNORD 011B unordered A, B = Unordered True No
|
|
|
|
NEQ 100B not-equal A != B True No
|
|
|
|
NLT 101B not-less- NOT(A < B) True Yes
|
|
than
|
|
|
|
NLE 110B not-less- NOT(A <= B) True Yes
|
|
than-or-
|
|
equal
|
|
|
|
--- ---- not-greater NOT(A > B) Swap True Yes
|
|
than Operands,
|
|
Use NLT
|
|
|
|
--- ---- not-greater NOT(A >= B) Swap True Yes
|
|
than- Operands,
|
|
or-equal Use NLE
|
|
|
|
ORD 111B ordered A , B = Ordered False No
|
|
</pre>
|
|
<p>The unordered relationship is true when at least one of the two values
|
|
being compared is a NaN or in an unsupported format.
|
|
<p>Note that the comparisons which are listed as not having a predicate or
|
|
encoding can only be achieved through software emulation, as described in
|
|
the "emulation" column. Note in particular that an instruction such as
|
|
<code><nobr>greater-than</nobr></code> is not the same as
|
|
<code><nobr>NLE</nobr></code>, as, unlike with the
|
|
<code><nobr>CMP</nobr></code> instruction, it has to take into account the
|
|
possibility of one operand containing a NaN or an unsupported numeric
|
|
format.
|
|
<h4><a name="section-B.2.4">B.2.4 Status Flags</a></h4>
|
|
<p>The status flags provide some information about the result of the
|
|
arithmetic instructions. This information can be used by conditional
|
|
instructions (such a <code><nobr>Jcc</nobr></code> and
|
|
<code><nobr>CMOVcc</nobr></code>) as well as by some of the other
|
|
instructions (such as <code><nobr>ADC</nobr></code> and
|
|
<code><nobr>INTO</nobr></code>).
|
|
<p>There are 6 status flags:
|
|
<p><pre>
|
|
CF - Carry flag.
|
|
</pre>
|
|
<p>Set if an arithmetic operation generates a carry or a borrow out of the
|
|
most-significant bit of the result; cleared otherwise. This flag indicates
|
|
an overflow condition for unsigned-integer arithmetic. It is also used in
|
|
multiple-precision arithmetic.
|
|
<p><pre>
|
|
PF - Parity flag.
|
|
</pre>
|
|
<p>Set if the least-significant byte of the result contains an even number
|
|
of 1 bits; cleared otherwise.
|
|
<p><pre>
|
|
AF - Adjust flag.
|
|
</pre>
|
|
<p>Set if an arithmetic operation generates a carry or a borrow out of bit
|
|
3 of the result; cleared otherwise. This flag is used in binary-coded
|
|
decimal (BCD) arithmetic.
|
|
<p><pre>
|
|
ZF - Zero flag.
|
|
</pre>
|
|
<p>Set if the result is zero; cleared otherwise.
|
|
<p><pre>
|
|
SF - Sign flag.
|
|
</pre>
|
|
<p>Set equal to the most-significant bit of the result, which is the sign
|
|
bit of a signed integer. (0 indicates a positive value and 1 indicates a
|
|
negative value.)
|
|
<p><pre>
|
|
OF - Overflow flag.
|
|
</pre>
|
|
<p>Set if the integer result is too large a positive number or too small a
|
|
negative number (excluding the sign-bit) to fit in the destination operand;
|
|
cleared otherwise. This flag indicates an overflow condition for
|
|
signed-integer (two's complement) arithmetic.
|
|
<h4><a name="section-B.2.5">B.2.5 Effective Address Encoding: ModR/M and SIB</a></h4>
|
|
<p>An effective address is encoded in up to three parts: a ModR/M byte, an
|
|
optional SIB byte, and an optional byte, word or doubleword displacement
|
|
field.
|
|
<p>The ModR/M byte consists of three fields: the
|
|
<code><nobr>mod</nobr></code> field, ranging from 0 to 3, in the upper two
|
|
bits of the byte, the <code><nobr>r/m</nobr></code> field, ranging from 0
|
|
to 7, in the lower three bits, and the spare (register) field in the middle
|
|
(bit 3 to bit 5). The spare field is not relevant to the effective address
|
|
being encoded, and either contains an extension to the instruction opcode
|
|
or the register value of another operand.
|
|
<p>The ModR/M system can be used to encode a direct register reference
|
|
rather than a memory access. This is always done by setting the
|
|
<code><nobr>mod</nobr></code> field to 3 and the
|
|
<code><nobr>r/m</nobr></code> field to the register value of the register
|
|
in question (it must be a general-purpose register, and the size of the
|
|
register must already be implicit in the encoding of the rest of the
|
|
instruction). In this case, the SIB byte and displacement field are both
|
|
absent.
|
|
<p>In 16-bit addressing mode (either <code><nobr>BITS 16</nobr></code> with
|
|
no <code><nobr>67</nobr></code> prefix, or
|
|
<code><nobr>BITS 32</nobr></code> with a <code><nobr>67</nobr></code>
|
|
prefix), the SIB byte is never used. The general rules for
|
|
<code><nobr>mod</nobr></code> and <code><nobr>r/m</nobr></code> (there is
|
|
an exception, given below) are:
|
|
<ul>
|
|
<li>The <code><nobr>mod</nobr></code> field gives the length of the
|
|
displacement field: 0 means no displacement, 1 means one byte, and 2 means
|
|
two bytes.
|
|
<li>The <code><nobr>r/m</nobr></code> field encodes the combination of
|
|
registers to be added to the displacement to give the accessed address: 0
|
|
means <code><nobr>BX+SI</nobr></code>, 1 means
|
|
<code><nobr>BX+DI</nobr></code>, 2 means <code><nobr>BP+SI</nobr></code>, 3
|
|
means <code><nobr>BP+DI</nobr></code>, 4 means <code><nobr>SI</nobr></code>
|
|
only, 5 means <code><nobr>DI</nobr></code> only, 6 means
|
|
<code><nobr>BP</nobr></code> only, and 7 means <code><nobr>BX</nobr></code>
|
|
only.
|
|
</ul>
|
|
<p>However, there is a special case:
|
|
<ul>
|
|
<li>If <code><nobr>mod</nobr></code> is 0 and <code><nobr>r/m</nobr></code>
|
|
is 6, the effective address encoded is not <code><nobr>[BP]</nobr></code>
|
|
as the above rules would suggest, but instead
|
|
<code><nobr>[disp16]</nobr></code>: the displacement field is present and
|
|
is two bytes long, and no registers are added to the displacement.
|
|
</ul>
|
|
<p>Therefore the effective address <code><nobr>[BP]</nobr></code> cannot be
|
|
encoded as efficiently as <code><nobr>[BX]</nobr></code>; so if you code
|
|
<code><nobr>[BP]</nobr></code> in a program, NASM adds a notional 8-bit
|
|
zero displacement, and sets <code><nobr>mod</nobr></code> to 1,
|
|
<code><nobr>r/m</nobr></code> to 6, and the one-byte displacement field to
|
|
0.
|
|
<p>In 32-bit addressing mode (either <code><nobr>BITS 16</nobr></code> with
|
|
a <code><nobr>67</nobr></code> prefix, or <code><nobr>BITS 32</nobr></code>
|
|
with no <code><nobr>67</nobr></code> prefix) the general rules (again,
|
|
there are exceptions) for <code><nobr>mod</nobr></code> and
|
|
<code><nobr>r/m</nobr></code> are:
|
|
<ul>
|
|
<li>The <code><nobr>mod</nobr></code> field gives the length of the
|
|
displacement field: 0 means no displacement, 1 means one byte, and 2 means
|
|
four bytes.
|
|
<li>If only one register is to be added to the displacement, and it is not
|
|
<code><nobr>ESP</nobr></code>, the <code><nobr>r/m</nobr></code> field
|
|
gives its register value, and the SIB byte is absent. If the
|
|
<code><nobr>r/m</nobr></code> field is 4 (which would encode
|
|
<code><nobr>ESP</nobr></code>), the SIB byte is present and gives the
|
|
combination and scaling of registers to be added to the displacement.
|
|
</ul>
|
|
<p>If the SIB byte is present, it describes the combination of registers
|
|
(an optional base register, and an optional index register scaled by
|
|
multiplication by 1, 2, 4 or 8) to be added to the displacement. The SIB
|
|
byte is divided into the <code><nobr>scale</nobr></code> field, in the top
|
|
two bits, the <code><nobr>index</nobr></code> field in the next three, and
|
|
the <code><nobr>base</nobr></code> field in the bottom three. The general
|
|
rules are:
|
|
<ul>
|
|
<li>The <code><nobr>base</nobr></code> field encodes the register value of
|
|
the base register.
|
|
<li>The <code><nobr>index</nobr></code> field encodes the register value of
|
|
the index register, unless it is 4, in which case no index register is used
|
|
(so <code><nobr>ESP</nobr></code> cannot be used as an index register).
|
|
<li>The <code><nobr>scale</nobr></code> field encodes the multiplier by
|
|
which the index register is scaled before adding it to the base and
|
|
displacement: 0 encodes a multiplier of 1, 1 encodes 2, 2 encodes 4 and 3
|
|
encodes 8.
|
|
</ul>
|
|
<p>The exceptions to the 32-bit encoding rules are:
|
|
<ul>
|
|
<li>If <code><nobr>mod</nobr></code> is 0 and <code><nobr>r/m</nobr></code>
|
|
is 5, the effective address encoded is not <code><nobr>[EBP]</nobr></code>
|
|
as the above rules would suggest, but instead
|
|
<code><nobr>[disp32]</nobr></code>: the displacement field is present and
|
|
is four bytes long, and no registers are added to the displacement.
|
|
<li>If <code><nobr>mod</nobr></code> is 0, <code><nobr>r/m</nobr></code> is
|
|
4 (meaning the SIB byte is present) and <code><nobr>base</nobr></code> is
|
|
4, the effective address encoded is not
|
|
<code><nobr>[EBP+index]</nobr></code> as the above rules would suggest, but
|
|
instead <code><nobr>[disp32+index]</nobr></code>: the displacement field is
|
|
present and is four bytes long, and there is no base register (but the
|
|
index register is still processed in the normal way).
|
|
</ul>
|
|
<h3><a name="section-B.3">B.3 Key to Instruction Flags</a></h3>
|
|
<p>Given along with each instruction in this appendix is a set of flags,
|
|
denoting the type of the instruction. The types are as follows:
|
|
<ul>
|
|
<li><code><nobr>8086</nobr></code>, <code><nobr>186</nobr></code>,
|
|
<code><nobr>286</nobr></code>, <code><nobr>386</nobr></code>,
|
|
<code><nobr>486</nobr></code>, <code><nobr>PENT</nobr></code> and
|
|
<code><nobr>P6</nobr></code> denote the lowest processor type that supports
|
|
the instruction. Most instructions run on all processors above the given
|
|
type; those that do not are documented. The Pentium II contains no
|
|
additional instructions beyond the P6 (Pentium Pro); from the point of view
|
|
of its instruction set, it can be thought of as a P6 with MMX capability.
|
|
<li><code><nobr>3DNOW</nobr></code> indicates that the instruction is a
|
|
3DNow! one, and will run on the AMD K6-2 and later processors. ATHLON
|
|
extensions to the 3DNow! instruction set are documented as such.
|
|
<li><code><nobr>CYRIX</nobr></code> indicates that the instruction is
|
|
specific to Cyrix processors, for example the extra MMX instructions in the
|
|
Cyrix extended MMX instruction set.
|
|
<li><code><nobr>FPU</nobr></code> indicates that the instruction is a
|
|
floating-point one, and will only run on machines with a coprocessor
|
|
(automatically including 486DX, Pentium and above).
|
|
<li><code><nobr>KATMAI</nobr></code> indicates that the instruction was
|
|
introduced as part of the Katmai New Instruction set. These instructions
|
|
are available on the Pentium III and later processors. Those which are not
|
|
specifically SSE instructions are also available on the AMD Athlon.
|
|
<li><code><nobr>MMX</nobr></code> indicates that the instruction is an MMX
|
|
one, and will run on MMX-capable Pentium processors and the Pentium II.
|
|
<li><code><nobr>PRIV</nobr></code> indicates that the instruction is a
|
|
protected-mode management instruction. Many of these may only be used in
|
|
protected mode, or only at privilege level zero.
|
|
<li><code><nobr>SSE</nobr></code> and <code><nobr>SSE2</nobr></code>
|
|
indicate that the instruction is a Streaming SIMD Extension instruction.
|
|
These instructions operate on multiple values in a single operation. SSE
|
|
was introduced with the Pentium III and SSE2 was introduced with the
|
|
Pentium 4.
|
|
<li><code><nobr>UNDOC</nobr></code> indicates that the instruction is an
|
|
undocumented one, and not part of the official Intel Architecture; it may
|
|
or may not be supported on any given machine.
|
|
<li><code><nobr>WILLAMETTE</nobr></code> indicates that the instruction was
|
|
introduced as part of the new instruction set in the Pentium 4 and Intel
|
|
Xeon processors. These instructions are also known as SSE2 instructions.
|
|
</ul>
|
|
<h3><a name="section-B.4">B.4 x86 Instruction Set</a></h3>
|
|
<h4><a name="section-B.4.1">B.4.1 <code><nobr>AAA</nobr></code>, <code><nobr>AAS</nobr></code>, <code><nobr>AAM</nobr></code>, <code><nobr>AAD</nobr></code>: ASCII Adjustments</a></h4>
|
|
<p><pre>
|
|
AAA ; 37 [8086]
|
|
</pre>
|
|
<p><pre>
|
|
AAS ; 3F [8086]
|
|
</pre>
|
|
<p><pre>
|
|
AAD ; D5 0A [8086]
|
|
AAD imm ; D5 ib [8086]
|
|
</pre>
|
|
<p><pre>
|
|
AAM ; D4 0A [8086]
|
|
AAM imm ; D4 ib [8086]
|
|
</pre>
|
|
<p>These instructions are used in conjunction with the add, subtract,
|
|
multiply and divide instructions to perform binary-coded decimal arithmetic
|
|
in <em>unpacked</em> (one BCD digit per byte - easy to translate to and
|
|
from <code><nobr>ASCII</nobr></code>, hence the instruction names) form.
|
|
There are also packed BCD instructions <code><nobr>DAA</nobr></code> and
|
|
<code><nobr>DAS</nobr></code>: see <a href="#section-B.4.57">section
|
|
B.4.57</a>.
|
|
<ul>
|
|
<li><code><nobr>AAA</nobr></code> (ASCII Adjust After Addition) should be
|
|
used after a one-byte <code><nobr>ADD</nobr></code> instruction whose
|
|
destination was the <code><nobr>AL</nobr></code> register: by means of
|
|
examining the value in the low nibble of <code><nobr>AL</nobr></code> and
|
|
also the auxiliary carry flag <code><nobr>AF</nobr></code>, it determines
|
|
whether the addition has overflowed, and adjusts it (and sets the carry
|
|
flag) if so. You can add long BCD strings together by doing
|
|
<code><nobr>ADD</nobr></code>/<code><nobr>AAA</nobr></code> on the low
|
|
digits, then doing
|
|
<code><nobr>ADC</nobr></code>/<code><nobr>AAA</nobr></code> on each
|
|
subsequent digit.
|
|
<li><code><nobr>AAS</nobr></code> (ASCII Adjust AL After Subtraction) works
|
|
similarly to <code><nobr>AAA</nobr></code>, but is for use after
|
|
<code><nobr>SUB</nobr></code> instructions rather than
|
|
<code><nobr>ADD</nobr></code>.
|
|
<li><code><nobr>AAM</nobr></code> (ASCII Adjust AX After Multiply) is for
|
|
use after you have multiplied two decimal digits together and left the
|
|
result in <code><nobr>AL</nobr></code>: it divides
|
|
<code><nobr>AL</nobr></code> by ten and stores the quotient in
|
|
<code><nobr>AH</nobr></code>, leaving the remainder in
|
|
<code><nobr>AL</nobr></code>. The divisor 10 can be changed by specifying
|
|
an operand to the instruction: a particularly handy use of this is
|
|
<code><nobr>AAM 16</nobr></code>, causing the two nibbles in
|
|
<code><nobr>AL</nobr></code> to be separated into
|
|
<code><nobr>AH</nobr></code> and <code><nobr>AL</nobr></code>.
|
|
<li><code><nobr>AAD</nobr></code> (ASCII Adjust AX Before Division)
|
|
performs the inverse operation to <code><nobr>AAM</nobr></code>: it
|
|
multiplies <code><nobr>AH</nobr></code> by ten, adds it to
|
|
<code><nobr>AL</nobr></code>, and sets <code><nobr>AH</nobr></code> to
|
|
zero. Again, the multiplier 10 can be changed.
|
|
</ul>
|
|
<h4><a name="section-B.4.2">B.4.2 <code><nobr>ADC</nobr></code>: Add with Carry</a></h4>
|
|
<p><pre>
|
|
ADC r/m8,reg8 ; 10 /r [8086]
|
|
ADC r/m16,reg16 ; o16 11 /r [8086]
|
|
ADC r/m32,reg32 ; o32 11 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADC reg8,r/m8 ; 12 /r [8086]
|
|
ADC reg16,r/m16 ; o16 13 /r [8086]
|
|
ADC reg32,r/m32 ; o32 13 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADC r/m8,imm8 ; 80 /2 ib [8086]
|
|
ADC r/m16,imm16 ; o16 81 /2 iw [8086]
|
|
ADC r/m32,imm32 ; o32 81 /2 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADC r/m16,imm8 ; o16 83 /2 ib [8086]
|
|
ADC r/m32,imm8 ; o32 83 /2 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADC AL,imm8 ; 14 ib [8086]
|
|
ADC AX,imm16 ; o16 15 iw [8086]
|
|
ADC EAX,imm32 ; o32 15 id [386]
|
|
</pre>
|
|
<p><code><nobr>ADC</nobr></code> performs integer addition: it adds its two
|
|
operands together, plus the value of the carry flag, and leaves the result
|
|
in its destination (first) operand. The destination operand can be a
|
|
register or a memory location. The source operand can be a register, a
|
|
memory location or an immediate value.
|
|
<p>The flags are set according to the result of the operation: in
|
|
particular, the carry flag is affected and can be used by a subsequent
|
|
<code><nobr>ADC</nobr></code> instruction.
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<p>To add two numbers without also adding the contents of the carry flag,
|
|
use <code><nobr>ADD</nobr></code> (<a href="#section-B.4.3">section
|
|
B.4.3</a>).
|
|
<h4><a name="section-B.4.3">B.4.3 <code><nobr>ADD</nobr></code>: Add Integers</a></h4>
|
|
<p><pre>
|
|
ADD r/m8,reg8 ; 00 /r [8086]
|
|
ADD r/m16,reg16 ; o16 01 /r [8086]
|
|
ADD r/m32,reg32 ; o32 01 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADD reg8,r/m8 ; 02 /r [8086]
|
|
ADD reg16,r/m16 ; o16 03 /r [8086]
|
|
ADD reg32,r/m32 ; o32 03 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADD r/m8,imm8 ; 80 /0 ib [8086]
|
|
ADD r/m16,imm16 ; o16 81 /0 iw [8086]
|
|
ADD r/m32,imm32 ; o32 81 /0 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADD r/m16,imm8 ; o16 83 /0 ib [8086]
|
|
ADD r/m32,imm8 ; o32 83 /0 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
ADD AL,imm8 ; 04 ib [8086]
|
|
ADD AX,imm16 ; o16 05 iw [8086]
|
|
ADD EAX,imm32 ; o32 05 id [386]
|
|
</pre>
|
|
<p><code><nobr>ADD</nobr></code> performs integer addition: it adds its two
|
|
operands together, and leaves the result in its destination (first)
|
|
operand. The destination operand can be a register or a memory location.
|
|
The source operand can be a register, a memory location or an immediate
|
|
value.
|
|
<p>The flags are set according to the result of the operation: in
|
|
particular, the carry flag is affected and can be used by a subsequent
|
|
<code><nobr>ADC</nobr></code> instruction.
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<h4><a name="section-B.4.4">B.4.4 <code><nobr>ADDPD</nobr></code>: ADD Packed Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
ADDPD xmm1,xmm2/mem128 ; 66 0F 58 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>ADDPD</nobr></code> performs addition on each of two packed
|
|
double-precision FP value pairs.
|
|
<p><pre>
|
|
dst[0-63] := dst[0-63] + src[0-63],
|
|
dst[64-127] := dst[64-127] + src[64-127].
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<h4><a name="section-B.4.5">B.4.5 <code><nobr>ADDPS</nobr></code>: ADD Packed Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
ADDPS xmm1,xmm2/mem128 ; 0F 58 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>ADDPS</nobr></code> performs addition on each of four packed
|
|
single-precision FP value pairs
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] + src[0-31],
|
|
dst[32-63] := dst[32-63] + src[32-63],
|
|
dst[64-95] := dst[64-95] + src[64-95],
|
|
dst[96-127] := dst[96-127] + src[96-127].
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<h4><a name="section-B.4.6">B.4.6 <code><nobr>ADDSD</nobr></code>: ADD Scalar Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
ADDSD xmm1,xmm2/mem64 ; F2 0F 58 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>ADDSD</nobr></code> adds the low double-precision FP values
|
|
from the source and destination operands and stores the double-precision FP
|
|
result in the destination operand.
|
|
<p><pre>
|
|
dst[0-63] := dst[0-63] + src[0-63],
|
|
dst[64-127) remains unchanged.
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a 64-bit
|
|
memory location.
|
|
<h4><a name="section-B.4.7">B.4.7 <code><nobr>ADDSS</nobr></code>: ADD Scalar Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
ADDSS xmm1,xmm2/mem32 ; F3 0F 58 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>ADDSS</nobr></code> adds the low single-precision FP values
|
|
from the source and destination operands and stores the single-precision FP
|
|
result in the destination operand.
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] + src[0-31],
|
|
dst[32-127] remains unchanged.
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a 32-bit
|
|
memory location.
|
|
<h4><a name="section-B.4.8">B.4.8 <code><nobr>AND</nobr></code>: Bitwise AND</a></h4>
|
|
<p><pre>
|
|
AND r/m8,reg8 ; 20 /r [8086]
|
|
AND r/m16,reg16 ; o16 21 /r [8086]
|
|
AND r/m32,reg32 ; o32 21 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
AND reg8,r/m8 ; 22 /r [8086]
|
|
AND reg16,r/m16 ; o16 23 /r [8086]
|
|
AND reg32,r/m32 ; o32 23 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
AND r/m8,imm8 ; 80 /4 ib [8086]
|
|
AND r/m16,imm16 ; o16 81 /4 iw [8086]
|
|
AND r/m32,imm32 ; o32 81 /4 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
AND r/m16,imm8 ; o16 83 /4 ib [8086]
|
|
AND r/m32,imm8 ; o32 83 /4 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
AND AL,imm8 ; 24 ib [8086]
|
|
AND AX,imm16 ; o16 25 iw [8086]
|
|
AND EAX,imm32 ; o32 25 id [386]
|
|
</pre>
|
|
<p><code><nobr>AND</nobr></code> performs a bitwise AND operation between
|
|
its two operands (i.e. each bit of the result is 1 if and only if the
|
|
corresponding bits of the two inputs were both 1), and stores the result in
|
|
the destination (first) operand. The destination operand can be a register
|
|
or a memory location. The source operand can be a register, a memory
|
|
location or an immediate value.
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<p>The <code><nobr>MMX</nobr></code> instruction
|
|
<code><nobr>PAND</nobr></code> (see <a href="#section-B.4.202">section
|
|
B.4.202</a>) performs the same operation on the 64-bit
|
|
<code><nobr>MMX</nobr></code> registers.
|
|
<h4><a name="section-B.4.9">B.4.9 <code><nobr>ANDNPD</nobr></code>: Bitwise Logical AND NOT of Packed Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
ANDNPD xmm1,xmm2/mem128 ; 66 0F 55 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>ANDNPD</nobr></code> inverts the bits of the two
|
|
double-precision floating-point values in the destination register, and
|
|
then performs a logical AND between the two double-precision floating-point
|
|
values in the source operand and the temporary inverted result, storing the
|
|
result in the destination register.
|
|
<p><pre>
|
|
dst[0-63] := src[0-63] AND NOT dst[0-63],
|
|
dst[64-127] := src[64-127] AND NOT dst[64-127].
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<h4><a name="section-B.4.10">B.4.10 <code><nobr>ANDNPS</nobr></code>: Bitwise Logical AND NOT of Packed Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
ANDNPS xmm1,xmm2/mem128 ; 0F 55 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>ANDNPS</nobr></code> inverts the bits of the four
|
|
single-precision floating-point values in the destination register, and
|
|
then performs a logical AND between the four single-precision
|
|
floating-point values in the source operand and the temporary inverted
|
|
result, storing the result in the destination register.
|
|
<p><pre>
|
|
dst[0-31] := src[0-31] AND NOT dst[0-31],
|
|
dst[32-63] := src[32-63] AND NOT dst[32-63],
|
|
dst[64-95] := src[64-95] AND NOT dst[64-95],
|
|
dst[96-127] := src[96-127] AND NOT dst[96-127].
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<h4><a name="section-B.4.11">B.4.11 <code><nobr>ANDPD</nobr></code>: Bitwise Logical AND For Single FP</a></h4>
|
|
<p><pre>
|
|
ANDPD xmm1,xmm2/mem128 ; 66 0F 54 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>ANDPD</nobr></code> performs a bitwise logical AND of the
|
|
two double-precision floating point values in the source and destination
|
|
operand, and stores the result in the destination register.
|
|
<p><pre>
|
|
dst[0-63] := src[0-63] AND dst[0-63],
|
|
dst[64-127] := src[64-127] AND dst[64-127].
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<h4><a name="section-B.4.12">B.4.12 <code><nobr>ANDPS</nobr></code>: Bitwise Logical AND For Single FP</a></h4>
|
|
<p><pre>
|
|
ANDPS xmm1,xmm2/mem128 ; 0F 54 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>ANDPS</nobr></code> performs a bitwise logical AND of the
|
|
four single-precision floating point values in the source and destination
|
|
operand, and stores the result in the destination register.
|
|
<p><pre>
|
|
dst[0-31] := src[0-31] AND dst[0-31],
|
|
dst[32-63] := src[32-63] AND dst[32-63],
|
|
dst[64-95] := src[64-95] AND dst[64-95],
|
|
dst[96-127] := src[96-127] AND dst[96-127].
|
|
</pre>
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<h4><a name="section-B.4.13">B.4.13 <code><nobr>ARPL</nobr></code>: Adjust RPL Field of Selector</a></h4>
|
|
<p><pre>
|
|
ARPL r/m16,reg16 ; 63 /r [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>ARPL</nobr></code> expects its two word operands to be
|
|
segment selectors. It adjusts the <code><nobr>RPL</nobr></code> (requested
|
|
privilege level - stored in the bottom two bits of the selector) field of
|
|
the destination (first) operand to ensure that it is no less (i.e. no more
|
|
privileged than) the <code><nobr>RPL</nobr></code> field of the source
|
|
operand. The zero flag is set if and only if a change had to be made.
|
|
<h4><a name="section-B.4.14">B.4.14 <code><nobr>BOUND</nobr></code>: Check Array Index against Bounds</a></h4>
|
|
<p><pre>
|
|
BOUND reg16,mem ; o16 62 /r [186]
|
|
BOUND reg32,mem ; o32 62 /r [386]
|
|
</pre>
|
|
<p><code><nobr>BOUND</nobr></code> expects its second operand to point to
|
|
an area of memory containing two signed values of the same size as its
|
|
first operand (i.e. two words for the 16-bit form; two doublewords for the
|
|
32-bit form). It performs two signed comparisons: if the value in the
|
|
register passed as its first operand is less than the first of the
|
|
in-memory values, or is greater than or equal to the second, it throws a
|
|
<code><nobr>BR</nobr></code> exception. Otherwise, it does nothing.
|
|
<h4><a name="section-B.4.15">B.4.15 <code><nobr>BSF</nobr></code>, <code><nobr>BSR</nobr></code>: Bit Scan</a></h4>
|
|
<p><pre>
|
|
BSF reg16,r/m16 ; o16 0F BC /r [386]
|
|
BSF reg32,r/m32 ; o32 0F BC /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
BSR reg16,r/m16 ; o16 0F BD /r [386]
|
|
BSR reg32,r/m32 ; o32 0F BD /r [386]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>BSF</nobr></code> searches for the least significant set
|
|
bit in its source (second) operand, and if it finds one, stores the index
|
|
in its destination (first) operand. If no set bit is found, the contents of
|
|
the destination operand are undefined. If the source operand is zero, the
|
|
zero flag is set.
|
|
<li><code><nobr>BSR</nobr></code> performs the same function, but searches
|
|
from the top instead, so it finds the most significant set bit.
|
|
</ul>
|
|
<p>Bit indices are from 0 (least significant) to 15 or 31 (most
|
|
significant). The destination operand can only be a register. The source
|
|
operand can be a register or a memory location.
|
|
<h4><a name="section-B.4.16">B.4.16 <code><nobr>BSWAP</nobr></code>: Byte Swap</a></h4>
|
|
<p><pre>
|
|
BSWAP reg32 ; o32 0F C8+r [486]
|
|
</pre>
|
|
<p><code><nobr>BSWAP</nobr></code> swaps the order of the four bytes of a
|
|
32-bit register: bits 0-7 exchange places with bits 24-31, and bits 8-15
|
|
swap with bits 16-23. There is no explicit 16-bit equivalent: to byte-swap
|
|
<code><nobr>AX</nobr></code>, <code><nobr>BX</nobr></code>,
|
|
<code><nobr>CX</nobr></code> or <code><nobr>DX</nobr></code>,
|
|
<code><nobr>XCHG</nobr></code> can be used. When
|
|
<code><nobr>BSWAP</nobr></code> is used with a 16-bit register, the result
|
|
is undefined.
|
|
<h4><a name="section-B.4.17">B.4.17 <code><nobr>BT</nobr></code>, <code><nobr>BTC</nobr></code>, <code><nobr>BTR</nobr></code>, <code><nobr>BTS</nobr></code>: Bit Test</a></h4>
|
|
<p><pre>
|
|
BT r/m16,reg16 ; o16 0F A3 /r [386]
|
|
BT r/m32,reg32 ; o32 0F A3 /r [386]
|
|
BT r/m16,imm8 ; o16 0F BA /4 ib [386]
|
|
BT r/m32,imm8 ; o32 0F BA /4 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
BTC r/m16,reg16 ; o16 0F BB /r [386]
|
|
BTC r/m32,reg32 ; o32 0F BB /r [386]
|
|
BTC r/m16,imm8 ; o16 0F BA /7 ib [386]
|
|
BTC r/m32,imm8 ; o32 0F BA /7 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
BTR r/m16,reg16 ; o16 0F B3 /r [386]
|
|
BTR r/m32,reg32 ; o32 0F B3 /r [386]
|
|
BTR r/m16,imm8 ; o16 0F BA /6 ib [386]
|
|
BTR r/m32,imm8 ; o32 0F BA /6 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
BTS r/m16,reg16 ; o16 0F AB /r [386]
|
|
BTS r/m32,reg32 ; o32 0F AB /r [386]
|
|
BTS r/m16,imm ; o16 0F BA /5 ib [386]
|
|
BTS r/m32,imm ; o32 0F BA /5 ib [386]
|
|
</pre>
|
|
<p>These instructions all test one bit of their first operand, whose index
|
|
is given by the second operand, and store the value of that bit into the
|
|
carry flag. Bit indices are from 0 (least significant) to 15 or 31 (most
|
|
significant).
|
|
<p>In addition to storing the original value of the bit into the carry
|
|
flag, <code><nobr>BTR</nobr></code> also resets (clears) the bit in the
|
|
operand itself. <code><nobr>BTS</nobr></code> sets the bit, and
|
|
<code><nobr>BTC</nobr></code> complements the bit.
|
|
<code><nobr>BT</nobr></code> does not modify its operands.
|
|
<p>The destination can be a register or a memory location. The source can
|
|
be a register or an immediate value.
|
|
<p>If the destination operand is a register, the bit offset should be in
|
|
the range 0-15 (for 16-bit operands) or 0-31 (for 32-bit operands). An
|
|
immediate value outside these ranges will be taken modulo 16/32 by the
|
|
processor.
|
|
<p>If the destination operand is a memory location, then an immediate bit
|
|
offset follows the same rules as for a register. If the bit offset is in a
|
|
register, then it can be anything within the signed range of the register
|
|
used (ie, for a 32-bit operand, it can be (-2^31) to (2^31 - 1)
|
|
<h4><a name="section-B.4.18">B.4.18 <code><nobr>CALL</nobr></code>: Call Subroutine</a></h4>
|
|
<p><pre>
|
|
CALL imm ; E8 rw/rd [8086]
|
|
CALL imm:imm16 ; o16 9A iw iw [8086]
|
|
CALL imm:imm32 ; o32 9A id iw [386]
|
|
CALL FAR mem16 ; o16 FF /3 [8086]
|
|
CALL FAR mem32 ; o32 FF /3 [386]
|
|
CALL r/m16 ; o16 FF /2 [8086]
|
|
CALL r/m32 ; o32 FF /2 [386]
|
|
</pre>
|
|
<p><code><nobr>CALL</nobr></code> calls a subroutine, by means of pushing
|
|
the current instruction pointer (<code><nobr>IP</nobr></code>) and
|
|
optionally <code><nobr>CS</nobr></code> as well on the stack, and then
|
|
jumping to a given address.
|
|
<p><code><nobr>CS</nobr></code> is pushed as well as
|
|
<code><nobr>IP</nobr></code> if and only if the call is a far call, i.e. a
|
|
destination segment address is specified in the instruction. The forms
|
|
involving two colon-separated arguments are far calls; so are the
|
|
<code><nobr>CALL FAR mem</nobr></code> forms.
|
|
<p>The immediate near call takes one of two forms
|
|
(<code><nobr>call imm16/imm32</nobr></code>, determined by the current
|
|
segment size limit. For 16-bit operands, you would use
|
|
<code><nobr>CALL 0x1234</nobr></code>, and for 32-bit operands you would
|
|
use <code><nobr>CALL 0x12345678</nobr></code>. The value passed as an
|
|
operand is a relative offset.
|
|
<p>You can choose between the two immediate far call forms
|
|
(<code><nobr>CALL imm:imm</nobr></code>) by the use of the
|
|
<code><nobr>WORD</nobr></code> and <code><nobr>DWORD</nobr></code>
|
|
keywords: <code><nobr>CALL WORD 0x1234:0x5678</nobr></code>) or
|
|
<code><nobr>CALL DWORD 0x1234:0x56789abc</nobr></code>.
|
|
<p>The <code><nobr>CALL FAR mem</nobr></code> forms execute a far call by
|
|
loading the destination address out of memory. The address loaded consists
|
|
of 16 or 32 bits of offset (depending on the operand size), and 16 bits of
|
|
segment. The operand size may be overridden using
|
|
<code><nobr>CALL WORD FAR mem</nobr></code> or
|
|
<code><nobr>CALL DWORD FAR mem</nobr></code>.
|
|
<p>The <code><nobr>CALL r/m</nobr></code> forms execute a near call (within
|
|
the same segment), loading the destination address out of memory or out of
|
|
a register. The keyword <code><nobr>NEAR</nobr></code> may be specified,
|
|
for clarity, in these forms, but is not necessary. Again, operand size can
|
|
be overridden using <code><nobr>CALL WORD mem</nobr></code> or
|
|
<code><nobr>CALL DWORD mem</nobr></code>.
|
|
<p>As a convenience, NASM does not require you to call a far procedure
|
|
symbol by coding the cumbersome
|
|
<code><nobr>CALL SEG routine:routine</nobr></code>, but instead allows the
|
|
easier synonym <code><nobr>CALL FAR routine</nobr></code>.
|
|
<p>The <code><nobr>CALL r/m</nobr></code> forms given above are near calls;
|
|
NASM will accept the <code><nobr>NEAR</nobr></code> keyword (e.g.
|
|
<code><nobr>CALL NEAR [address]</nobr></code>), even though it is not
|
|
strictly necessary.
|
|
<h4><a name="section-B.4.19">B.4.19 <code><nobr>CBW</nobr></code>, <code><nobr>CWD</nobr></code>, <code><nobr>CDQ</nobr></code>, <code><nobr>CWDE</nobr></code>: Sign Extensions</a></h4>
|
|
<p><pre>
|
|
CBW ; o16 98 [8086]
|
|
CWDE ; o32 98 [386]
|
|
</pre>
|
|
<p><pre>
|
|
CWD ; o16 99 [8086]
|
|
CDQ ; o32 99 [386]
|
|
</pre>
|
|
<p>All these instructions sign-extend a short value into a longer one, by
|
|
replicating the top bit of the original value to fill the extended one.
|
|
<p><code><nobr>CBW</nobr></code> extends <code><nobr>AL</nobr></code> into
|
|
<code><nobr>AX</nobr></code> by repeating the top bit of
|
|
<code><nobr>AL</nobr></code> in every bit of <code><nobr>AH</nobr></code>.
|
|
<code><nobr>CWDE</nobr></code> extends <code><nobr>AX</nobr></code> into
|
|
<code><nobr>EAX</nobr></code>. <code><nobr>CWD</nobr></code> extends
|
|
<code><nobr>AX</nobr></code> into <code><nobr>DX:AX</nobr></code> by
|
|
repeating the top bit of <code><nobr>AX</nobr></code> throughout
|
|
<code><nobr>DX</nobr></code>, and <code><nobr>CDQ</nobr></code> extends
|
|
<code><nobr>EAX</nobr></code> into <code><nobr>EDX:EAX</nobr></code>.
|
|
<h4><a name="section-B.4.20">B.4.20 <code><nobr>CLC</nobr></code>, <code><nobr>CLD</nobr></code>, <code><nobr>CLI</nobr></code>, <code><nobr>CLTS</nobr></code>: Clear Flags</a></h4>
|
|
<p><pre>
|
|
CLC ; F8 [8086]
|
|
CLD ; FC [8086]
|
|
CLI ; FA [8086]
|
|
CLTS ; 0F 06 [286,PRIV]
|
|
</pre>
|
|
<p>These instructions clear various flags. <code><nobr>CLC</nobr></code>
|
|
clears the carry flag; <code><nobr>CLD</nobr></code> clears the direction
|
|
flag; <code><nobr>CLI</nobr></code> clears the interrupt flag (thus
|
|
disabling interrupts); and <code><nobr>CLTS</nobr></code> clears the
|
|
task-switched (<code><nobr>TS</nobr></code>) flag in
|
|
<code><nobr>CR0</nobr></code>.
|
|
<p>To set the carry, direction, or interrupt flags, use the
|
|
<code><nobr>STC</nobr></code>, <code><nobr>STD</nobr></code> and
|
|
<code><nobr>STI</nobr></code> instructions
|
|
(<a href="#section-B.4.301">section B.4.301</a>). To invert the carry flag,
|
|
use <code><nobr>CMC</nobr></code> (<a href="#section-B.4.22">section
|
|
B.4.22</a>).
|
|
<h4><a name="section-B.4.21">B.4.21 <code><nobr>CLFLUSH</nobr></code>: Flush Cache Line</a></h4>
|
|
<p><pre>
|
|
CLFLUSH mem ; 0F AE /7 [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CLFLUSH</nobr></code> invalidates the cache line that
|
|
contains the linear address specified by the source operand from all levels
|
|
of the processor cache hierarchy (data and instruction). If, at any level
|
|
of the cache hierarchy, the line is inconsistent with memory (dirty) it is
|
|
written to memory before invalidation. The source operand points to a
|
|
byte-sized memory location.
|
|
<p>Although <code><nobr>CLFLUSH</nobr></code> is flagged
|
|
<code><nobr>SSE2</nobr></code> and above, it may not be present on all
|
|
processors which have <code><nobr>SSE2</nobr></code> support, and it may be
|
|
supported on other processors; the <code><nobr>CPUID</nobr></code>
|
|
instruction (<a href="#section-B.4.34">section B.4.34</a>) will return a
|
|
bit which indicates support for the <code><nobr>CLFLUSH</nobr></code>
|
|
instruction.
|
|
<h4><a name="section-B.4.22">B.4.22 <code><nobr>CMC</nobr></code>: Complement Carry Flag</a></h4>
|
|
<p><pre>
|
|
CMC ; F5 [8086]
|
|
</pre>
|
|
<p><code><nobr>CMC</nobr></code> changes the value of the carry flag: if it
|
|
was 0, it sets it to 1, and vice versa.
|
|
<h4><a name="section-B.4.23">B.4.23 <code><nobr>CMOVcc</nobr></code>: Conditional Move</a></h4>
|
|
<p><pre>
|
|
CMOVcc reg16,r/m16 ; o16 0F 40+cc /r [P6]
|
|
CMOVcc reg32,r/m32 ; o32 0F 40+cc /r [P6]
|
|
</pre>
|
|
<p><code><nobr>CMOV</nobr></code> moves its source (second) operand into
|
|
its destination (first) operand if the given condition code is satisfied;
|
|
otherwise it does nothing.
|
|
<p>For a list of condition codes, see <a href="#section-B.2.2">section
|
|
B.2.2</a>.
|
|
<p>Although the <code><nobr>CMOV</nobr></code> instructions are flagged
|
|
<code><nobr>P6</nobr></code> and above, they may not be supported by all
|
|
Pentium Pro processors; the <code><nobr>CPUID</nobr></code> instruction
|
|
(<a href="#section-B.4.34">section B.4.34</a>) will return a bit which
|
|
indicates whether conditional moves are supported.
|
|
<h4><a name="section-B.4.24">B.4.24 <code><nobr>CMP</nobr></code>: Compare Integers</a></h4>
|
|
<p><pre>
|
|
CMP r/m8,reg8 ; 38 /r [8086]
|
|
CMP r/m16,reg16 ; o16 39 /r [8086]
|
|
CMP r/m32,reg32 ; o32 39 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
CMP reg8,r/m8 ; 3A /r [8086]
|
|
CMP reg16,r/m16 ; o16 3B /r [8086]
|
|
CMP reg32,r/m32 ; o32 3B /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
CMP r/m8,imm8 ; 80 /0 ib [8086]
|
|
CMP r/m16,imm16 ; o16 81 /0 iw [8086]
|
|
CMP r/m32,imm32 ; o32 81 /0 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
CMP r/m16,imm8 ; o16 83 /0 ib [8086]
|
|
CMP r/m32,imm8 ; o32 83 /0 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
CMP AL,imm8 ; 3C ib [8086]
|
|
CMP AX,imm16 ; o16 3D iw [8086]
|
|
CMP EAX,imm32 ; o32 3D id [386]
|
|
</pre>
|
|
<p><code><nobr>CMP</nobr></code> performs a `mental' subtraction of its
|
|
second operand from its first operand, and affects the flags as if the
|
|
subtraction had taken place, but does not store the result of the
|
|
subtraction anywhere.
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<p>The destination operand can be a register or a memory location. The
|
|
source can be a register, memory location or an immediate value of the same
|
|
size as the destination.
|
|
<h4><a name="section-B.4.25">B.4.25 <code><nobr>CMPccPD</nobr></code>: Packed Double-Precision FP Compare </a></h4>
|
|
<p><pre>
|
|
CMPPD xmm1,xmm2/mem128,imm8 ; 66 0F C2 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
CMPEQPD xmm1,xmm2/mem128 ; 66 0F C2 /r 00 [WILLAMETTE,SSE2]
|
|
CMPLTPD xmm1,xmm2/mem128 ; 66 0F C2 /r 01 [WILLAMETTE,SSE2]
|
|
CMPLEPD xmm1,xmm2/mem128 ; 66 0F C2 /r 02 [WILLAMETTE,SSE2]
|
|
CMPUNORDPD xmm1,xmm2/mem128 ; 66 0F C2 /r 03 [WILLAMETTE,SSE2]
|
|
CMPNEQPD xmm1,xmm2/mem128 ; 66 0F C2 /r 04 [WILLAMETTE,SSE2]
|
|
CMPNLTPD xmm1,xmm2/mem128 ; 66 0F C2 /r 05 [WILLAMETTE,SSE2]
|
|
CMPNLEPD xmm1,xmm2/mem128 ; 66 0F C2 /r 06 [WILLAMETTE,SSE2]
|
|
CMPORDPD xmm1,xmm2/mem128 ; 66 0F C2 /r 07 [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p>The <code><nobr>CMPccPD</nobr></code> instructions compare the two
|
|
packed double-precision FP values in the source and destination operands,
|
|
and returns the result of the comparison in the destination register. The
|
|
result of each comparison is a quadword mask of all 1s (comparison true) or
|
|
all 0s (comparison false).
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
can be either an <code><nobr>XMM</nobr></code> register or a 128-bit memory
|
|
location.
|
|
<p>The third operand is an 8-bit immediate value, of which the low 3 bits
|
|
define the type of comparison. For ease of programming, the 8 two-operand
|
|
pseudo-instructions are provided, with the third operand already filled in.
|
|
The <code><nobr>Condition Predicates</nobr></code> are:
|
|
<p><pre>
|
|
EQ 0 Equal
|
|
LT 1 Less-than
|
|
LE 2 Less-than-or-equal
|
|
UNORD 3 Unordered
|
|
NE 4 Not-equal
|
|
NLT 5 Not-less-than
|
|
NLE 6 Not-less-than-or-equal
|
|
ORD 7 Ordered
|
|
</pre>
|
|
<p>For more details of the comparison predicates, and details of how to
|
|
emulate the "greater-than" equivalents, see
|
|
<a href="#section-B.2.3">section B.2.3</a>
|
|
<h4><a name="section-B.4.26">B.4.26 <code><nobr>CMPccPS</nobr></code>: Packed Single-Precision FP Compare </a></h4>
|
|
<p><pre>
|
|
CMPPS xmm1,xmm2/mem128,imm8 ; 0F C2 /r ib [KATMAI,SSE]
|
|
</pre>
|
|
<p><pre>
|
|
CMPEQPS xmm1,xmm2/mem128 ; 0F C2 /r 00 [KATMAI,SSE]
|
|
CMPLTPS xmm1,xmm2/mem128 ; 0F C2 /r 01 [KATMAI,SSE]
|
|
CMPLEPS xmm1,xmm2/mem128 ; 0F C2 /r 02 [KATMAI,SSE]
|
|
CMPUNORDPS xmm1,xmm2/mem128 ; 0F C2 /r 03 [KATMAI,SSE]
|
|
CMPNEQPS xmm1,xmm2/mem128 ; 0F C2 /r 04 [KATMAI,SSE]
|
|
CMPNLTPS xmm1,xmm2/mem128 ; 0F C2 /r 05 [KATMAI,SSE]
|
|
CMPNLEPS xmm1,xmm2/mem128 ; 0F C2 /r 06 [KATMAI,SSE]
|
|
CMPORDPS xmm1,xmm2/mem128 ; 0F C2 /r 07 [KATMAI,SSE]
|
|
</pre>
|
|
<p>The <code><nobr>CMPccPS</nobr></code> instructions compare the two
|
|
packed single-precision FP values in the source and destination operands,
|
|
and returns the result of the comparison in the destination register. The
|
|
result of each comparison is a doubleword mask of all 1s (comparison true)
|
|
or all 0s (comparison false).
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
can be either an <code><nobr>XMM</nobr></code> register or a 128-bit memory
|
|
location.
|
|
<p>The third operand is an 8-bit immediate value, of which the low 3 bits
|
|
define the type of comparison. For ease of programming, the 8 two-operand
|
|
pseudo-instructions are provided, with the third operand already filled in.
|
|
The <code><nobr>Condition Predicates</nobr></code> are:
|
|
<p><pre>
|
|
EQ 0 Equal
|
|
LT 1 Less-than
|
|
LE 2 Less-than-or-equal
|
|
UNORD 3 Unordered
|
|
NE 4 Not-equal
|
|
NLT 5 Not-less-than
|
|
NLE 6 Not-less-than-or-equal
|
|
ORD 7 Ordered
|
|
</pre>
|
|
<p>For more details of the comparison predicates, and details of how to
|
|
emulate the "greater-than" equivalents, see
|
|
<a href="#section-B.2.3">section B.2.3</a>
|
|
<h4><a name="section-B.4.27">B.4.27 <code><nobr>CMPSB</nobr></code>, <code><nobr>CMPSW</nobr></code>, <code><nobr>CMPSD</nobr></code>: Compare Strings</a></h4>
|
|
<p><pre>
|
|
CMPSB ; A6 [8086]
|
|
CMPSW ; o16 A7 [8086]
|
|
CMPSD ; o32 A7 [386]
|
|
</pre>
|
|
<p><code><nobr>CMPSB</nobr></code> compares the byte at
|
|
<code><nobr>[DS:SI]</nobr></code> or <code><nobr>[DS:ESI]</nobr></code>
|
|
with the byte at <code><nobr>[ES:DI]</nobr></code> or
|
|
<code><nobr>[ES:EDI]</nobr></code>, and sets the flags accordingly. It then
|
|
increments or decrements (depending on the direction flag: increments if
|
|
the flag is clear, decrements if it is set) <code><nobr>SI</nobr></code>
|
|
and <code><nobr>DI</nobr></code> (or <code><nobr>ESI</nobr></code> and
|
|
<code><nobr>EDI</nobr></code>).
|
|
<p>The registers used are <code><nobr>SI</nobr></code> and
|
|
<code><nobr>DI</nobr></code> if the address size is 16 bits, and
|
|
<code><nobr>ESI</nobr></code> and <code><nobr>EDI</nobr></code> if it is 32
|
|
bits. If you need to use an address size not equal to the current
|
|
<code><nobr>BITS</nobr></code> setting, you can use an explicit
|
|
<code><nobr>a16</nobr></code> or <code><nobr>a32</nobr></code> prefix.
|
|
<p>The segment register used to load from <code><nobr>[SI]</nobr></code> or
|
|
<code><nobr>[ESI]</nobr></code> can be overridden by using a segment
|
|
register name as a prefix (for example,
|
|
<code><nobr>ES CMPSB</nobr></code>). The use of
|
|
<code><nobr>ES</nobr></code> for the load from
|
|
<code><nobr>[DI]</nobr></code> or <code><nobr>[EDI]</nobr></code> cannot be
|
|
overridden.
|
|
<p><code><nobr>CMPSW</nobr></code> and <code><nobr>CMPSD</nobr></code> work
|
|
in the same way, but they compare a word or a doubleword instead of a byte,
|
|
and increment or decrement the addressing registers by 2 or 4 instead of 1.
|
|
<p>The <code><nobr>REPE</nobr></code> and <code><nobr>REPNE</nobr></code>
|
|
prefixes (equivalently, <code><nobr>REPZ</nobr></code> and
|
|
<code><nobr>REPNZ</nobr></code>) may be used to repeat the instruction up
|
|
to <code><nobr>CX</nobr></code> (or <code><nobr>ECX</nobr></code> - again,
|
|
the address size chooses which) times until the first unequal or equal byte
|
|
is found.
|
|
<h4><a name="section-B.4.28">B.4.28 <code><nobr>CMPccSD</nobr></code>: Scalar Double-Precision FP Compare </a></h4>
|
|
<p><pre>
|
|
CMPSD xmm1,xmm2/mem64,imm8 ; F2 0F C2 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
CMPEQSD xmm1,xmm2/mem64 ; F2 0F C2 /r 00 [WILLAMETTE,SSE2]
|
|
CMPLTSD xmm1,xmm2/mem64 ; F2 0F C2 /r 01 [WILLAMETTE,SSE2]
|
|
CMPLESD xmm1,xmm2/mem64 ; F2 0F C2 /r 02 [WILLAMETTE,SSE2]
|
|
CMPUNORDSD xmm1,xmm2/mem64 ; F2 0F C2 /r 03 [WILLAMETTE,SSE2]
|
|
CMPNEQSD xmm1,xmm2/mem64 ; F2 0F C2 /r 04 [WILLAMETTE,SSE2]
|
|
CMPNLTSD xmm1,xmm2/mem64 ; F2 0F C2 /r 05 [WILLAMETTE,SSE2]
|
|
CMPNLESD xmm1,xmm2/mem64 ; F2 0F C2 /r 06 [WILLAMETTE,SSE2]
|
|
CMPORDSD xmm1,xmm2/mem64 ; F2 0F C2 /r 07 [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p>The <code><nobr>CMPccSD</nobr></code> instructions compare the low-order
|
|
double-precision FP values in the source and destination operands, and
|
|
returns the result of the comparison in the destination register. The
|
|
result of each comparison is a quadword mask of all 1s (comparison true) or
|
|
all 0s (comparison false).
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
can be either an <code><nobr>XMM</nobr></code> register or a 128-bit memory
|
|
location.
|
|
<p>The third operand is an 8-bit immediate value, of which the low 3 bits
|
|
define the type of comparison. For ease of programming, the 8 two-operand
|
|
pseudo-instructions are provided, with the third operand already filled in.
|
|
The <code><nobr>Condition Predicates</nobr></code> are:
|
|
<p><pre>
|
|
EQ 0 Equal
|
|
LT 1 Less-than
|
|
LE 2 Less-than-or-equal
|
|
UNORD 3 Unordered
|
|
NE 4 Not-equal
|
|
NLT 5 Not-less-than
|
|
NLE 6 Not-less-than-or-equal
|
|
ORD 7 Ordered
|
|
</pre>
|
|
<p>For more details of the comparison predicates, and details of how to
|
|
emulate the "greater-than" equivalents, see
|
|
<a href="#section-B.2.3">section B.2.3</a>
|
|
<h4><a name="section-B.4.29">B.4.29 <code><nobr>CMPccSS</nobr></code>: Scalar Single-Precision FP Compare </a></h4>
|
|
<p><pre>
|
|
CMPSS xmm1,xmm2/mem32,imm8 ; F3 0F C2 /r ib [KATMAI,SSE]
|
|
</pre>
|
|
<p><pre>
|
|
CMPEQSS xmm1,xmm2/mem32 ; F3 0F C2 /r 00 [KATMAI,SSE]
|
|
CMPLTSS xmm1,xmm2/mem32 ; F3 0F C2 /r 01 [KATMAI,SSE]
|
|
CMPLESS xmm1,xmm2/mem32 ; F3 0F C2 /r 02 [KATMAI,SSE]
|
|
CMPUNORDSS xmm1,xmm2/mem32 ; F3 0F C2 /r 03 [KATMAI,SSE]
|
|
CMPNEQSS xmm1,xmm2/mem32 ; F3 0F C2 /r 04 [KATMAI,SSE]
|
|
CMPNLTSS xmm1,xmm2/mem32 ; F3 0F C2 /r 05 [KATMAI,SSE]
|
|
CMPNLESS xmm1,xmm2/mem32 ; F3 0F C2 /r 06 [KATMAI,SSE]
|
|
CMPORDSS xmm1,xmm2/mem32 ; F3 0F C2 /r 07 [KATMAI,SSE]
|
|
</pre>
|
|
<p>The <code><nobr>CMPccSS</nobr></code> instructions compare the low-order
|
|
single-precision FP values in the source and destination operands, and
|
|
returns the result of the comparison in the destination register. The
|
|
result of each comparison is a doubleword mask of all 1s (comparison true)
|
|
or all 0s (comparison false).
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
can be either an <code><nobr>XMM</nobr></code> register or a 128-bit memory
|
|
location.
|
|
<p>The third operand is an 8-bit immediate value, of which the low 3 bits
|
|
define the type of comparison. For ease of programming, the 8 two-operand
|
|
pseudo-instructions are provided, with the third operand already filled in.
|
|
The <code><nobr>Condition Predicates</nobr></code> are:
|
|
<p><pre>
|
|
EQ 0 Equal
|
|
LT 1 Less-than
|
|
LE 2 Less-than-or-equal
|
|
UNORD 3 Unordered
|
|
NE 4 Not-equal
|
|
NLT 5 Not-less-than
|
|
NLE 6 Not-less-than-or-equal
|
|
ORD 7 Ordered
|
|
</pre>
|
|
<p>For more details of the comparison predicates, and details of how to
|
|
emulate the "greater-than" equivalents, see
|
|
<a href="#section-B.2.3">section B.2.3</a>
|
|
<h4><a name="section-B.4.30">B.4.30 <code><nobr>CMPXCHG</nobr></code>, <code><nobr>CMPXCHG486</nobr></code>: Compare and Exchange</a></h4>
|
|
<p><pre>
|
|
CMPXCHG r/m8,reg8 ; 0F B0 /r [PENT]
|
|
CMPXCHG r/m16,reg16 ; o16 0F B1 /r [PENT]
|
|
CMPXCHG r/m32,reg32 ; o32 0F B1 /r [PENT]
|
|
</pre>
|
|
<p><pre>
|
|
CMPXCHG486 r/m8,reg8 ; 0F A6 /r [486,UNDOC]
|
|
CMPXCHG486 r/m16,reg16 ; o16 0F A7 /r [486,UNDOC]
|
|
CMPXCHG486 r/m32,reg32 ; o32 0F A7 /r [486,UNDOC]
|
|
</pre>
|
|
<p>These two instructions perform exactly the same operation; however,
|
|
apparently some (not all) 486 processors support it under a non-standard
|
|
opcode, so NASM provides the undocumented
|
|
<code><nobr>CMPXCHG486</nobr></code> form to generate the non-standard
|
|
opcode.
|
|
<p><code><nobr>CMPXCHG</nobr></code> compares its destination (first)
|
|
operand to the value in <code><nobr>AL</nobr></code>,
|
|
<code><nobr>AX</nobr></code> or <code><nobr>EAX</nobr></code> (depending on
|
|
the operand size of the instruction). If they are equal, it copies its
|
|
source (second) operand into the destination and sets the zero flag.
|
|
Otherwise, it clears the zero flag and copies the destination register to
|
|
AL, AX or EAX.
|
|
<p>The destination can be either a register or a memory location. The
|
|
source is a register.
|
|
<p><code><nobr>CMPXCHG</nobr></code> is intended to be used for atomic
|
|
operations in multitasking or multiprocessor environments. To safely update
|
|
a value in shared memory, for example, you might load the value into
|
|
<code><nobr>EAX</nobr></code>, load the updated value into
|
|
<code><nobr>EBX</nobr></code>, and then execute the instruction
|
|
<code><nobr>LOCK CMPXCHG [value],EBX</nobr></code>. If
|
|
<code><nobr>value</nobr></code> has not changed since being loaded, it is
|
|
updated with your desired new value, and the zero flag is set to let you
|
|
know it has worked. (The <code><nobr>LOCK</nobr></code> prefix prevents
|
|
another processor doing anything in the middle of this operation: it
|
|
guarantees atomicity.) However, if another processor has modified the value
|
|
in between your load and your attempted store, the store does not happen,
|
|
and you are notified of the failure by a cleared zero flag, so you can go
|
|
round and try again.
|
|
<h4><a name="section-B.4.31">B.4.31 <code><nobr>CMPXCHG8B</nobr></code>: Compare and Exchange Eight Bytes</a></h4>
|
|
<p><pre>
|
|
CMPXCHG8B mem ; 0F C7 /1 [PENT]
|
|
</pre>
|
|
<p>This is a larger and more unwieldy version of
|
|
<code><nobr>CMPXCHG</nobr></code>: it compares the 64-bit (eight-byte)
|
|
value stored at <code><nobr>[mem]</nobr></code> with the value in
|
|
<code><nobr>EDX:EAX</nobr></code>. If they are equal, it sets the zero flag
|
|
and stores <code><nobr>ECX:EBX</nobr></code> into the memory area. If they
|
|
are unequal, it clears the zero flag and stores the memory contents into
|
|
<code><nobr>EDX:EAX</nobr></code>.
|
|
<p><code><nobr>CMPXCHG8B</nobr></code> can be used with the
|
|
<code><nobr>LOCK</nobr></code> prefix, to allow atomic execution. This is
|
|
useful in multi-processor and multi-tasking environments.
|
|
<h4><a name="section-B.4.32">B.4.32 <code><nobr>COMISD</nobr></code>: Scalar Ordered Double-Precision FP Compare and Set EFLAGS</a></h4>
|
|
<p><pre>
|
|
COMISD xmm1,xmm2/mem64 ; 66 0F 2F /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>COMISD</nobr></code> compares the low-order double-precision
|
|
FP value in the two source operands. ZF, PF and CF are set according to the
|
|
result. OF, AF and AF are cleared. The unordered result is returned if
|
|
either source is a NaN (QNaN or SNaN).
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
memory location.
|
|
<p>The flags are set according to the following rules:
|
|
<p><pre>
|
|
Result Flags Values
|
|
</pre>
|
|
<p><pre>
|
|
UNORDERED: ZF,PF,CF <-- 111;
|
|
GREATER_THAN: ZF,PF,CF <-- 000;
|
|
LESS_THAN: ZF,PF,CF <-- 001;
|
|
EQUAL: ZF,PF,CF <-- 100;
|
|
</pre>
|
|
<h4><a name="section-B.4.33">B.4.33 <code><nobr>COMISS</nobr></code>: Scalar Ordered Single-Precision FP Compare and Set EFLAGS</a></h4>
|
|
<p><pre>
|
|
COMISS xmm1,xmm2/mem32 ; 66 0F 2F /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>COMISS</nobr></code> compares the low-order single-precision
|
|
FP value in the two source operands. ZF, PF and CF are set according to the
|
|
result. OF, AF and AF are cleared. The unordered result is returned if
|
|
either source is a NaN (QNaN or SNaN).
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
memory location.
|
|
<p>The flags are set according to the following rules:
|
|
<p><pre>
|
|
Result Flags Values
|
|
</pre>
|
|
<p><pre>
|
|
UNORDERED: ZF,PF,CF <-- 111;
|
|
GREATER_THAN: ZF,PF,CF <-- 000;
|
|
LESS_THAN: ZF,PF,CF <-- 001;
|
|
EQUAL: ZF,PF,CF <-- 100;
|
|
</pre>
|
|
<h4><a name="section-B.4.34">B.4.34 <code><nobr>CPUID</nobr></code>: Get CPU Identification Code</a></h4>
|
|
<p><pre>
|
|
CPUID ; 0F A2 [PENT]
|
|
</pre>
|
|
<p><code><nobr>CPUID</nobr></code> returns various information about the
|
|
processor it is being executed on. It fills the four registers
|
|
<code><nobr>EAX</nobr></code>, <code><nobr>EBX</nobr></code>,
|
|
<code><nobr>ECX</nobr></code> and <code><nobr>EDX</nobr></code> with
|
|
information, which varies depending on the input contents of
|
|
<code><nobr>EAX</nobr></code>.
|
|
<p><code><nobr>CPUID</nobr></code> also acts as a barrier to serialise
|
|
instruction execution: executing the <code><nobr>CPUID</nobr></code>
|
|
instruction guarantees that all the effects (memory modification, flag
|
|
modification, register modification) of previous instructions have been
|
|
completed before the next instruction gets fetched.
|
|
<p>The information returned is as follows:
|
|
<ul>
|
|
<li>If <code><nobr>EAX</nobr></code> is zero on input,
|
|
<code><nobr>EAX</nobr></code> on output holds the maximum acceptable input
|
|
value of <code><nobr>EAX</nobr></code>, and
|
|
<code><nobr>EBX:EDX:ECX</nobr></code> contain the string
|
|
<code><nobr>"GenuineIntel"</nobr></code> (or not, if you have a clone
|
|
processor). That is to say, <code><nobr>EBX</nobr></code> contains
|
|
<code><nobr>"Genu"</nobr></code> (in NASM's own sense of character
|
|
constants, described in <a href="nasmdoc3.html#section-3.4.2">section
|
|
3.4.2</a>), <code><nobr>EDX</nobr></code> contains
|
|
<code><nobr>"ineI"</nobr></code> and <code><nobr>ECX</nobr></code> contains
|
|
<code><nobr>"ntel"</nobr></code>.
|
|
<li>If <code><nobr>EAX</nobr></code> is one on input,
|
|
<code><nobr>EAX</nobr></code> on output contains version information about
|
|
the processor, and <code><nobr>EDX</nobr></code> contains a set of feature
|
|
flags, showing the presence and absence of various features. For example,
|
|
bit 8 is set if the <code><nobr>CMPXCHG8B</nobr></code> instruction
|
|
(<a href="#section-B.4.31">section B.4.31</a>) is supported, bit 15 is set
|
|
if the conditional move instructions (<a href="#section-B.4.23">section
|
|
B.4.23</a> and <a href="#section-B.4.72">section B.4.72</a>) are supported,
|
|
and bit 23 is set if <code><nobr>MMX</nobr></code> instructions are
|
|
supported.
|
|
<li>If <code><nobr>EAX</nobr></code> is two on input,
|
|
<code><nobr>EAX</nobr></code>, <code><nobr>EBX</nobr></code>,
|
|
<code><nobr>ECX</nobr></code> and <code><nobr>EDX</nobr></code> all contain
|
|
information about caches and TLBs (Translation Lookahead Buffers).
|
|
</ul>
|
|
<p>For more information on the data returned from
|
|
<code><nobr>CPUID</nobr></code>, see the documentation from Intel and other
|
|
processor manufacturers.
|
|
<h4><a name="section-B.4.35">B.4.35 <code><nobr>CVTDQ2PD</nobr></code>: Packed Signed INT32 to Packed Double-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTDQ2PD xmm1,xmm2/mem64 ; F3 0F E6 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTDQ2PD</nobr></code> converts two packed signed
|
|
doublewords from the source operand to two packed double-precision FP
|
|
values in the destination operand.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
64-bit memory location. If the source is a register, the packed integers
|
|
are in the low quadword.
|
|
<h4><a name="section-B.4.36">B.4.36 <code><nobr>CVTDQ2PS</nobr></code>: Packed Signed INT32 to Packed Single-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTDQ2PS xmm1,xmm2/mem128 ; 0F 5B /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTDQ2PS</nobr></code> converts four packed signed
|
|
doublewords from the source operand to four packed single-precision FP
|
|
values in the destination operand.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.37">B.4.37 <code><nobr>CVTPD2DQ</nobr></code>: Packed Double-Precision FP to Packed Signed INT32 Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPD2DQ xmm1,xmm2/mem128 ; F2 0F E6 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTPD2DQ</nobr></code> converts two packed double-precision
|
|
FP values from the source operand to two packed signed doublewords in the
|
|
low quadword of the destination operand. The high quadword of the
|
|
destination is set to all 0s.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.38">B.4.38 <code><nobr>CVTPD2PI</nobr></code>: Packed Double-Precision FP to Packed Signed INT32 Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPD2PI mm,xmm/mem128 ; 66 0F 2D /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTPD2PI</nobr></code> converts two packed double-precision
|
|
FP values from the source operand to two packed signed doublewords in the
|
|
destination operand.
|
|
<p>The destination operand is an <code><nobr>MMX</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.39">B.4.39 <code><nobr>CVTPD2PS</nobr></code>: Packed Double-Precision FP to Packed Single-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPD2PS xmm1,xmm2/mem128 ; 66 0F 5A /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTPD2PS</nobr></code> converts two packed double-precision
|
|
FP values from the source operand to two packed single-precision FP values
|
|
in the low quadword of the destination operand. The high quadword of the
|
|
destination is set to all 0s.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.40">B.4.40 <code><nobr>CVTPI2PD</nobr></code>: Packed Signed INT32 to Packed Double-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPI2PD xmm,mm/mem64 ; 66 0F 2A /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTPI2PD</nobr></code> converts two packed signed
|
|
doublewords from the source operand to two packed double-precision FP
|
|
values in the destination operand.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>MMX</nobr></code> register or a
|
|
64-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.41">B.4.41 <code><nobr>CVTPI2PS</nobr></code>: Packed Signed INT32 to Packed Single-FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPI2PS xmm,mm/mem64 ; 0F 2A /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>CVTPI2PS</nobr></code> converts two packed signed
|
|
doublewords from the source operand to two packed single-precision FP
|
|
values in the low quadword of the destination operand. The high quadword of
|
|
the destination remains unchanged.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>MMX</nobr></code> register or a
|
|
64-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.42">B.4.42 <code><nobr>CVTPS2DQ</nobr></code>: Packed Single-Precision FP to Packed Signed INT32 Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPS2DQ xmm1,xmm2/mem128 ; 66 0F 5B /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTPS2DQ</nobr></code> converts four packed single-precision
|
|
FP values from the source operand to four packed signed doublewords in the
|
|
destination operand.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.43">B.4.43 <code><nobr>CVTPS2PD</nobr></code>: Packed Single-Precision FP to Packed Double-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPS2PD xmm1,xmm2/mem64 ; 0F 5A /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTPS2PD</nobr></code> converts two packed single-precision
|
|
FP values from the source operand to two packed double-precision FP values
|
|
in the destination operand.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
64-bit memory location. If the source is a register, the input values are
|
|
in the low quadword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.44">B.4.44 <code><nobr>CVTPS2PI</nobr></code>: Packed Single-Precision FP to Packed Signed INT32 Conversion</a></h4>
|
|
<p><pre>
|
|
CVTPS2PI mm,xmm/mem64 ; 0F 2D /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>CVTPS2PI</nobr></code> converts two packed single-precision
|
|
FP values from the source operand to two packed signed doublewords in the
|
|
destination operand.
|
|
<p>The destination operand is an <code><nobr>MMX</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
64-bit memory location. If the source is a register, the input values are
|
|
in the low quadword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.45">B.4.45 <code><nobr>CVTSD2SI</nobr></code>: Scalar Double-Precision FP to Signed INT32 Conversion</a></h4>
|
|
<p><pre>
|
|
CVTSD2SI reg32,xmm/mem64 ; F2 0F 2D /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTSD2SI</nobr></code> converts a double-precision FP value
|
|
from the source operand to a signed doubleword in the destination operand.
|
|
<p>The destination operand is a general purpose register. The source can be
|
|
either an <code><nobr>XMM</nobr></code> register or a 64-bit memory
|
|
location. If the source is a register, the input value is in the low
|
|
quadword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.46">B.4.46 <code><nobr>CVTSD2SS</nobr></code>: Scalar Double-Precision FP to Scalar Single-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTSD2SS xmm1,xmm2/mem64 ; F2 0F 5A /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>CVTSD2SS</nobr></code> converts a double-precision FP value
|
|
from the source operand to a single-precision FP value in the low
|
|
doubleword of the destination operand. The upper 3 doublewords are left
|
|
unchanged.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
64-bit memory location. If the source is a register, the input value is in
|
|
the low quadword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.47">B.4.47 <code><nobr>CVTSI2SD</nobr></code>: Signed INT32 to Scalar Double-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTSI2SD xmm,r/m32 ; F2 0F 2A /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTSI2SD</nobr></code> converts a signed doubleword from the
|
|
source operand to a double-precision FP value in the low quadword of the
|
|
destination operand. The high quadword is left unchanged.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either a general purpose register or a 32-bit memory
|
|
location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.48">B.4.48 <code><nobr>CVTSI2SS</nobr></code>: Signed INT32 to Scalar Single-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTSI2SS xmm,r/m32 ; F3 0F 2A /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>CVTSI2SS</nobr></code> converts a signed doubleword from the
|
|
source operand to a single-precision FP value in the low doubleword of the
|
|
destination operand. The upper 3 doublewords are left unchanged.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either a general purpose register or a 32-bit memory
|
|
location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.49">B.4.49 <code><nobr>CVTSS2SD</nobr></code>: Scalar Single-Precision FP to Scalar Double-Precision FP Conversion</a></h4>
|
|
<p><pre>
|
|
CVTSS2SD xmm1,xmm2/mem32 ; F3 0F 5A /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTSS2SD</nobr></code> converts a single-precision FP value
|
|
from the source operand to a double-precision FP value in the low quadword
|
|
of the destination operand. The upper quadword is left unchanged.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
32-bit memory location. If the source is a register, the input value is
|
|
contained in the low doubleword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.50">B.4.50 <code><nobr>CVTSS2SI</nobr></code>: Scalar Single-Precision FP to Signed INT32 Conversion</a></h4>
|
|
<p><pre>
|
|
CVTSS2SI reg32,xmm/mem32 ; F3 0F 2D /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>CVTSS2SI</nobr></code> converts a single-precision FP value
|
|
from the source operand to a signed doubleword in the destination operand.
|
|
<p>The destination operand is a general purpose register. The source can be
|
|
either an <code><nobr>XMM</nobr></code> register or a 32-bit memory
|
|
location. If the source is a register, the input value is in the low
|
|
doubleword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.51">B.4.51 <code><nobr>CVTTPD2DQ</nobr></code>: Packed Double-Precision FP to Packed Signed INT32 Conversion with Truncation</a></h4>
|
|
<p><pre>
|
|
CVTTPD2DQ xmm1,xmm2/mem128 ; 66 0F E6 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTTPD2DQ</nobr></code> converts two packed double-precision
|
|
FP values in the source operand to two packed single-precision FP values in
|
|
the destination operand. If the result is inexact, it is truncated (rounded
|
|
toward zero). The high quadword is set to all 0s.
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.52">B.4.52 <code><nobr>CVTTPD2PI</nobr></code>: Packed Double-Precision FP to Packed Signed INT32 Conversion with Truncation</a></h4>
|
|
<p><pre>
|
|
CVTTPD2PI mm,xmm/mem128 ; 66 0F 2C /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTTPD2PI</nobr></code> converts two packed double-precision
|
|
FP values in the source operand to two packed single-precision FP values in
|
|
the destination operand. If the result is inexact, it is truncated (rounded
|
|
toward zero).
|
|
<p>The destination operand is an <code><nobr>MMX</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.53">B.4.53 <code><nobr>CVTTPS2DQ</nobr></code>: Packed Single-Precision FP to Packed Signed INT32 Conversion with Truncation</a></h4>
|
|
<p><pre>
|
|
CVTTPS2DQ xmm1,xmm2/mem128 ; F3 0F 5B /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTTPS2DQ</nobr></code> converts four packed
|
|
single-precision FP values in the source operand to four packed signed
|
|
doublewords in the destination operand. If the result is inexact, it is
|
|
truncated (rounded toward zero).
|
|
<p>The destination operand is an <code><nobr>XMM</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.54">B.4.54 <code><nobr>CVTTPS2PI</nobr></code>: Packed Single-Precision FP to Packed Signed INT32 Conversion with Truncation</a></h4>
|
|
<p><pre>
|
|
CVTTPS2PI mm,xmm/mem64 ; 0F 2C /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>CVTTPS2PI</nobr></code> converts two packed single-precision
|
|
FP values in the source operand to two packed signed doublewords in the
|
|
destination operand. If the result is inexact, it is truncated (rounded
|
|
toward zero). If the source is a register, the input values are in the low
|
|
quadword.
|
|
<p>The destination operand is an <code><nobr>MMX</nobr></code> register.
|
|
The source can be either an <code><nobr>XMM</nobr></code> register or a
|
|
64-bit memory location. If the source is a register, the input value is in
|
|
the low quadword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.55">B.4.55 <code><nobr>CVTTSD2SI</nobr></code>: Scalar Double-Precision FP to Signed INT32 Conversion with Truncation</a></h4>
|
|
<p><pre>
|
|
CVTTSD2SI reg32,xmm/mem64 ; F2 0F 2C /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>CVTTSD2SI</nobr></code> converts a double-precision FP value
|
|
in the source operand to a signed doubleword in the destination operand. If
|
|
the result is inexact, it is truncated (rounded toward zero).
|
|
<p>The destination operand is a general purpose register. The source can be
|
|
either an <code><nobr>XMM</nobr></code> register or a 64-bit memory
|
|
location. If the source is a register, the input value is in the low
|
|
quadword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.56">B.4.56 <code><nobr>CVTTSS2SI</nobr></code>: Scalar Single-Precision FP to Signed INT32 Conversion with Truncation</a></h4>
|
|
<p><pre>
|
|
CVTTSD2SI reg32,xmm/mem32 ; F3 0F 2C /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>CVTTSS2SI</nobr></code> converts a single-precision FP value
|
|
in the source operand to a signed doubleword in the destination operand. If
|
|
the result is inexact, it is truncated (rounded toward zero).
|
|
<p>The destination operand is a general purpose register. The source can be
|
|
either an <code><nobr>XMM</nobr></code> register or a 32-bit memory
|
|
location. If the source is a register, the input value is in the low
|
|
doubleword.
|
|
<p>For more details of this instruction, see the Intel Processor manuals.
|
|
<h4><a name="section-B.4.57">B.4.57 <code><nobr>DAA</nobr></code>, <code><nobr>DAS</nobr></code>: Decimal Adjustments</a></h4>
|
|
<p><pre>
|
|
DAA ; 27 [8086]
|
|
DAS ; 2F [8086]
|
|
</pre>
|
|
<p>These instructions are used in conjunction with the add and subtract
|
|
instructions to perform binary-coded decimal arithmetic in <em>packed</em>
|
|
(one BCD digit per nibble) form. For the unpacked equivalents, see
|
|
<a href="#section-B.4.1">section B.4.1</a>.
|
|
<p><code><nobr>DAA</nobr></code> should be used after a one-byte
|
|
<code><nobr>ADD</nobr></code> instruction whose destination was the
|
|
<code><nobr>AL</nobr></code> register: by means of examining the value in
|
|
the <code><nobr>AL</nobr></code> and also the auxiliary carry flag
|
|
<code><nobr>AF</nobr></code>, it determines whether either digit of the
|
|
addition has overflowed, and adjusts it (and sets the carry and
|
|
auxiliary-carry flags) if so. You can add long BCD strings together by
|
|
doing <code><nobr>ADD</nobr></code>/<code><nobr>DAA</nobr></code> on the
|
|
low two digits, then doing
|
|
<code><nobr>ADC</nobr></code>/<code><nobr>DAA</nobr></code> on each
|
|
subsequent pair of digits.
|
|
<p><code><nobr>DAS</nobr></code> works similarly to
|
|
<code><nobr>DAA</nobr></code>, but is for use after
|
|
<code><nobr>SUB</nobr></code> instructions rather than
|
|
<code><nobr>ADD</nobr></code>.
|
|
<h4><a name="section-B.4.58">B.4.58 <code><nobr>DEC</nobr></code>: Decrement Integer</a></h4>
|
|
<p><pre>
|
|
DEC reg16 ; o16 48+r [8086]
|
|
DEC reg32 ; o32 48+r [386]
|
|
DEC r/m8 ; FE /1 [8086]
|
|
DEC r/m16 ; o16 FF /1 [8086]
|
|
DEC r/m32 ; o32 FF /1 [386]
|
|
</pre>
|
|
<p><code><nobr>DEC</nobr></code> subtracts 1 from its operand. It does
|
|
<em>not</em> affect the carry flag: to affect the carry flag, use
|
|
<code><nobr>SUB something,1</nobr></code> (see
|
|
<a href="#section-B.4.305">section B.4.305</a>).
|
|
<code><nobr>DEC</nobr></code> affects all the other flags according to the
|
|
result.
|
|
<p>This instruction can be used with a <code><nobr>LOCK</nobr></code>
|
|
prefix to allow atomic execution.
|
|
<p>See also <code><nobr>INC</nobr></code>
|
|
(<a href="#section-B.4.120">section B.4.120</a>).
|
|
<h4><a name="section-B.4.59">B.4.59 <code><nobr>DIV</nobr></code>: Unsigned Integer Divide</a></h4>
|
|
<p><pre>
|
|
DIV r/m8 ; F6 /6 [8086]
|
|
DIV r/m16 ; o16 F7 /6 [8086]
|
|
DIV r/m32 ; o32 F7 /6 [386]
|
|
</pre>
|
|
<p><code><nobr>DIV</nobr></code> performs unsigned integer division. The
|
|
explicit operand provided is the divisor; the dividend and destination
|
|
operands are implicit, in the following way:
|
|
<ul>
|
|
<li>For <code><nobr>DIV r/m8</nobr></code>, <code><nobr>AX</nobr></code> is
|
|
divided by the given operand; the quotient is stored in
|
|
<code><nobr>AL</nobr></code> and the remainder in
|
|
<code><nobr>AH</nobr></code>.
|
|
<li>For <code><nobr>DIV r/m16</nobr></code>,
|
|
<code><nobr>DX:AX</nobr></code> is divided by the given operand; the
|
|
quotient is stored in <code><nobr>AX</nobr></code> and the remainder in
|
|
<code><nobr>DX</nobr></code>.
|
|
<li>For <code><nobr>DIV r/m32</nobr></code>,
|
|
<code><nobr>EDX:EAX</nobr></code> is divided by the given operand; the
|
|
quotient is stored in <code><nobr>EAX</nobr></code> and the remainder in
|
|
<code><nobr>EDX</nobr></code>.
|
|
</ul>
|
|
<p>Signed integer division is performed by the
|
|
<code><nobr>IDIV</nobr></code> instruction: see
|
|
<a href="#section-B.4.117">section B.4.117</a>.
|
|
<h4><a name="section-B.4.60">B.4.60 <code><nobr>DIVPD</nobr></code>: Packed Double-Precision FP Divide</a></h4>
|
|
<p><pre>
|
|
DIVPD xmm1,xmm2/mem128 ; 66 0F 5E /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>DIVPD</nobr></code> divides the two packed double-precision
|
|
FP values in the destination operand by the two packed double-precision FP
|
|
values in the source operand, and stores the packed double-precision
|
|
results in the destination register.
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p><pre>
|
|
dst[0-63] := dst[0-63] / src[0-63],
|
|
dst[64-127] := dst[64-127] / src[64-127].
|
|
</pre>
|
|
<h4><a name="section-B.4.61">B.4.61 <code><nobr>DIVPS</nobr></code>: Packed Single-Precision FP Divide</a></h4>
|
|
<p><pre>
|
|
DIVPS xmm1,xmm2/mem128 ; 0F 5E /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>DIVPS</nobr></code> divides the four packed single-precision
|
|
FP values in the destination operand by the four packed single-precision FP
|
|
values in the source operand, and stores the packed single-precision
|
|
results in the destination register.
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a
|
|
128-bit memory location.
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] / src[0-31],
|
|
dst[32-63] := dst[32-63] / src[32-63],
|
|
dst[64-95] := dst[64-95] / src[64-95],
|
|
dst[96-127] := dst[96-127] / src[96-127].
|
|
</pre>
|
|
<h4><a name="section-B.4.62">B.4.62 <code><nobr>DIVSD</nobr></code>: Scalar Double-Precision FP Divide</a></h4>
|
|
<p><pre>
|
|
DIVSD xmm1,xmm2/mem64 ; F2 0F 5E /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>DIVSD</nobr></code> divides the low-order double-precision
|
|
FP value in the destination operand by the low-order double-precision FP
|
|
value in the source operand, and stores the double-precision result in the
|
|
destination register.
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a 64-bit
|
|
memory location.
|
|
<p><pre>
|
|
dst[0-63] := dst[0-63] / src[0-63],
|
|
dst[64-127] remains unchanged.
|
|
</pre>
|
|
<h4><a name="section-B.4.63">B.4.63 <code><nobr>DIVSS</nobr></code>: Scalar Single-Precision FP Divide</a></h4>
|
|
<p><pre>
|
|
DIVSS xmm1,xmm2/mem32 ; F3 0F 5E /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>DIVSS</nobr></code> divides the low-order single-precision
|
|
FP value in the destination operand by the low-order single-precision FP
|
|
value in the source operand, and stores the single-precision result in the
|
|
destination register.
|
|
<p>The destination is an <code><nobr>XMM</nobr></code> register. The source
|
|
operand can be either an <code><nobr>XMM</nobr></code> register or a 32-bit
|
|
memory location.
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] / src[0-31],
|
|
dst[32-127] remains unchanged.
|
|
</pre>
|
|
<h4><a name="section-B.4.64">B.4.64 <code><nobr>EMMS</nobr></code>: Empty MMX State</a></h4>
|
|
<p><pre>
|
|
EMMS ; 0F 77 [PENT,MMX]
|
|
</pre>
|
|
<p><code><nobr>EMMS</nobr></code> sets the FPU tag word (marking which
|
|
floating-point registers are available) to all ones, meaning all registers
|
|
are available for the FPU to use. It should be used after executing
|
|
<code><nobr>MMX</nobr></code> instructions and before executing any
|
|
subsequent floating-point operations.
|
|
<h4><a name="section-B.4.65">B.4.65 <code><nobr>ENTER</nobr></code>: Create Stack Frame</a></h4>
|
|
<p><pre>
|
|
ENTER imm,imm ; C8 iw ib [186]
|
|
</pre>
|
|
<p><code><nobr>ENTER</nobr></code> constructs a
|
|
<code><nobr>stack frame</nobr></code> for a high-level language procedure
|
|
call. The first operand (the <code><nobr>iw</nobr></code> in the opcode
|
|
definition above refers to the first operand) gives the amount of stack
|
|
space to allocate for local variables; the second (the
|
|
<code><nobr>ib</nobr></code> above) gives the nesting level of the
|
|
procedure (for languages like Pascal, with nested procedures).
|
|
<p>The function of <code><nobr>ENTER</nobr></code>, with a nesting level of
|
|
zero, is equivalent to
|
|
<p><pre>
|
|
PUSH EBP ; or PUSH BP in 16 bits
|
|
MOV EBP,ESP ; or MOV BP,SP in 16 bits
|
|
SUB ESP,operand1 ; or SUB SP,operand1 in 16 bits
|
|
</pre>
|
|
<p>This creates a stack frame with the procedure parameters accessible
|
|
upwards from <code><nobr>EBP</nobr></code>, and local variables accessible
|
|
downwards from <code><nobr>EBP</nobr></code>.
|
|
<p>With a nesting level of one, the stack frame created is 4 (or 2) bytes
|
|
bigger, and the value of the final frame pointer
|
|
<code><nobr>EBP</nobr></code> is accessible in memory at
|
|
<code><nobr>[EBP-4]</nobr></code>.
|
|
<p>This allows <code><nobr>ENTER</nobr></code>, when called with a nesting
|
|
level of two, to look at the stack frame described by the <em>previous</em>
|
|
value of <code><nobr>EBP</nobr></code>, find the frame pointer at offset -4
|
|
from that, and push it along with its new frame pointer, so that when a
|
|
level-two procedure is called from within a level-one procedure,
|
|
<code><nobr>[EBP-4]</nobr></code> holds the frame pointer of the most
|
|
recent level-one procedure call and <code><nobr>[EBP-8]</nobr></code> holds
|
|
that of the most recent level-two call. And so on, for nesting levels up to
|
|
31.
|
|
<p>Stack frames created by <code><nobr>ENTER</nobr></code> can be destroyed
|
|
by the <code><nobr>LEAVE</nobr></code> instruction: see
|
|
<a href="#section-B.4.136">section B.4.136</a>.
|
|
<h4><a name="section-B.4.66">B.4.66 <code><nobr>F2XM1</nobr></code>: Calculate 2**X-1</a></h4>
|
|
<p><pre>
|
|
F2XM1 ; D9 F0 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>F2XM1</nobr></code> raises 2 to the power of
|
|
<code><nobr>ST0</nobr></code>, subtracts one, and stores the result back
|
|
into <code><nobr>ST0</nobr></code>. The initial contents of
|
|
<code><nobr>ST0</nobr></code> must be a number in the range -1.0 to +1.0.
|
|
<h4><a name="section-B.4.67">B.4.67 <code><nobr>FABS</nobr></code>: Floating-Point Absolute Value</a></h4>
|
|
<p><pre>
|
|
FABS ; D9 E1 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FABS</nobr></code> computes the absolute value of
|
|
<code><nobr>ST0</nobr></code>,by clearing the sign bit, and stores the
|
|
result back in <code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.68">B.4.68 <code><nobr>FADD</nobr></code>, <code><nobr>FADDP</nobr></code>: Floating-Point Addition</a></h4>
|
|
<p><pre>
|
|
FADD mem32 ; D8 /0 [8086,FPU]
|
|
FADD mem64 ; DC /0 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FADD fpureg ; D8 C0+r [8086,FPU]
|
|
FADD ST0,fpureg ; D8 C0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FADD TO fpureg ; DC C0+r [8086,FPU]
|
|
FADD fpureg,ST0 ; DC C0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FADDP fpureg ; DE C0+r [8086,FPU]
|
|
FADDP fpureg,ST0 ; DE C0+r [8086,FPU]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>FADD</nobr></code>, given one operand, adds the operand to
|
|
<code><nobr>ST0</nobr></code> and stores the result back in
|
|
<code><nobr>ST0</nobr></code>. If the operand has the
|
|
<code><nobr>TO</nobr></code> modifier, the result is stored in the register
|
|
given rather than in <code><nobr>ST0</nobr></code>.
|
|
<li><code><nobr>FADDP</nobr></code> performs the same function as
|
|
<code><nobr>FADD TO</nobr></code>, but pops the register stack after
|
|
storing the result.
|
|
</ul>
|
|
<p>The given two-operand forms are synonyms for the one-operand forms.
|
|
<p>To add an integer value to <code><nobr>ST0</nobr></code>, use the
|
|
c{FIADD} instruction (<a href="#section-B.4.80">section B.4.80</a>)
|
|
<h4><a name="section-B.4.69">B.4.69 <code><nobr>FBLD</nobr></code>, <code><nobr>FBSTP</nobr></code>: BCD Floating-Point Load and Store</a></h4>
|
|
<p><pre>
|
|
FBLD mem80 ; DF /4 [8086,FPU]
|
|
FBSTP mem80 ; DF /6 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FBLD</nobr></code> loads an 80-bit (ten-byte) packed
|
|
binary-coded decimal number from the given memory address, converts it to a
|
|
real, and pushes it on the register stack. <code><nobr>FBSTP</nobr></code>
|
|
stores the value of <code><nobr>ST0</nobr></code>, in packed BCD, at the
|
|
given address and then pops the register stack.
|
|
<h4><a name="section-B.4.70">B.4.70 <code><nobr>FCHS</nobr></code>: Floating-Point Change Sign</a></h4>
|
|
<p><pre>
|
|
FCHS ; D9 E0 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FCHS</nobr></code> negates the number in
|
|
<code><nobr>ST0</nobr></code>, by inverting the sign bit: negative numbers
|
|
become positive, and vice versa.
|
|
<h4><a name="section-B.4.71">B.4.71 <code><nobr>FCLEX</nobr></code>, <code><nobr>FNCLEX</nobr></code>: Clear Floating-Point Exceptions</a></h4>
|
|
<p><pre>
|
|
FCLEX ; 9B DB E2 [8086,FPU]
|
|
FNCLEX ; DB E2 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FCLEX</nobr></code> clears any floating-point exceptions
|
|
which may be pending. <code><nobr>FNCLEX</nobr></code> does the same thing
|
|
but doesn't wait for previous floating-point operations (including the
|
|
<em>handling</em> of pending exceptions) to finish first.
|
|
<h4><a name="section-B.4.72">B.4.72 <code><nobr>FCMOVcc</nobr></code>: Floating-Point Conditional Move</a></h4>
|
|
<p><pre>
|
|
FCMOVB fpureg ; DA C0+r [P6,FPU]
|
|
FCMOVB ST0,fpureg ; DA C0+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCMOVE fpureg ; DA C8+r [P6,FPU]
|
|
FCMOVE ST0,fpureg ; DA C8+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCMOVBE fpureg ; DA D0+r [P6,FPU]
|
|
FCMOVBE ST0,fpureg ; DA D0+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCMOVU fpureg ; DA D8+r [P6,FPU]
|
|
FCMOVU ST0,fpureg ; DA D8+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCMOVNB fpureg ; DB C0+r [P6,FPU]
|
|
FCMOVNB ST0,fpureg ; DB C0+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCMOVNE fpureg ; DB C8+r [P6,FPU]
|
|
FCMOVNE ST0,fpureg ; DB C8+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCMOVNBE fpureg ; DB D0+r [P6,FPU]
|
|
FCMOVNBE ST0,fpureg ; DB D0+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCMOVNU fpureg ; DB D8+r [P6,FPU]
|
|
FCMOVNU ST0,fpureg ; DB D8+r [P6,FPU]
|
|
</pre>
|
|
<p>The <code><nobr>FCMOV</nobr></code> instructions perform conditional
|
|
move operations: each of them moves the contents of the given register into
|
|
<code><nobr>ST0</nobr></code> if its condition is satisfied, and does
|
|
nothing if not.
|
|
<p>The conditions are not the same as the standard condition codes used
|
|
with conditional jump instructions. The conditions
|
|
<code><nobr>B</nobr></code>, <code><nobr>BE</nobr></code>,
|
|
<code><nobr>NB</nobr></code>, <code><nobr>NBE</nobr></code>,
|
|
<code><nobr>E</nobr></code> and <code><nobr>NE</nobr></code> are exactly as
|
|
normal, but none of the other standard ones are supported. Instead, the
|
|
condition <code><nobr>U</nobr></code> and its counterpart
|
|
<code><nobr>NU</nobr></code> are provided; the <code><nobr>U</nobr></code>
|
|
condition is satisfied if the last two floating-point numbers compared were
|
|
<em>unordered</em>, i.e. they were not equal but neither one could be said
|
|
to be greater than the other, for example if they were NaNs. (The flag
|
|
state which signals this is the setting of the parity flag: so the
|
|
<code><nobr>U</nobr></code> condition is notionally equivalent to
|
|
<code><nobr>PE</nobr></code>, and <code><nobr>NU</nobr></code> is
|
|
equivalent to <code><nobr>PO</nobr></code>.)
|
|
<p>The <code><nobr>FCMOV</nobr></code> conditions test the main processor's
|
|
status flags, not the FPU status flags, so using
|
|
<code><nobr>FCMOV</nobr></code> directly after
|
|
<code><nobr>FCOM</nobr></code> will not work. Instead, you should either
|
|
use <code><nobr>FCOMI</nobr></code> which writes directly to the main CPU
|
|
flags word, or use <code><nobr>FSTSW</nobr></code> to extract the FPU
|
|
flags.
|
|
<p>Although the <code><nobr>FCMOV</nobr></code> instructions are flagged
|
|
<code><nobr>P6</nobr></code> above, they may not be supported by all
|
|
Pentium Pro processors; the <code><nobr>CPUID</nobr></code> instruction
|
|
(<a href="#section-B.4.34">section B.4.34</a>) will return a bit which
|
|
indicates whether conditional moves are supported.
|
|
<h4><a name="section-B.4.73">B.4.73 <code><nobr>FCOM</nobr></code>, <code><nobr>FCOMP</nobr></code>, <code><nobr>FCOMPP</nobr></code>, <code><nobr>FCOMI</nobr></code>, <code><nobr>FCOMIP</nobr></code>: Floating-Point Compare</a></h4>
|
|
<p><pre>
|
|
FCOM mem32 ; D8 /2 [8086,FPU]
|
|
FCOM mem64 ; DC /2 [8086,FPU]
|
|
FCOM fpureg ; D8 D0+r [8086,FPU]
|
|
FCOM ST0,fpureg ; D8 D0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCOMP mem32 ; D8 /3 [8086,FPU]
|
|
FCOMP mem64 ; DC /3 [8086,FPU]
|
|
FCOMP fpureg ; D8 D8+r [8086,FPU]
|
|
FCOMP ST0,fpureg ; D8 D8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCOMPP ; DE D9 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCOMI fpureg ; DB F0+r [P6,FPU]
|
|
FCOMI ST0,fpureg ; DB F0+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FCOMIP fpureg ; DF F0+r [P6,FPU]
|
|
FCOMIP ST0,fpureg ; DF F0+r [P6,FPU]
|
|
</pre>
|
|
<p><code><nobr>FCOM</nobr></code> compares <code><nobr>ST0</nobr></code>
|
|
with the given operand, and sets the FPU flags accordingly.
|
|
<code><nobr>ST0</nobr></code> is treated as the left-hand side of the
|
|
comparison, so that the carry flag is set (for a `less-than' result) if
|
|
<code><nobr>ST0</nobr></code> is less than the given operand.
|
|
<p><code><nobr>FCOMP</nobr></code> does the same as
|
|
<code><nobr>FCOM</nobr></code>, but pops the register stack afterwards.
|
|
<code><nobr>FCOMPP</nobr></code> compares <code><nobr>ST0</nobr></code>
|
|
with <code><nobr>ST1</nobr></code> and then pops the register stack twice.
|
|
<p><code><nobr>FCOMI</nobr></code> and <code><nobr>FCOMIP</nobr></code>
|
|
work like the corresponding forms of <code><nobr>FCOM</nobr></code> and
|
|
<code><nobr>FCOMP</nobr></code>, but write their results directly to the
|
|
CPU flags register rather than the FPU status word, so they can be
|
|
immediately followed by conditional jump or conditional move instructions.
|
|
<p>The <code><nobr>FCOM</nobr></code> instructions differ from the
|
|
<code><nobr>FUCOM</nobr></code> instructions
|
|
(<a href="#section-B.4.108">section B.4.108</a>) only in the way they
|
|
handle quiet NaNs: <code><nobr>FUCOM</nobr></code> will handle them
|
|
silently and set the condition code flags to an `unordered' result, whereas
|
|
<code><nobr>FCOM</nobr></code> will generate an exception.
|
|
<h4><a name="section-B.4.74">B.4.74 <code><nobr>FCOS</nobr></code>: Cosine</a></h4>
|
|
<p><pre>
|
|
FCOS ; D9 FF [386,FPU]
|
|
</pre>
|
|
<p><code><nobr>FCOS</nobr></code> computes the cosine of
|
|
<code><nobr>ST0</nobr></code> (in radians), and stores the result in
|
|
<code><nobr>ST0</nobr></code>. The absolute value of
|
|
<code><nobr>ST0</nobr></code> must be less than 2**63.
|
|
<p>See also <code><nobr>FSINCOS</nobr></code>
|
|
(<a href="#section-B.4.100">section B.4.100</a>).
|
|
<h4><a name="section-B.4.75">B.4.75 <code><nobr>FDECSTP</nobr></code>: Decrement Floating-Point Stack Pointer</a></h4>
|
|
<p><pre>
|
|
FDECSTP ; D9 F6 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FDECSTP</nobr></code> decrements the `top' field in the
|
|
floating-point status word. This has the effect of rotating the FPU
|
|
register stack by one, as if the contents of <code><nobr>ST7</nobr></code>
|
|
had been pushed on the stack. See also <code><nobr>FINCSTP</nobr></code>
|
|
(<a href="#section-B.4.85">section B.4.85</a>).
|
|
<h4><a name="section-B.4.76">B.4.76 <code><nobr>FxDISI</nobr></code>, <code><nobr>FxENI</nobr></code>: Disable and Enable Floating-Point Interrupts</a></h4>
|
|
<p><pre>
|
|
FDISI ; 9B DB E1 [8086,FPU]
|
|
FNDISI ; DB E1 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FENI ; 9B DB E0 [8086,FPU]
|
|
FNENI ; DB E0 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FDISI</nobr></code> and <code><nobr>FENI</nobr></code>
|
|
disable and enable floating-point interrupts. These instructions are only
|
|
meaningful on original 8087 processors: the 287 and above treat them as
|
|
no-operation instructions.
|
|
<p><code><nobr>FNDISI</nobr></code> and <code><nobr>FNENI</nobr></code> do
|
|
the same thing as <code><nobr>FDISI</nobr></code> and
|
|
<code><nobr>FENI</nobr></code> respectively, but without waiting for the
|
|
floating-point processor to finish what it was doing first.
|
|
<h4><a name="section-B.4.77">B.4.77 <code><nobr>FDIV</nobr></code>, <code><nobr>FDIVP</nobr></code>, <code><nobr>FDIVR</nobr></code>, <code><nobr>FDIVRP</nobr></code>: Floating-Point Division</a></h4>
|
|
<p><pre>
|
|
FDIV mem32 ; D8 /6 [8086,FPU]
|
|
FDIV mem64 ; DC /6 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FDIV fpureg ; D8 F0+r [8086,FPU]
|
|
FDIV ST0,fpureg ; D8 F0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FDIV TO fpureg ; DC F8+r [8086,FPU]
|
|
FDIV fpureg,ST0 ; DC F8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FDIVR mem32 ; D8 /0 [8086,FPU]
|
|
FDIVR mem64 ; DC /0 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FDIVR fpureg ; D8 F8+r [8086,FPU]
|
|
FDIVR ST0,fpureg ; D8 F8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FDIVR TO fpureg ; DC F0+r [8086,FPU]
|
|
FDIVR fpureg,ST0 ; DC F0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FDIVP fpureg ; DE F8+r [8086,FPU]
|
|
FDIVP fpureg,ST0 ; DE F8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FDIVRP fpureg ; DE F0+r [8086,FPU]
|
|
FDIVRP fpureg,ST0 ; DE F0+r [8086,FPU]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>FDIV</nobr></code> divides <code><nobr>ST0</nobr></code> by
|
|
the given operand and stores the result back in
|
|
<code><nobr>ST0</nobr></code>, unless the <code><nobr>TO</nobr></code>
|
|
qualifier is given, in which case it divides the given operand by
|
|
<code><nobr>ST0</nobr></code> and stores the result in the operand.
|
|
<li><code><nobr>FDIVR</nobr></code> does the same thing, but does the
|
|
division the other way up: so if <code><nobr>TO</nobr></code> is not given,
|
|
it divides the given operand by <code><nobr>ST0</nobr></code> and stores
|
|
the result in <code><nobr>ST0</nobr></code>, whereas if
|
|
<code><nobr>TO</nobr></code> is given it divides
|
|
<code><nobr>ST0</nobr></code> by its operand and stores the result in the
|
|
operand.
|
|
<li><code><nobr>FDIVP</nobr></code> operates like
|
|
<code><nobr>FDIV TO</nobr></code>, but pops the register stack once it has
|
|
finished.
|
|
<li><code><nobr>FDIVRP</nobr></code> operates like
|
|
<code><nobr>FDIVR TO</nobr></code>, but pops the register stack once it has
|
|
finished.
|
|
</ul>
|
|
<p>For FP/Integer divisions, see <code><nobr>FIDIV</nobr></code>
|
|
(<a href="#section-B.4.82">section B.4.82</a>).
|
|
<h4><a name="section-B.4.78">B.4.78 <code><nobr>FEMMS</nobr></code>: Faster Enter/Exit of the MMX or floating-point state</a></h4>
|
|
<p><pre>
|
|
FEMMS ; 0F 0E [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>FEMMS</nobr></code> can be used in place of the
|
|
<code><nobr>EMMS</nobr></code> instruction on processors which support the
|
|
3DNow! instruction set. Following execution of
|
|
<code><nobr>FEMMS</nobr></code>, the state of the
|
|
<code><nobr>MMX/FP</nobr></code> registers is undefined, and this allows a
|
|
faster context switch between <code><nobr>FP</nobr></code> and
|
|
<code><nobr>MMX</nobr></code> instructions. The
|
|
<code><nobr>FEMMS</nobr></code> instruction can also be used
|
|
<em>before</em> executing <code><nobr>MMX</nobr></code> instructions
|
|
<h4><a name="section-B.4.79">B.4.79 <code><nobr>FFREE</nobr></code>: Flag Floating-Point Register as Unused</a></h4>
|
|
<p><pre>
|
|
FFREE fpureg ; DD C0+r [8086,FPU]
|
|
FFREEP fpureg ; DF C0+r [286,FPU,UNDOC]
|
|
</pre>
|
|
<p><code><nobr>FFREE</nobr></code> marks the given register as being empty.
|
|
<p><code><nobr>FFREEP</nobr></code> marks the given register as being
|
|
empty, and then pops the register stack.
|
|
<h4><a name="section-B.4.80">B.4.80 <code><nobr>FIADD</nobr></code>: Floating-Point/Integer Addition</a></h4>
|
|
<p><pre>
|
|
FIADD mem16 ; DE /0 [8086,FPU]
|
|
FIADD mem32 ; DA /0 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FIADD</nobr></code> adds the 16-bit or 32-bit integer stored
|
|
in the given memory location to <code><nobr>ST0</nobr></code>, storing the
|
|
result in <code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.81">B.4.81 <code><nobr>FICOM</nobr></code>, <code><nobr>FICOMP</nobr></code>: Floating-Point/Integer Compare</a></h4>
|
|
<p><pre>
|
|
FICOM mem16 ; DE /2 [8086,FPU]
|
|
FICOM mem32 ; DA /2 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FICOMP mem16 ; DE /3 [8086,FPU]
|
|
FICOMP mem32 ; DA /3 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FICOM</nobr></code> compares <code><nobr>ST0</nobr></code>
|
|
with the 16-bit or 32-bit integer stored in the given memory location, and
|
|
sets the FPU flags accordingly. <code><nobr>FICOMP</nobr></code> does the
|
|
same, but pops the register stack afterwards.
|
|
<h4><a name="section-B.4.82">B.4.82 <code><nobr>FIDIV</nobr></code>, <code><nobr>FIDIVR</nobr></code>: Floating-Point/Integer Division</a></h4>
|
|
<p><pre>
|
|
FIDIV mem16 ; DE /6 [8086,FPU]
|
|
FIDIV mem32 ; DA /6 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FIDIVR mem16 ; DE /7 [8086,FPU]
|
|
FIDIVR mem32 ; DA /7 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FIDIV</nobr></code> divides <code><nobr>ST0</nobr></code> by
|
|
the 16-bit or 32-bit integer stored in the given memory location, and
|
|
stores the result in <code><nobr>ST0</nobr></code>.
|
|
<code><nobr>FIDIVR</nobr></code> does the division the other way up: it
|
|
divides the integer by <code><nobr>ST0</nobr></code>, but still stores the
|
|
result in <code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.83">B.4.83 <code><nobr>FILD</nobr></code>, <code><nobr>FIST</nobr></code>, <code><nobr>FISTP</nobr></code>: Floating-Point/Integer Conversion</a></h4>
|
|
<p><pre>
|
|
FILD mem16 ; DF /0 [8086,FPU]
|
|
FILD mem32 ; DB /0 [8086,FPU]
|
|
FILD mem64 ; DF /5 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FIST mem16 ; DF /2 [8086,FPU]
|
|
FIST mem32 ; DB /2 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FISTP mem16 ; DF /3 [8086,FPU]
|
|
FISTP mem32 ; DB /3 [8086,FPU]
|
|
FISTP mem64 ; DF /7 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FILD</nobr></code> loads an integer out of a memory
|
|
location, converts it to a real, and pushes it on the FPU register stack.
|
|
<code><nobr>FIST</nobr></code> converts <code><nobr>ST0</nobr></code> to an
|
|
integer and stores that in memory; <code><nobr>FISTP</nobr></code> does the
|
|
same as <code><nobr>FIST</nobr></code>, but pops the register stack
|
|
afterwards.
|
|
<h4><a name="section-B.4.84">B.4.84 <code><nobr>FIMUL</nobr></code>: Floating-Point/Integer Multiplication</a></h4>
|
|
<p><pre>
|
|
FIMUL mem16 ; DE /1 [8086,FPU]
|
|
FIMUL mem32 ; DA /1 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FIMUL</nobr></code> multiplies <code><nobr>ST0</nobr></code>
|
|
by the 16-bit or 32-bit integer stored in the given memory location, and
|
|
stores the result in <code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.85">B.4.85 <code><nobr>FINCSTP</nobr></code>: Increment Floating-Point Stack Pointer</a></h4>
|
|
<p><pre>
|
|
FINCSTP ; D9 F7 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FINCSTP</nobr></code> increments the `top' field in the
|
|
floating-point status word. This has the effect of rotating the FPU
|
|
register stack by one, as if the register stack had been popped; however,
|
|
unlike the popping of the stack performed by many FPU instructions, it does
|
|
not flag the new <code><nobr>ST7</nobr></code> (previously
|
|
<code><nobr>ST0</nobr></code>) as empty. See also
|
|
<code><nobr>FDECSTP</nobr></code> (<a href="#section-B.4.75">section
|
|
B.4.75</a>).
|
|
<h4><a name="section-B.4.86">B.4.86 <code><nobr>FINIT</nobr></code>, <code><nobr>FNINIT</nobr></code>: Initialise Floating-Point Unit</a></h4>
|
|
<p><pre>
|
|
FINIT ; 9B DB E3 [8086,FPU]
|
|
FNINIT ; DB E3 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FINIT</nobr></code> initialises the FPU to its default
|
|
state. It flags all registers as empty, without actually change their
|
|
values, clears the top of stack pointer. <code><nobr>FNINIT</nobr></code>
|
|
does the same, without first waiting for pending exceptions to clear.
|
|
<h4><a name="section-B.4.87">B.4.87 <code><nobr>FISUB</nobr></code>: Floating-Point/Integer Subtraction</a></h4>
|
|
<p><pre>
|
|
FISUB mem16 ; DE /4 [8086,FPU]
|
|
FISUB mem32 ; DA /4 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FISUBR mem16 ; DE /5 [8086,FPU]
|
|
FISUBR mem32 ; DA /5 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FISUB</nobr></code> subtracts the 16-bit or 32-bit integer
|
|
stored in the given memory location from <code><nobr>ST0</nobr></code>, and
|
|
stores the result in <code><nobr>ST0</nobr></code>.
|
|
<code><nobr>FISUBR</nobr></code> does the subtraction the other way round,
|
|
i.e. it subtracts <code><nobr>ST0</nobr></code> from the given integer, but
|
|
still stores the result in <code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.88">B.4.88 <code><nobr>FLD</nobr></code>: Floating-Point Load</a></h4>
|
|
<p><pre>
|
|
FLD mem32 ; D9 /0 [8086,FPU]
|
|
FLD mem64 ; DD /0 [8086,FPU]
|
|
FLD mem80 ; DB /5 [8086,FPU]
|
|
FLD fpureg ; D9 C0+r [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FLD</nobr></code> loads a floating-point value out of the
|
|
given register or memory location, and pushes it on the FPU register stack.
|
|
<h4><a name="section-B.4.89">B.4.89 <code><nobr>FLDxx</nobr></code>: Floating-Point Load Constants</a></h4>
|
|
<p><pre>
|
|
FLD1 ; D9 E8 [8086,FPU]
|
|
FLDL2E ; D9 EA [8086,FPU]
|
|
FLDL2T ; D9 E9 [8086,FPU]
|
|
FLDLG2 ; D9 EC [8086,FPU]
|
|
FLDLN2 ; D9 ED [8086,FPU]
|
|
FLDPI ; D9 EB [8086,FPU]
|
|
FLDZ ; D9 EE [8086,FPU]
|
|
</pre>
|
|
<p>These instructions push specific standard constants on the FPU register
|
|
stack.
|
|
<p><pre>
|
|
Instruction Constant pushed
|
|
</pre>
|
|
<p><pre>
|
|
FLD1 1
|
|
FLDL2E base-2 logarithm of e
|
|
FLDL2T base-2 log of 10
|
|
FLDLG2 base-10 log of 2
|
|
FLDLN2 base-e log of 2
|
|
FLDPI pi
|
|
FLDZ zero
|
|
</pre>
|
|
<h4><a name="section-B.4.90">B.4.90 <code><nobr>FLDCW</nobr></code>: Load Floating-Point Control Word</a></h4>
|
|
<p><pre>
|
|
FLDCW mem16 ; D9 /5 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FLDCW</nobr></code> loads a 16-bit value out of memory and
|
|
stores it into the FPU control word (governing things like the rounding
|
|
mode, the precision, and the exception masks). See also
|
|
<code><nobr>FSTCW</nobr></code> (<a href="#section-B.4.103">section
|
|
B.4.103</a>). If exceptions are enabled and you don't want to generate one,
|
|
use <code><nobr>FCLEX</nobr></code> or <code><nobr>FNCLEX</nobr></code>
|
|
(<a href="#section-B.4.71">section B.4.71</a>) before loading the new
|
|
control word.
|
|
<h4><a name="section-B.4.91">B.4.91 <code><nobr>FLDENV</nobr></code>: Load Floating-Point Environment</a></h4>
|
|
<p><pre>
|
|
FLDENV mem ; D9 /4 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FLDENV</nobr></code> loads the FPU operating environment
|
|
(control word, status word, tag word, instruction pointer, data pointer and
|
|
last opcode) from memory. The memory area is 14 or 28 bytes long, depending
|
|
on the CPU mode at the time. See also <code><nobr>FSTENV</nobr></code>
|
|
(<a href="#section-B.4.104">section B.4.104</a>).
|
|
<h4><a name="section-B.4.92">B.4.92 <code><nobr>FMUL</nobr></code>, <code><nobr>FMULP</nobr></code>: Floating-Point Multiply</a></h4>
|
|
<p><pre>
|
|
FMUL mem32 ; D8 /1 [8086,FPU]
|
|
FMUL mem64 ; DC /1 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FMUL fpureg ; D8 C8+r [8086,FPU]
|
|
FMUL ST0,fpureg ; D8 C8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FMUL TO fpureg ; DC C8+r [8086,FPU]
|
|
FMUL fpureg,ST0 ; DC C8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FMULP fpureg ; DE C8+r [8086,FPU]
|
|
FMULP fpureg,ST0 ; DE C8+r [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FMUL</nobr></code> multiplies <code><nobr>ST0</nobr></code>
|
|
by the given operand, and stores the result in
|
|
<code><nobr>ST0</nobr></code>, unless the <code><nobr>TO</nobr></code>
|
|
qualifier is used in which case it stores the result in the operand.
|
|
<code><nobr>FMULP</nobr></code> performs the same operation as
|
|
<code><nobr>FMUL TO</nobr></code>, and then pops the register stack.
|
|
<h4><a name="section-B.4.93">B.4.93 <code><nobr>FNOP</nobr></code>: Floating-Point No Operation</a></h4>
|
|
<p><pre>
|
|
FNOP ; D9 D0 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FNOP</nobr></code> does nothing.
|
|
<h4><a name="section-B.4.94">B.4.94 <code><nobr>FPATAN</nobr></code>, <code><nobr>FPTAN</nobr></code>: Arctangent and Tangent</a></h4>
|
|
<p><pre>
|
|
FPATAN ; D9 F3 [8086,FPU]
|
|
FPTAN ; D9 F2 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FPATAN</nobr></code> computes the arctangent, in radians, of
|
|
the result of dividing <code><nobr>ST1</nobr></code> by
|
|
<code><nobr>ST0</nobr></code>, stores the result in
|
|
<code><nobr>ST1</nobr></code>, and pops the register stack. It works like
|
|
the C <code><nobr>atan2</nobr></code> function, in that changing the sign
|
|
of both <code><nobr>ST0</nobr></code> and <code><nobr>ST1</nobr></code>
|
|
changes the output value by pi (so it performs true rectangular-to-polar
|
|
coordinate conversion, with <code><nobr>ST1</nobr></code> being the Y
|
|
coordinate and <code><nobr>ST0</nobr></code> being the X coordinate, not
|
|
merely an arctangent).
|
|
<p><code><nobr>FPTAN</nobr></code> computes the tangent of the value in
|
|
<code><nobr>ST0</nobr></code> (in radians), and stores the result back into
|
|
<code><nobr>ST0</nobr></code>.
|
|
<p>The absolute value of <code><nobr>ST0</nobr></code> must be less than
|
|
2**63.
|
|
<h4><a name="section-B.4.95">B.4.95 <code><nobr>FPREM</nobr></code>, <code><nobr>FPREM1</nobr></code>: Floating-Point Partial Remainder</a></h4>
|
|
<p><pre>
|
|
FPREM ; D9 F8 [8086,FPU]
|
|
FPREM1 ; D9 F5 [386,FPU]
|
|
</pre>
|
|
<p>These instructions both produce the remainder obtained by dividing
|
|
<code><nobr>ST0</nobr></code> by <code><nobr>ST1</nobr></code>. This is
|
|
calculated, notionally, by dividing <code><nobr>ST0</nobr></code> by
|
|
<code><nobr>ST1</nobr></code>, rounding the result to an integer,
|
|
multiplying by <code><nobr>ST1</nobr></code> again, and computing the value
|
|
which would need to be added back on to the result to get back to the
|
|
original value in <code><nobr>ST0</nobr></code>.
|
|
<p>The two instructions differ in the way the notional round-to-integer
|
|
operation is performed. <code><nobr>FPREM</nobr></code> does it by rounding
|
|
towards zero, so that the remainder it returns always has the same sign as
|
|
the original value in <code><nobr>ST0</nobr></code>;
|
|
<code><nobr>FPREM1</nobr></code> does it by rounding to the nearest
|
|
integer, so that the remainder always has at most half the magnitude of
|
|
<code><nobr>ST1</nobr></code>.
|
|
<p>Both instructions calculate <em>partial</em> remainders, meaning that
|
|
they may not manage to provide the final result, but might leave
|
|
intermediate results in <code><nobr>ST0</nobr></code> instead. If this
|
|
happens, they will set the C2 flag in the FPU status word; therefore, to
|
|
calculate a remainder, you should repeatedly execute
|
|
<code><nobr>FPREM</nobr></code> or <code><nobr>FPREM1</nobr></code> until
|
|
C2 becomes clear.
|
|
<h4><a name="section-B.4.96">B.4.96 <code><nobr>FRNDINT</nobr></code>: Floating-Point Round to Integer</a></h4>
|
|
<p><pre>
|
|
FRNDINT ; D9 FC [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FRNDINT</nobr></code> rounds the contents of
|
|
<code><nobr>ST0</nobr></code> to an integer, according to the current
|
|
rounding mode set in the FPU control word, and stores the result back in
|
|
<code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.97">B.4.97 <code><nobr>FSAVE</nobr></code>, <code><nobr>FRSTOR</nobr></code>: Save/Restore Floating-Point State</a></h4>
|
|
<p><pre>
|
|
FSAVE mem ; 9B DD /6 [8086,FPU]
|
|
FNSAVE mem ; DD /6 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FRSTOR mem ; DD /4 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FSAVE</nobr></code> saves the entire floating-point unit
|
|
state, including all the information saved by
|
|
<code><nobr>FSTENV</nobr></code> (<a href="#section-B.4.104">section
|
|
B.4.104</a>) plus the contents of all the registers, to a 94 or 108 byte
|
|
area of memory (depending on the CPU mode).
|
|
<code><nobr>FRSTOR</nobr></code> restores the floating-point state from the
|
|
same area of memory.
|
|
<p><code><nobr>FNSAVE</nobr></code> does the same as
|
|
<code><nobr>FSAVE</nobr></code>, without first waiting for pending
|
|
floating-point exceptions to clear.
|
|
<h4><a name="section-B.4.98">B.4.98 <code><nobr>FSCALE</nobr></code>: Scale Floating-Point Value by Power of Two</a></h4>
|
|
<p><pre>
|
|
FSCALE ; D9 FD [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FSCALE</nobr></code> scales a number by a power of two: it
|
|
rounds <code><nobr>ST1</nobr></code> towards zero to obtain an integer,
|
|
then multiplies <code><nobr>ST0</nobr></code> by two to the power of that
|
|
integer, and stores the result in <code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.99">B.4.99 <code><nobr>FSETPM</nobr></code>: Set Protected Mode</a></h4>
|
|
<p><pre>
|
|
FSETPM ; DB E4 [286,FPU]
|
|
</pre>
|
|
<p>This instruction initialises protected mode on the 287 floating-point
|
|
coprocessor. It is only meaningful on that processor: the 387 and above
|
|
treat the instruction as a no-operation.
|
|
<h4><a name="section-B.4.100">B.4.100 <code><nobr>FSIN</nobr></code>, <code><nobr>FSINCOS</nobr></code>: Sine and Cosine</a></h4>
|
|
<p><pre>
|
|
FSIN ; D9 FE [386,FPU]
|
|
FSINCOS ; D9 FB [386,FPU]
|
|
</pre>
|
|
<p><code><nobr>FSIN</nobr></code> calculates the sine of
|
|
<code><nobr>ST0</nobr></code> (in radians) and stores the result in
|
|
<code><nobr>ST0</nobr></code>. <code><nobr>FSINCOS</nobr></code> does the
|
|
same, but then pushes the cosine of the same value on the register stack,
|
|
so that the sine ends up in <code><nobr>ST1</nobr></code> and the cosine in
|
|
<code><nobr>ST0</nobr></code>. <code><nobr>FSINCOS</nobr></code> is faster
|
|
than executing <code><nobr>FSIN</nobr></code> and
|
|
<code><nobr>FCOS</nobr></code> (see <a href="#section-B.4.74">section
|
|
B.4.74</a>) in succession.
|
|
<p>The absolute value of <code><nobr>ST0</nobr></code> must be less than
|
|
2**63.
|
|
<h4><a name="section-B.4.101">B.4.101 <code><nobr>FSQRT</nobr></code>: Floating-Point Square Root</a></h4>
|
|
<p><pre>
|
|
FSQRT ; D9 FA [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FSQRT</nobr></code> calculates the square root of
|
|
<code><nobr>ST0</nobr></code> and stores the result in
|
|
<code><nobr>ST0</nobr></code>.
|
|
<h4><a name="section-B.4.102">B.4.102 <code><nobr>FST</nobr></code>, <code><nobr>FSTP</nobr></code>: Floating-Point Store</a></h4>
|
|
<p><pre>
|
|
FST mem32 ; D9 /2 [8086,FPU]
|
|
FST mem64 ; DD /2 [8086,FPU]
|
|
FST fpureg ; DD D0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSTP mem32 ; D9 /3 [8086,FPU]
|
|
FSTP mem64 ; DD /3 [8086,FPU]
|
|
FSTP mem80 ; DB /7 [8086,FPU]
|
|
FSTP fpureg ; DD D8+r [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FST</nobr></code> stores the value in
|
|
<code><nobr>ST0</nobr></code> into the given memory location or other FPU
|
|
register. <code><nobr>FSTP</nobr></code> does the same, but then pops the
|
|
register stack.
|
|
<h4><a name="section-B.4.103">B.4.103 <code><nobr>FSTCW</nobr></code>: Store Floating-Point Control Word</a></h4>
|
|
<p><pre>
|
|
FSTCW mem16 ; 9B D9 /7 [8086,FPU]
|
|
FNSTCW mem16 ; D9 /7 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FSTCW</nobr></code> stores the <code><nobr>FPU</nobr></code>
|
|
control word (governing things like the rounding mode, the precision, and
|
|
the exception masks) into a 2-byte memory area. See also
|
|
<code><nobr>FLDCW</nobr></code> (<a href="#section-B.4.90">section
|
|
B.4.90</a>).
|
|
<p><code><nobr>FNSTCW</nobr></code> does the same thing as
|
|
<code><nobr>FSTCW</nobr></code>, without first waiting for pending
|
|
floating-point exceptions to clear.
|
|
<h4><a name="section-B.4.104">B.4.104 <code><nobr>FSTENV</nobr></code>: Store Floating-Point Environment</a></h4>
|
|
<p><pre>
|
|
FSTENV mem ; 9B D9 /6 [8086,FPU]
|
|
FNSTENV mem ; D9 /6 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FSTENV</nobr></code> stores the
|
|
<code><nobr>FPU</nobr></code> operating environment (control word, status
|
|
word, tag word, instruction pointer, data pointer and last opcode) into
|
|
memory. The memory area is 14 or 28 bytes long, depending on the CPU mode
|
|
at the time. See also <code><nobr>FLDENV</nobr></code>
|
|
(<a href="#section-B.4.91">section B.4.91</a>).
|
|
<p><code><nobr>FNSTENV</nobr></code> does the same thing as
|
|
<code><nobr>FSTENV</nobr></code>, without first waiting for pending
|
|
floating-point exceptions to clear.
|
|
<h4><a name="section-B.4.105">B.4.105 <code><nobr>FSTSW</nobr></code>: Store Floating-Point Status Word</a></h4>
|
|
<p><pre>
|
|
FSTSW mem16 ; 9B DD /7 [8086,FPU]
|
|
FSTSW AX ; 9B DF E0 [286,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FNSTSW mem16 ; DD /7 [8086,FPU]
|
|
FNSTSW AX ; DF E0 [286,FPU]
|
|
</pre>
|
|
<p><code><nobr>FSTSW</nobr></code> stores the <code><nobr>FPU</nobr></code>
|
|
status word into <code><nobr>AX</nobr></code> or into a 2-byte memory area.
|
|
<p><code><nobr>FNSTSW</nobr></code> does the same thing as
|
|
<code><nobr>FSTSW</nobr></code>, without first waiting for pending
|
|
floating-point exceptions to clear.
|
|
<h4><a name="section-B.4.106">B.4.106 <code><nobr>FSUB</nobr></code>, <code><nobr>FSUBP</nobr></code>, <code><nobr>FSUBR</nobr></code>, <code><nobr>FSUBRP</nobr></code>: Floating-Point Subtract</a></h4>
|
|
<p><pre>
|
|
FSUB mem32 ; D8 /4 [8086,FPU]
|
|
FSUB mem64 ; DC /4 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSUB fpureg ; D8 E0+r [8086,FPU]
|
|
FSUB ST0,fpureg ; D8 E0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSUB TO fpureg ; DC E8+r [8086,FPU]
|
|
FSUB fpureg,ST0 ; DC E8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSUBR mem32 ; D8 /5 [8086,FPU]
|
|
FSUBR mem64 ; DC /5 [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSUBR fpureg ; D8 E8+r [8086,FPU]
|
|
FSUBR ST0,fpureg ; D8 E8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSUBR TO fpureg ; DC E0+r [8086,FPU]
|
|
FSUBR fpureg,ST0 ; DC E0+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSUBP fpureg ; DE E8+r [8086,FPU]
|
|
FSUBP fpureg,ST0 ; DE E8+r [8086,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FSUBRP fpureg ; DE E0+r [8086,FPU]
|
|
FSUBRP fpureg,ST0 ; DE E0+r [8086,FPU]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>FSUB</nobr></code> subtracts the given operand from
|
|
<code><nobr>ST0</nobr></code> and stores the result back in
|
|
<code><nobr>ST0</nobr></code>, unless the <code><nobr>TO</nobr></code>
|
|
qualifier is given, in which case it subtracts
|
|
<code><nobr>ST0</nobr></code> from the given operand and stores the result
|
|
in the operand.
|
|
<li><code><nobr>FSUBR</nobr></code> does the same thing, but does the
|
|
subtraction the other way up: so if <code><nobr>TO</nobr></code> is not
|
|
given, it subtracts <code><nobr>ST0</nobr></code> from the given operand
|
|
and stores the result in <code><nobr>ST0</nobr></code>, whereas if
|
|
<code><nobr>TO</nobr></code> is given it subtracts its operand from
|
|
<code><nobr>ST0</nobr></code> and stores the result in the operand.
|
|
<li><code><nobr>FSUBP</nobr></code> operates like
|
|
<code><nobr>FSUB TO</nobr></code>, but pops the register stack once it has
|
|
finished.
|
|
<li><code><nobr>FSUBRP</nobr></code> operates like
|
|
<code><nobr>FSUBR TO</nobr></code>, but pops the register stack once it has
|
|
finished.
|
|
</ul>
|
|
<h4><a name="section-B.4.107">B.4.107 <code><nobr>FTST</nobr></code>: Test <code><nobr>ST0</nobr></code> Against Zero</a></h4>
|
|
<p><pre>
|
|
FTST ; D9 E4 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FTST</nobr></code> compares <code><nobr>ST0</nobr></code>
|
|
with zero and sets the FPU flags accordingly. <code><nobr>ST0</nobr></code>
|
|
is treated as the left-hand side of the comparison, so that a `less-than'
|
|
result is generated if <code><nobr>ST0</nobr></code> is negative.
|
|
<h4><a name="section-B.4.108">B.4.108 <code><nobr>FUCOMxx</nobr></code>: Floating-Point Unordered Compare</a></h4>
|
|
<p><pre>
|
|
FUCOM fpureg ; DD E0+r [386,FPU]
|
|
FUCOM ST0,fpureg ; DD E0+r [386,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FUCOMP fpureg ; DD E8+r [386,FPU]
|
|
FUCOMP ST0,fpureg ; DD E8+r [386,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FUCOMPP ; DA E9 [386,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FUCOMI fpureg ; DB E8+r [P6,FPU]
|
|
FUCOMI ST0,fpureg ; DB E8+r [P6,FPU]
|
|
</pre>
|
|
<p><pre>
|
|
FUCOMIP fpureg ; DF E8+r [P6,FPU]
|
|
FUCOMIP ST0,fpureg ; DF E8+r [P6,FPU]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>FUCOM</nobr></code> compares <code><nobr>ST0</nobr></code>
|
|
with the given operand, and sets the FPU flags accordingly.
|
|
<code><nobr>ST0</nobr></code> is treated as the left-hand side of the
|
|
comparison, so that the carry flag is set (for a `less-than' result) if
|
|
<code><nobr>ST0</nobr></code> is less than the given operand.
|
|
<li><code><nobr>FUCOMP</nobr></code> does the same as
|
|
<code><nobr>FUCOM</nobr></code>, but pops the register stack afterwards.
|
|
<code><nobr>FUCOMPP</nobr></code> compares <code><nobr>ST0</nobr></code>
|
|
with <code><nobr>ST1</nobr></code> and then pops the register stack twice.
|
|
<li><code><nobr>FUCOMI</nobr></code> and <code><nobr>FUCOMIP</nobr></code>
|
|
work like the corresponding forms of <code><nobr>FUCOM</nobr></code> and
|
|
<code><nobr>FUCOMP</nobr></code>, but write their results directly to the
|
|
CPU flags register rather than the FPU status word, so they can be
|
|
immediately followed by conditional jump or conditional move instructions.
|
|
</ul>
|
|
<p>The <code><nobr>FUCOM</nobr></code> instructions differ from the
|
|
<code><nobr>FCOM</nobr></code> instructions
|
|
(<a href="#section-B.4.73">section B.4.73</a>) only in the way they handle
|
|
quiet NaNs: <code><nobr>FUCOM</nobr></code> will handle them silently and
|
|
set the condition code flags to an `unordered' result, whereas
|
|
<code><nobr>FCOM</nobr></code> will generate an exception.
|
|
<h4><a name="section-B.4.109">B.4.109 <code><nobr>FXAM</nobr></code>: Examine Class of Value in <code><nobr>ST0</nobr></code></a></h4>
|
|
<p><pre>
|
|
FXAM ; D9 E5 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FXAM</nobr></code> sets the FPU flags
|
|
<code><nobr>C3</nobr></code>, <code><nobr>C2</nobr></code> and
|
|
<code><nobr>C0</nobr></code> depending on the type of value stored in
|
|
<code><nobr>ST0</nobr></code>:
|
|
<p><pre>
|
|
Register contents Flags
|
|
</pre>
|
|
<p><pre>
|
|
Unsupported format 000
|
|
NaN 001
|
|
Finite number 010
|
|
Infinity 011
|
|
Zero 100
|
|
Empty register 101
|
|
Denormal 110
|
|
</pre>
|
|
<p>Additionally, the <code><nobr>C1</nobr></code> flag is set to the sign
|
|
of the number.
|
|
<h4><a name="section-B.4.110">B.4.110 <code><nobr>FXCH</nobr></code>: Floating-Point Exchange</a></h4>
|
|
<p><pre>
|
|
FXCH ; D9 C9 [8086,FPU]
|
|
FXCH fpureg ; D9 C8+r [8086,FPU]
|
|
FXCH fpureg,ST0 ; D9 C8+r [8086,FPU]
|
|
FXCH ST0,fpureg ; D9 C8+r [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FXCH</nobr></code> exchanges <code><nobr>ST0</nobr></code>
|
|
with a given FPU register. The no-operand form exchanges
|
|
<code><nobr>ST0</nobr></code> with <code><nobr>ST1</nobr></code>.
|
|
<h4><a name="section-B.4.111">B.4.111 <code><nobr>FXRSTOR</nobr></code>: Restore <code><nobr>FP</nobr></code>, <code><nobr>MMX</nobr></code> and <code><nobr>SSE</nobr></code> State</a></h4>
|
|
<p><pre>
|
|
FXRSTOR memory ; 0F AE /1 [P6,SSE,FPU]
|
|
</pre>
|
|
<p>The <code><nobr>FXRSTOR</nobr></code> instruction reloads the
|
|
<code><nobr>FPU</nobr></code>, <code><nobr>MMX</nobr></code> and
|
|
<code><nobr>SSE</nobr></code> state (environment and registers), from the
|
|
512 byte memory area defined by the source operand. This data should have
|
|
been written by a previous <code><nobr>FXSAVE</nobr></code>.
|
|
<h4><a name="section-B.4.112">B.4.112 <code><nobr>FXSAVE</nobr></code>: Store <code><nobr>FP</nobr></code>, <code><nobr>MMX</nobr></code> and <code><nobr>SSE</nobr></code> State</a></h4>
|
|
<p><pre>
|
|
FXSAVE memory ; 0F AE /0 [P6,SSE,FPU]
|
|
</pre>
|
|
<p><code><nobr>FXSAVE</nobr></code>The FXSAVE instruction writes the
|
|
current <code><nobr>FPU</nobr></code>, <code><nobr>MMX</nobr></code> and
|
|
<code><nobr>SSE</nobr></code> technology states (environment and
|
|
registers), to the 512 byte memory area defined by the destination operand.
|
|
It does this without checking for pending unmasked floating-point
|
|
exceptions (similar to the operation of <code><nobr>FNSAVE</nobr></code>).
|
|
<p>Unlike the <code><nobr>FSAVE/FNSAVE</nobr></code> instructions, the
|
|
processor retains the contents of the <code><nobr>FPU</nobr></code>,
|
|
<code><nobr>MMX</nobr></code> and <code><nobr>SSE</nobr></code> state in
|
|
the processor after the state has been saved. This instruction has been
|
|
optimised to maximize floating-point save performance.
|
|
<h4><a name="section-B.4.113">B.4.113 <code><nobr>FXTRACT</nobr></code>: Extract Exponent and Significand</a></h4>
|
|
<p><pre>
|
|
FXTRACT ; D9 F4 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FXTRACT</nobr></code> separates the number in
|
|
<code><nobr>ST0</nobr></code> into its exponent and significand (mantissa),
|
|
stores the exponent back into <code><nobr>ST0</nobr></code>, and then
|
|
pushes the significand on the register stack (so that the significand ends
|
|
up in <code><nobr>ST0</nobr></code>, and the exponent in
|
|
<code><nobr>ST1</nobr></code>).
|
|
<h4><a name="section-B.4.114">B.4.114 <code><nobr>FYL2X</nobr></code>, <code><nobr>FYL2XP1</nobr></code>: Compute Y times Log2(X) or Log2(X+1)</a></h4>
|
|
<p><pre>
|
|
FYL2X ; D9 F1 [8086,FPU]
|
|
FYL2XP1 ; D9 F9 [8086,FPU]
|
|
</pre>
|
|
<p><code><nobr>FYL2X</nobr></code> multiplies <code><nobr>ST1</nobr></code>
|
|
by the base-2 logarithm of <code><nobr>ST0</nobr></code>, stores the result
|
|
in <code><nobr>ST1</nobr></code>, and pops the register stack (so that the
|
|
result ends up in <code><nobr>ST0</nobr></code>).
|
|
<code><nobr>ST0</nobr></code> must be non-zero and positive.
|
|
<p><code><nobr>FYL2XP1</nobr></code> works the same way, but replacing the
|
|
base-2 log of <code><nobr>ST0</nobr></code> with that of
|
|
<code><nobr>ST0</nobr></code> plus one. This time,
|
|
<code><nobr>ST0</nobr></code> must have magnitude no greater than 1 minus
|
|
half the square root of two.
|
|
<h4><a name="section-B.4.115">B.4.115 <code><nobr>HLT</nobr></code>: Halt Processor</a></h4>
|
|
<p><pre>
|
|
HLT ; F4 [8086,PRIV]
|
|
</pre>
|
|
<p><code><nobr>HLT</nobr></code> puts the processor into a halted state,
|
|
where it will perform no more operations until restarted by an interrupt or
|
|
a reset.
|
|
<p>On the 286 and later processors, this is a privileged instruction.
|
|
<h4><a name="section-B.4.116">B.4.116 <code><nobr>IBTS</nobr></code>: Insert Bit String</a></h4>
|
|
<p><pre>
|
|
IBTS r/m16,reg16 ; o16 0F A7 /r [386,UNDOC]
|
|
IBTS r/m32,reg32 ; o32 0F A7 /r [386,UNDOC]
|
|
</pre>
|
|
<p>The implied operation of this instruction is:
|
|
<p><pre>
|
|
IBTS r/m16,AX,CL,reg16
|
|
IBTS r/m32,EAX,CL,reg32
|
|
</pre>
|
|
<p>Writes a bit string from the source operand to the destination.
|
|
<code><nobr>CL</nobr></code> indicates the number of bits to be copied,
|
|
from the low bits of the source. <code><nobr>(E)AX</nobr></code> indicates
|
|
the low order bit offset in the destination that is written to. For
|
|
example, if <code><nobr>CL</nobr></code> is set to 4 and
|
|
<code><nobr>AX</nobr></code> (for 16-bit code) is set to 5, bits 0-3 of
|
|
<code><nobr>src</nobr></code> will be copied to bits 5-8 of
|
|
<code><nobr>dst</nobr></code>. This instruction is very poorly documented,
|
|
and I have been unable to find any official source of documentation on it.
|
|
<p><code><nobr>IBTS</nobr></code> is supported only on the early Intel
|
|
386s, and conflicts with the opcodes for
|
|
<code><nobr>CMPXCHG486</nobr></code> (on early Intel 486s). NASM supports
|
|
it only for completeness. Its counterpart is <code><nobr>XBTS</nobr></code>
|
|
(see <a href="#section-B.4.332">section B.4.332</a>).
|
|
<h4><a name="section-B.4.117">B.4.117 <code><nobr>IDIV</nobr></code>: Signed Integer Divide</a></h4>
|
|
<p><pre>
|
|
IDIV r/m8 ; F6 /7 [8086]
|
|
IDIV r/m16 ; o16 F7 /7 [8086]
|
|
IDIV r/m32 ; o32 F7 /7 [386]
|
|
</pre>
|
|
<p><code><nobr>IDIV</nobr></code> performs signed integer division. The
|
|
explicit operand provided is the divisor; the dividend and destination
|
|
operands are implicit, in the following way:
|
|
<ul>
|
|
<li>For <code><nobr>IDIV r/m8</nobr></code>, <code><nobr>AX</nobr></code>
|
|
is divided by the given operand; the quotient is stored in
|
|
<code><nobr>AL</nobr></code> and the remainder in
|
|
<code><nobr>AH</nobr></code>.
|
|
<li>For <code><nobr>IDIV r/m16</nobr></code>,
|
|
<code><nobr>DX:AX</nobr></code> is divided by the given operand; the
|
|
quotient is stored in <code><nobr>AX</nobr></code> and the remainder in
|
|
<code><nobr>DX</nobr></code>.
|
|
<li>For <code><nobr>IDIV r/m32</nobr></code>,
|
|
<code><nobr>EDX:EAX</nobr></code> is divided by the given operand; the
|
|
quotient is stored in <code><nobr>EAX</nobr></code> and the remainder in
|
|
<code><nobr>EDX</nobr></code>.
|
|
</ul>
|
|
<p>Unsigned integer division is performed by the
|
|
<code><nobr>DIV</nobr></code> instruction: see
|
|
<a href="#section-B.4.59">section B.4.59</a>.
|
|
<h4><a name="section-B.4.118">B.4.118 <code><nobr>IMUL</nobr></code>: Signed Integer Multiply</a></h4>
|
|
<p><pre>
|
|
IMUL r/m8 ; F6 /5 [8086]
|
|
IMUL r/m16 ; o16 F7 /5 [8086]
|
|
IMUL r/m32 ; o32 F7 /5 [386]
|
|
</pre>
|
|
<p><pre>
|
|
IMUL reg16,r/m16 ; o16 0F AF /r [386]
|
|
IMUL reg32,r/m32 ; o32 0F AF /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
IMUL reg16,imm8 ; o16 6B /r ib [186]
|
|
IMUL reg16,imm16 ; o16 69 /r iw [186]
|
|
IMUL reg32,imm8 ; o32 6B /r ib [386]
|
|
IMUL reg32,imm32 ; o32 69 /r id [386]
|
|
</pre>
|
|
<p><pre>
|
|
IMUL reg16,r/m16,imm8 ; o16 6B /r ib [186]
|
|
IMUL reg16,r/m16,imm16 ; o16 69 /r iw [186]
|
|
IMUL reg32,r/m32,imm8 ; o32 6B /r ib [386]
|
|
IMUL reg32,r/m32,imm32 ; o32 69 /r id [386]
|
|
</pre>
|
|
<p><code><nobr>IMUL</nobr></code> performs signed integer multiplication.
|
|
For the single-operand form, the other operand and destination are
|
|
implicit, in the following way:
|
|
<ul>
|
|
<li>For <code><nobr>IMUL r/m8</nobr></code>, <code><nobr>AL</nobr></code>
|
|
is multiplied by the given operand; the product is stored in
|
|
<code><nobr>AX</nobr></code>.
|
|
<li>For <code><nobr>IMUL r/m16</nobr></code>, <code><nobr>AX</nobr></code>
|
|
is multiplied by the given operand; the product is stored in
|
|
<code><nobr>DX:AX</nobr></code>.
|
|
<li>For <code><nobr>IMUL r/m32</nobr></code>, <code><nobr>EAX</nobr></code>
|
|
is multiplied by the given operand; the product is stored in
|
|
<code><nobr>EDX:EAX</nobr></code>.
|
|
</ul>
|
|
<p>The two-operand form multiplies its two operands and stores the result
|
|
in the destination (first) operand. The three-operand form multiplies its
|
|
last two operands and stores the result in the first operand.
|
|
<p>The two-operand form with an immediate second operand is in fact a
|
|
shorthand for the three-operand form, as can be seen by examining the
|
|
opcode descriptions: in the two-operand form, the code
|
|
<code><nobr>/r</nobr></code> takes both its register and
|
|
<code><nobr>r/m</nobr></code> parts from the same operand (the first one).
|
|
<p>In the forms with an 8-bit immediate operand and another longer source
|
|
operand, the immediate operand is considered to be signed, and is
|
|
sign-extended to the length of the other source operand. In these cases,
|
|
the <code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<p>Unsigned integer multiplication is performed by the
|
|
<code><nobr>MUL</nobr></code> instruction: see
|
|
<a href="#section-B.4.184">section B.4.184</a>.
|
|
<h4><a name="section-B.4.119">B.4.119 <code><nobr>IN</nobr></code>: Input from I/O Port</a></h4>
|
|
<p><pre>
|
|
IN AL,imm8 ; E4 ib [8086]
|
|
IN AX,imm8 ; o16 E5 ib [8086]
|
|
IN EAX,imm8 ; o32 E5 ib [386]
|
|
IN AL,DX ; EC [8086]
|
|
IN AX,DX ; o16 ED [8086]
|
|
IN EAX,DX ; o32 ED [386]
|
|
</pre>
|
|
<p><code><nobr>IN</nobr></code> reads a byte, word or doubleword from the
|
|
specified I/O port, and stores it in the given destination register. The
|
|
port number may be specified as an immediate value if it is between 0 and
|
|
255, and otherwise must be stored in <code><nobr>DX</nobr></code>. See also
|
|
<code><nobr>OUT</nobr></code> (<a href="#section-B.4.194">section
|
|
B.4.194</a>).
|
|
<h4><a name="section-B.4.120">B.4.120 <code><nobr>INC</nobr></code>: Increment Integer</a></h4>
|
|
<p><pre>
|
|
INC reg16 ; o16 40+r [8086]
|
|
INC reg32 ; o32 40+r [386]
|
|
INC r/m8 ; FE /0 [8086]
|
|
INC r/m16 ; o16 FF /0 [8086]
|
|
INC r/m32 ; o32 FF /0 [386]
|
|
</pre>
|
|
<p><code><nobr>INC</nobr></code> adds 1 to its operand. It does
|
|
<em>not</em> affect the carry flag: to affect the carry flag, use
|
|
<code><nobr>ADD something,1</nobr></code> (see
|
|
<a href="#section-B.4.3">section B.4.3</a>). <code><nobr>INC</nobr></code>
|
|
affects all the other flags according to the result.
|
|
<p>This instruction can be used with a <code><nobr>LOCK</nobr></code>
|
|
prefix to allow atomic execution.
|
|
<p>See also <code><nobr>DEC</nobr></code>
|
|
(<a href="#section-B.4.58">section B.4.58</a>).
|
|
<h4><a name="section-B.4.121">B.4.121 <code><nobr>INSB</nobr></code>, <code><nobr>INSW</nobr></code>, <code><nobr>INSD</nobr></code>: Input String from I/O Port</a></h4>
|
|
<p><pre>
|
|
INSB ; 6C [186]
|
|
INSW ; o16 6D [186]
|
|
INSD ; o32 6D [386]
|
|
</pre>
|
|
<p><code><nobr>INSB</nobr></code> inputs a byte from the I/O port specified
|
|
in <code><nobr>DX</nobr></code> and stores it at
|
|
<code><nobr>[ES:DI]</nobr></code> or <code><nobr>[ES:EDI]</nobr></code>. It
|
|
then increments or decrements (depending on the direction flag: increments
|
|
if the flag is clear, decrements if it is set) <code><nobr>DI</nobr></code>
|
|
or <code><nobr>EDI</nobr></code>.
|
|
<p>The register used is <code><nobr>DI</nobr></code> if the address size is
|
|
16 bits, and <code><nobr>EDI</nobr></code> if it is 32 bits. If you need to
|
|
use an address size not equal to the current <code><nobr>BITS</nobr></code>
|
|
setting, you can use an explicit <code><nobr>a16</nobr></code> or
|
|
<code><nobr>a32</nobr></code> prefix.
|
|
<p>Segment override prefixes have no effect for this instruction: the use
|
|
of <code><nobr>ES</nobr></code> for the load from
|
|
<code><nobr>[DI]</nobr></code> or <code><nobr>[EDI]</nobr></code> cannot be
|
|
overridden.
|
|
<p><code><nobr>INSW</nobr></code> and <code><nobr>INSD</nobr></code> work
|
|
in the same way, but they input a word or a doubleword instead of a byte,
|
|
and increment or decrement the addressing register by 2 or 4 instead of 1.
|
|
<p>The <code><nobr>REP</nobr></code> prefix may be used to repeat the
|
|
instruction <code><nobr>CX</nobr></code> (or <code><nobr>ECX</nobr></code>
|
|
- again, the address size chooses which) times.
|
|
<p>See also <code><nobr>OUTSB</nobr></code>,
|
|
<code><nobr>OUTSW</nobr></code> and <code><nobr>OUTSD</nobr></code>
|
|
(<a href="#section-B.4.195">section B.4.195</a>).
|
|
<h4><a name="section-B.4.122">B.4.122 <code><nobr>INT</nobr></code>: Software Interrupt</a></h4>
|
|
<p><pre>
|
|
INT imm8 ; CD ib [8086]
|
|
</pre>
|
|
<p><code><nobr>INT</nobr></code> causes a software interrupt through a
|
|
specified vector number from 0 to 255.
|
|
<p>The code generated by the <code><nobr>INT</nobr></code> instruction is
|
|
always two bytes long: although there are short forms for some
|
|
<code><nobr>INT</nobr></code> instructions, NASM does not generate them
|
|
when it sees the <code><nobr>INT</nobr></code> mnemonic. In order to
|
|
generate single-byte breakpoint instructions, use the
|
|
<code><nobr>INT3</nobr></code> or <code><nobr>INT1</nobr></code>
|
|
instructions (see <a href="#section-B.4.123">section B.4.123</a>) instead.
|
|
<h4><a name="section-B.4.123">B.4.123 <code><nobr>INT3</nobr></code>, <code><nobr>INT1</nobr></code>, <code><nobr>ICEBP</nobr></code>, <code><nobr>INT01</nobr></code>: Breakpoints</a></h4>
|
|
<p><pre>
|
|
INT1 ; F1 [P6]
|
|
ICEBP ; F1 [P6]
|
|
INT01 ; F1 [P6]
|
|
</pre>
|
|
<p><pre>
|
|
INT3 ; CC [8086]
|
|
INT03 ; CC [8086]
|
|
</pre>
|
|
<p><code><nobr>INT1</nobr></code> and <code><nobr>INT3</nobr></code> are
|
|
short one-byte forms of the instructions <code><nobr>INT 1</nobr></code>
|
|
and <code><nobr>INT 3</nobr></code> (see <a href="#section-B.4.122">section
|
|
B.4.122</a>). They perform a similar function to their longer counterparts,
|
|
but take up less code space. They are used as breakpoints by debuggers.
|
|
<ul>
|
|
<li><code><nobr>INT1</nobr></code>, and its alternative synonyms
|
|
<code><nobr>INT01</nobr></code> and <code><nobr>ICEBP</nobr></code>, is an
|
|
instruction used by in-circuit emulators (ICEs). It is present, though not
|
|
documented, on some processors down to the 286, but is only documented for
|
|
the Pentium Pro. <code><nobr>INT3</nobr></code> is the instruction normally
|
|
used as a breakpoint by debuggers.
|
|
<li><code><nobr>INT3</nobr></code>, and its synonym
|
|
<code><nobr>INT03</nobr></code>, is not precisely equivalent to
|
|
<code><nobr>INT 3</nobr></code>: the short form, since it is designed to be
|
|
used as a breakpoint, bypasses the normal <code><nobr>IOPL</nobr></code>
|
|
checks in virtual-8086 mode, and also does not go through interrupt
|
|
redirection.
|
|
</ul>
|
|
<h4><a name="section-B.4.124">B.4.124 <code><nobr>INTO</nobr></code>: Interrupt if Overflow</a></h4>
|
|
<p><pre>
|
|
INTO ; CE [8086]
|
|
</pre>
|
|
<p><code><nobr>INTO</nobr></code> performs an
|
|
<code><nobr>INT 4</nobr></code> software interrupt (see
|
|
<a href="#section-B.4.122">section B.4.122</a>) if and only if the overflow
|
|
flag is set.
|
|
<h4><a name="section-B.4.125">B.4.125 <code><nobr>INVD</nobr></code>: Invalidate Internal Caches</a></h4>
|
|
<p><pre>
|
|
INVD ; 0F 08 [486]
|
|
</pre>
|
|
<p><code><nobr>INVD</nobr></code> invalidates and empties the processor's
|
|
internal caches, and causes the processor to instruct external caches to do
|
|
the same. It does not write the contents of the caches back to memory
|
|
first: any modified data held in the caches will be lost. To write the data
|
|
back first, use <code><nobr>WBINVD</nobr></code>
|
|
(<a href="#section-B.4.328">section B.4.328</a>).
|
|
<h4><a name="section-B.4.126">B.4.126 <code><nobr>INVLPG</nobr></code>: Invalidate TLB Entry</a></h4>
|
|
<p><pre>
|
|
INVLPG mem ; 0F 01 /7 [486]
|
|
</pre>
|
|
<p><code><nobr>INVLPG</nobr></code> invalidates the translation lookahead
|
|
buffer (TLB) entry associated with the supplied memory address.
|
|
<h4><a name="section-B.4.127">B.4.127 <code><nobr>IRET</nobr></code>, <code><nobr>IRETW</nobr></code>, <code><nobr>IRETD</nobr></code>: Return from Interrupt</a></h4>
|
|
<p><pre>
|
|
IRET ; CF [8086]
|
|
IRETW ; o16 CF [8086]
|
|
IRETD ; o32 CF [386]
|
|
</pre>
|
|
<p><code><nobr>IRET</nobr></code> returns from an interrupt (hardware or
|
|
software) by means of popping <code><nobr>IP</nobr></code> (or
|
|
<code><nobr>EIP</nobr></code>), <code><nobr>CS</nobr></code> and the flags
|
|
off the stack and then continuing execution from the new
|
|
<code><nobr>CS:IP</nobr></code>.
|
|
<p><code><nobr>IRETW</nobr></code> pops <code><nobr>IP</nobr></code>,
|
|
<code><nobr>CS</nobr></code> and the flags as 2 bytes each, taking 6 bytes
|
|
off the stack in total. <code><nobr>IRETD</nobr></code> pops
|
|
<code><nobr>EIP</nobr></code> as 4 bytes, pops a further 4 bytes of which
|
|
the top two are discarded and the bottom two go into
|
|
<code><nobr>CS</nobr></code>, and pops the flags as 4 bytes as well, taking
|
|
12 bytes off the stack.
|
|
<p><code><nobr>IRET</nobr></code> is a shorthand for either
|
|
<code><nobr>IRETW</nobr></code> or <code><nobr>IRETD</nobr></code>,
|
|
depending on the default <code><nobr>BITS</nobr></code> setting at the
|
|
time.
|
|
<h4><a name="section-B.4.128">B.4.128 <code><nobr>Jcc</nobr></code>: Conditional Branch</a></h4>
|
|
<p><pre>
|
|
Jcc imm ; 70+cc rb [8086]
|
|
Jcc NEAR imm ; 0F 80+cc rw/rd [386]
|
|
</pre>
|
|
<p>The conditional jump instructions execute a near (same segment) jump if
|
|
and only if their conditions are satisfied. For example,
|
|
<code><nobr>JNZ</nobr></code> jumps only if the zero flag is not set.
|
|
<p>The ordinary form of the instructions has only a 128-byte range; the
|
|
<code><nobr>NEAR</nobr></code> form is a 386 extension to the instruction
|
|
set, and can span the full size of a segment. NASM will not override your
|
|
choice of jump instruction: if you want <code><nobr>Jcc NEAR</nobr></code>,
|
|
you have to use the <code><nobr>NEAR</nobr></code> keyword.
|
|
<p>The <code><nobr>SHORT</nobr></code> keyword is allowed on the first form
|
|
of the instruction, for clarity, but is not necessary.
|
|
<p>For details of the condition codes, see <a href="#section-B.2.2">section
|
|
B.2.2</a>.
|
|
<h4><a name="section-B.4.129">B.4.129 <code><nobr>JCXZ</nobr></code>, <code><nobr>JECXZ</nobr></code>: Jump if CX/ECX Zero</a></h4>
|
|
<p><pre>
|
|
JCXZ imm ; a16 E3 rb [8086]
|
|
JECXZ imm ; a32 E3 rb [386]
|
|
</pre>
|
|
<p><code><nobr>JCXZ</nobr></code> performs a short jump (with maximum range
|
|
128 bytes) if and only if the contents of the <code><nobr>CX</nobr></code>
|
|
register is 0. <code><nobr>JECXZ</nobr></code> does the same thing, but
|
|
with <code><nobr>ECX</nobr></code>.
|
|
<h4><a name="section-B.4.130">B.4.130 <code><nobr>JMP</nobr></code>: Jump</a></h4>
|
|
<p><pre>
|
|
JMP imm ; E9 rw/rd [8086]
|
|
JMP SHORT imm ; EB rb [8086]
|
|
JMP imm:imm16 ; o16 EA iw iw [8086]
|
|
JMP imm:imm32 ; o32 EA id iw [386]
|
|
JMP FAR mem ; o16 FF /5 [8086]
|
|
JMP FAR mem32 ; o32 FF /5 [386]
|
|
JMP r/m16 ; o16 FF /4 [8086]
|
|
JMP r/m32 ; o32 FF /4 [386]
|
|
</pre>
|
|
<p><code><nobr>JMP</nobr></code> jumps to a given address. The address may
|
|
be specified as an absolute segment and offset, or as a relative jump
|
|
within the current segment.
|
|
<p><code><nobr>JMP SHORT imm</nobr></code> has a maximum range of 128
|
|
bytes, since the displacement is specified as only 8 bits, but takes up
|
|
less code space. NASM does not choose when to generate
|
|
<code><nobr>JMP SHORT</nobr></code> for you: you must explicitly code
|
|
<code><nobr>SHORT</nobr></code> every time you want a short jump.
|
|
<p>You can choose between the two immediate far jump forms
|
|
(<code><nobr>JMP imm:imm</nobr></code>) by the use of the
|
|
<code><nobr>WORD</nobr></code> and <code><nobr>DWORD</nobr></code>
|
|
keywords: <code><nobr>JMP WORD 0x1234:0x5678</nobr></code>) or
|
|
<code><nobr>JMP DWORD 0x1234:0x56789abc</nobr></code>.
|
|
<p>The <code><nobr>JMP FAR mem</nobr></code> forms execute a far jump by
|
|
loading the destination address out of memory. The address loaded consists
|
|
of 16 or 32 bits of offset (depending on the operand size), and 16 bits of
|
|
segment. The operand size may be overridden using
|
|
<code><nobr>JMP WORD FAR mem</nobr></code> or
|
|
<code><nobr>JMP DWORD FAR mem</nobr></code>.
|
|
<p>The <code><nobr>JMP r/m</nobr></code> forms execute a near jump (within
|
|
the same segment), loading the destination address out of memory or out of
|
|
a register. The keyword <code><nobr>NEAR</nobr></code> may be specified,
|
|
for clarity, in these forms, but is not necessary. Again, operand size can
|
|
be overridden using <code><nobr>JMP WORD mem</nobr></code> or
|
|
<code><nobr>JMP DWORD mem</nobr></code>.
|
|
<p>As a convenience, NASM does not require you to jump to a far symbol by
|
|
coding the cumbersome <code><nobr>JMP SEG routine:routine</nobr></code>,
|
|
but instead allows the easier synonym
|
|
<code><nobr>JMP FAR routine</nobr></code>.
|
|
<p>The <code><nobr>CALL r/m</nobr></code> forms given above are near calls;
|
|
NASM will accept the <code><nobr>NEAR</nobr></code> keyword (e.g.
|
|
<code><nobr>CALL NEAR [address]</nobr></code>), even though it is not
|
|
strictly necessary.
|
|
<h4><a name="section-B.4.131">B.4.131 <code><nobr>LAHF</nobr></code>: Load AH from Flags</a></h4>
|
|
<p><pre>
|
|
LAHF ; 9F [8086]
|
|
</pre>
|
|
<p><code><nobr>LAHF</nobr></code> sets the <code><nobr>AH</nobr></code>
|
|
register according to the contents of the low byte of the flags word.
|
|
<p>The operation of <code><nobr>LAHF</nobr></code> is:
|
|
<p><pre>
|
|
AH <-- SF:ZF:0:AF:0:PF:1:CF
|
|
</pre>
|
|
<p>See also <code><nobr>SAHF</nobr></code>
|
|
(<a href="#section-B.4.282">section B.4.282</a>).
|
|
<h4><a name="section-B.4.132">B.4.132 <code><nobr>LAR</nobr></code>: Load Access Rights</a></h4>
|
|
<p><pre>
|
|
LAR reg16,r/m16 ; o16 0F 02 /r [286,PRIV]
|
|
LAR reg32,r/m32 ; o32 0F 02 /r [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>LAR</nobr></code> takes the segment selector specified by
|
|
its source (second) operand, finds the corresponding segment descriptor in
|
|
the GDT or LDT, and loads the access-rights byte of the descriptor into its
|
|
destination (first) operand.
|
|
<h4><a name="section-B.4.133">B.4.133 <code><nobr>LDMXCSR</nobr></code>: Load Streaming SIMD Extension Control/Status</a></h4>
|
|
<p><pre>
|
|
LDMXCSR mem32 ; 0F AE /2 [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>LDMXCSR</nobr></code> loads 32-bits of data from the
|
|
specified memory location into the <code><nobr>MXCSR</nobr></code>
|
|
control/status register. <code><nobr>MXCSR</nobr></code> is used to enable
|
|
masked/unmasked exception handling, to set rounding modes, to set
|
|
flush-to-zero mode, and to view exception status flags.
|
|
<p>For details of the <code><nobr>MXCSR</nobr></code> register, see the
|
|
Intel processor docs.
|
|
<p>See also <code><nobr>STMXCSR</nobr></code>
|
|
(<a href="#section-B.4.302">section B.4.302</a>
|
|
<h4><a name="section-B.4.134">B.4.134 <code><nobr>LDS</nobr></code>, <code><nobr>LES</nobr></code>, <code><nobr>LFS</nobr></code>, <code><nobr>LGS</nobr></code>, <code><nobr>LSS</nobr></code>: Load Far Pointer</a></h4>
|
|
<p><pre>
|
|
LDS reg16,mem ; o16 C5 /r [8086]
|
|
LDS reg32,mem ; o32 C5 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
LES reg16,mem ; o16 C4 /r [8086]
|
|
LES reg32,mem ; o32 C4 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
LFS reg16,mem ; o16 0F B4 /r [386]
|
|
LFS reg32,mem ; o32 0F B4 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
LGS reg16,mem ; o16 0F B5 /r [386]
|
|
LGS reg32,mem ; o32 0F B5 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
LSS reg16,mem ; o16 0F B2 /r [386]
|
|
LSS reg32,mem ; o32 0F B2 /r [386]
|
|
</pre>
|
|
<p>These instructions load an entire far pointer (16 or 32 bits of offset,
|
|
plus 16 bits of segment) out of memory in one go.
|
|
<code><nobr>LDS</nobr></code>, for example, loads 16 or 32 bits from the
|
|
given memory address into the given register (depending on the size of the
|
|
register), then loads the <em>next</em> 16 bits from memory into
|
|
<code><nobr>DS</nobr></code>. <code><nobr>LES</nobr></code>,
|
|
<code><nobr>LFS</nobr></code>, <code><nobr>LGS</nobr></code> and
|
|
<code><nobr>LSS</nobr></code> work in the same way but use the other
|
|
segment registers.
|
|
<h4><a name="section-B.4.135">B.4.135 <code><nobr>LEA</nobr></code>: Load Effective Address</a></h4>
|
|
<p><pre>
|
|
LEA reg16,mem ; o16 8D /r [8086]
|
|
LEA reg32,mem ; o32 8D /r [386]
|
|
</pre>
|
|
<p><code><nobr>LEA</nobr></code>, despite its syntax, does not access
|
|
memory. It calculates the effective address specified by its second operand
|
|
as if it were going to load or store data from it, but instead it stores
|
|
the calculated address into the register specified by its first operand.
|
|
This can be used to perform quite complex calculations (e.g.
|
|
<code><nobr>LEA EAX,[EBX+ECX*4+100]</nobr></code>) in one instruction.
|
|
<p><code><nobr>LEA</nobr></code>, despite being a purely arithmetic
|
|
instruction which accesses no memory, still requires square brackets around
|
|
its second operand, as if it were a memory reference.
|
|
<p>The size of the calculation is the current <em>address</em> size, and
|
|
the size that the result is stored as is the current <em>operand</em> size.
|
|
If the address and operand size are not the same, then if the addressing
|
|
mode was 32-bits, the low 16-bits are stored, and if the address was
|
|
16-bits, it is zero-extended to 32-bits before storing.
|
|
<h4><a name="section-B.4.136">B.4.136 <code><nobr>LEAVE</nobr></code>: Destroy Stack Frame</a></h4>
|
|
<p><pre>
|
|
LEAVE ; C9 [186]
|
|
</pre>
|
|
<p><code><nobr>LEAVE</nobr></code> destroys a stack frame of the form
|
|
created by the <code><nobr>ENTER</nobr></code> instruction (see
|
|
<a href="#section-B.4.65">section B.4.65</a>). It is functionally
|
|
equivalent to <code><nobr>MOV ESP,EBP</nobr></code> followed by
|
|
<code><nobr>POP EBP</nobr></code> (or <code><nobr>MOV SP,BP</nobr></code>
|
|
followed by <code><nobr>POP BP</nobr></code> in 16-bit mode).
|
|
<h4><a name="section-B.4.137">B.4.137 <code><nobr>LFENCE</nobr></code>: Load Fence</a></h4>
|
|
<p><pre>
|
|
LFENCE ; 0F AE /5 [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>LFENCE</nobr></code> performs a serialising operation on all
|
|
loads from memory that were issued before the
|
|
<code><nobr>LFENCE</nobr></code> instruction. This guarantees that all
|
|
memory reads before the <code><nobr>LFENCE</nobr></code> instruction are
|
|
visible before any reads after the <code><nobr>LFENCE</nobr></code>
|
|
instruction.
|
|
<p><code><nobr>LFENCE</nobr></code> is ordered respective to other
|
|
<code><nobr>LFENCE</nobr></code> instruction,
|
|
<code><nobr>MFENCE</nobr></code>, any memory read and any other serialising
|
|
instruction (such as <code><nobr>CPUID</nobr></code>).
|
|
<p>Weakly ordered memory types can be used to achieve higher processor
|
|
performance through such techniques as out-of-order issue and speculative
|
|
reads. The degree to which a consumer of data recognizes or knows that the
|
|
data is weakly ordered varies among applications and may be unknown to the
|
|
producer of this data. The <code><nobr>LFENCE</nobr></code> instruction
|
|
provides a performance-efficient way of ensuring load ordering between
|
|
routines that produce weakly-ordered results and routines that consume that
|
|
data.
|
|
<p><code><nobr>LFENCE</nobr></code> uses the following ModRM encoding:
|
|
<p><pre>
|
|
Mod (7:6) = 11B
|
|
Reg/Opcode (5:3) = 101B
|
|
R/M (2:0) = 000B
|
|
</pre>
|
|
<p>All other ModRM encodings are defined to be reserved, and use of these
|
|
encodings risks incompatibility with future processors.
|
|
<p>See also <code><nobr>SFENCE</nobr></code>
|
|
(<a href="#section-B.4.288">section B.4.288</a>) and
|
|
<code><nobr>MFENCE</nobr></code> (<a href="#section-B.4.151">section
|
|
B.4.151</a>).
|
|
<h4><a name="section-B.4.138">B.4.138 <code><nobr>LGDT</nobr></code>, <code><nobr>LIDT</nobr></code>, <code><nobr>LLDT</nobr></code>: Load Descriptor Tables</a></h4>
|
|
<p><pre>
|
|
LGDT mem ; 0F 01 /2 [286,PRIV]
|
|
LIDT mem ; 0F 01 /3 [286,PRIV]
|
|
LLDT r/m16 ; 0F 00 /2 [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>LGDT</nobr></code> and <code><nobr>LIDT</nobr></code> both
|
|
take a 6-byte memory area as an operand: they load a 32-bit linear address
|
|
and a 16-bit size limit from that area (in the opposite order) into the
|
|
<code><nobr>GDTR</nobr></code> (global descriptor table register) or
|
|
<code><nobr>IDTR</nobr></code> (interrupt descriptor table register). These
|
|
are the only instructions which directly use <em>linear</em> addresses,
|
|
rather than segment/offset pairs.
|
|
<p><code><nobr>LLDT</nobr></code> takes a segment selector as an operand.
|
|
The processor looks up that selector in the GDT and stores the limit and
|
|
base address given there into the <code><nobr>LDTR</nobr></code> (local
|
|
descriptor table register).
|
|
<p>See also <code><nobr>SGDT</nobr></code>, <code><nobr>SIDT</nobr></code>
|
|
and <code><nobr>SLDT</nobr></code> (<a href="#section-B.4.289">section
|
|
B.4.289</a>).
|
|
<h4><a name="section-B.4.139">B.4.139 <code><nobr>LMSW</nobr></code>: Load/Store Machine Status Word</a></h4>
|
|
<p><pre>
|
|
LMSW r/m16 ; 0F 01 /6 [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>LMSW</nobr></code> loads the bottom four bits of the source
|
|
operand into the bottom four bits of the <code><nobr>CR0</nobr></code>
|
|
control register (or the Machine Status Word, on 286 processors). See also
|
|
<code><nobr>SMSW</nobr></code> (<a href="#section-B.4.296">section
|
|
B.4.296</a>).
|
|
<h4><a name="section-B.4.140">B.4.140 <code><nobr>LOADALL</nobr></code>, <code><nobr>LOADALL286</nobr></code>: Load Processor State</a></h4>
|
|
<p><pre>
|
|
LOADALL ; 0F 07 [386,UNDOC]
|
|
LOADALL286 ; 0F 05 [286,UNDOC]
|
|
</pre>
|
|
<p>This instruction, in its two different-opcode forms, is apparently
|
|
supported on most 286 processors, some 386 and possibly some 486. The
|
|
opcode differs between the 286 and the 386.
|
|
<p>The function of the instruction is to load all information relating to
|
|
the state of the processor out of a block of memory: on the 286, this block
|
|
is located implicitly at absolute address <code><nobr>0x800</nobr></code>,
|
|
and on the 386 and 486 it is at <code><nobr>[ES:EDI]</nobr></code>.
|
|
<h4><a name="section-B.4.141">B.4.141 <code><nobr>LODSB</nobr></code>, <code><nobr>LODSW</nobr></code>, <code><nobr>LODSD</nobr></code>: Load from String</a></h4>
|
|
<p><pre>
|
|
LODSB ; AC [8086]
|
|
LODSW ; o16 AD [8086]
|
|
LODSD ; o32 AD [386]
|
|
</pre>
|
|
<p><code><nobr>LODSB</nobr></code> loads a byte from
|
|
<code><nobr>[DS:SI]</nobr></code> or <code><nobr>[DS:ESI]</nobr></code>
|
|
into <code><nobr>AL</nobr></code>. It then increments or decrements
|
|
(depending on the direction flag: increments if the flag is clear,
|
|
decrements if it is set) <code><nobr>SI</nobr></code> or
|
|
<code><nobr>ESI</nobr></code>.
|
|
<p>The register used is <code><nobr>SI</nobr></code> if the address size is
|
|
16 bits, and <code><nobr>ESI</nobr></code> if it is 32 bits. If you need to
|
|
use an address size not equal to the current <code><nobr>BITS</nobr></code>
|
|
setting, you can use an explicit <code><nobr>a16</nobr></code> or
|
|
<code><nobr>a32</nobr></code> prefix.
|
|
<p>The segment register used to load from <code><nobr>[SI]</nobr></code> or
|
|
<code><nobr>[ESI]</nobr></code> can be overridden by using a segment
|
|
register name as a prefix (for example,
|
|
<code><nobr>ES LODSB</nobr></code>).
|
|
<p><code><nobr>LODSW</nobr></code> and <code><nobr>LODSD</nobr></code> work
|
|
in the same way, but they load a word or a doubleword instead of a byte,
|
|
and increment or decrement the addressing registers by 2 or 4 instead of 1.
|
|
<h4><a name="section-B.4.142">B.4.142 <code><nobr>LOOP</nobr></code>, <code><nobr>LOOPE</nobr></code>, <code><nobr>LOOPZ</nobr></code>, <code><nobr>LOOPNE</nobr></code>, <code><nobr>LOOPNZ</nobr></code>: Loop with Counter</a></h4>
|
|
<p><pre>
|
|
LOOP imm ; E2 rb [8086]
|
|
LOOP imm,CX ; a16 E2 rb [8086]
|
|
LOOP imm,ECX ; a32 E2 rb [386]
|
|
</pre>
|
|
<p><pre>
|
|
LOOPE imm ; E1 rb [8086]
|
|
LOOPE imm,CX ; a16 E1 rb [8086]
|
|
LOOPE imm,ECX ; a32 E1 rb [386]
|
|
LOOPZ imm ; E1 rb [8086]
|
|
LOOPZ imm,CX ; a16 E1 rb [8086]
|
|
LOOPZ imm,ECX ; a32 E1 rb [386]
|
|
</pre>
|
|
<p><pre>
|
|
LOOPNE imm ; E0 rb [8086]
|
|
LOOPNE imm,CX ; a16 E0 rb [8086]
|
|
LOOPNE imm,ECX ; a32 E0 rb [386]
|
|
LOOPNZ imm ; E0 rb [8086]
|
|
LOOPNZ imm,CX ; a16 E0 rb [8086]
|
|
LOOPNZ imm,ECX ; a32 E0 rb [386]
|
|
</pre>
|
|
<p><code><nobr>LOOP</nobr></code> decrements its counter register (either
|
|
<code><nobr>CX</nobr></code> or <code><nobr>ECX</nobr></code> - if one is
|
|
not specified explicitly, the <code><nobr>BITS</nobr></code> setting
|
|
dictates which is used) by one, and if the counter does not become zero as
|
|
a result of this operation, it jumps to the given label. The jump has a
|
|
range of 128 bytes.
|
|
<p><code><nobr>LOOPE</nobr></code> (or its synonym
|
|
<code><nobr>LOOPZ</nobr></code>) adds the additional condition that it only
|
|
jumps if the counter is nonzero <em>and</em> the zero flag is set.
|
|
Similarly, <code><nobr>LOOPNE</nobr></code> (and
|
|
<code><nobr>LOOPNZ</nobr></code>) jumps only if the counter is nonzero and
|
|
the zero flag is clear.
|
|
<h4><a name="section-B.4.143">B.4.143 <code><nobr>LSL</nobr></code>: Load Segment Limit</a></h4>
|
|
<p><pre>
|
|
LSL reg16,r/m16 ; o16 0F 03 /r [286,PRIV]
|
|
LSL reg32,r/m32 ; o32 0F 03 /r [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>LSL</nobr></code> is given a segment selector in its source
|
|
(second) operand; it computes the segment limit value by loading the
|
|
segment limit field from the associated segment descriptor in the
|
|
<code><nobr>GDT</nobr></code> or <code><nobr>LDT</nobr></code>. (This
|
|
involves shifting left by 12 bits if the segment limit is page-granular,
|
|
and not if it is byte-granular; so you end up with a byte limit in either
|
|
case.) The segment limit obtained is then loaded into the destination
|
|
(first) operand.
|
|
<h4><a name="section-B.4.144">B.4.144 <code><nobr>LTR</nobr></code>: Load Task Register</a></h4>
|
|
<p><pre>
|
|
LTR r/m16 ; 0F 00 /3 [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>LTR</nobr></code> looks up the segment base and limit in the
|
|
GDT or LDT descriptor specified by the segment selector given as its
|
|
operand, and loads them into the Task Register.
|
|
<h4><a name="section-B.4.145">B.4.145 <code><nobr>MASKMOVDQU</nobr></code>: Byte Mask Write</a></h4>
|
|
<p><pre>
|
|
MASKMOVDQU xmm1,xmm2 ; 66 0F F7 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MASKMOVDQU</nobr></code> stores data from xmm1 to the
|
|
location specified by <code><nobr>ES:(E)DI</nobr></code>. The size of the
|
|
store depends on the address-size attribute. The most significant bit in
|
|
each byte of the mask register xmm2 is used to selectively write the data
|
|
(0 = no write, 1 = write) on a per-byte basis.
|
|
<h4><a name="section-B.4.146">B.4.146 <code><nobr>MASKMOVQ</nobr></code>: Byte Mask Write</a></h4>
|
|
<p><pre>
|
|
MASKMOVQ mm1,mm2 ; 0F F7 /r [KATMAI,MMX]
|
|
</pre>
|
|
<p><code><nobr>MASKMOVQ</nobr></code> stores data from mm1 to the location
|
|
specified by <code><nobr>ES:(E)DI</nobr></code>. The size of the store
|
|
depends on the address-size attribute. The most significant bit in each
|
|
byte of the mask register mm2 is used to selectively write the data (0 = no
|
|
write, 1 = write) on a per-byte basis.
|
|
<h4><a name="section-B.4.147">B.4.147 <code><nobr>MAXPD</nobr></code>: Return Packed Double-Precision FP Maximum</a></h4>
|
|
<p><pre>
|
|
MAXPD xmm1,xmm2/m128 ; 66 0F 5F /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MAXPD</nobr></code> performs a SIMD compare of the packed
|
|
double-precision FP numbers from xmm1 and xmm2/mem, and stores the maximum
|
|
values of each pair of values in xmm1. If the values being compared are
|
|
both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128)
|
|
is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a
|
|
QNaN version of the SNaN is not returned).
|
|
<h4><a name="section-B.4.148">B.4.148 <code><nobr>MAXPS</nobr></code>: Return Packed Single-Precision FP Maximum</a></h4>
|
|
<p><pre>
|
|
MAXPS xmm1,xmm2/m128 ; 0F 5F /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MAXPS</nobr></code> performs a SIMD compare of the packed
|
|
single-precision FP numbers from xmm1 and xmm2/mem, and stores the maximum
|
|
values of each pair of values in xmm1. If the values being compared are
|
|
both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128)
|
|
is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a
|
|
QNaN version of the SNaN is not returned).
|
|
<h4><a name="section-B.4.149">B.4.149 <code><nobr>MAXSD</nobr></code>: Return Scalar Double-Precision FP Maximum</a></h4>
|
|
<p><pre>
|
|
MAXSD xmm1,xmm2/m64 ; F2 0F 5F /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MAXSD</nobr></code> compares the low-order double-precision
|
|
FP numbers from xmm1 and xmm2/mem, and stores the maximum value in xmm1. If
|
|
the values being compared are both zeroes, source2 (xmm2/m64) would be
|
|
returned. If source2 (xmm2/m64) is an SNaN, this SNaN is forwarded
|
|
unchanged to the destination (i.e., a QNaN version of the SNaN is not
|
|
returned). The high quadword of the destination is left unchanged.
|
|
<h4><a name="section-B.4.150">B.4.150 <code><nobr>MAXSS</nobr></code>: Return Scalar Single-Precision FP Maximum</a></h4>
|
|
<p><pre>
|
|
MAXSS xmm1,xmm2/m32 ; F3 0F 5F /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MAXSS</nobr></code> compares the low-order single-precision
|
|
FP numbers from xmm1 and xmm2/mem, and stores the maximum value in xmm1. If
|
|
the values being compared are both zeroes, source2 (xmm2/m32) would be
|
|
returned. If source2 (xmm2/m32) is an SNaN, this SNaN is forwarded
|
|
unchanged to the destination (i.e., a QNaN version of the SNaN is not
|
|
returned). The high three doublewords of the destination are left
|
|
unchanged.
|
|
<h4><a name="section-B.4.151">B.4.151 <code><nobr>MFENCE</nobr></code>: Memory Fence</a></h4>
|
|
<p><pre>
|
|
MFENCE ; 0F AE /6 [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MFENCE</nobr></code> performs a serialising operation on all
|
|
loads from memory and writes to memory that were issued before the
|
|
<code><nobr>MFENCE</nobr></code> instruction. This guarantees that all
|
|
memory reads and writes before the <code><nobr>MFENCE</nobr></code>
|
|
instruction are completed before any reads and writes after the
|
|
<code><nobr>MFENCE</nobr></code> instruction.
|
|
<p><code><nobr>MFENCE</nobr></code> is ordered respective to other
|
|
<code><nobr>MFENCE</nobr></code> instructions,
|
|
<code><nobr>LFENCE</nobr></code>, <code><nobr>SFENCE</nobr></code>, any
|
|
memory read and any other serialising instruction (such as
|
|
<code><nobr>CPUID</nobr></code>).
|
|
<p>Weakly ordered memory types can be used to achieve higher processor
|
|
performance through such techniques as out-of-order issue, speculative
|
|
reads, write-combining, and write-collapsing. The degree to which a
|
|
consumer of data recognizes or knows that the data is weakly ordered varies
|
|
among applications and may be unknown to the producer of this data. The
|
|
<code><nobr>MFENCE</nobr></code> instruction provides a
|
|
performance-efficient way of ensuring load and store ordering between
|
|
routines that produce weakly-ordered results and routines that consume that
|
|
data.
|
|
<p><code><nobr>MFENCE</nobr></code> uses the following ModRM encoding:
|
|
<p><pre>
|
|
Mod (7:6) = 11B
|
|
Reg/Opcode (5:3) = 110B
|
|
R/M (2:0) = 000B
|
|
</pre>
|
|
<p>All other ModRM encodings are defined to be reserved, and use of these
|
|
encodings risks incompatibility with future processors.
|
|
<p>See also <code><nobr>LFENCE</nobr></code>
|
|
(<a href="#section-B.4.137">section B.4.137</a>) and
|
|
<code><nobr>SFENCE</nobr></code> (<a href="#section-B.4.288">section
|
|
B.4.288</a>).
|
|
<h4><a name="section-B.4.152">B.4.152 <code><nobr>MINPD</nobr></code>: Return Packed Double-Precision FP Minimum</a></h4>
|
|
<p><pre>
|
|
MINPD xmm1,xmm2/m128 ; 66 0F 5D /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MINPD</nobr></code> performs a SIMD compare of the packed
|
|
double-precision FP numbers from xmm1 and xmm2/mem, and stores the minimum
|
|
values of each pair of values in xmm1. If the values being compared are
|
|
both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128)
|
|
is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a
|
|
QNaN version of the SNaN is not returned).
|
|
<h4><a name="section-B.4.153">B.4.153 <code><nobr>MINPS</nobr></code>: Return Packed Single-Precision FP Minimum</a></h4>
|
|
<p><pre>
|
|
MINPS xmm1,xmm2/m128 ; 0F 5D /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MINPS</nobr></code> performs a SIMD compare of the packed
|
|
single-precision FP numbers from xmm1 and xmm2/mem, and stores the minimum
|
|
values of each pair of values in xmm1. If the values being compared are
|
|
both zeroes, source2 (xmm2/m128) would be returned. If source2 (xmm2/m128)
|
|
is an SNaN, this SNaN is forwarded unchanged to the destination (i.e., a
|
|
QNaN version of the SNaN is not returned).
|
|
<h4><a name="section-B.4.154">B.4.154 <code><nobr>MINSD</nobr></code>: Return Scalar Double-Precision FP Minimum</a></h4>
|
|
<p><pre>
|
|
MINSD xmm1,xmm2/m64 ; F2 0F 5D /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MINSD</nobr></code> compares the low-order double-precision
|
|
FP numbers from xmm1 and xmm2/mem, and stores the minimum value in xmm1. If
|
|
the values being compared are both zeroes, source2 (xmm2/m64) would be
|
|
returned. If source2 (xmm2/m64) is an SNaN, this SNaN is forwarded
|
|
unchanged to the destination (i.e., a QNaN version of the SNaN is not
|
|
returned). The high quadword of the destination is left unchanged.
|
|
<h4><a name="section-B.4.155">B.4.155 <code><nobr>MINSS</nobr></code>: Return Scalar Single-Precision FP Minimum</a></h4>
|
|
<p><pre>
|
|
MINSS xmm1,xmm2/m32 ; F3 0F 5D /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MINSS</nobr></code> compares the low-order single-precision
|
|
FP numbers from xmm1 and xmm2/mem, and stores the minimum value in xmm1. If
|
|
the values being compared are both zeroes, source2 (xmm2/m32) would be
|
|
returned. If source2 (xmm2/m32) is an SNaN, this SNaN is forwarded
|
|
unchanged to the destination (i.e., a QNaN version of the SNaN is not
|
|
returned). The high three doublewords of the destination are left
|
|
unchanged.
|
|
<h4><a name="section-B.4.156">B.4.156 <code><nobr>MOV</nobr></code>: Move Data</a></h4>
|
|
<p><pre>
|
|
MOV r/m8,reg8 ; 88 /r [8086]
|
|
MOV r/m16,reg16 ; o16 89 /r [8086]
|
|
MOV r/m32,reg32 ; o32 89 /r [386]
|
|
MOV reg8,r/m8 ; 8A /r [8086]
|
|
MOV reg16,r/m16 ; o16 8B /r [8086]
|
|
MOV reg32,r/m32 ; o32 8B /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
MOV reg8,imm8 ; B0+r ib [8086]
|
|
MOV reg16,imm16 ; o16 B8+r iw [8086]
|
|
MOV reg32,imm32 ; o32 B8+r id [386]
|
|
MOV r/m8,imm8 ; C6 /0 ib [8086]
|
|
MOV r/m16,imm16 ; o16 C7 /0 iw [8086]
|
|
MOV r/m32,imm32 ; o32 C7 /0 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
MOV AL,memoffs8 ; A0 ow/od [8086]
|
|
MOV AX,memoffs16 ; o16 A1 ow/od [8086]
|
|
MOV EAX,memoffs32 ; o32 A1 ow/od [386]
|
|
MOV memoffs8,AL ; A2 ow/od [8086]
|
|
MOV memoffs16,AX ; o16 A3 ow/od [8086]
|
|
MOV memoffs32,EAX ; o32 A3 ow/od [386]
|
|
</pre>
|
|
<p><pre>
|
|
MOV r/m16,segreg ; o16 8C /r [8086]
|
|
MOV r/m32,segreg ; o32 8C /r [386]
|
|
MOV segreg,r/m16 ; o16 8E /r [8086]
|
|
MOV segreg,r/m32 ; o32 8E /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
MOV reg32,CR0/2/3/4 ; 0F 20 /r [386]
|
|
MOV reg32,DR0/1/2/3/6/7 ; 0F 21 /r [386]
|
|
MOV reg32,TR3/4/5/6/7 ; 0F 24 /r [386]
|
|
MOV CR0/2/3/4,reg32 ; 0F 22 /r [386]
|
|
MOV DR0/1/2/3/6/7,reg32 ; 0F 23 /r [386]
|
|
MOV TR3/4/5/6/7,reg32 ; 0F 26 /r [386]
|
|
</pre>
|
|
<p><code><nobr>MOV</nobr></code> copies the contents of its source (second)
|
|
operand into its destination (first) operand.
|
|
<p>In all forms of the <code><nobr>MOV</nobr></code> instruction, the two
|
|
operands are the same size, except for moving between a segment register
|
|
and an <code><nobr>r/m32</nobr></code> operand. These instructions are
|
|
treated exactly like the corresponding 16-bit equivalent (so that, for
|
|
example, <code><nobr>MOV DS,EAX</nobr></code> functions identically to
|
|
<code><nobr>MOV DS,AX</nobr></code> but saves a prefix when in 32-bit
|
|
mode), except that when a segment register is moved into a 32-bit
|
|
destination, the top two bytes of the result are undefined.
|
|
<p><code><nobr>MOV</nobr></code> may not use <code><nobr>CS</nobr></code>
|
|
as a destination.
|
|
<p><code><nobr>CR4</nobr></code> is only a supported register on the
|
|
Pentium and above.
|
|
<p>Test registers are supported on 386/486 processors and on some non-Intel
|
|
Pentium class processors.
|
|
<h4><a name="section-B.4.157">B.4.157 <code><nobr>MOVAPD</nobr></code>: Move Aligned Packed Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
MOVAPD xmm1,xmm2/mem128 ; 66 0F 28 /r [WILLAMETTE,SSE2]
|
|
MOVAPD xmm1/mem128,xmm2 ; 66 0F 29 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVAPD</nobr></code> moves a double quadword containing 2
|
|
packed double-precision FP values from the source operand to the
|
|
destination. When the source or destination operand is a memory location,
|
|
it must be aligned on a 16-byte boundary.
|
|
<p>To move data in and out of memory locations that are not known to be on
|
|
16-byte boundaries, use the <code><nobr>MOVUPD</nobr></code> instruction
|
|
(<a href="#section-B.4.182">section B.4.182</a>).
|
|
<h4><a name="section-B.4.158">B.4.158 <code><nobr>MOVAPS</nobr></code>: Move Aligned Packed Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
MOVAPS xmm1,xmm2/mem128 ; 0F 28 /r [KATMAI,SSE]
|
|
MOVAPS xmm1/mem128,xmm2 ; 0F 29 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVAPS</nobr></code> moves a double quadword containing 4
|
|
packed single-precision FP values from the source operand to the
|
|
destination. When the source or destination operand is a memory location,
|
|
it must be aligned on a 16-byte boundary.
|
|
<p>To move data in and out of memory locations that are not known to be on
|
|
16-byte boundaries, use the <code><nobr>MOVUPS</nobr></code> instruction
|
|
(<a href="#section-B.4.183">section B.4.183</a>).
|
|
<h4><a name="section-B.4.159">B.4.159 <code><nobr>MOVD</nobr></code>: Move Doubleword to/from MMX Register</a></h4>
|
|
<p><pre>
|
|
MOVD mm,r/m32 ; 0F 6E /r [PENT,MMX]
|
|
MOVD r/m32,mm ; 0F 7E /r [PENT,MMX]
|
|
MOVD xmm,r/m32 ; 66 0F 6E /r [WILLAMETTE,SSE2]
|
|
MOVD r/m32,xmm ; 66 0F 7E /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVD</nobr></code> copies 32 bits from its source (second)
|
|
operand into its destination (first) operand. When the destination is a
|
|
64-bit <code><nobr>MMX</nobr></code> register or a 128-bit
|
|
<code><nobr>XMM</nobr></code> register, the input value is zero-extended to
|
|
fill the destination register.
|
|
<h4><a name="section-B.4.160">B.4.160 <code><nobr>MOVDQ2Q</nobr></code>: Move Quadword from XMM to MMX register.</a></h4>
|
|
<p><pre>
|
|
MOVDQ2Q mm,xmm ; F2 OF D6 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVDQ2Q</nobr></code> moves the low quadword from the source
|
|
operand to the destination operand.
|
|
<h4><a name="section-B.4.161">B.4.161 <code><nobr>MOVDQA</nobr></code>: Move Aligned Double Quadword</a></h4>
|
|
<p><pre>
|
|
MOVDQA xmm1,xmm2/m128 ; 66 OF 6F /r [WILLAMETTE,SSE2]
|
|
MOVDQA xmm1/m128,xmm2 ; 66 OF 7F /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVDQA</nobr></code> moves a double quadword from the source
|
|
operand to the destination operand. When the source or destination operand
|
|
is a memory location, it must be aligned to a 16-byte boundary.
|
|
<p>To move a double quadword to or from unaligned memory locations, use the
|
|
<code><nobr>MOVDQU</nobr></code> instruction
|
|
(<a href="#section-B.4.162">section B.4.162</a>).
|
|
<h4><a name="section-B.4.162">B.4.162 <code><nobr>MOVDQU</nobr></code>: Move Unaligned Double Quadword</a></h4>
|
|
<p><pre>
|
|
MOVDQU xmm1,xmm2/m128 ; F3 OF 6F /r [WILLAMETTE,SSE2]
|
|
MOVDQU xmm1/m128,xmm2 ; F3 OF 7F /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVDQU</nobr></code> moves a double quadword from the source
|
|
operand to the destination operand. When the source or destination operand
|
|
is a memory location, the memory may be unaligned.
|
|
<p>To move a double quadword to or from known aligned memory locations, use
|
|
the <code><nobr>MOVDQA</nobr></code> instruction
|
|
(<a href="#section-B.4.161">section B.4.161</a>).
|
|
<h4><a name="section-B.4.163">B.4.163 <code><nobr>MOVHLPS</nobr></code>: Move Packed Single-Precision FP High to Low</a></h4>
|
|
<p><pre>
|
|
MOVHLPS xmm1,xmm2 ; OF 12 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVHLPS</nobr></code> moves the two packed single-precision
|
|
FP values from the high quadword of the source register xmm2 to the low
|
|
quadword of the destination register, xmm2. The upper quadword of xmm1 is
|
|
left unchanged.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[0-63] := src[64-127],
|
|
dst[64-127] remains unchanged.
|
|
</pre>
|
|
<h4><a name="section-B.4.164">B.4.164 <code><nobr>MOVHPD</nobr></code>: Move High Packed Double-Precision FP</a></h4>
|
|
<p><pre>
|
|
MOVHPD xmm,m64 ; 66 OF 16 /r [WILLAMETTE,SSE2]
|
|
MOVHPD m64,xmm ; 66 OF 17 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVHPD</nobr></code> moves a double-precision FP value
|
|
between the source and destination operands. One of the operands is a
|
|
64-bit memory location, the other is the high quadword of an
|
|
<code><nobr>XMM</nobr></code> register.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
mem[0-63] := xmm[64-127];
|
|
</pre>
|
|
<p>or
|
|
<p><pre>
|
|
xmm[0-63] remains unchanged;
|
|
xmm[64-127] := mem[0-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.165">B.4.165 <code><nobr>MOVHPS</nobr></code>: Move High Packed Single-Precision FP</a></h4>
|
|
<p><pre>
|
|
MOVHPS xmm,m64 ; 0F 16 /r [KATMAI,SSE]
|
|
MOVHPS m64,xmm ; 0F 17 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVHPS</nobr></code> moves two packed single-precision FP
|
|
values between the source and destination operands. One of the operands is
|
|
a 64-bit memory location, the other is the high quadword of an
|
|
<code><nobr>XMM</nobr></code> register.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
mem[0-63] := xmm[64-127];
|
|
</pre>
|
|
<p>or
|
|
<p><pre>
|
|
xmm[0-63] remains unchanged;
|
|
xmm[64-127] := mem[0-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.166">B.4.166 <code><nobr>MOVLHPS</nobr></code>: Move Packed Single-Precision FP Low to High</a></h4>
|
|
<p><pre>
|
|
MOVLHPS xmm1,xmm2 ; OF 16 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVLHPS</nobr></code> moves the two packed single-precision
|
|
FP values from the low quadword of the source register xmm2 to the high
|
|
quadword of the destination register, xmm2. The low quadword of xmm1 is
|
|
left unchanged.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[0-63] remains unchanged;
|
|
dst[64-127] := src[0-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.167">B.4.167 <code><nobr>MOVLPD</nobr></code>: Move Low Packed Double-Precision FP</a></h4>
|
|
<p><pre>
|
|
MOVLPD xmm,m64 ; 66 OF 12 /r [WILLAMETTE,SSE2]
|
|
MOVLPD m64,xmm ; 66 OF 13 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVLPD</nobr></code> moves a double-precision FP value
|
|
between the source and destination operands. One of the operands is a
|
|
64-bit memory location, the other is the low quadword of an
|
|
<code><nobr>XMM</nobr></code> register.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
mem(0-63) := xmm(0-63);
|
|
</pre>
|
|
<p>or
|
|
<p><pre>
|
|
xmm(0-63) := mem(0-63);
|
|
xmm(64-127) remains unchanged.
|
|
</pre>
|
|
<h4><a name="section-B.4.168">B.4.168 <code><nobr>MOVLPS</nobr></code>: Move Low Packed Single-Precision FP</a></h4>
|
|
<p><pre>
|
|
MOVLPS xmm,m64 ; OF 12 /r [KATMAI,SSE]
|
|
MOVLPS m64,xmm ; OF 13 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVLPS</nobr></code> moves two packed single-precision FP
|
|
values between the source and destination operands. One of the operands is
|
|
a 64-bit memory location, the other is the low quadword of an
|
|
<code><nobr>XMM</nobr></code> register.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
mem(0-63) := xmm(0-63);
|
|
</pre>
|
|
<p>or
|
|
<p><pre>
|
|
xmm(0-63) := mem(0-63);
|
|
xmm(64-127) remains unchanged.
|
|
</pre>
|
|
<h4><a name="section-B.4.169">B.4.169 <code><nobr>MOVMSKPD</nobr></code>: Extract Packed Double-Precision FP Sign Mask</a></h4>
|
|
<p><pre>
|
|
MOVMSKPD reg32,xmm ; 66 0F 50 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVMSKPD</nobr></code> inserts a 2-bit mask in r32, formed
|
|
of the most significant bits of each double-precision FP number of the
|
|
source operand.
|
|
<h4><a name="section-B.4.170">B.4.170 <code><nobr>MOVMSKPS</nobr></code>: Extract Packed Single-Precision FP Sign Mask</a></h4>
|
|
<p><pre>
|
|
MOVMSKPS reg32,xmm ; 0F 50 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVMSKPS</nobr></code> inserts a 4-bit mask in r32, formed
|
|
of the most significant bits of each single-precision FP number of the
|
|
source operand.
|
|
<h4><a name="section-B.4.171">B.4.171 <code><nobr>MOVNTDQ</nobr></code>: Move Double Quadword Non Temporal</a></h4>
|
|
<p><pre>
|
|
MOVNTDQ m128,xmm ; 66 0F E7 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVNTDQ</nobr></code> moves the double quadword from the
|
|
<code><nobr>XMM</nobr></code> source register to the destination memory
|
|
location, using a non-temporal hint. This store instruction minimizes cache
|
|
pollution.
|
|
<h4><a name="section-B.4.172">B.4.172 <code><nobr>MOVNTI</nobr></code>: Move Doubleword Non Temporal</a></h4>
|
|
<p><pre>
|
|
MOVNTI m32,reg32 ; 0F C3 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVNTI</nobr></code> moves the doubleword in the source
|
|
register to the destination memory location, using a non-temporal hint.
|
|
This store instruction minimizes cache pollution.
|
|
<h4><a name="section-B.4.173">B.4.173 <code><nobr>MOVNTPD</nobr></code>: Move Aligned Four Packed Single-Precision FP Values Non Temporal</a></h4>
|
|
<p><pre>
|
|
MOVNTPD m128,xmm ; 66 0F 2B /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVNTPD</nobr></code> moves the double quadword from the
|
|
<code><nobr>XMM</nobr></code> source register to the destination memory
|
|
location, using a non-temporal hint. This store instruction minimizes cache
|
|
pollution. The memory location must be aligned to a 16-byte boundary.
|
|
<h4><a name="section-B.4.174">B.4.174 <code><nobr>MOVNTPS</nobr></code>: Move Aligned Four Packed Single-Precision FP Values Non Temporal</a></h4>
|
|
<p><pre>
|
|
MOVNTPS m128,xmm ; 0F 2B /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVNTPS</nobr></code> moves the double quadword from the
|
|
<code><nobr>XMM</nobr></code> source register to the destination memory
|
|
location, using a non-temporal hint. This store instruction minimizes cache
|
|
pollution. The memory location must be aligned to a 16-byte boundary.
|
|
<h4><a name="section-B.4.175">B.4.175 <code><nobr>MOVNTQ</nobr></code>: Move Quadword Non Temporal</a></h4>
|
|
<p><pre>
|
|
MOVNTQ m64,mm ; 0F E7 /r [KATMAI,MMX]
|
|
</pre>
|
|
<p><code><nobr>MOVNTQ</nobr></code> moves the quadword in the
|
|
<code><nobr>MMX</nobr></code> source register to the destination memory
|
|
location, using a non-temporal hint. This store instruction minimizes cache
|
|
pollution.
|
|
<h4><a name="section-B.4.176">B.4.176 <code><nobr>MOVQ</nobr></code>: Move Quadword to/from MMX Register</a></h4>
|
|
<p><pre>
|
|
MOVQ mm1,mm2/m64 ; 0F 6F /r [PENT,MMX]
|
|
MOVQ mm1/m64,mm2 ; 0F 7F /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
MOVQ xmm1,xmm2/m64 ; F3 0F 7E /r [WILLAMETTE,SSE2]
|
|
MOVQ xmm1/m64,xmm2 ; 66 0F D6 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVQ</nobr></code> copies 64 bits from its source (second)
|
|
operand into its destination (first) operand. When the source is an
|
|
<code><nobr>XMM</nobr></code> register, the low quadword is moved. When the
|
|
destination is an <code><nobr>XMM</nobr></code> register, the destination
|
|
is the low quadword, and the high quadword is cleared.
|
|
<h4><a name="section-B.4.177">B.4.177 <code><nobr>MOVQ2DQ</nobr></code>: Move Quadword from MMX to XMM register.</a></h4>
|
|
<p><pre>
|
|
MOVQ2DQ xmm,mm ; F3 OF D6 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVQ2DQ</nobr></code> moves the quadword from the source
|
|
operand to the low quadword of the destination operand, and clears the high
|
|
quadword.
|
|
<h4><a name="section-B.4.178">B.4.178 <code><nobr>MOVSB</nobr></code>, <code><nobr>MOVSW</nobr></code>, <code><nobr>MOVSD</nobr></code>: Move String</a></h4>
|
|
<p><pre>
|
|
MOVSB ; A4 [8086]
|
|
MOVSW ; o16 A5 [8086]
|
|
MOVSD ; o32 A5 [386]
|
|
</pre>
|
|
<p><code><nobr>MOVSB</nobr></code> copies the byte at
|
|
<code><nobr>[DS:SI]</nobr></code> or <code><nobr>[DS:ESI]</nobr></code> to
|
|
<code><nobr>[ES:DI]</nobr></code> or <code><nobr>[ES:EDI]</nobr></code>. It
|
|
then increments or decrements (depending on the direction flag: increments
|
|
if the flag is clear, decrements if it is set) <code><nobr>SI</nobr></code>
|
|
and <code><nobr>DI</nobr></code> (or <code><nobr>ESI</nobr></code> and
|
|
<code><nobr>EDI</nobr></code>).
|
|
<p>The registers used are <code><nobr>SI</nobr></code> and
|
|
<code><nobr>DI</nobr></code> if the address size is 16 bits, and
|
|
<code><nobr>ESI</nobr></code> and <code><nobr>EDI</nobr></code> if it is 32
|
|
bits. If you need to use an address size not equal to the current
|
|
<code><nobr>BITS</nobr></code> setting, you can use an explicit
|
|
<code><nobr>a16</nobr></code> or <code><nobr>a32</nobr></code> prefix.
|
|
<p>The segment register used to load from <code><nobr>[SI]</nobr></code> or
|
|
<code><nobr>[ESI]</nobr></code> can be overridden by using a segment
|
|
register name as a prefix (for example,
|
|
<code><nobr>es movsb</nobr></code>). The use of
|
|
<code><nobr>ES</nobr></code> for the store to
|
|
<code><nobr>[DI]</nobr></code> or <code><nobr>[EDI]</nobr></code> cannot be
|
|
overridden.
|
|
<p><code><nobr>MOVSW</nobr></code> and <code><nobr>MOVSD</nobr></code> work
|
|
in the same way, but they copy a word or a doubleword instead of a byte,
|
|
and increment or decrement the addressing registers by 2 or 4 instead of 1.
|
|
<p>The <code><nobr>REP</nobr></code> prefix may be used to repeat the
|
|
instruction <code><nobr>CX</nobr></code> (or <code><nobr>ECX</nobr></code>
|
|
- again, the address size chooses which) times.
|
|
<h4><a name="section-B.4.179">B.4.179 <code><nobr>MOVSD</nobr></code>: Move Scalar Double-Precision FP Value</a></h4>
|
|
<p><pre>
|
|
MOVSD xmm1,xmm2/m64 ; F2 0F 10 /r [WILLAMETTE,SSE2]
|
|
MOVSD xmm1/m64,xmm2 ; F2 0F 11 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVSD</nobr></code> moves a double-precision FP value from
|
|
the source operand to the destination operand. When the source or
|
|
destination is a register, the low-order FP value is read or written.
|
|
<h4><a name="section-B.4.180">B.4.180 <code><nobr>MOVSS</nobr></code>: Move Scalar Single-Precision FP Value</a></h4>
|
|
<p><pre>
|
|
MOVSS xmm1,xmm2/m32 ; F3 0F 10 /r [KATMAI,SSE]
|
|
MOVSS xmm1/m32,xmm2 ; F3 0F 11 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVSS</nobr></code> moves a single-precision FP value from
|
|
the source operand to the destination operand. When the source or
|
|
destination is a register, the low-order FP value is read or written.
|
|
<h4><a name="section-B.4.181">B.4.181 <code><nobr>MOVSX</nobr></code>, <code><nobr>MOVZX</nobr></code>: Move Data with Sign or Zero Extend</a></h4>
|
|
<p><pre>
|
|
MOVSX reg16,r/m8 ; o16 0F BE /r [386]
|
|
MOVSX reg32,r/m8 ; o32 0F BE /r [386]
|
|
MOVSX reg32,r/m16 ; o32 0F BF /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
MOVZX reg16,r/m8 ; o16 0F B6 /r [386]
|
|
MOVZX reg32,r/m8 ; o32 0F B6 /r [386]
|
|
MOVZX reg32,r/m16 ; o32 0F B7 /r [386]
|
|
</pre>
|
|
<p><code><nobr>MOVSX</nobr></code> sign-extends its source (second) operand
|
|
to the length of its destination (first) operand, and copies the result
|
|
into the destination operand. <code><nobr>MOVZX</nobr></code> does the
|
|
same, but zero-extends rather than sign-extending.
|
|
<h4><a name="section-B.4.182">B.4.182 <code><nobr>MOVUPD</nobr></code>: Move Unaligned Packed Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
MOVUPD xmm1,xmm2/mem128 ; 66 0F 10 /r [WILLAMETTE,SSE2]
|
|
MOVUPD xmm1/mem128,xmm2 ; 66 0F 11 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MOVUPD</nobr></code> moves a double quadword containing 2
|
|
packed double-precision FP values from the source operand to the
|
|
destination. This instruction makes no assumptions about alignment of
|
|
memory operands.
|
|
<p>To move data in and out of memory locations that are known to be on
|
|
16-byte boundaries, use the <code><nobr>MOVAPD</nobr></code> instruction
|
|
(<a href="#section-B.4.157">section B.4.157</a>).
|
|
<h4><a name="section-B.4.183">B.4.183 <code><nobr>MOVUPS</nobr></code>: Move Unaligned Packed Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
MOVUPS xmm1,xmm2/mem128 ; 0F 10 /r [KATMAI,SSE]
|
|
MOVUPS xmm1/mem128,xmm2 ; 0F 11 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MOVUPS</nobr></code> moves a double quadword containing 4
|
|
packed single-precision FP values from the source operand to the
|
|
destination. This instruction makes no assumptions about alignment of
|
|
memory operands.
|
|
<p>To move data in and out of memory locations that are known to be on
|
|
16-byte boundaries, use the <code><nobr>MOVAPS</nobr></code> instruction
|
|
(<a href="#section-B.4.158">section B.4.158</a>).
|
|
<h4><a name="section-B.4.184">B.4.184 <code><nobr>MUL</nobr></code>: Unsigned Integer Multiply</a></h4>
|
|
<p><pre>
|
|
MUL r/m8 ; F6 /4 [8086]
|
|
MUL r/m16 ; o16 F7 /4 [8086]
|
|
MUL r/m32 ; o32 F7 /4 [386]
|
|
</pre>
|
|
<p><code><nobr>MUL</nobr></code> performs unsigned integer multiplication.
|
|
The other operand to the multiplication, and the destination operand, are
|
|
implicit, in the following way:
|
|
<ul>
|
|
<li>For <code><nobr>MUL r/m8</nobr></code>, <code><nobr>AL</nobr></code> is
|
|
multiplied by the given operand; the product is stored in
|
|
<code><nobr>AX</nobr></code>.
|
|
<li>For <code><nobr>MUL r/m16</nobr></code>, <code><nobr>AX</nobr></code>
|
|
is multiplied by the given operand; the product is stored in
|
|
<code><nobr>DX:AX</nobr></code>.
|
|
<li>For <code><nobr>MUL r/m32</nobr></code>, <code><nobr>EAX</nobr></code>
|
|
is multiplied by the given operand; the product is stored in
|
|
<code><nobr>EDX:EAX</nobr></code>.
|
|
</ul>
|
|
<p>Signed integer multiplication is performed by the
|
|
<code><nobr>IMUL</nobr></code> instruction: see
|
|
<a href="#section-B.4.118">section B.4.118</a>.
|
|
<h4><a name="section-B.4.185">B.4.185 <code><nobr>MULPD</nobr></code>: Packed Single-FP Multiply</a></h4>
|
|
<p><pre>
|
|
MULPD xmm1,xmm2/mem128 ; 66 0F 59 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MULPD</nobr></code> performs a SIMD multiply of the packed
|
|
double-precision FP values in both operands, and stores the results in the
|
|
destination register.
|
|
<h4><a name="section-B.4.186">B.4.186 <code><nobr>MULPS</nobr></code>: Packed Single-FP Multiply</a></h4>
|
|
<p><pre>
|
|
MULPS xmm1,xmm2/mem128 ; 0F 59 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MULPS</nobr></code> performs a SIMD multiply of the packed
|
|
single-precision FP values in both operands, and stores the results in the
|
|
destination register.
|
|
<h4><a name="section-B.4.187">B.4.187 <code><nobr>MULSD</nobr></code>: Scalar Single-FP Multiply</a></h4>
|
|
<p><pre>
|
|
MULSD xmm1,xmm2/mem32 ; F2 0F 59 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>MULSD</nobr></code> multiplies the lowest double-precision
|
|
FP values of both operands, and stores the result in the low quadword of
|
|
xmm1.
|
|
<h4><a name="section-B.4.188">B.4.188 <code><nobr>MULSS</nobr></code>: Scalar Single-FP Multiply</a></h4>
|
|
<p><pre>
|
|
MULSS xmm1,xmm2/mem32 ; F3 0F 59 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>MULSS</nobr></code> multiplies the lowest single-precision
|
|
FP values of both operands, and stores the result in the low doubleword of
|
|
xmm1.
|
|
<h4><a name="section-B.4.189">B.4.189 <code><nobr>NEG</nobr></code>, <code><nobr>NOT</nobr></code>: Two's and One's Complement</a></h4>
|
|
<p><pre>
|
|
NEG r/m8 ; F6 /3 [8086]
|
|
NEG r/m16 ; o16 F7 /3 [8086]
|
|
NEG r/m32 ; o32 F7 /3 [386]
|
|
</pre>
|
|
<p><pre>
|
|
NOT r/m8 ; F6 /2 [8086]
|
|
NOT r/m16 ; o16 F7 /2 [8086]
|
|
NOT r/m32 ; o32 F7 /2 [386]
|
|
</pre>
|
|
<p><code><nobr>NEG</nobr></code> replaces the contents of its operand by
|
|
the two's complement negation (invert all the bits and then add one) of the
|
|
original value. <code><nobr>NOT</nobr></code>, similarly, performs one's
|
|
complement (inverts all the bits).
|
|
<h4><a name="section-B.4.190">B.4.190 <code><nobr>NOP</nobr></code>: No Operation</a></h4>
|
|
<p><pre>
|
|
NOP ; 90 [8086]
|
|
</pre>
|
|
<p><code><nobr>NOP</nobr></code> performs no operation. Its opcode is the
|
|
same as that generated by <code><nobr>XCHG AX,AX</nobr></code> or
|
|
<code><nobr>XCHG EAX,EAX</nobr></code> (depending on the processor mode;
|
|
see <a href="#section-B.4.333">section B.4.333</a>).
|
|
<h4><a name="section-B.4.191">B.4.191 <code><nobr>OR</nobr></code>: Bitwise OR</a></h4>
|
|
<p><pre>
|
|
OR r/m8,reg8 ; 08 /r [8086]
|
|
OR r/m16,reg16 ; o16 09 /r [8086]
|
|
OR r/m32,reg32 ; o32 09 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
OR reg8,r/m8 ; 0A /r [8086]
|
|
OR reg16,r/m16 ; o16 0B /r [8086]
|
|
OR reg32,r/m32 ; o32 0B /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
OR r/m8,imm8 ; 80 /1 ib [8086]
|
|
OR r/m16,imm16 ; o16 81 /1 iw [8086]
|
|
OR r/m32,imm32 ; o32 81 /1 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
OR r/m16,imm8 ; o16 83 /1 ib [8086]
|
|
OR r/m32,imm8 ; o32 83 /1 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
OR AL,imm8 ; 0C ib [8086]
|
|
OR AX,imm16 ; o16 0D iw [8086]
|
|
OR EAX,imm32 ; o32 0D id [386]
|
|
</pre>
|
|
<p><code><nobr>OR</nobr></code> performs a bitwise OR operation between its
|
|
two operands (i.e. each bit of the result is 1 if and only if at least one
|
|
of the corresponding bits of the two inputs was 1), and stores the result
|
|
in the destination (first) operand.
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<p>The MMX instruction <code><nobr>POR</nobr></code> (see
|
|
<a href="#section-B.4.247">section B.4.247</a>) performs the same operation
|
|
on the 64-bit MMX registers.
|
|
<h4><a name="section-B.4.192">B.4.192 <code><nobr>ORPD</nobr></code>: Bit-wise Logical OR of Double-Precision FP Data</a></h4>
|
|
<p><pre>
|
|
ORPD xmm1,xmm2/m128 ; 66 0F 56 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>ORPD</nobr></code> return a bit-wise logical OR between xmm1
|
|
and xmm2/mem, and stores the result in xmm1. If the source operand is a
|
|
memory location, it must be aligned to a 16-byte boundary.
|
|
<h4><a name="section-B.4.193">B.4.193 <code><nobr>ORPS</nobr></code>: Bit-wise Logical OR of Single-Precision FP Data</a></h4>
|
|
<p><pre>
|
|
ORPS xmm1,xmm2/m128 ; 0F 56 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>ORPS</nobr></code> return a bit-wise logical OR between xmm1
|
|
and xmm2/mem, and stores the result in xmm1. If the source operand is a
|
|
memory location, it must be aligned to a 16-byte boundary.
|
|
<h4><a name="section-B.4.194">B.4.194 <code><nobr>OUT</nobr></code>: Output Data to I/O Port</a></h4>
|
|
<p><pre>
|
|
OUT imm8,AL ; E6 ib [8086]
|
|
OUT imm8,AX ; o16 E7 ib [8086]
|
|
OUT imm8,EAX ; o32 E7 ib [386]
|
|
OUT DX,AL ; EE [8086]
|
|
OUT DX,AX ; o16 EF [8086]
|
|
OUT DX,EAX ; o32 EF [386]
|
|
</pre>
|
|
<p><code><nobr>OUT</nobr></code> writes the contents of the given source
|
|
register to the specified I/O port. The port number may be specified as an
|
|
immediate value if it is between 0 and 255, and otherwise must be stored in
|
|
<code><nobr>DX</nobr></code>. See also <code><nobr>IN</nobr></code>
|
|
(<a href="#section-B.4.119">section B.4.119</a>).
|
|
<h4><a name="section-B.4.195">B.4.195 <code><nobr>OUTSB</nobr></code>, <code><nobr>OUTSW</nobr></code>, <code><nobr>OUTSD</nobr></code>: Output String to I/O Port</a></h4>
|
|
<p><pre>
|
|
OUTSB ; 6E [186]
|
|
OUTSW ; o16 6F [186]
|
|
OUTSD ; o32 6F [386]
|
|
</pre>
|
|
<p><code><nobr>OUTSB</nobr></code> loads a byte from
|
|
<code><nobr>[DS:SI]</nobr></code> or <code><nobr>[DS:ESI]</nobr></code> and
|
|
writes it to the I/O port specified in <code><nobr>DX</nobr></code>. It
|
|
then increments or decrements (depending on the direction flag: increments
|
|
if the flag is clear, decrements if it is set) <code><nobr>SI</nobr></code>
|
|
or <code><nobr>ESI</nobr></code>.
|
|
<p>The register used is <code><nobr>SI</nobr></code> if the address size is
|
|
16 bits, and <code><nobr>ESI</nobr></code> if it is 32 bits. If you need to
|
|
use an address size not equal to the current <code><nobr>BITS</nobr></code>
|
|
setting, you can use an explicit <code><nobr>a16</nobr></code> or
|
|
<code><nobr>a32</nobr></code> prefix.
|
|
<p>The segment register used to load from <code><nobr>[SI]</nobr></code> or
|
|
<code><nobr>[ESI]</nobr></code> can be overridden by using a segment
|
|
register name as a prefix (for example,
|
|
<code><nobr>es outsb</nobr></code>).
|
|
<p><code><nobr>OUTSW</nobr></code> and <code><nobr>OUTSD</nobr></code> work
|
|
in the same way, but they output a word or a doubleword instead of a byte,
|
|
and increment or decrement the addressing registers by 2 or 4 instead of 1.
|
|
<p>The <code><nobr>REP</nobr></code> prefix may be used to repeat the
|
|
instruction <code><nobr>CX</nobr></code> (or <code><nobr>ECX</nobr></code>
|
|
- again, the address size chooses which) times.
|
|
<h4><a name="section-B.4.196">B.4.196 <code><nobr>PACKSSDW</nobr></code>, <code><nobr>PACKSSWB</nobr></code>, <code><nobr>PACKUSWB</nobr></code>: Pack Data</a></h4>
|
|
<p><pre>
|
|
PACKSSDW mm1,mm2/m64 ; 0F 6B /r [PENT,MMX]
|
|
PACKSSWB mm1,mm2/m64 ; 0F 63 /r [PENT,MMX]
|
|
PACKUSWB mm1,mm2/m64 ; 0F 67 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PACKSSDW xmm1,xmm2/m128 ; 66 0F 6B /r [WILLAMETTE,SSE2]
|
|
PACKSSWB xmm1,xmm2/m128 ; 66 0F 63 /r [WILLAMETTE,SSE2]
|
|
PACKUSWB xmm1,xmm2/m128 ; 66 0F 67 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p>All these instructions start by combining the source and destination
|
|
operands, and then splitting the result in smaller sections which it then
|
|
packs into the destination register. The <code><nobr>MMX</nobr></code>
|
|
versions pack two 64-bit operands into one 64-bit register, while the
|
|
<code><nobr>SSE</nobr></code> versions pack two 128-bit operands into one
|
|
128-bit register.
|
|
<ul>
|
|
<li><code><nobr>PACKSSWB</nobr></code> splits the combined value into
|
|
words, and then reduces the words to bytes, using signed saturation. It
|
|
then packs the bytes into the destination register in the same order the
|
|
words were in.
|
|
<li><code><nobr>PACKSSDW</nobr></code> performs the same operation as
|
|
<code><nobr>PACKSSWB</nobr></code>, except that it reduces doublewords to
|
|
words, then packs them into the destination register.
|
|
<li><code><nobr>PACKUSWB</nobr></code> performs the same operation as
|
|
<code><nobr>PACKSSWB</nobr></code>, except that it uses unsigned saturation
|
|
when reducing the size of the elements.
|
|
</ul>
|
|
<p>To perform signed saturation on a number, it is replaced by the largest
|
|
signed number (<code><nobr>7FFFh</nobr></code> or
|
|
<code><nobr>7Fh</nobr></code>) that <em>will</em> fit, and if it is too
|
|
small it is replaced by the smallest signed number
|
|
(<code><nobr>8000h</nobr></code> or <code><nobr>80h</nobr></code>) that
|
|
will fit. To perform unsigned saturation, the input is treated as unsigned,
|
|
and the input is replaced by the largest unsigned number that will fit.
|
|
<h4><a name="section-B.4.197">B.4.197 <code><nobr>PADDB</nobr></code>, <code><nobr>PADDW</nobr></code>, <code><nobr>PADDD</nobr></code>: Add Packed Integers</a></h4>
|
|
<p><pre>
|
|
PADDB mm1,mm2/m64 ; 0F FC /r [PENT,MMX]
|
|
PADDW mm1,mm2/m64 ; 0F FD /r [PENT,MMX]
|
|
PADDD mm1,mm2/m64 ; 0F FE /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PADDB xmm1,xmm2/m128 ; 66 0F FC /r [WILLAMETTE,SSE2]
|
|
PADDW xmm1,xmm2/m128 ; 66 0F FD /r [WILLAMETTE,SSE2]
|
|
PADDD xmm1,xmm2/m128 ; 66 0F FE /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PADDx</nobr></code> performs packed addition of the two
|
|
operands, storing the result in the destination (first) operand.
|
|
<ul>
|
|
<li><code><nobr>PADDB</nobr></code> treats the operands as packed bytes,
|
|
and adds each byte individually;
|
|
<li><code><nobr>PADDW</nobr></code> treats the operands as packed words;
|
|
<li><code><nobr>PADDD</nobr></code> treats its operands as packed
|
|
doublewords.
|
|
</ul>
|
|
<p>When an individual result is too large to fit in its destination, it is
|
|
wrapped around and the low bits are stored, with the carry bit discarded.
|
|
<h4><a name="section-B.4.198">B.4.198 <code><nobr>PADDQ</nobr></code>: Add Packed Quadword Integers</a></h4>
|
|
<p><pre>
|
|
PADDQ mm1,mm2/m64 ; 0F D4 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PADDQ xmm1,xmm2/m128 ; 66 0F D4 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PADDQ</nobr></code> adds the quadwords in the source and
|
|
destination operands, and stores the result in the destination register.
|
|
<p>When an individual result is too large to fit in its destination, it is
|
|
wrapped around and the low bits are stored, with the carry bit discarded.
|
|
<h4><a name="section-B.4.199">B.4.199 <code><nobr>PADDSB</nobr></code>, <code><nobr>PADDSW</nobr></code>: Add Packed Signed Integers With Saturation</a></h4>
|
|
<p><pre>
|
|
PADDSB mm1,mm2/m64 ; 0F EC /r [PENT,MMX]
|
|
PADDSW mm1,mm2/m64 ; 0F ED /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PADDSB xmm1,xmm2/m128 ; 66 0F EC /r [WILLAMETTE,SSE2]
|
|
PADDSW xmm1,xmm2/m128 ; 66 0F ED /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PADDSx</nobr></code> performs packed addition of the two
|
|
operands, storing the result in the destination (first) operand.
|
|
<code><nobr>PADDSB</nobr></code> treats the operands as packed bytes, and
|
|
adds each byte individually; and <code><nobr>PADDSW</nobr></code> treats
|
|
the operands as packed words.
|
|
<p>When an individual result is too large to fit in its destination, a
|
|
saturated value is stored. The resulting value is the value with the
|
|
largest magnitude of the same sign as the result which will fit in the
|
|
available space.
|
|
<h4><a name="section-B.4.200">B.4.200 <code><nobr>PADDSIW</nobr></code>: MMX Packed Addition to Implicit Destination</a></h4>
|
|
<p><pre>
|
|
PADDSIW mmxreg,r/m64 ; 0F 51 /r [CYRIX,MMX]
|
|
</pre>
|
|
<p><code><nobr>PADDSIW</nobr></code>, specific to the Cyrix extensions to
|
|
the MMX instruction set, performs the same function as
|
|
<code><nobr>PADDSW</nobr></code>, except that the result is placed in an
|
|
implied register.
|
|
<p>To work out the implied register, invert the lowest bit in the register
|
|
number. So <code><nobr>PADDSIW MM0,MM2</nobr></code> would put the result
|
|
in <code><nobr>MM1</nobr></code>, but
|
|
<code><nobr>PADDSIW MM1,MM2</nobr></code> would put the result in
|
|
<code><nobr>MM0</nobr></code>.
|
|
<h4><a name="section-B.4.201">B.4.201 <code><nobr>PADDUSB</nobr></code>, <code><nobr>PADDUSW</nobr></code>: Add Packed Unsigned Integers With Saturation</a></h4>
|
|
<p><pre>
|
|
PADDUSB mm1,mm2/m64 ; 0F DC /r [PENT,MMX]
|
|
PADDUSW mm1,mm2/m64 ; 0F DD /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PADDUSB xmm1,xmm2/m128 ; 66 0F DC /r [WILLAMETTE,SSE2]
|
|
PADDUSW xmm1,xmm2/m128 ; 66 0F DD /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PADDUSx</nobr></code> performs packed addition of the two
|
|
operands, storing the result in the destination (first) operand.
|
|
<code><nobr>PADDUSB</nobr></code> treats the operands as packed bytes, and
|
|
adds each byte individually; and <code><nobr>PADDUSW</nobr></code> treats
|
|
the operands as packed words.
|
|
<p>When an individual result is too large to fit in its destination, a
|
|
saturated value is stored. The resulting value is the maximum value that
|
|
will fit in the available space.
|
|
<h4><a name="section-B.4.202">B.4.202 <code><nobr>PAND</nobr></code>, <code><nobr>PANDN</nobr></code>: MMX Bitwise AND and AND-NOT</a></h4>
|
|
<p><pre>
|
|
PAND mm1,mm2/m64 ; 0F DB /r [PENT,MMX]
|
|
PANDN mm1,mm2/m64 ; 0F DF /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PAND xmm1,xmm2/m128 ; 66 0F DB /r [WILLAMETTE,SSE2]
|
|
PANDN xmm1,xmm2/m128 ; 66 0F DF /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PAND</nobr></code> performs a bitwise AND operation between
|
|
its two operands (i.e. each bit of the result is 1 if and only if the
|
|
corresponding bits of the two inputs were both 1), and stores the result in
|
|
the destination (first) operand.
|
|
<p><code><nobr>PANDN</nobr></code> performs the same operation, but
|
|
performs a one's complement operation on the destination (first) operand
|
|
first.
|
|
<h4><a name="section-B.4.203">B.4.203 <code><nobr>PAUSE</nobr></code>: Spin Loop Hint</a></h4>
|
|
<p><pre>
|
|
PAUSE ; F3 90 [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PAUSE</nobr></code> provides a hint to the processor that
|
|
the following code is a spin loop. This improves processor performance by
|
|
bypassing possible memory order violations. On older processors, this
|
|
instruction operates as a <code><nobr>NOP</nobr></code>.
|
|
<h4><a name="section-B.4.204">B.4.204 <code><nobr>PAVEB</nobr></code>: MMX Packed Average</a></h4>
|
|
<p><pre>
|
|
PAVEB mmxreg,r/m64 ; 0F 50 /r [CYRIX,MMX]
|
|
</pre>
|
|
<p><code><nobr>PAVEB</nobr></code>, specific to the Cyrix MMX extensions,
|
|
treats its two operands as vectors of eight unsigned bytes, and calculates
|
|
the average of the corresponding bytes in the operands. The resulting
|
|
vector of eight averages is stored in the first operand.
|
|
<p>This opcode maps to <code><nobr>MOVMSKPS r32, xmm</nobr></code> on
|
|
processors that support the SSE instruction set.
|
|
<h4><a name="section-B.4.205">B.4.205 <code><nobr>PAVGB</nobr></code> <code><nobr>PAVGW</nobr></code>: Average Packed Integers</a></h4>
|
|
<p><pre>
|
|
PAVGB mm1,mm2/m64 ; 0F E0 /r [KATMAI,MMX]
|
|
PAVGW mm1,mm2/m64 ; 0F E3 /r [KATMAI,MMX,SM]
|
|
</pre>
|
|
<p><pre>
|
|
PAVGB xmm1,xmm2/m128 ; 66 0F E0 /r [WILLAMETTE,SSE2]
|
|
PAVGW xmm1,xmm2/m128 ; 66 0F E3 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PAVGB</nobr></code> and <code><nobr>PAVGW</nobr></code> add
|
|
the unsigned data elements of the source operand to the unsigned data
|
|
elements of the destination register, then adds 1 to the temporary results.
|
|
The results of the add are then each independently right-shifted by one bit
|
|
position. The high order bits of each element are filled with the carry
|
|
bits of the corresponding sum.
|
|
<ul>
|
|
<li><code><nobr>PAVGB</nobr></code> operates on packed unsigned bytes, and
|
|
<li><code><nobr>PAVGW</nobr></code> operates on packed unsigned words.
|
|
</ul>
|
|
<h4><a name="section-B.4.206">B.4.206 <code><nobr>PAVGUSB</nobr></code>: Average of unsigned packed 8-bit values</a></h4>
|
|
<p><pre>
|
|
PAVGUSB mm1,mm2/m64 ; 0F 0F /r BF [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PAVGUSB</nobr></code> adds the unsigned data elements of the
|
|
source operand to the unsigned data elements of the destination register,
|
|
then adds 1 to the temporary results. The results of the add are then each
|
|
independently right-shifted by one bit position. The high order bits of
|
|
each element are filled with the carry bits of the corresponding sum.
|
|
<p>This instruction performs exactly the same operations as the
|
|
<code><nobr>PAVGB</nobr></code> <code><nobr>MMX</nobr></code> instruction
|
|
(<a href="#section-B.4.205">section B.4.205</a>).
|
|
<h4><a name="section-B.4.207">B.4.207 <code><nobr>PCMPxx</nobr></code>: Compare Packed Integers.</a></h4>
|
|
<p><pre>
|
|
PCMPEQB mm1,mm2/m64 ; 0F 74 /r [PENT,MMX]
|
|
PCMPEQW mm1,mm2/m64 ; 0F 75 /r [PENT,MMX]
|
|
PCMPEQD mm1,mm2/m64 ; 0F 76 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PCMPGTB mm1,mm2/m64 ; 0F 64 /r [PENT,MMX]
|
|
PCMPGTW mm1,mm2/m64 ; 0F 65 /r [PENT,MMX]
|
|
PCMPGTD mm1,mm2/m64 ; 0F 66 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PCMPEQB xmm1,xmm2/m128 ; 66 0F 74 /r [WILLAMETTE,SSE2]
|
|
PCMPEQW xmm1,xmm2/m128 ; 66 0F 75 /r [WILLAMETTE,SSE2]
|
|
PCMPEQD xmm1,xmm2/m128 ; 66 0F 76 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PCMPGTB xmm1,xmm2/m128 ; 66 0F 64 /r [WILLAMETTE,SSE2]
|
|
PCMPGTW xmm1,xmm2/m128 ; 66 0F 65 /r [WILLAMETTE,SSE2]
|
|
PCMPGTD xmm1,xmm2/m128 ; 66 0F 66 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p>The <code><nobr>PCMPxx</nobr></code> instructions all treat their
|
|
operands as vectors of bytes, words, or doublewords; corresponding elements
|
|
of the source and destination are compared, and the corresponding element
|
|
of the destination (first) operand is set to all zeros or all ones
|
|
depending on the result of the comparison.
|
|
<ul>
|
|
<li><code><nobr>PCMPxxB</nobr></code> treats the operands as vectors of
|
|
bytes;
|
|
<li><code><nobr>PCMPxxW</nobr></code> treats the operands as vectors of
|
|
words;
|
|
<li><code><nobr>PCMPxxD</nobr></code> treats the operands as vectors of
|
|
doublewords;
|
|
<li><code><nobr>PCMPEQx</nobr></code> sets the corresponding element of the
|
|
destination operand to all ones if the two elements compared are equal;
|
|
<li><code><nobr>PCMPGTx</nobr></code> sets the destination element to all
|
|
ones if the element of the first (destination) operand is greater (treated
|
|
as a signed integer) than that of the second (source) operand.
|
|
</ul>
|
|
<h4><a name="section-B.4.208">B.4.208 <code><nobr>PDISTIB</nobr></code>: MMX Packed Distance and Accumulate with Implied Register</a></h4>
|
|
<p><pre>
|
|
PDISTIB mm,m64 ; 0F 54 /r [CYRIX,MMX]
|
|
</pre>
|
|
<p><code><nobr>PDISTIB</nobr></code>, specific to the Cyrix MMX extensions,
|
|
treats its two input operands as vectors of eight unsigned bytes. For each
|
|
byte position, it finds the absolute difference between the bytes in that
|
|
position in the two input operands, and adds that value to the byte in the
|
|
same position in the implied output register. The addition is saturated to
|
|
an unsigned byte in the same way as <code><nobr>PADDUSB</nobr></code>.
|
|
<p>To work out the implied register, invert the lowest bit in the register
|
|
number. So <code><nobr>PDISTIB MM0,M64</nobr></code> would put the result
|
|
in <code><nobr>MM1</nobr></code>, but
|
|
<code><nobr>PDISTIB MM1,M64</nobr></code> would put the result in
|
|
<code><nobr>MM0</nobr></code>.
|
|
<p>Note that <code><nobr>PDISTIB</nobr></code> cannot take a register as
|
|
its second source operand.
|
|
<p>Operation:
|
|
<p><pre>
|
|
dstI[0-7] := dstI[0-7] + ABS(src0[0-7] - src1[0-7]),
|
|
dstI[8-15] := dstI[8-15] + ABS(src0[8-15] - src1[8-15]),
|
|
.......
|
|
.......
|
|
dstI[56-63] := dstI[56-63] + ABS(src0[56-63] - src1[56-63]).
|
|
</pre>
|
|
<h4><a name="section-B.4.209">B.4.209 <code><nobr>PEXTRW</nobr></code>: Extract Word</a></h4>
|
|
<p><pre>
|
|
PEXTRW reg32,mm,imm8 ; 0F C5 /r ib [KATMAI,MMX]
|
|
PEXTRW reg32,xmm,imm8 ; 66 0F C5 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PEXTRW</nobr></code> moves the word in the source register
|
|
(second operand) that is pointed to by the count operand (third operand),
|
|
into the lower half of a 32-bit general purpose register. The upper half of
|
|
the register is cleared to all 0s.
|
|
<p>When the source operand is an <code><nobr>MMX</nobr></code> register,
|
|
the two least significant bits of the count specify the source word. When
|
|
it is an <code><nobr>SSE</nobr></code> register, the three least
|
|
significant bits specify the word location.
|
|
<h4><a name="section-B.4.210">B.4.210 <code><nobr>PF2ID</nobr></code>: Packed Single-Precision FP to Integer Convert</a></h4>
|
|
<p><pre>
|
|
PF2ID mm1,mm2/m64 ; 0F 0F /r 1D [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PF2ID</nobr></code> converts two single-precision FP values
|
|
in the source operand to signed 32-bit integers, using truncation, and
|
|
stores them in the destination operand. Source values that are outside the
|
|
range supported by the destination are saturated to the largest absolute
|
|
value of the same sign.
|
|
<h4><a name="section-B.4.211">B.4.211 <code><nobr>PF2IW</nobr></code>: Packed Single-Precision FP to Integer Word Convert</a></h4>
|
|
<p><pre>
|
|
PF2IW mm1,mm2/m64 ; 0F 0F /r 1C [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PF2IW</nobr></code> converts two single-precision FP values
|
|
in the source operand to signed 16-bit integers, using truncation, and
|
|
stores them in the destination operand. Source values that are outside the
|
|
range supported by the destination are saturated to the largest absolute
|
|
value of the same sign.
|
|
<ul>
|
|
<li>In the K6-2 and K6-III, the 16-bit value is zero-extended to 32-bits
|
|
before storing.
|
|
<li>In the K6-2+, K6-III+ and Athlon processors, the value is sign-extended
|
|
to 32-bits before storing.
|
|
</ul>
|
|
<h4><a name="section-B.4.212">B.4.212 <code><nobr>PFACC</nobr></code>: Packed Single-Precision FP Accumulate</a></h4>
|
|
<p><pre>
|
|
PFACC mm1,mm2/m64 ; 0F 0F /r AE [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFACC</nobr></code> adds the two single-precision FP values
|
|
from the destination operand together, then adds the two single-precision
|
|
FP values from the source operand, and places the results in the low and
|
|
high doublewords of the destination operand.
|
|
<p>The operation is:
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] + dst[32-63],
|
|
dst[32-63] := src[0-31] + src[32-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.213">B.4.213 <code><nobr>PFADD</nobr></code>: Packed Single-Precision FP Addition</a></h4>
|
|
<p><pre>
|
|
PFADD mm1,mm2/m64 ; 0F 0F /r 9E [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFADD</nobr></code> performs addition on each of two packed
|
|
single-precision FP value pairs.
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] + src[0-31],
|
|
dst[32-63] := dst[32-63] + src[32-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.214">B.4.214 <code><nobr>PFCMPxx</nobr></code>: Packed Single-Precision FP Compare </a></h4>
|
|
<p><pre>
|
|
PFCMPEQ mm1,mm2/m64 ; 0F 0F /r B0 [PENT,3DNOW]
|
|
PFCMPGE mm1,mm2/m64 ; 0F 0F /r 90 [PENT,3DNOW]
|
|
PFCMPGT mm1,mm2/m64 ; 0F 0F /r A0 [PENT,3DNOW]
|
|
</pre>
|
|
<p>The <code><nobr>PFCMPxx</nobr></code> instructions compare the packed
|
|
single-point FP values in the source and destination operands, and set the
|
|
destination according to the result. If the condition is true, the
|
|
destination is set to all 1s, otherwise it's set to all 0s.
|
|
<ul>
|
|
<li><code><nobr>PFCMPEQ</nobr></code> tests whether dst == src;
|
|
<li><code><nobr>PFCMPGE</nobr></code> tests whether dst >= src;
|
|
<li><code><nobr>PFCMPGT</nobr></code> tests whether dst > src.
|
|
</ul>
|
|
<h4><a name="section-B.4.215">B.4.215 <code><nobr>PFMAX</nobr></code>: Packed Single-Precision FP Maximum</a></h4>
|
|
<p><pre>
|
|
PFMAX mm1,mm2/m64 ; 0F 0F /r A4 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFMAX</nobr></code> returns the higher of each pair of
|
|
single-precision FP values. If the higher value is zero, it is returned as
|
|
positive zero.
|
|
<h4><a name="section-B.4.216">B.4.216 <code><nobr>PFMIN</nobr></code>: Packed Single-Precision FP Minimum</a></h4>
|
|
<p><pre>
|
|
PFMIN mm1,mm2/m64 ; 0F 0F /r 94 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFMIN</nobr></code> returns the lower of each pair of
|
|
single-precision FP values. If the lower value is zero, it is returned as
|
|
positive zero.
|
|
<h4><a name="section-B.4.217">B.4.217 <code><nobr>PFMUL</nobr></code>: Packed Single-Precision FP Multiply</a></h4>
|
|
<p><pre>
|
|
PFMUL mm1,mm2/m64 ; 0F 0F /r B4 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFMUL</nobr></code> returns the product of each pair of
|
|
single-precision FP values.
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] * src[0-31],
|
|
dst[32-63] := dst[32-63] * src[32-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.218">B.4.218 <code><nobr>PFNACC</nobr></code>: Packed Single-Precision FP Negative Accumulate</a></h4>
|
|
<p><pre>
|
|
PFNACC mm1,mm2/m64 ; 0F 0F /r 8A [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFNACC</nobr></code> performs a negative accumulate of the
|
|
two single-precision FP values in the source and destination registers. The
|
|
result of the accumulate from the destination register is stored in the low
|
|
doubleword of the destination, and the result of the source accumulate is
|
|
stored in the high doubleword of the destination register.
|
|
<p>The operation is:
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] - dst[32-63],
|
|
dst[32-63] := src[0-31] - src[32-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.219">B.4.219 <code><nobr>PFPNACC</nobr></code>: Packed Single-Precision FP Mixed Accumulate</a></h4>
|
|
<p><pre>
|
|
PFPNACC mm1,mm2/m64 ; 0F 0F /r 8E [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFPNACC</nobr></code> performs a positive accumulate of the
|
|
two single-precision FP values in the source register and a negative
|
|
accumulate of the destination register. The result of the accumulate from
|
|
the destination register is stored in the low doubleword of the
|
|
destination, and the result of the source accumulate is stored in the high
|
|
doubleword of the destination register.
|
|
<p>The operation is:
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] - dst[32-63],
|
|
dst[32-63] := src[0-31] + src[32-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.220">B.4.220 <code><nobr>PFRCP</nobr></code>: Packed Single-Precision FP Reciprocal Approximation</a></h4>
|
|
<p><pre>
|
|
PFRCP mm1,mm2/m64 ; 0F 0F /r 96 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFRCP</nobr></code> performs a low precision estimate of the
|
|
reciprocal of the low-order single-precision FP value in the source
|
|
operand, storing the result in both halves of the destination register. The
|
|
result is accurate to 14 bits.
|
|
<p>For higher precision reciprocals, this instruction should be followed by
|
|
two more instructions: <code><nobr>PFRCPIT1</nobr></code>
|
|
(<a href="#section-B.4.221">section B.4.221</a>) and
|
|
<code><nobr>PFRCPIT2</nobr></code> (<a href="#section-B.4.221">section
|
|
B.4.221</a>). This will result in a 24-bit accuracy. For more details, see
|
|
the AMD 3DNow! technology manual.
|
|
<h4><a name="section-B.4.221">B.4.221 <code><nobr>PFRCPIT1</nobr></code>: Packed Single-Precision FP Reciprocal, First Iteration Step</a></h4>
|
|
<p><pre>
|
|
PFRCPIT1 mm1,mm2/m64 ; 0F 0F /r A6 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFRCPIT1</nobr></code> performs the first intermediate step
|
|
in the calculation of the reciprocal of a single-precision FP value. The
|
|
first source value (<code><nobr>mm1</nobr></code> is the original value,
|
|
and the second source value (<code><nobr>mm2/m64</nobr></code> is the
|
|
result of a <code><nobr>PFRCP</nobr></code> instruction.
|
|
<p>For the final step in a reciprocal, returning the full 24-bit accuracy
|
|
of a single-precision FP value, see <code><nobr>PFRCPIT2</nobr></code>
|
|
(<a href="#section-B.4.222">section B.4.222</a>). For more details, see the
|
|
AMD 3DNow! technology manual.
|
|
<h4><a name="section-B.4.222">B.4.222 <code><nobr>PFRCPIT2</nobr></code>: Packed Single-Precision FP Reciprocal/ Reciprocal Square Root, Second Iteration Step</a></h4>
|
|
<p><pre>
|
|
PFRCPIT2 mm1,mm2/m64 ; 0F 0F /r B6 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFRCPIT2</nobr></code> performs the second and final
|
|
intermediate step in the calculation of a reciprocal or reciprocal square
|
|
root, refining the values returned by the <code><nobr>PFRCP</nobr></code>
|
|
and <code><nobr>PFRSQRT</nobr></code> instructions, respectively.
|
|
<p>The first source value (<code><nobr>mm1</nobr></code>) is the output of
|
|
either a <code><nobr>PFRCPIT1</nobr></code> or a
|
|
<code><nobr>PFRSQIT1</nobr></code> instruction, and the second source is
|
|
the output of either the <code><nobr>PFRCP</nobr></code> or the
|
|
<code><nobr>PFRSQRT</nobr></code> instruction. For more details, see the
|
|
AMD 3DNow! technology manual.
|
|
<h4><a name="section-B.4.223">B.4.223 <code><nobr>PFRSQIT1</nobr></code>: Packed Single-Precision FP Reciprocal Square Root, First Iteration Step</a></h4>
|
|
<p><pre>
|
|
PFRSQIT1 mm1,mm2/m64 ; 0F 0F /r A7 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFRSQIT1</nobr></code> performs the first intermediate step
|
|
in the calculation of the reciprocal square root of a single-precision FP
|
|
value. The first source value (<code><nobr>mm1</nobr></code> is the square
|
|
of the result of a <code><nobr>PFRSQRT</nobr></code> instruction, and the
|
|
second source value (<code><nobr>mm2/m64</nobr></code> is the original
|
|
value.
|
|
<p>For the final step in a calculation, returning the full 24-bit accuracy
|
|
of a single-precision FP value, see <code><nobr>PFRCPIT2</nobr></code>
|
|
(<a href="#section-B.4.222">section B.4.222</a>). For more details, see the
|
|
AMD 3DNow! technology manual.
|
|
<h4><a name="section-B.4.224">B.4.224 <code><nobr>PFRSQRT</nobr></code>: Packed Single-Precision FP Reciprocal Square Root Approximation</a></h4>
|
|
<p><pre>
|
|
PFRSQRT mm1,mm2/m64 ; 0F 0F /r 97 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFRSQRT</nobr></code> performs a low precision estimate of
|
|
the reciprocal square root of the low-order single-precision FP value in
|
|
the source operand, storing the result in both halves of the destination
|
|
register. The result is accurate to 15 bits.
|
|
<p>For higher precision reciprocals, this instruction should be followed by
|
|
two more instructions: <code><nobr>PFRSQIT1</nobr></code>
|
|
(<a href="#section-B.4.223">section B.4.223</a>) and
|
|
<code><nobr>PFRCPIT2</nobr></code> (<a href="#section-B.4.221">section
|
|
B.4.221</a>). This will result in a 24-bit accuracy. For more details, see
|
|
the AMD 3DNow! technology manual.
|
|
<h4><a name="section-B.4.225">B.4.225 <code><nobr>PFSUB</nobr></code>: Packed Single-Precision FP Subtract</a></h4>
|
|
<p><pre>
|
|
PFSUB mm1,mm2/m64 ; 0F 0F /r 9A [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFSUB</nobr></code> subtracts the single-precision FP values
|
|
in the source from those in the destination, and stores the result in the
|
|
destination operand.
|
|
<p><pre>
|
|
dst[0-31] := dst[0-31] - src[0-31],
|
|
dst[32-63] := dst[32-63] - src[32-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.226">B.4.226 <code><nobr>PFSUBR</nobr></code>: Packed Single-Precision FP Reverse Subtract</a></h4>
|
|
<p><pre>
|
|
PFSUBR mm1,mm2/m64 ; 0F 0F /r AA [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PFSUBR</nobr></code> subtracts the single-precision FP
|
|
values in the destination from those in the source, and stores the result
|
|
in the destination operand.
|
|
<p><pre>
|
|
dst[0-31] := src[0-31] - dst[0-31],
|
|
dst[32-63] := src[32-63] - dst[32-63].
|
|
</pre>
|
|
<h4><a name="section-B.4.227">B.4.227 <code><nobr>PI2FD</nobr></code>: Packed Doubleword Integer to Single-Precision FP Convert</a></h4>
|
|
<p><pre>
|
|
PI2FD mm1,mm2/m64 ; 0F 0F /r 0D [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PF2ID</nobr></code> converts two signed 32-bit integers in
|
|
the source operand to single-precision FP values, using truncation of
|
|
significant digits, and stores them in the destination operand.
|
|
<h4><a name="section-B.4.228">B.4.228 <code><nobr>PF2IW</nobr></code>: Packed Word Integer to Single-Precision FP Convert</a></h4>
|
|
<p><pre>
|
|
PI2FW mm1,mm2/m64 ; 0F 0F /r 0C [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PF2IW</nobr></code> converts two signed 16-bit integers in
|
|
the source operand to single-precision FP values, and stores them in the
|
|
destination operand. The input values are in the low word of each
|
|
doubleword.
|
|
<h4><a name="section-B.4.229">B.4.229 <code><nobr>PINSRW</nobr></code>: Insert Word</a></h4>
|
|
<p><pre>
|
|
PINSRW mm,r16/r32/m16,imm8 ;0F C4 /r ib [KATMAI,MMX]
|
|
PINSRW xmm,r16/r32/m16,imm8 ;66 0F C4 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PINSRW</nobr></code> loads a word from a 16-bit register (or
|
|
the low half of a 32-bit register), or from memory, and loads it to the
|
|
word position in the destination register, pointed at by the count operand
|
|
(third operand). If the destination is an <code><nobr>MMX</nobr></code>
|
|
register, the low two bits of the count byte are used, if it is an
|
|
<code><nobr>XMM</nobr></code> register the low 3 bits are used. The
|
|
insertion is done in such a way that the other words from the destination
|
|
register are left untouched.
|
|
<h4><a name="section-B.4.230">B.4.230 <code><nobr>PMACHRIW</nobr></code>: Packed Multiply and Accumulate with Rounding</a></h4>
|
|
<p><pre>
|
|
PMACHRIW mm,m64 ; 0F 5E /r [CYRIX,MMX]
|
|
</pre>
|
|
<p><code><nobr>PMACHRIW</nobr></code> takes two packed 16-bit integer
|
|
inputs, multiplies the values in the inputs, rounds on bit 15 of each
|
|
result, then adds bits 15-30 of each result to the corresponding position
|
|
of the <em>implied</em> destination register.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dstI[0-15] := dstI[0-15] + (mm[0-15] *m64[0-15]
|
|
+ 0x00004000)[15-30],
|
|
dstI[16-31] := dstI[16-31] + (mm[16-31]*m64[16-31]
|
|
+ 0x00004000)[15-30],
|
|
dstI[32-47] := dstI[32-47] + (mm[32-47]*m64[32-47]
|
|
+ 0x00004000)[15-30],
|
|
dstI[48-63] := dstI[48-63] + (mm[48-63]*m64[48-63]
|
|
+ 0x00004000)[15-30].
|
|
</pre>
|
|
<p>Note that <code><nobr>PMACHRIW</nobr></code> cannot take a register as
|
|
its second source operand.
|
|
<h4><a name="section-B.4.231">B.4.231 <code><nobr>PMADDWD</nobr></code>: MMX Packed Multiply and Add</a></h4>
|
|
<p><pre>
|
|
PMADDWD mm1,mm2/m64 ; 0F F5 /r [PENT,MMX]
|
|
PMADDWD xmm1,xmm2/m128 ; 66 0F F5 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMADDWD</nobr></code> treats its two inputs as vectors of
|
|
signed words. It multiplies corresponding elements of the two operands,
|
|
giving doubleword results. These are then added together in pairs and
|
|
stored in the destination operand.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[0-31] := (dst[0-15] * src[0-15])
|
|
+ (dst[16-31] * src[16-31]);
|
|
dst[32-63] := (dst[32-47] * src[32-47])
|
|
+ (dst[48-63] * src[48-63]);
|
|
</pre>
|
|
<p>The following apply to the <code><nobr>SSE</nobr></code> version of the
|
|
instruction:
|
|
<p><pre>
|
|
dst[64-95] := (dst[64-79] * src[64-79])
|
|
+ (dst[80-95] * src[80-95]);
|
|
dst[96-127] := (dst[96-111] * src[96-111])
|
|
+ (dst[112-127] * src[112-127]).
|
|
</pre>
|
|
<h4><a name="section-B.4.232">B.4.232 <code><nobr>PMAGW</nobr></code>: MMX Packed Magnitude</a></h4>
|
|
<p><pre>
|
|
PMAGW mm1,mm2/m64 ; 0F 52 /r [CYRIX,MMX]
|
|
</pre>
|
|
<p><code><nobr>PMAGW</nobr></code>, specific to the Cyrix MMX extensions,
|
|
treats both its operands as vectors of four signed words. It compares the
|
|
absolute values of the words in corresponding positions, and sets each word
|
|
of the destination (first) operand to whichever of the two words in that
|
|
position had the larger absolute value.
|
|
<h4><a name="section-B.4.233">B.4.233 <code><nobr>PMAXSW</nobr></code>: Packed Signed Integer Word Maximum</a></h4>
|
|
<p><pre>
|
|
PMAXSW mm1,mm2/m64 ; 0F EE /r [KATMAI,MMX]
|
|
PMAXSW xmm1,xmm2/m128 ; 66 0F EE /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMAXSW</nobr></code> compares each pair of words in the two
|
|
source operands, and for each pair it stores the maximum value in the
|
|
destination register.
|
|
<h4><a name="section-B.4.234">B.4.234 <code><nobr>PMAXUB</nobr></code>: Packed Unsigned Integer Byte Maximum</a></h4>
|
|
<p><pre>
|
|
PMAXUB mm1,mm2/m64 ; 0F DE /r [KATMAI,MMX]
|
|
PMAXUB xmm1,xmm2/m128 ; 66 0F DE /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMAXUB</nobr></code> compares each pair of bytes in the two
|
|
source operands, and for each pair it stores the maximum value in the
|
|
destination register.
|
|
<h4><a name="section-B.4.235">B.4.235 <code><nobr>PMINSW</nobr></code>: Packed Signed Integer Word Minimum</a></h4>
|
|
<p><pre>
|
|
PMINSW mm1,mm2/m64 ; 0F EA /r [KATMAI,MMX]
|
|
PMINSW xmm1,xmm2/m128 ; 66 0F EA /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMINSW</nobr></code> compares each pair of words in the two
|
|
source operands, and for each pair it stores the minimum value in the
|
|
destination register.
|
|
<h4><a name="section-B.4.236">B.4.236 <code><nobr>PMINUB</nobr></code>: Packed Unsigned Integer Byte Minimum</a></h4>
|
|
<p><pre>
|
|
PMINUB mm1,mm2/m64 ; 0F DA /r [KATMAI,MMX]
|
|
PMINUB xmm1,xmm2/m128 ; 66 0F DA /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMINUB</nobr></code> compares each pair of bytes in the two
|
|
source operands, and for each pair it stores the minimum value in the
|
|
destination register.
|
|
<h4><a name="section-B.4.237">B.4.237 <code><nobr>PMOVMSKB</nobr></code>: Move Byte Mask To Integer</a></h4>
|
|
<p><pre>
|
|
PMOVMSKB reg32,mm ; 0F D7 /r [KATMAI,MMX]
|
|
PMOVMSKB reg32,xmm ; 66 0F D7 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMOVMSKB</nobr></code> returns an 8-bit or 16-bit mask
|
|
formed of the most significant bits of each byte of source operand (8-bits
|
|
for an <code><nobr>MMX</nobr></code> register, 16-bits for an
|
|
<code><nobr>XMM</nobr></code> register).
|
|
<h4><a name="section-B.4.238">B.4.238 <code><nobr>PMULHRWC</nobr></code>, <code><nobr>PMULHRIW</nobr></code>: Multiply Packed 16-bit Integers With Rounding, and Store High Word</a></h4>
|
|
<p><pre>
|
|
PMULHRWC mm1,mm2/m64 ; 0F 59 /r [CYRIX,MMX]
|
|
PMULHRIW mm1,mm2/m64 ; 0F 5D /r [CYRIX,MMX]
|
|
</pre>
|
|
<p>These instructions take two packed 16-bit integer inputs, multiply the
|
|
values in the inputs, round on bit 15 of each result, then store bits 15-30
|
|
of each result to the corresponding position of the destination register.
|
|
<ul>
|
|
<li>For <code><nobr>PMULHRWC</nobr></code>, the destination is the first
|
|
source operand.
|
|
<li>For <code><nobr>PMULHRIW</nobr></code>, the destination is an implied
|
|
register (worked out as described for <code><nobr>PADDSIW</nobr></code>
|
|
(<a href="#section-B.4.200">section B.4.200</a>)).
|
|
</ul>
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[0-15] := (src1[0-15] *src2[0-15] + 0x00004000)[15-30]
|
|
dst[16-31] := (src1[16-31]*src2[16-31] + 0x00004000)[15-30]
|
|
dst[32-47] := (src1[32-47]*src2[32-47] + 0x00004000)[15-30]
|
|
dst[48-63] := (src1[48-63]*src2[48-63] + 0x00004000)[15-30]
|
|
</pre>
|
|
<p>See also <code><nobr>PMULHRWA</nobr></code>
|
|
(<a href="#section-B.4.239">section B.4.239</a>) for a 3DNow! version of
|
|
this instruction.
|
|
<h4><a name="section-B.4.239">B.4.239 <code><nobr>PMULHRWA</nobr></code>: Multiply Packed 16-bit Integers With Rounding, and Store High Word</a></h4>
|
|
<p><pre>
|
|
PMULHRWA mm1,mm2/m64 ; 0F 0F /r B7 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PMULHRWA</nobr></code> takes two packed 16-bit integer
|
|
inputs, multiplies the values in the inputs, rounds on bit 16 of each
|
|
result, then stores bits 16-31 of each result to the corresponding position
|
|
of the destination register.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[0-15] := (src1[0-15] *src2[0-15] + 0x00008000)[16-31];
|
|
dst[16-31] := (src1[16-31]*src2[16-31] + 0x00008000)[16-31];
|
|
dst[32-47] := (src1[32-47]*src2[32-47] + 0x00008000)[16-31];
|
|
dst[48-63] := (src1[48-63]*src2[48-63] + 0x00008000)[16-31].
|
|
</pre>
|
|
<p>See also <code><nobr>PMULHRWC</nobr></code>
|
|
(<a href="#section-B.4.238">section B.4.238</a>) for a Cyrix version of
|
|
this instruction.
|
|
<h4><a name="section-B.4.240">B.4.240 <code><nobr>PMULHUW</nobr></code>: Multiply Packed 16-bit Integers, and Store High Word</a></h4>
|
|
<p><pre>
|
|
PMULHUW mm1,mm2/m64 ; 0F E4 /r [KATMAI,MMX]
|
|
PMULHUW xmm1,xmm2/m128 ; 66 0F E4 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMULHUW</nobr></code> takes two packed unsigned 16-bit
|
|
integer inputs, multiplies the values in the inputs, then stores bits 16-31
|
|
of each result to the corresponding position of the destination register.
|
|
<h4><a name="section-B.4.241">B.4.241 <code><nobr>PMULHW</nobr></code>, <code><nobr>PMULLW</nobr></code>: Multiply Packed 16-bit Integers, and Store</a></h4>
|
|
<p><pre>
|
|
PMULHW mm1,mm2/m64 ; 0F E5 /r [PENT,MMX]
|
|
PMULLW mm1,mm2/m64 ; 0F D5 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PMULHW xmm1,xmm2/m128 ; 66 0F E5 /r [WILLAMETTE,SSE2]
|
|
PMULLW xmm1,xmm2/m128 ; 66 0F D5 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMULxW</nobr></code> takes two packed unsigned 16-bit
|
|
integer inputs, and multiplies the values in the inputs, forming doubleword
|
|
results.
|
|
<ul>
|
|
<li><code><nobr>PMULHW</nobr></code> then stores the top 16 bits of each
|
|
doubleword in the destination (first) operand;
|
|
<li><code><nobr>PMULLW</nobr></code> stores the bottom 16 bits of each
|
|
doubleword in the destination operand.
|
|
</ul>
|
|
<h4><a name="section-B.4.242">B.4.242 <code><nobr>PMULUDQ</nobr></code>: Multiply Packed Unsigned 32-bit Integers, and Store.</a></h4>
|
|
<p><pre>
|
|
PMULUDQ mm1,mm2/m64 ; 0F F4 /r [WILLAMETTE,SSE2]
|
|
PMULUDQ xmm1,xmm2/m128 ; 66 0F F4 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PMULUDQ</nobr></code> takes two packed unsigned 32-bit
|
|
integer inputs, and multiplies the values in the inputs, forming quadword
|
|
results. The source is either an unsigned doubleword in the low doubleword
|
|
of a 64-bit operand, or it's two unsigned doublewords in the first and
|
|
third doublewords of a 128-bit operand. This produces either one or two
|
|
64-bit results, which are stored in the respective quadword locations of
|
|
the destination register.
|
|
<p>The operation is:
|
|
<p><pre>
|
|
dst[0-63] := dst[0-31] * src[0-31];
|
|
dst[64-127] := dst[64-95] * src[64-95].
|
|
</pre>
|
|
<h4><a name="section-B.4.243">B.4.243 <code><nobr>PMVccZB</nobr></code>: MMX Packed Conditional Move</a></h4>
|
|
<p><pre>
|
|
PMVZB mmxreg,mem64 ; 0F 58 /r [CYRIX,MMX]
|
|
PMVNZB mmxreg,mem64 ; 0F 5A /r [CYRIX,MMX]
|
|
PMVLZB mmxreg,mem64 ; 0F 5B /r [CYRIX,MMX]
|
|
PMVGEZB mmxreg,mem64 ; 0F 5C /r [CYRIX,MMX]
|
|
</pre>
|
|
<p>These instructions, specific to the Cyrix MMX extensions, perform
|
|
parallel conditional moves. The two input operands are treated as vectors
|
|
of eight bytes. Each byte of the destination (first) operand is either
|
|
written from the corresponding byte of the source (second) operand, or left
|
|
alone, depending on the value of the byte in the <em>implied</em> operand
|
|
(specified in the same way as <code><nobr>PADDSIW</nobr></code>, in
|
|
<a href="#section-B.4.200">section B.4.200</a>).
|
|
<ul>
|
|
<li><code><nobr>PMVZB</nobr></code> performs each move if the corresponding
|
|
byte in the implied operand is zero;
|
|
<li><code><nobr>PMVNZB</nobr></code> moves if the byte is non-zero;
|
|
<li><code><nobr>PMVLZB</nobr></code> moves if the byte is less than zero;
|
|
<li><code><nobr>PMVGEZB</nobr></code> moves if the byte is greater than or
|
|
equal to zero.
|
|
</ul>
|
|
<p>Note that these instructions cannot take a register as their second
|
|
source operand.
|
|
<h4><a name="section-B.4.244">B.4.244 <code><nobr>POP</nobr></code>: Pop Data from Stack</a></h4>
|
|
<p><pre>
|
|
POP reg16 ; o16 58+r [8086]
|
|
POP reg32 ; o32 58+r [386]
|
|
</pre>
|
|
<p><pre>
|
|
POP r/m16 ; o16 8F /0 [8086]
|
|
POP r/m32 ; o32 8F /0 [386]
|
|
</pre>
|
|
<p><pre>
|
|
POP CS ; 0F [8086,UNDOC]
|
|
POP DS ; 1F [8086]
|
|
POP ES ; 07 [8086]
|
|
POP SS ; 17 [8086]
|
|
POP FS ; 0F A1 [386]
|
|
POP GS ; 0F A9 [386]
|
|
</pre>
|
|
<p><code><nobr>POP</nobr></code> loads a value from the stack (from
|
|
<code><nobr>[SS:SP]</nobr></code> or <code><nobr>[SS:ESP]</nobr></code>)
|
|
and then increments the stack pointer.
|
|
<p>The address-size attribute of the instruction determines whether
|
|
<code><nobr>SP</nobr></code> or <code><nobr>ESP</nobr></code> is used as
|
|
the stack pointer: to deliberately override the default given by the
|
|
<code><nobr>BITS</nobr></code> setting, you can use an
|
|
<code><nobr>a16</nobr></code> or <code><nobr>a32</nobr></code> prefix.
|
|
<p>The operand-size attribute of the instruction determines whether the
|
|
stack pointer is incremented by 2 or 4: this means that segment register
|
|
pops in <code><nobr>BITS 32</nobr></code> mode will pop 4 bytes off the
|
|
stack and discard the upper two of them. If you need to override that, you
|
|
can use an <code><nobr>o16</nobr></code> or <code><nobr>o32</nobr></code>
|
|
prefix.
|
|
<p>The above opcode listings give two forms for general-purpose register
|
|
pop instructions: for example, <code><nobr>POP BX</nobr></code> has the two
|
|
forms <code><nobr>5B</nobr></code> and <code><nobr>8F C3</nobr></code>.
|
|
NASM will always generate the shorter form when given
|
|
<code><nobr>POP BX</nobr></code>. NDISASM will disassemble both.
|
|
<p><code><nobr>POP CS</nobr></code> is not a documented instruction, and is
|
|
not supported on any processor above the 8086 (since they use
|
|
<code><nobr>0Fh</nobr></code> as an opcode prefix for instruction set
|
|
extensions). However, at least some 8086 processors do support it, and so
|
|
NASM generates it for completeness.
|
|
<h4><a name="section-B.4.245">B.4.245 <code><nobr>POPAx</nobr></code>: Pop All General-Purpose Registers</a></h4>
|
|
<p><pre>
|
|
POPA ; 61 [186]
|
|
POPAW ; o16 61 [186]
|
|
POPAD ; o32 61 [386]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>POPAW</nobr></code> pops a word from the stack into each
|
|
of, successively, <code><nobr>DI</nobr></code>,
|
|
<code><nobr>SI</nobr></code>, <code><nobr>BP</nobr></code>, nothing (it
|
|
discards a word from the stack which was a placeholder for
|
|
<code><nobr>SP</nobr></code>), <code><nobr>BX</nobr></code>,
|
|
<code><nobr>DX</nobr></code>, <code><nobr>CX</nobr></code> and
|
|
<code><nobr>AX</nobr></code>. It is intended to reverse the operation of
|
|
<code><nobr>PUSHAW</nobr></code> (see <a href="#section-B.4.264">section
|
|
B.4.264</a>), but it ignores the value for <code><nobr>SP</nobr></code>
|
|
that was pushed on the stack by <code><nobr>PUSHAW</nobr></code>.
|
|
<li><code><nobr>POPAD</nobr></code> pops twice as much data, and places the
|
|
results in <code><nobr>EDI</nobr></code>, <code><nobr>ESI</nobr></code>,
|
|
<code><nobr>EBP</nobr></code>, nothing (placeholder for
|
|
<code><nobr>ESP</nobr></code>), <code><nobr>EBX</nobr></code>,
|
|
<code><nobr>EDX</nobr></code>, <code><nobr>ECX</nobr></code> and
|
|
<code><nobr>EAX</nobr></code>. It reverses the operation of
|
|
<code><nobr>PUSHAD</nobr></code>.
|
|
</ul>
|
|
<p><code><nobr>POPA</nobr></code> is an alias mnemonic for either
|
|
<code><nobr>POPAW</nobr></code> or <code><nobr>POPAD</nobr></code>,
|
|
depending on the current <code><nobr>BITS</nobr></code> setting.
|
|
<p>Note that the registers are popped in reverse order of their numeric
|
|
values in opcodes (see <a href="#section-B.2.1">section B.2.1</a>).
|
|
<h4><a name="section-B.4.246">B.4.246 <code><nobr>POPFx</nobr></code>: Pop Flags Register</a></h4>
|
|
<p><pre>
|
|
POPF ; 9D [8086]
|
|
POPFW ; o16 9D [8086]
|
|
POPFD ; o32 9D [386]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>POPFW</nobr></code> pops a word from the stack and stores
|
|
it in the bottom 16 bits of the flags register (or the whole flags
|
|
register, on processors below a 386).
|
|
<li><code><nobr>POPFD</nobr></code> pops a doubleword and stores it in the
|
|
entire flags register.
|
|
</ul>
|
|
<p><code><nobr>POPF</nobr></code> is an alias mnemonic for either
|
|
<code><nobr>POPFW</nobr></code> or <code><nobr>POPFD</nobr></code>,
|
|
depending on the current <code><nobr>BITS</nobr></code> setting.
|
|
<p>See also <code><nobr>PUSHF</nobr></code>
|
|
(<a href="#section-B.4.265">section B.4.265</a>).
|
|
<h4><a name="section-B.4.247">B.4.247 <code><nobr>POR</nobr></code>: MMX Bitwise OR</a></h4>
|
|
<p><pre>
|
|
POR mm1,mm2/m64 ; 0F EB /r [PENT,MMX]
|
|
POR xmm1,xmm2/m128 ; 66 0F EB /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>POR</nobr></code> performs a bitwise OR operation between
|
|
its two operands (i.e. each bit of the result is 1 if and only if at least
|
|
one of the corresponding bits of the two inputs was 1), and stores the
|
|
result in the destination (first) operand.
|
|
<h4><a name="section-B.4.248">B.4.248 <code><nobr>PREFETCH</nobr></code>: Prefetch Data Into Caches</a></h4>
|
|
<p><pre>
|
|
PREFETCH mem8 ; 0F 0D /0 [PENT,3DNOW]
|
|
PREFETCHW mem8 ; 0F 0D /1 [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PREFETCH</nobr></code> and
|
|
<code><nobr>PREFETCHW</nobr></code> fetch the line of data from memory that
|
|
contains the specified byte. <code><nobr>PREFETCHW</nobr></code> performs
|
|
differently on the Athlon to earlier processors.
|
|
<p>For more details, see the 3DNow! Technology Manual.
|
|
<h4><a name="section-B.4.249">B.4.249 <code><nobr>PREFETCHh</nobr></code>: Prefetch Data Into Caches </a></h4>
|
|
<p><pre>
|
|
PREFETCHNTA m8 ; 0F 18 /0 [KATMAI]
|
|
PREFETCHT0 m8 ; 0F 18 /1 [KATMAI]
|
|
PREFETCHT1 m8 ; 0F 18 /2 [KATMAI]
|
|
PREFETCHT2 m8 ; 0F 18 /3 [KATMAI]
|
|
</pre>
|
|
<p>The <code><nobr>PREFETCHh</nobr></code> instructions fetch the line of
|
|
data from memory that contains the specified byte. It is placed in the
|
|
cache according to rules specified by locality hints
|
|
<code><nobr>h</nobr></code>:
|
|
<p>The hints are:
|
|
<ul>
|
|
<li><code><nobr>T0</nobr></code> (temporal data) - prefetch data into all
|
|
levels of the cache hierarchy.
|
|
<li><code><nobr>T1</nobr></code> (temporal data with respect to first level
|
|
cache) - prefetch data into level 2 cache and higher.
|
|
<li><code><nobr>T2</nobr></code> (temporal data with respect to second
|
|
level cache) - prefetch data into level 2 cache and higher.
|
|
<li><code><nobr>NTA</nobr></code> (non-temporal data with respect to all
|
|
cache levels) - prefetch data into non-temporal cache structure and into a
|
|
location close to the processor, minimizing cache pollution.
|
|
</ul>
|
|
<p>Note that this group of instructions doesn't provide a guarantee that
|
|
the data will be in the cache when it is needed. For more details, see the
|
|
Intel IA32 Software Developer Manual, Volume 2.
|
|
<h4><a name="section-B.4.250">B.4.250 <code><nobr>PSADBW</nobr></code>: Packed Sum of Absolute Differences</a></h4>
|
|
<p><pre>
|
|
PSADBW mm1,mm2/m64 ; 0F F6 /r [KATMAI,MMX]
|
|
PSADBW xmm1,xmm2/m128 ; 66 0F F6 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSADBW</nobr></code> The PSADBW instruction computes the
|
|
absolute value of the difference of the packed unsigned bytes in the two
|
|
source operands. These differences are then summed to produce a word result
|
|
in the lower 16-bit field of the destination register; the rest of the
|
|
register is cleared. The destination operand is an
|
|
<code><nobr>MMX</nobr></code> or an <code><nobr>XMM</nobr></code> register.
|
|
The source operand can either be a register or a memory operand.
|
|
<h4><a name="section-B.4.251">B.4.251 <code><nobr>PSHUFD</nobr></code>: Shuffle Packed Doublewords</a></h4>
|
|
<p><pre>
|
|
PSHUFD xmm1,xmm2/m128,imm8 ; 66 0F 70 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSHUFD</nobr></code> shuffles the doublewords in the source
|
|
(second) operand according to the encoding specified by imm8, and stores
|
|
the result in the destination (first) operand.
|
|
<p>Bits 0 and 1 of imm8 encode the source position of the doubleword to be
|
|
copied to position 0 in the destination operand. Bits 2 and 3 encode for
|
|
position 1, bits 4 and 5 encode for position 2, and bits 6 and 7 encode for
|
|
position 3. For example, an encoding of 10 in bits 0 and 1 of imm8
|
|
indicates that the doubleword at bits 64-95 of the source operand will be
|
|
copied to bits 0-31 of the destination.
|
|
<h4><a name="section-B.4.252">B.4.252 <code><nobr>PSHUFHW</nobr></code>: Shuffle Packed High Words</a></h4>
|
|
<p><pre>
|
|
PSHUFHW xmm1,xmm2/m128,imm8 ; F3 0F 70 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSHUFW</nobr></code> shuffles the words in the high quadword
|
|
of the source (second) operand according to the encoding specified by imm8,
|
|
and stores the result in the high quadword of the destination (first)
|
|
operand.
|
|
<p>The operation of this instruction is similar to the
|
|
<code><nobr>PSHUFW</nobr></code> instruction, except that the source and
|
|
destination are the top quadword of a 128-bit operand, instead of being
|
|
64-bit operands. The low quadword is copied from the source to the
|
|
destination without any changes.
|
|
<h4><a name="section-B.4.253">B.4.253 <code><nobr>PSHUFLW</nobr></code>: Shuffle Packed Low Words</a></h4>
|
|
<p><pre>
|
|
PSHUFLW xmm1,xmm2/m128,imm8 ; F2 0F 70 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSHUFLW</nobr></code> shuffles the words in the low quadword
|
|
of the source (second) operand according to the encoding specified by imm8,
|
|
and stores the result in the low quadword of the destination (first)
|
|
operand.
|
|
<p>The operation of this instruction is similar to the
|
|
<code><nobr>PSHUFW</nobr></code> instruction, except that the source and
|
|
destination are the low quadword of a 128-bit operand, instead of being
|
|
64-bit operands. The high quadword is copied from the source to the
|
|
destination without any changes.
|
|
<h4><a name="section-B.4.254">B.4.254 <code><nobr>PSHUFW</nobr></code>: Shuffle Packed Words</a></h4>
|
|
<p><pre>
|
|
PSHUFW mm1,mm2/m64,imm8 ; 0F 70 /r ib [KATMAI,MMX]
|
|
</pre>
|
|
<p><code><nobr>PSHUFW</nobr></code> shuffles the words in the source
|
|
(second) operand according to the encoding specified by imm8, and stores
|
|
the result in the destination (first) operand.
|
|
<p>Bits 0 and 1 of imm8 encode the source position of the word to be copied
|
|
to position 0 in the destination operand. Bits 2 and 3 encode for position
|
|
1, bits 4 and 5 encode for position 2, and bits 6 and 7 encode for position
|
|
3. For example, an encoding of 10 in bits 0 and 1 of imm8 indicates that
|
|
the word at bits 32-47 of the source operand will be copied to bits 0-15 of
|
|
the destination.
|
|
<h4><a name="section-B.4.255">B.4.255 <code><nobr>PSLLx</nobr></code>: Packed Data Bit Shift Left Logical</a></h4>
|
|
<p><pre>
|
|
PSLLW mm1,mm2/m64 ; 0F F1 /r [PENT,MMX]
|
|
PSLLW mm,imm8 ; 0F 71 /6 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSLLW xmm1,xmm2/m128 ; 66 0F F1 /r [WILLAMETTE,SSE2]
|
|
PSLLW xmm,imm8 ; 66 0F 71 /6 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSLLD mm1,mm2/m64 ; 0F F2 /r [PENT,MMX]
|
|
PSLLD mm,imm8 ; 0F 72 /6 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSLLD xmm1,xmm2/m128 ; 66 0F F2 /r [WILLAMETTE,SSE2]
|
|
PSLLD xmm,imm8 ; 66 0F 72 /6 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSLLQ mm1,mm2/m64 ; 0F F3 /r [PENT,MMX]
|
|
PSLLQ mm,imm8 ; 0F 73 /6 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSLLQ xmm1,xmm2/m128 ; 66 0F F3 /r [WILLAMETTE,SSE2]
|
|
PSLLQ xmm,imm8 ; 66 0F 73 /6 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSLLDQ xmm1,imm8 ; 66 0F 73 /7 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSLLx</nobr></code> performs logical left shifts of the data
|
|
elements in the destination (first) operand, moving each bit in the
|
|
separate elements left by the number of bits specified in the source
|
|
(second) operand, clearing the low-order bits as they are vacated.
|
|
<code><nobr>PSLLDQ</nobr></code> shifts bytes, not bits.
|
|
<ul>
|
|
<li><code><nobr>PSLLW</nobr></code> shifts word sized elements.
|
|
<li><code><nobr>PSLLD</nobr></code> shifts doubleword sized elements.
|
|
<li><code><nobr>PSLLQ</nobr></code> shifts quadword sized elements.
|
|
<li><code><nobr>PSLLDQ</nobr></code> shifts double quadword sized elements.
|
|
</ul>
|
|
<h4><a name="section-B.4.256">B.4.256 <code><nobr>PSRAx</nobr></code>: Packed Data Bit Shift Right Arithmetic</a></h4>
|
|
<p><pre>
|
|
PSRAW mm1,mm2/m64 ; 0F E1 /r [PENT,MMX]
|
|
PSRAW mm,imm8 ; 0F 71 /4 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSRAW xmm1,xmm2/m128 ; 66 0F E1 /r [WILLAMETTE,SSE2]
|
|
PSRAW xmm,imm8 ; 66 0F 71 /4 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSRAD mm1,mm2/m64 ; 0F E2 /r [PENT,MMX]
|
|
PSRAD mm,imm8 ; 0F 72 /4 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSRAD xmm1,xmm2/m128 ; 66 0F E2 /r [WILLAMETTE,SSE2]
|
|
PSRAD xmm,imm8 ; 66 0F 72 /4 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSRAx</nobr></code> performs arithmetic right shifts of the
|
|
data elements in the destination (first) operand, moving each bit in the
|
|
separate elements right by the number of bits specified in the source
|
|
(second) operand, setting the high-order bits to the value of the original
|
|
sign bit.
|
|
<ul>
|
|
<li><code><nobr>PSRAW</nobr></code> shifts word sized elements.
|
|
<li><code><nobr>PSRAD</nobr></code> shifts doubleword sized elements.
|
|
</ul>
|
|
<h4><a name="section-B.4.257">B.4.257 <code><nobr>PSRLx</nobr></code>: Packed Data Bit Shift Right Logical</a></h4>
|
|
<p><pre>
|
|
PSRLW mm1,mm2/m64 ; 0F D1 /r [PENT,MMX]
|
|
PSRLW mm,imm8 ; 0F 71 /2 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSRLW xmm1,xmm2/m128 ; 66 0F D1 /r [WILLAMETTE,SSE2]
|
|
PSRLW xmm,imm8 ; 66 0F 71 /2 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSRLD mm1,mm2/m64 ; 0F D2 /r [PENT,MMX]
|
|
PSRLD mm,imm8 ; 0F 72 /2 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSRLD xmm1,xmm2/m128 ; 66 0F D2 /r [WILLAMETTE,SSE2]
|
|
PSRLD xmm,imm8 ; 66 0F 72 /2 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSRLQ mm1,mm2/m64 ; 0F D3 /r [PENT,MMX]
|
|
PSRLQ mm,imm8 ; 0F 73 /2 ib [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSRLQ xmm1,xmm2/m128 ; 66 0F D3 /r [WILLAMETTE,SSE2]
|
|
PSRLQ xmm,imm8 ; 66 0F 73 /2 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSRLDQ xmm1,imm8 ; 66 0F 73 /3 ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSRLx</nobr></code> performs logical right shifts of the
|
|
data elements in the destination (first) operand, moving each bit in the
|
|
separate elements right by the number of bits specified in the source
|
|
(second) operand, clearing the high-order bits as they are vacated.
|
|
<code><nobr>PSRLDQ</nobr></code> shifts bytes, not bits.
|
|
<ul>
|
|
<li><code><nobr>PSRLW</nobr></code> shifts word sized elements.
|
|
<li><code><nobr>PSRLD</nobr></code> shifts doubleword sized elements.
|
|
<li><code><nobr>PSRLQ</nobr></code> shifts quadword sized elements.
|
|
<li><code><nobr>PSRLDQ</nobr></code> shifts double quadword sized elements.
|
|
</ul>
|
|
<h4><a name="section-B.4.258">B.4.258 <code><nobr>PSUBx</nobr></code>: Subtract Packed Integers</a></h4>
|
|
<p><pre>
|
|
PSUBB mm1,mm2/m64 ; 0F F8 /r [PENT,MMX]
|
|
PSUBW mm1,mm2/m64 ; 0F F9 /r [PENT,MMX]
|
|
PSUBD mm1,mm2/m64 ; 0F FA /r [PENT,MMX]
|
|
PSUBQ mm1,mm2/m64 ; 0F FB /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSUBB xmm1,xmm2/m128 ; 66 0F F8 /r [WILLAMETTE,SSE2]
|
|
PSUBW xmm1,xmm2/m128 ; 66 0F F9 /r [WILLAMETTE,SSE2]
|
|
PSUBD xmm1,xmm2/m128 ; 66 0F FA /r [WILLAMETTE,SSE2]
|
|
PSUBQ xmm1,xmm2/m128 ; 66 0F FB /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSUBx</nobr></code> subtracts packed integers in the source
|
|
operand from those in the destination operand. It doesn't differentiate
|
|
between signed and unsigned integers, and doesn't set any of the flags.
|
|
<ul>
|
|
<li><code><nobr>PSUBB</nobr></code> operates on byte sized elements.
|
|
<li><code><nobr>PSUBW</nobr></code> operates on word sized elements.
|
|
<li><code><nobr>PSUBD</nobr></code> operates on doubleword sized elements.
|
|
<li><code><nobr>PSUBQ</nobr></code> operates on quadword sized elements.
|
|
</ul>
|
|
<h4><a name="section-B.4.259">B.4.259 <code><nobr>PSUBSxx</nobr></code>, <code><nobr>PSUBUSx</nobr></code>: Subtract Packed Integers With Saturation</a></h4>
|
|
<p><pre>
|
|
PSUBSB mm1,mm2/m64 ; 0F E8 /r [PENT,MMX]
|
|
PSUBSW mm1,mm2/m64 ; 0F E9 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSUBSB xmm1,xmm2/m128 ; 66 0F E8 /r [WILLAMETTE,SSE2]
|
|
PSUBSW xmm1,xmm2/m128 ; 66 0F E9 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PSUBUSB mm1,mm2/m64 ; 0F D8 /r [PENT,MMX]
|
|
PSUBUSW mm1,mm2/m64 ; 0F D9 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PSUBUSB xmm1,xmm2/m128 ; 66 0F D8 /r [WILLAMETTE,SSE2]
|
|
PSUBUSW xmm1,xmm2/m128 ; 66 0F D9 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PSUBSx</nobr></code> and <code><nobr>PSUBUSx</nobr></code>
|
|
subtracts packed integers in the source operand from those in the
|
|
destination operand, and use saturation for results that are outside the
|
|
range supported by the destination operand.
|
|
<ul>
|
|
<li><code><nobr>PSUBSB</nobr></code> operates on signed bytes, and uses
|
|
signed saturation on the results.
|
|
<li><code><nobr>PSUBSW</nobr></code> operates on signed words, and uses
|
|
signed saturation on the results.
|
|
<li><code><nobr>PSUBUSB</nobr></code> operates on unsigned bytes, and uses
|
|
signed saturation on the results.
|
|
<li><code><nobr>PSUBUSW</nobr></code> operates on unsigned words, and uses
|
|
signed saturation on the results.
|
|
</ul>
|
|
<h4><a name="section-B.4.260">B.4.260 <code><nobr>PSUBSIW</nobr></code>: MMX Packed Subtract with Saturation to Implied Destination</a></h4>
|
|
<p><pre>
|
|
PSUBSIW mm1,mm2/m64 ; 0F 55 /r [CYRIX,MMX]
|
|
</pre>
|
|
<p><code><nobr>PSUBSIW</nobr></code>, specific to the Cyrix extensions to
|
|
the MMX instruction set, performs the same function as
|
|
<code><nobr>PSUBSW</nobr></code>, except that the result is not placed in
|
|
the register specified by the first operand, but instead in the implied
|
|
destination register, specified as for <code><nobr>PADDSIW</nobr></code>
|
|
(<a href="#section-B.4.200">section B.4.200</a>).
|
|
<h4><a name="section-B.4.261">B.4.261 <code><nobr>PSWAPD</nobr></code>: Swap Packed Data </a></h4>
|
|
<p><pre>
|
|
PSWAPD mm1,mm2/m64 ; 0F 0F /r BB [PENT,3DNOW]
|
|
</pre>
|
|
<p><code><nobr>PSWAPD</nobr></code> swaps the packed doublewords in the
|
|
source operand, and stores the result in the destination operand.
|
|
<p>In the <code><nobr>K6-2</nobr></code> and
|
|
<code><nobr>K6-III</nobr></code> processors, this opcode uses the mnemonic
|
|
<code><nobr>PSWAPW</nobr></code>, and it swaps the order of words when
|
|
copying from the source to the destination.
|
|
<p>The operation in the <code><nobr>K6-2</nobr></code> and
|
|
<code><nobr>K6-III</nobr></code> processors is
|
|
<p><pre>
|
|
dst[0-15] = src[48-63];
|
|
dst[16-31] = src[32-47];
|
|
dst[32-47] = src[16-31];
|
|
dst[48-63] = src[0-15].
|
|
</pre>
|
|
<p>The operation in the <code><nobr>K6-x+</nobr></code>,
|
|
<code><nobr>ATHLON</nobr></code> and later processors is:
|
|
<p><pre>
|
|
dst[0-31] = src[32-63];
|
|
dst[32-63] = src[0-31].
|
|
</pre>
|
|
<h4><a name="section-B.4.262">B.4.262 <code><nobr>PUNPCKxxx</nobr></code>: Unpack and Interleave Data</a></h4>
|
|
<p><pre>
|
|
PUNPCKHBW mm1,mm2/m64 ; 0F 68 /r [PENT,MMX]
|
|
PUNPCKHWD mm1,mm2/m64 ; 0F 69 /r [PENT,MMX]
|
|
PUNPCKHDQ mm1,mm2/m64 ; 0F 6A /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PUNPCKHBW xmm1,xmm2/m128 ; 66 0F 68 /r [WILLAMETTE,SSE2]
|
|
PUNPCKHWD xmm1,xmm2/m128 ; 66 0F 69 /r [WILLAMETTE,SSE2]
|
|
PUNPCKHDQ xmm1,xmm2/m128 ; 66 0F 6A /r [WILLAMETTE,SSE2]
|
|
PUNPCKHQDQ xmm1,xmm2/m128 ; 66 0F 6D /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><pre>
|
|
PUNPCKLBW mm1,mm2/m32 ; 0F 60 /r [PENT,MMX]
|
|
PUNPCKLWD mm1,mm2/m32 ; 0F 61 /r [PENT,MMX]
|
|
PUNPCKLDQ mm1,mm2/m32 ; 0F 62 /r [PENT,MMX]
|
|
</pre>
|
|
<p><pre>
|
|
PUNPCKLBW xmm1,xmm2/m128 ; 66 0F 60 /r [WILLAMETTE,SSE2]
|
|
PUNPCKLWD xmm1,xmm2/m128 ; 66 0F 61 /r [WILLAMETTE,SSE2]
|
|
PUNPCKLDQ xmm1,xmm2/m128 ; 66 0F 62 /r [WILLAMETTE,SSE2]
|
|
PUNPCKLQDQ xmm1,xmm2/m128 ; 66 0F 6C /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PUNPCKxx</nobr></code> all treat their operands as vectors,
|
|
and produce a new vector generated by interleaving elements from the two
|
|
inputs. The <code><nobr>PUNPCKHxx</nobr></code> instructions start by
|
|
throwing away the bottom half of each input operand, and the
|
|
<code><nobr>PUNPCKLxx</nobr></code> instructions throw away the top half.
|
|
<p>The remaining elements, are then interleaved into the destination,
|
|
alternating elements from the second (source) operand and the first
|
|
(destination) operand: so the leftmost part of each element in the result
|
|
always comes from the second operand, and the rightmost from the
|
|
destination.
|
|
<ul>
|
|
<li><code><nobr>PUNPCKxBW</nobr></code> works a byte at a time, producing
|
|
word sized output elements.
|
|
<li><code><nobr>PUNPCKxWD</nobr></code> works a word at a time, producing
|
|
doubleword sized output elements.
|
|
<li><code><nobr>PUNPCKxDQ</nobr></code> works a doubleword at a time,
|
|
producing quadword sized output elements.
|
|
<li><code><nobr>PUNPCKxQDQ</nobr></code> works a quadword at a time,
|
|
producing double quadword sized output elements.
|
|
</ul>
|
|
<p>So, for example, for <code><nobr>MMX</nobr></code> operands, if the
|
|
first operand held <code><nobr>0x7A6A5A4A3A2A1A0A</nobr></code> and the
|
|
second held <code><nobr>0x7B6B5B4B3B2B1B0B</nobr></code>, then:
|
|
<ul>
|
|
<li><code><nobr>PUNPCKHBW</nobr></code> would return
|
|
<code><nobr>0x7B7A6B6A5B5A4B4A</nobr></code>.
|
|
<li><code><nobr>PUNPCKHWD</nobr></code> would return
|
|
<code><nobr>0x7B6B7A6A5B4B5A4A</nobr></code>.
|
|
<li><code><nobr>PUNPCKHDQ</nobr></code> would return
|
|
<code><nobr>0x7B6B5B4B7A6A5A4A</nobr></code>.
|
|
<li><code><nobr>PUNPCKLBW</nobr></code> would return
|
|
<code><nobr>0x3B3A2B2A1B1A0B0A</nobr></code>.
|
|
<li><code><nobr>PUNPCKLWD</nobr></code> would return
|
|
<code><nobr>0x3B2B3A2A1B0B1A0A</nobr></code>.
|
|
<li><code><nobr>PUNPCKLDQ</nobr></code> would return
|
|
<code><nobr>0x3B2B1B0B3A2A1A0A</nobr></code>.
|
|
</ul>
|
|
<h4><a name="section-B.4.263">B.4.263 <code><nobr>PUSH</nobr></code>: Push Data on Stack</a></h4>
|
|
<p><pre>
|
|
PUSH reg16 ; o16 50+r [8086]
|
|
PUSH reg32 ; o32 50+r [386]
|
|
</pre>
|
|
<p><pre>
|
|
PUSH r/m16 ; o16 FF /6 [8086]
|
|
PUSH r/m32 ; o32 FF /6 [386]
|
|
</pre>
|
|
<p><pre>
|
|
PUSH CS ; 0E [8086]
|
|
PUSH DS ; 1E [8086]
|
|
PUSH ES ; 06 [8086]
|
|
PUSH SS ; 16 [8086]
|
|
PUSH FS ; 0F A0 [386]
|
|
PUSH GS ; 0F A8 [386]
|
|
</pre>
|
|
<p><pre>
|
|
PUSH imm8 ; 6A ib [186]
|
|
PUSH imm16 ; o16 68 iw [186]
|
|
PUSH imm32 ; o32 68 id [386]
|
|
</pre>
|
|
<p><code><nobr>PUSH</nobr></code> decrements the stack pointer
|
|
(<code><nobr>SP</nobr></code> or <code><nobr>ESP</nobr></code>) by 2 or 4,
|
|
and then stores the given value at <code><nobr>[SS:SP]</nobr></code> or
|
|
<code><nobr>[SS:ESP]</nobr></code>.
|
|
<p>The address-size attribute of the instruction determines whether
|
|
<code><nobr>SP</nobr></code> or <code><nobr>ESP</nobr></code> is used as
|
|
the stack pointer: to deliberately override the default given by the
|
|
<code><nobr>BITS</nobr></code> setting, you can use an
|
|
<code><nobr>a16</nobr></code> or <code><nobr>a32</nobr></code> prefix.
|
|
<p>The operand-size attribute of the instruction determines whether the
|
|
stack pointer is decremented by 2 or 4: this means that segment register
|
|
pushes in <code><nobr>BITS 32</nobr></code> mode will push 4 bytes on the
|
|
stack, of which the upper two are undefined. If you need to override that,
|
|
you can use an <code><nobr>o16</nobr></code> or
|
|
<code><nobr>o32</nobr></code> prefix.
|
|
<p>The above opcode listings give two forms for general-purpose register
|
|
push instructions: for example, <code><nobr>PUSH BX</nobr></code> has the
|
|
two forms <code><nobr>53</nobr></code> and <code><nobr>FF F3</nobr></code>.
|
|
NASM will always generate the shorter form when given
|
|
<code><nobr>PUSH BX</nobr></code>. NDISASM will disassemble both.
|
|
<p>Unlike the undocumented and barely supported
|
|
<code><nobr>POP CS</nobr></code>, <code><nobr>PUSH CS</nobr></code> is a
|
|
perfectly valid and sensible instruction, supported on all processors.
|
|
<p>The instruction <code><nobr>PUSH SP</nobr></code> may be used to
|
|
distinguish an 8086 from later processors: on an 8086, the value of
|
|
<code><nobr>SP</nobr></code> stored is the value it has <em>after</em> the
|
|
push instruction, whereas on later processors it is the value
|
|
<em>before</em> the push instruction.
|
|
<h4><a name="section-B.4.264">B.4.264 <code><nobr>PUSHAx</nobr></code>: Push All General-Purpose Registers</a></h4>
|
|
<p><pre>
|
|
PUSHA ; 60 [186]
|
|
PUSHAD ; o32 60 [386]
|
|
PUSHAW ; o16 60 [186]
|
|
</pre>
|
|
<p><code><nobr>PUSHAW</nobr></code> pushes, in succession,
|
|
<code><nobr>AX</nobr></code>, <code><nobr>CX</nobr></code>,
|
|
<code><nobr>DX</nobr></code>, <code><nobr>BX</nobr></code>,
|
|
<code><nobr>SP</nobr></code>, <code><nobr>BP</nobr></code>,
|
|
<code><nobr>SI</nobr></code> and <code><nobr>DI</nobr></code> on the stack,
|
|
decrementing the stack pointer by a total of 16.
|
|
<p><code><nobr>PUSHAD</nobr></code> pushes, in succession,
|
|
<code><nobr>EAX</nobr></code>, <code><nobr>ECX</nobr></code>,
|
|
<code><nobr>EDX</nobr></code>, <code><nobr>EBX</nobr></code>,
|
|
<code><nobr>ESP</nobr></code>, <code><nobr>EBP</nobr></code>,
|
|
<code><nobr>ESI</nobr></code> and <code><nobr>EDI</nobr></code> on the
|
|
stack, decrementing the stack pointer by a total of 32.
|
|
<p>In both cases, the value of <code><nobr>SP</nobr></code> or
|
|
<code><nobr>ESP</nobr></code> pushed is its <em>original</em> value, as it
|
|
had before the instruction was executed.
|
|
<p><code><nobr>PUSHA</nobr></code> is an alias mnemonic for either
|
|
<code><nobr>PUSHAW</nobr></code> or <code><nobr>PUSHAD</nobr></code>,
|
|
depending on the current <code><nobr>BITS</nobr></code> setting.
|
|
<p>Note that the registers are pushed in order of their numeric values in
|
|
opcodes (see <a href="#section-B.2.1">section B.2.1</a>).
|
|
<p>See also <code><nobr>POPA</nobr></code>
|
|
(<a href="#section-B.4.245">section B.4.245</a>).
|
|
<h4><a name="section-B.4.265">B.4.265 <code><nobr>PUSHFx</nobr></code>: Push Flags Register</a></h4>
|
|
<p><pre>
|
|
PUSHF ; 9C [8086]
|
|
PUSHFD ; o32 9C [386]
|
|
PUSHFW ; o16 9C [8086]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>PUSHFW</nobr></code> pops a word from the stack and stores
|
|
it in the bottom 16 bits of the flags register (or the whole flags
|
|
register, on processors below a 386).
|
|
<li><code><nobr>PUSHFD</nobr></code> pops a doubleword and stores it in the
|
|
entire flags register.
|
|
</ul>
|
|
<p><code><nobr>PUSHF</nobr></code> is an alias mnemonic for either
|
|
<code><nobr>PUSHFW</nobr></code> or <code><nobr>PUSHFD</nobr></code>,
|
|
depending on the current <code><nobr>BITS</nobr></code> setting.
|
|
<p>See also <code><nobr>POPF</nobr></code>
|
|
(<a href="#section-B.4.246">section B.4.246</a>).
|
|
<h4><a name="section-B.4.266">B.4.266 <code><nobr>PXOR</nobr></code>: MMX Bitwise XOR</a></h4>
|
|
<p><pre>
|
|
PXOR mm1,mm2/m64 ; 0F EF /r [PENT,MMX]
|
|
PXOR xmm1,xmm2/m128 ; 66 0F EF /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>PXOR</nobr></code> performs a bitwise XOR operation between
|
|
its two operands (i.e. each bit of the result is 1 if and only if exactly
|
|
one of the corresponding bits of the two inputs was 1), and stores the
|
|
result in the destination (first) operand.
|
|
<h4><a name="section-B.4.267">B.4.267 <code><nobr>RCL</nobr></code>, <code><nobr>RCR</nobr></code>: Bitwise Rotate through Carry Bit</a></h4>
|
|
<p><pre>
|
|
RCL r/m8,1 ; D0 /2 [8086]
|
|
RCL r/m8,CL ; D2 /2 [8086]
|
|
RCL r/m8,imm8 ; C0 /2 ib [186]
|
|
RCL r/m16,1 ; o16 D1 /2 [8086]
|
|
RCL r/m16,CL ; o16 D3 /2 [8086]
|
|
RCL r/m16,imm8 ; o16 C1 /2 ib [186]
|
|
RCL r/m32,1 ; o32 D1 /2 [386]
|
|
RCL r/m32,CL ; o32 D3 /2 [386]
|
|
RCL r/m32,imm8 ; o32 C1 /2 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
RCR r/m8,1 ; D0 /3 [8086]
|
|
RCR r/m8,CL ; D2 /3 [8086]
|
|
RCR r/m8,imm8 ; C0 /3 ib [186]
|
|
RCR r/m16,1 ; o16 D1 /3 [8086]
|
|
RCR r/m16,CL ; o16 D3 /3 [8086]
|
|
RCR r/m16,imm8 ; o16 C1 /3 ib [186]
|
|
RCR r/m32,1 ; o32 D1 /3 [386]
|
|
RCR r/m32,CL ; o32 D3 /3 [386]
|
|
RCR r/m32,imm8 ; o32 C1 /3 ib [386]
|
|
</pre>
|
|
<p><code><nobr>RCL</nobr></code> and <code><nobr>RCR</nobr></code> perform
|
|
a 9-bit, 17-bit or 33-bit bitwise rotation operation, involving the given
|
|
source/destination (first) operand and the carry bit. Thus, for example, in
|
|
the operation <code><nobr>RCL AL,1</nobr></code>, a 9-bit rotation is
|
|
performed in which <code><nobr>AL</nobr></code> is shifted left by 1, the
|
|
top bit of <code><nobr>AL</nobr></code> moves into the carry flag, and the
|
|
original value of the carry flag is placed in the low bit of
|
|
<code><nobr>AL</nobr></code>.
|
|
<p>The number of bits to rotate by is given by the second operand. Only the
|
|
bottom five bits of the rotation count are considered by processors above
|
|
the 8086.
|
|
<p>You can force the longer (286 and upwards, beginning with a
|
|
<code><nobr>C1</nobr></code> byte) form of
|
|
<code><nobr>RCL foo,1</nobr></code> by using a
|
|
<code><nobr>BYTE</nobr></code> prefix:
|
|
<code><nobr>RCL foo,BYTE 1</nobr></code>. Similarly with
|
|
<code><nobr>RCR</nobr></code>.
|
|
<h4><a name="section-B.4.268">B.4.268 <code><nobr>RCPPS</nobr></code>: Packed Single-Precision FP Reciprocal</a></h4>
|
|
<p><pre>
|
|
RCPPS xmm1,xmm2/m128 ; 0F 53 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>RCPPS</nobr></code> returns an approximation of the
|
|
reciprocal of the packed single-precision FP values from xmm2/m128. The
|
|
maximum error for this approximation is: |Error| <= 1.5 x 2^-12
|
|
<h4><a name="section-B.4.269">B.4.269 <code><nobr>RCPSS</nobr></code>: Scalar Single-Precision FP Reciprocal</a></h4>
|
|
<p><pre>
|
|
RCPSS xmm1,xmm2/m128 ; F3 0F 53 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>RCPSS</nobr></code> returns an approximation of the
|
|
reciprocal of the lower single-precision FP value from xmm2/m32; the upper
|
|
three fields are passed through from xmm1. The maximum error for this
|
|
approximation is: |Error| <= 1.5 x 2^-12
|
|
<h4><a name="section-B.4.270">B.4.270 <code><nobr>RDMSR</nobr></code>: Read Model-Specific Registers</a></h4>
|
|
<p><pre>
|
|
RDMSR ; 0F 32 [PENT,PRIV]
|
|
</pre>
|
|
<p><code><nobr>RDMSR</nobr></code> reads the processor Model-Specific
|
|
Register (MSR) whose index is stored in <code><nobr>ECX</nobr></code>, and
|
|
stores the result in <code><nobr>EDX:EAX</nobr></code>. See also
|
|
<code><nobr>WRMSR</nobr></code> (<a href="#section-B.4.329">section
|
|
B.4.329</a>).
|
|
<h4><a name="section-B.4.271">B.4.271 <code><nobr>RDPMC</nobr></code>: Read Performance-Monitoring Counters</a></h4>
|
|
<p><pre>
|
|
RDPMC ; 0F 33 [P6]
|
|
</pre>
|
|
<p><code><nobr>RDPMC</nobr></code> reads the processor
|
|
performance-monitoring counter whose index is stored in
|
|
<code><nobr>ECX</nobr></code>, and stores the result in
|
|
<code><nobr>EDX:EAX</nobr></code>.
|
|
<p>This instruction is available on P6 and later processors and on MMX
|
|
class processors.
|
|
<h4><a name="section-B.4.272">B.4.272 <code><nobr>RDSHR</nobr></code>: Read SMM Header Pointer Register</a></h4>
|
|
<p><pre>
|
|
RDSHR r/m32 ; 0F 36 /0 [386,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>RDSHR</nobr></code> reads the contents of the SMM header
|
|
pointer register and saves it to the destination operand, which can be
|
|
either a 32 bit memory location or a 32 bit register.
|
|
<p>See also <code><nobr>WRSHR</nobr></code>
|
|
(<a href="#section-B.4.330">section B.4.330</a>).
|
|
<h4><a name="section-B.4.273">B.4.273 <code><nobr>RDTSC</nobr></code>: Read Time-Stamp Counter</a></h4>
|
|
<p><pre>
|
|
RDTSC ; 0F 31 [PENT]
|
|
</pre>
|
|
<p><code><nobr>RDTSC</nobr></code> reads the processor's time-stamp counter
|
|
into <code><nobr>EDX:EAX</nobr></code>.
|
|
<h4><a name="section-B.4.274">B.4.274 <code><nobr>RET</nobr></code>, <code><nobr>RETF</nobr></code>, <code><nobr>RETN</nobr></code>: Return from Procedure Call</a></h4>
|
|
<p><pre>
|
|
RET ; C3 [8086]
|
|
RET imm16 ; C2 iw [8086]
|
|
</pre>
|
|
<p><pre>
|
|
RETF ; CB [8086]
|
|
RETF imm16 ; CA iw [8086]
|
|
</pre>
|
|
<p><pre>
|
|
RETN ; C3 [8086]
|
|
RETN imm16 ; C2 iw [8086]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>RET</nobr></code>, and its exact synonym
|
|
<code><nobr>RETN</nobr></code>, pop <code><nobr>IP</nobr></code> or
|
|
<code><nobr>EIP</nobr></code> from the stack and transfer control to the
|
|
new address. Optionally, if a numeric second operand is provided, they
|
|
increment the stack pointer by a further <code><nobr>imm16</nobr></code>
|
|
bytes after popping the return address.
|
|
<li><code><nobr>RETF</nobr></code> executes a far return: after popping
|
|
<code><nobr>IP</nobr></code>/<code><nobr>EIP</nobr></code>, it then pops
|
|
<code><nobr>CS</nobr></code>, and <em>then</em> increments the stack
|
|
pointer by the optional argument if present.
|
|
</ul>
|
|
<h4><a name="section-B.4.275">B.4.275 <code><nobr>ROL</nobr></code>, <code><nobr>ROR</nobr></code>: Bitwise Rotate</a></h4>
|
|
<p><pre>
|
|
ROL r/m8,1 ; D0 /0 [8086]
|
|
ROL r/m8,CL ; D2 /0 [8086]
|
|
ROL r/m8,imm8 ; C0 /0 ib [186]
|
|
ROL r/m16,1 ; o16 D1 /0 [8086]
|
|
ROL r/m16,CL ; o16 D3 /0 [8086]
|
|
ROL r/m16,imm8 ; o16 C1 /0 ib [186]
|
|
ROL r/m32,1 ; o32 D1 /0 [386]
|
|
ROL r/m32,CL ; o32 D3 /0 [386]
|
|
ROL r/m32,imm8 ; o32 C1 /0 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
ROR r/m8,1 ; D0 /1 [8086]
|
|
ROR r/m8,CL ; D2 /1 [8086]
|
|
ROR r/m8,imm8 ; C0 /1 ib [186]
|
|
ROR r/m16,1 ; o16 D1 /1 [8086]
|
|
ROR r/m16,CL ; o16 D3 /1 [8086]
|
|
ROR r/m16,imm8 ; o16 C1 /1 ib [186]
|
|
ROR r/m32,1 ; o32 D1 /1 [386]
|
|
ROR r/m32,CL ; o32 D3 /1 [386]
|
|
ROR r/m32,imm8 ; o32 C1 /1 ib [386]
|
|
</pre>
|
|
<p><code><nobr>ROL</nobr></code> and <code><nobr>ROR</nobr></code> perform
|
|
a bitwise rotation operation on the given source/destination (first)
|
|
operand. Thus, for example, in the operation
|
|
<code><nobr>ROL AL,1</nobr></code>, an 8-bit rotation is performed in which
|
|
<code><nobr>AL</nobr></code> is shifted left by 1 and the original top bit
|
|
of <code><nobr>AL</nobr></code> moves round into the low bit.
|
|
<p>The number of bits to rotate by is given by the second operand. Only the
|
|
bottom five bits of the rotation count are considered by processors above
|
|
the 8086.
|
|
<p>You can force the longer (286 and upwards, beginning with a
|
|
<code><nobr>C1</nobr></code> byte) form of
|
|
<code><nobr>ROL foo,1</nobr></code> by using a
|
|
<code><nobr>BYTE</nobr></code> prefix:
|
|
<code><nobr>ROL foo,BYTE 1</nobr></code>. Similarly with
|
|
<code><nobr>ROR</nobr></code>.
|
|
<h4><a name="section-B.4.276">B.4.276 <code><nobr>RSDC</nobr></code>: Restore Segment Register and Descriptor</a></h4>
|
|
<p><pre>
|
|
RSDC segreg,m80 ; 0F 79 /r [486,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>RSDC</nobr></code> restores a segment register (DS, ES, FS,
|
|
GS, or SS) from mem80, and sets up its descriptor.
|
|
<h4><a name="section-B.4.277">B.4.277 <code><nobr>RSLDT</nobr></code>: Restore Segment Register and Descriptor</a></h4>
|
|
<p><pre>
|
|
RSLDT m80 ; 0F 7B /0 [486,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>RSLDT</nobr></code> restores the Local Descriptor Table
|
|
(LDTR) from mem80.
|
|
<h4><a name="section-B.4.278">B.4.278 <code><nobr>RSM</nobr></code>: Resume from System-Management Mode</a></h4>
|
|
<p><pre>
|
|
RSM ; 0F AA [PENT]
|
|
</pre>
|
|
<p><code><nobr>RSM</nobr></code> returns the processor to its normal
|
|
operating mode when it was in System-Management Mode.
|
|
<h4><a name="section-B.4.279">B.4.279 <code><nobr>RSQRTPS</nobr></code>: Packed Single-Precision FP Square Root Reciprocal</a></h4>
|
|
<p><pre>
|
|
RSQRTPS xmm1,xmm2/m128 ; 0F 52 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>RSQRTPS</nobr></code> computes the approximate reciprocals
|
|
of the square roots of the packed single-precision floating-point values in
|
|
the source and stores the results in xmm1. The maximum error for this
|
|
approximation is: |Error| <= 1.5 x 2^-12
|
|
<h4><a name="section-B.4.280">B.4.280 <code><nobr>RSQRTSS</nobr></code>: Scalar Single-Precision FP Square Root Reciprocal</a></h4>
|
|
<p><pre>
|
|
RSQRTSS xmm1,xmm2/m128 ; F3 0F 52 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>RSQRTSS</nobr></code> returns an approximation of the
|
|
reciprocal of the square root of the lowest order single-precision FP value
|
|
from the source, and stores it in the low doubleword of the destination
|
|
register. The upper three fields of xmm1 are preserved. The maximum error
|
|
for this approximation is: |Error| <= 1.5 x 2^-12
|
|
<h4><a name="section-B.4.281">B.4.281 <code><nobr>RSTS</nobr></code>: Restore TSR and Descriptor</a></h4>
|
|
<p><pre>
|
|
RSTS m80 ; 0F 7D /0 [486,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>RSTS</nobr></code> restores Task State Register (TSR) from
|
|
mem80.
|
|
<h4><a name="section-B.4.282">B.4.282 <code><nobr>SAHF</nobr></code>: Store AH to Flags</a></h4>
|
|
<p><pre>
|
|
SAHF ; 9E [8086]
|
|
</pre>
|
|
<p><code><nobr>SAHF</nobr></code> sets the low byte of the flags word
|
|
according to the contents of the <code><nobr>AH</nobr></code> register.
|
|
<p>The operation of <code><nobr>SAHF</nobr></code> is:
|
|
<p><pre>
|
|
AH --> SF:ZF:0:AF:0:PF:1:CF
|
|
</pre>
|
|
<p>See also <code><nobr>LAHF</nobr></code>
|
|
(<a href="#section-B.4.131">section B.4.131</a>).
|
|
<h4><a name="section-B.4.283">B.4.283 <code><nobr>SAL</nobr></code>, <code><nobr>SAR</nobr></code>: Bitwise Arithmetic Shifts</a></h4>
|
|
<p><pre>
|
|
SAL r/m8,1 ; D0 /4 [8086]
|
|
SAL r/m8,CL ; D2 /4 [8086]
|
|
SAL r/m8,imm8 ; C0 /4 ib [186]
|
|
SAL r/m16,1 ; o16 D1 /4 [8086]
|
|
SAL r/m16,CL ; o16 D3 /4 [8086]
|
|
SAL r/m16,imm8 ; o16 C1 /4 ib [186]
|
|
SAL r/m32,1 ; o32 D1 /4 [386]
|
|
SAL r/m32,CL ; o32 D3 /4 [386]
|
|
SAL r/m32,imm8 ; o32 C1 /4 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
SAR r/m8,1 ; D0 /7 [8086]
|
|
SAR r/m8,CL ; D2 /7 [8086]
|
|
SAR r/m8,imm8 ; C0 /7 ib [186]
|
|
SAR r/m16,1 ; o16 D1 /7 [8086]
|
|
SAR r/m16,CL ; o16 D3 /7 [8086]
|
|
SAR r/m16,imm8 ; o16 C1 /7 ib [186]
|
|
SAR r/m32,1 ; o32 D1 /7 [386]
|
|
SAR r/m32,CL ; o32 D3 /7 [386]
|
|
SAR r/m32,imm8 ; o32 C1 /7 ib [386]
|
|
</pre>
|
|
<p><code><nobr>SAL</nobr></code> and <code><nobr>SAR</nobr></code> perform
|
|
an arithmetic shift operation on the given source/destination (first)
|
|
operand. The vacated bits are filled with zero for
|
|
<code><nobr>SAL</nobr></code>, and with copies of the original high bit of
|
|
the source operand for <code><nobr>SAR</nobr></code>.
|
|
<p><code><nobr>SAL</nobr></code> is a synonym for
|
|
<code><nobr>SHL</nobr></code> (see <a href="#section-B.4.290">section
|
|
B.4.290</a>). NASM will assemble either one to the same code, but NDISASM
|
|
will always disassemble that code as <code><nobr>SHL</nobr></code>.
|
|
<p>The number of bits to shift by is given by the second operand. Only the
|
|
bottom five bits of the shift count are considered by processors above the
|
|
8086.
|
|
<p>You can force the longer (286 and upwards, beginning with a
|
|
<code><nobr>C1</nobr></code> byte) form of
|
|
<code><nobr>SAL foo,1</nobr></code> by using a
|
|
<code><nobr>BYTE</nobr></code> prefix:
|
|
<code><nobr>SAL foo,BYTE 1</nobr></code>. Similarly with
|
|
<code><nobr>SAR</nobr></code>.
|
|
<h4><a name="section-B.4.284">B.4.284 <code><nobr>SALC</nobr></code>: Set AL from Carry Flag</a></h4>
|
|
<p><pre>
|
|
SALC ; D6 [8086,UNDOC]
|
|
</pre>
|
|
<p><code><nobr>SALC</nobr></code> is an early undocumented instruction
|
|
similar in concept to <code><nobr>SETcc</nobr></code>
|
|
(<a href="#section-B.4.287">section B.4.287</a>). Its function is to set
|
|
<code><nobr>AL</nobr></code> to zero if the carry flag is clear, or to
|
|
<code><nobr>0xFF</nobr></code> if it is set.
|
|
<h4><a name="section-B.4.285">B.4.285 <code><nobr>SBB</nobr></code>: Subtract with Borrow</a></h4>
|
|
<p><pre>
|
|
SBB r/m8,reg8 ; 18 /r [8086]
|
|
SBB r/m16,reg16 ; o16 19 /r [8086]
|
|
SBB r/m32,reg32 ; o32 19 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
SBB reg8,r/m8 ; 1A /r [8086]
|
|
SBB reg16,r/m16 ; o16 1B /r [8086]
|
|
SBB reg32,r/m32 ; o32 1B /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
SBB r/m8,imm8 ; 80 /3 ib [8086]
|
|
SBB r/m16,imm16 ; o16 81 /3 iw [8086]
|
|
SBB r/m32,imm32 ; o32 81 /3 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
SBB r/m16,imm8 ; o16 83 /3 ib [8086]
|
|
SBB r/m32,imm8 ; o32 83 /3 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
SBB AL,imm8 ; 1C ib [8086]
|
|
SBB AX,imm16 ; o16 1D iw [8086]
|
|
SBB EAX,imm32 ; o32 1D id [386]
|
|
</pre>
|
|
<p><code><nobr>SBB</nobr></code> performs integer subtraction: it subtracts
|
|
its second operand, plus the value of the carry flag, from its first, and
|
|
leaves the result in its destination (first) operand. The flags are set
|
|
according to the result of the operation: in particular, the carry flag is
|
|
affected and can be used by a subsequent <code><nobr>SBB</nobr></code>
|
|
instruction.
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<p>To subtract one number from another without also subtracting the
|
|
contents of the carry flag, use <code><nobr>SUB</nobr></code>
|
|
(<a href="#section-B.4.305">section B.4.305</a>).
|
|
<h4><a name="section-B.4.286">B.4.286 <code><nobr>SCASB</nobr></code>, <code><nobr>SCASW</nobr></code>, <code><nobr>SCASD</nobr></code>: Scan String</a></h4>
|
|
<p><pre>
|
|
SCASB ; AE [8086]
|
|
SCASW ; o16 AF [8086]
|
|
SCASD ; o32 AF [386]
|
|
</pre>
|
|
<p><code><nobr>SCASB</nobr></code> compares the byte in
|
|
<code><nobr>AL</nobr></code> with the byte at
|
|
<code><nobr>[ES:DI]</nobr></code> or <code><nobr>[ES:EDI]</nobr></code>,
|
|
and sets the flags accordingly. It then increments or decrements (depending
|
|
on the direction flag: increments if the flag is clear, decrements if it is
|
|
set) <code><nobr>DI</nobr></code> (or <code><nobr>EDI</nobr></code>).
|
|
<p>The register used is <code><nobr>DI</nobr></code> if the address size is
|
|
16 bits, and <code><nobr>EDI</nobr></code> if it is 32 bits. If you need to
|
|
use an address size not equal to the current <code><nobr>BITS</nobr></code>
|
|
setting, you can use an explicit <code><nobr>a16</nobr></code> or
|
|
<code><nobr>a32</nobr></code> prefix.
|
|
<p>Segment override prefixes have no effect for this instruction: the use
|
|
of <code><nobr>ES</nobr></code> for the load from
|
|
<code><nobr>[DI]</nobr></code> or <code><nobr>[EDI]</nobr></code> cannot be
|
|
overridden.
|
|
<p><code><nobr>SCASW</nobr></code> and <code><nobr>SCASD</nobr></code> work
|
|
in the same way, but they compare a word to <code><nobr>AX</nobr></code> or
|
|
a doubleword to <code><nobr>EAX</nobr></code> instead of a byte to
|
|
<code><nobr>AL</nobr></code>, and increment or decrement the addressing
|
|
registers by 2 or 4 instead of 1.
|
|
<p>The <code><nobr>REPE</nobr></code> and <code><nobr>REPNE</nobr></code>
|
|
prefixes (equivalently, <code><nobr>REPZ</nobr></code> and
|
|
<code><nobr>REPNZ</nobr></code>) may be used to repeat the instruction up
|
|
to <code><nobr>CX</nobr></code> (or <code><nobr>ECX</nobr></code> - again,
|
|
the address size chooses which) times until the first unequal or equal byte
|
|
is found.
|
|
<h4><a name="section-B.4.287">B.4.287 <code><nobr>SETcc</nobr></code>: Set Register from Condition</a></h4>
|
|
<p><pre>
|
|
SETcc r/m8 ; 0F 90+cc /2 [386]
|
|
</pre>
|
|
<p><code><nobr>SETcc</nobr></code> sets the given 8-bit operand to zero if
|
|
its condition is not satisfied, and to 1 if it is.
|
|
<h4><a name="section-B.4.288">B.4.288 <code><nobr>SFENCE</nobr></code>: Store Fence</a></h4>
|
|
<p><pre>
|
|
SFENCE ; 0F AE /7 [KATMAI]
|
|
</pre>
|
|
<p><code><nobr>SFENCE</nobr></code> performs a serialising operation on all
|
|
writes to memory that were issued before the
|
|
<code><nobr>SFENCE</nobr></code> instruction. This guarantees that all
|
|
memory writes before the <code><nobr>SFENCE</nobr></code> instruction are
|
|
visible before any writes after the <code><nobr>SFENCE</nobr></code>
|
|
instruction.
|
|
<p><code><nobr>SFENCE</nobr></code> is ordered respective to other
|
|
<code><nobr>SFENCE</nobr></code> instruction,
|
|
<code><nobr>MFENCE</nobr></code>, any memory write and any other
|
|
serialising instruction (such as <code><nobr>CPUID</nobr></code>).
|
|
<p>Weakly ordered memory types can be used to achieve higher processor
|
|
performance through such techniques as out-of-order issue, write-combining,
|
|
and write-collapsing. The degree to which a consumer of data recognizes or
|
|
knows that the data is weakly ordered varies among applications and may be
|
|
unknown to the producer of this data. The <code><nobr>SFENCE</nobr></code>
|
|
instruction provides a performance-efficient way of insuring store ordering
|
|
between routines that produce weakly-ordered results and routines that
|
|
consume this data.
|
|
<p><code><nobr>SFENCE</nobr></code> uses the following ModRM encoding:
|
|
<p><pre>
|
|
Mod (7:6) = 11B
|
|
Reg/Opcode (5:3) = 111B
|
|
R/M (2:0) = 000B
|
|
</pre>
|
|
<p>All other ModRM encodings are defined to be reserved, and use of these
|
|
encodings risks incompatibility with future processors.
|
|
<p>See also <code><nobr>LFENCE</nobr></code>
|
|
(<a href="#section-B.4.137">section B.4.137</a>) and
|
|
<code><nobr>MFENCE</nobr></code> (<a href="#section-B.4.151">section
|
|
B.4.151</a>).
|
|
<h4><a name="section-B.4.289">B.4.289 <code><nobr>SGDT</nobr></code>, <code><nobr>SIDT</nobr></code>, <code><nobr>SLDT</nobr></code>: Store Descriptor Table Pointers</a></h4>
|
|
<p><pre>
|
|
SGDT mem ; 0F 01 /0 [286,PRIV]
|
|
SIDT mem ; 0F 01 /1 [286,PRIV]
|
|
SLDT r/m16 ; 0F 00 /0 [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>SGDT</nobr></code> and <code><nobr>SIDT</nobr></code> both
|
|
take a 6-byte memory area as an operand: they store the contents of the
|
|
GDTR (global descriptor table register) or IDTR (interrupt descriptor table
|
|
register) into that area as a 32-bit linear address and a 16-bit size limit
|
|
from that area (in that order). These are the only instructions which
|
|
directly use <em>linear</em> addresses, rather than segment/offset pairs.
|
|
<p><code><nobr>SLDT</nobr></code> stores the segment selector corresponding
|
|
to the LDT (local descriptor table) into the given operand.
|
|
<p>See also <code><nobr>LGDT</nobr></code>, <code><nobr>LIDT</nobr></code>
|
|
and <code><nobr>LLDT</nobr></code> (<a href="#section-B.4.138">section
|
|
B.4.138</a>).
|
|
<h4><a name="section-B.4.290">B.4.290 <code><nobr>SHL</nobr></code>, <code><nobr>SHR</nobr></code>: Bitwise Logical Shifts</a></h4>
|
|
<p><pre>
|
|
SHL r/m8,1 ; D0 /4 [8086]
|
|
SHL r/m8,CL ; D2 /4 [8086]
|
|
SHL r/m8,imm8 ; C0 /4 ib [186]
|
|
SHL r/m16,1 ; o16 D1 /4 [8086]
|
|
SHL r/m16,CL ; o16 D3 /4 [8086]
|
|
SHL r/m16,imm8 ; o16 C1 /4 ib [186]
|
|
SHL r/m32,1 ; o32 D1 /4 [386]
|
|
SHL r/m32,CL ; o32 D3 /4 [386]
|
|
SHL r/m32,imm8 ; o32 C1 /4 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
SHR r/m8,1 ; D0 /5 [8086]
|
|
SHR r/m8,CL ; D2 /5 [8086]
|
|
SHR r/m8,imm8 ; C0 /5 ib [186]
|
|
SHR r/m16,1 ; o16 D1 /5 [8086]
|
|
SHR r/m16,CL ; o16 D3 /5 [8086]
|
|
SHR r/m16,imm8 ; o16 C1 /5 ib [186]
|
|
SHR r/m32,1 ; o32 D1 /5 [386]
|
|
SHR r/m32,CL ; o32 D3 /5 [386]
|
|
SHR r/m32,imm8 ; o32 C1 /5 ib [386]
|
|
</pre>
|
|
<p><code><nobr>SHL</nobr></code> and <code><nobr>SHR</nobr></code> perform
|
|
a logical shift operation on the given source/destination (first) operand.
|
|
The vacated bits are filled with zero.
|
|
<p>A synonym for <code><nobr>SHL</nobr></code> is
|
|
<code><nobr>SAL</nobr></code> (see <a href="#section-B.4.283">section
|
|
B.4.283</a>). NASM will assemble either one to the same code, but NDISASM
|
|
will always disassemble that code as <code><nobr>SHL</nobr></code>.
|
|
<p>The number of bits to shift by is given by the second operand. Only the
|
|
bottom five bits of the shift count are considered by processors above the
|
|
8086.
|
|
<p>You can force the longer (286 and upwards, beginning with a
|
|
<code><nobr>C1</nobr></code> byte) form of
|
|
<code><nobr>SHL foo,1</nobr></code> by using a
|
|
<code><nobr>BYTE</nobr></code> prefix:
|
|
<code><nobr>SHL foo,BYTE 1</nobr></code>. Similarly with
|
|
<code><nobr>SHR</nobr></code>.
|
|
<h4><a name="section-B.4.291">B.4.291 <code><nobr>SHLD</nobr></code>, <code><nobr>SHRD</nobr></code>: Bitwise Double-Precision Shifts</a></h4>
|
|
<p><pre>
|
|
SHLD r/m16,reg16,imm8 ; o16 0F A4 /r ib [386]
|
|
SHLD r/m16,reg32,imm8 ; o32 0F A4 /r ib [386]
|
|
SHLD r/m16,reg16,CL ; o16 0F A5 /r [386]
|
|
SHLD r/m16,reg32,CL ; o32 0F A5 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
SHRD r/m16,reg16,imm8 ; o16 0F AC /r ib [386]
|
|
SHRD r/m32,reg32,imm8 ; o32 0F AC /r ib [386]
|
|
SHRD r/m16,reg16,CL ; o16 0F AD /r [386]
|
|
SHRD r/m32,reg32,CL ; o32 0F AD /r [386]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>SHLD</nobr></code> performs a double-precision left shift.
|
|
It notionally places its second operand to the right of its first, then
|
|
shifts the entire bit string thus generated to the left by a number of bits
|
|
specified in the third operand. It then updates only the <em>first</em>
|
|
operand according to the result of this. The second operand is not
|
|
modified.
|
|
<li><code><nobr>SHRD</nobr></code> performs the corresponding right shift:
|
|
it notionally places the second operand to the <em>left</em> of the first,
|
|
shifts the whole bit string right, and updates only the first operand.
|
|
</ul>
|
|
<p>For example, if <code><nobr>EAX</nobr></code> holds
|
|
<code><nobr>0x01234567</nobr></code> and <code><nobr>EBX</nobr></code>
|
|
holds <code><nobr>0x89ABCDEF</nobr></code>, then the instruction
|
|
<code><nobr>SHLD EAX,EBX,4</nobr></code> would update
|
|
<code><nobr>EAX</nobr></code> to hold <code><nobr>0x12345678</nobr></code>.
|
|
Under the same conditions, <code><nobr>SHRD EAX,EBX,4</nobr></code> would
|
|
update <code><nobr>EAX</nobr></code> to hold
|
|
<code><nobr>0xF0123456</nobr></code>.
|
|
<p>The number of bits to shift by is given by the third operand. Only the
|
|
bottom five bits of the shift count are considered.
|
|
<h4><a name="section-B.4.292">B.4.292 <code><nobr>SHUFPD</nobr></code>: Shuffle Packed Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
SHUFPD xmm1,xmm2/m128,imm8 ; 66 0F C6 /r ib [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>SHUFPD</nobr></code> moves one of the packed
|
|
double-precision FP values from the destination operand into the low
|
|
quadword of the destination operand; the upper quadword is generated by
|
|
moving one of the double-precision FP values from the source operand into
|
|
the destination. The select (third) operand selects which of the values are
|
|
moved to the destination register.
|
|
<p>The select operand is an 8-bit immediate: bit 0 selects which value is
|
|
moved from the destination operand to the result (where 0 selects the low
|
|
quadword and 1 selects the high quadword) and bit 1 selects which value is
|
|
moved from the source operand to the result. Bits 2 through 7 of the
|
|
shuffle operand are reserved.
|
|
<h4><a name="section-B.4.293">B.4.293 <code><nobr>SHUFPS</nobr></code>: Shuffle Packed Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
SHUFPS xmm1,xmm2/m128,imm8 ; 0F C6 /r ib [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>SHUFPS</nobr></code> moves two of the packed
|
|
single-precision FP values from the destination operand into the low
|
|
quadword of the destination operand; the upper quadword is generated by
|
|
moving two of the single-precision FP values from the source operand into
|
|
the destination. The select (third) operand selects which of the values are
|
|
moved to the destination register.
|
|
<p>The select operand is an 8-bit immediate: bits 0 and 1 select the value
|
|
to be moved from the destination operand the low doubleword of the result,
|
|
bits 2 and 3 select the value to be moved from the destination operand the
|
|
second doubleword of the result, bits 4 and 5 select the value to be moved
|
|
from the source operand the third doubleword of the result, and bits 6 and
|
|
7 select the value to be moved from the source operand to the high
|
|
doubleword of the result.
|
|
<h4><a name="section-B.4.294">B.4.294 <code><nobr>SMI</nobr></code>: System Management Interrupt</a></h4>
|
|
<p><pre>
|
|
SMI ; F1 [386,UNDOC]
|
|
</pre>
|
|
<p><code><nobr>SMI</nobr></code> puts some AMD processors into SMM mode. It
|
|
is available on some 386 and 486 processors, and is only available when DR7
|
|
bit 12 is set, otherwise it generates an Int 1.
|
|
<h4><a name="section-B.4.295">B.4.295 <code><nobr>SMINT</nobr></code>, <code><nobr>SMINTOLD</nobr></code>: Software SMM Entry (CYRIX)</a></h4>
|
|
<p><pre>
|
|
SMINT ; 0F 38 [PENT,CYRIX]
|
|
SMINTOLD ; 0F 7E [486,CYRIX]
|
|
</pre>
|
|
<p><code><nobr>SMINT</nobr></code> puts the processor into SMM mode. The
|
|
CPU state information is saved in the SMM memory header, and then execution
|
|
begins at the SMM base address.
|
|
<p><code><nobr>SMINTOLD</nobr></code> is the same as
|
|
<code><nobr>SMINT</nobr></code>, but was the opcode used on the 486.
|
|
<p>This pair of opcodes are specific to the Cyrix and compatible range of
|
|
processors (Cyrix, IBM, Via).
|
|
<h4><a name="section-B.4.296">B.4.296 <code><nobr>SMSW</nobr></code>: Store Machine Status Word</a></h4>
|
|
<p><pre>
|
|
SMSW r/m16 ; 0F 01 /4 [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>SMSW</nobr></code> stores the bottom half of the
|
|
<code><nobr>CR0</nobr></code> control register (or the Machine Status Word,
|
|
on 286 processors) into the destination operand. See also
|
|
<code><nobr>LMSW</nobr></code> (<a href="#section-B.4.139">section
|
|
B.4.139</a>).
|
|
<p>For 32-bit code, this would use the low 16-bits of the specified
|
|
register (or a 16bit memory location), without needing an operand size
|
|
override byte.
|
|
<h4><a name="section-B.4.297">B.4.297 <code><nobr>SQRTPD</nobr></code>: Packed Double-Precision FP Square Root</a></h4>
|
|
<p><pre>
|
|
SQRTPD xmm1,xmm2/m128 ; 66 0F 51 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>SQRTPD</nobr></code> calculates the square root of the
|
|
packed double-precision FP value from the source operand, and stores the
|
|
double-precision results in the destination register.
|
|
<h4><a name="section-B.4.298">B.4.298 <code><nobr>SQRTPS</nobr></code>: Packed Single-Precision FP Square Root</a></h4>
|
|
<p><pre>
|
|
SQRTPS xmm1,xmm2/m128 ; 0F 51 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>SQRTPS</nobr></code> calculates the square root of the
|
|
packed single-precision FP value from the source operand, and stores the
|
|
single-precision results in the destination register.
|
|
<h4><a name="section-B.4.299">B.4.299 <code><nobr>SQRTSD</nobr></code>: Scalar Double-Precision FP Square Root</a></h4>
|
|
<p><pre>
|
|
SQRTSD xmm1,xmm2/m128 ; F2 0F 51 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>SQRTSD</nobr></code> calculates the square root of the
|
|
low-order double-precision FP value from the source operand, and stores the
|
|
double-precision result in the destination register. The high-quadword
|
|
remains unchanged.
|
|
<h4><a name="section-B.4.300">B.4.300 <code><nobr>SQRTSS</nobr></code>: Scalar Single-Precision FP Square Root</a></h4>
|
|
<p><pre>
|
|
SQRTSS xmm1,xmm2/m128 ; F3 0F 51 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>SQRTSS</nobr></code> calculates the square root of the
|
|
low-order single-precision FP value from the source operand, and stores the
|
|
single-precision result in the destination register. The three high
|
|
doublewords remain unchanged.
|
|
<h4><a name="section-B.4.301">B.4.301 <code><nobr>STC</nobr></code>, <code><nobr>STD</nobr></code>, <code><nobr>STI</nobr></code>: Set Flags</a></h4>
|
|
<p><pre>
|
|
STC ; F9 [8086]
|
|
STD ; FD [8086]
|
|
STI ; FB [8086]
|
|
</pre>
|
|
<p>These instructions set various flags. <code><nobr>STC</nobr></code> sets
|
|
the carry flag; <code><nobr>STD</nobr></code> sets the direction flag; and
|
|
<code><nobr>STI</nobr></code> sets the interrupt flag (thus enabling
|
|
interrupts).
|
|
<p>To clear the carry, direction, or interrupt flags, use the
|
|
<code><nobr>CLC</nobr></code>, <code><nobr>CLD</nobr></code> and
|
|
<code><nobr>CLI</nobr></code> instructions
|
|
(<a href="#section-B.4.20">section B.4.20</a>). To invert the carry flag,
|
|
use <code><nobr>CMC</nobr></code> (<a href="#section-B.4.22">section
|
|
B.4.22</a>).
|
|
<h4><a name="section-B.4.302">B.4.302 <code><nobr>STMXCSR</nobr></code>: Store Streaming SIMD Extension Control/Status</a></h4>
|
|
<p><pre>
|
|
STMXCSR m32 ; 0F AE /3 [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>STMXCSR</nobr></code> stores the contents of the
|
|
<code><nobr>MXCSR</nobr></code> control/status register to the specified
|
|
memory location. <code><nobr>MXCSR</nobr></code> is used to enable
|
|
masked/unmasked exception handling, to set rounding modes, to set
|
|
flush-to-zero mode, and to view exception status flags. The reserved bits
|
|
in the <code><nobr>MXCSR</nobr></code> register are stored as 0s.
|
|
<p>For details of the <code><nobr>MXCSR</nobr></code> register, see the
|
|
Intel processor docs.
|
|
<p>See also <code><nobr>LDMXCSR</nobr></code>
|
|
(<a href="#section-B.4.133">section B.4.133</a>).
|
|
<h4><a name="section-B.4.303">B.4.303 <code><nobr>STOSB</nobr></code>, <code><nobr>STOSW</nobr></code>, <code><nobr>STOSD</nobr></code>: Store Byte to String</a></h4>
|
|
<p><pre>
|
|
STOSB ; AA [8086]
|
|
STOSW ; o16 AB [8086]
|
|
STOSD ; o32 AB [386]
|
|
</pre>
|
|
<p><code><nobr>STOSB</nobr></code> stores the byte in
|
|
<code><nobr>AL</nobr></code> at <code><nobr>[ES:DI]</nobr></code> or
|
|
<code><nobr>[ES:EDI]</nobr></code>, and sets the flags accordingly. It then
|
|
increments or decrements (depending on the direction flag: increments if
|
|
the flag is clear, decrements if it is set) <code><nobr>DI</nobr></code>
|
|
(or <code><nobr>EDI</nobr></code>).
|
|
<p>The register used is <code><nobr>DI</nobr></code> if the address size is
|
|
16 bits, and <code><nobr>EDI</nobr></code> if it is 32 bits. If you need to
|
|
use an address size not equal to the current <code><nobr>BITS</nobr></code>
|
|
setting, you can use an explicit <code><nobr>a16</nobr></code> or
|
|
<code><nobr>a32</nobr></code> prefix.
|
|
<p>Segment override prefixes have no effect for this instruction: the use
|
|
of <code><nobr>ES</nobr></code> for the store to
|
|
<code><nobr>[DI]</nobr></code> or <code><nobr>[EDI]</nobr></code> cannot be
|
|
overridden.
|
|
<p><code><nobr>STOSW</nobr></code> and <code><nobr>STOSD</nobr></code> work
|
|
in the same way, but they store the word in <code><nobr>AX</nobr></code> or
|
|
the doubleword in <code><nobr>EAX</nobr></code> instead of the byte in
|
|
<code><nobr>AL</nobr></code>, and increment or decrement the addressing
|
|
registers by 2 or 4 instead of 1.
|
|
<p>The <code><nobr>REP</nobr></code> prefix may be used to repeat the
|
|
instruction <code><nobr>CX</nobr></code> (or <code><nobr>ECX</nobr></code>
|
|
- again, the address size chooses which) times.
|
|
<h4><a name="section-B.4.304">B.4.304 <code><nobr>STR</nobr></code>: Store Task Register</a></h4>
|
|
<p><pre>
|
|
STR r/m16 ; 0F 00 /1 [286,PRIV]
|
|
</pre>
|
|
<p><code><nobr>STR</nobr></code> stores the segment selector corresponding
|
|
to the contents of the Task Register into its operand. When the operand
|
|
size is a 16-bit register, the upper 16-bits are cleared to 0s. When the
|
|
destination operand is a memory location, 16 bits are written regardless of
|
|
the operand size.
|
|
<h4><a name="section-B.4.305">B.4.305 <code><nobr>SUB</nobr></code>: Subtract Integers</a></h4>
|
|
<p><pre>
|
|
SUB r/m8,reg8 ; 28 /r [8086]
|
|
SUB r/m16,reg16 ; o16 29 /r [8086]
|
|
SUB r/m32,reg32 ; o32 29 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
SUB reg8,r/m8 ; 2A /r [8086]
|
|
SUB reg16,r/m16 ; o16 2B /r [8086]
|
|
SUB reg32,r/m32 ; o32 2B /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
SUB r/m8,imm8 ; 80 /5 ib [8086]
|
|
SUB r/m16,imm16 ; o16 81 /5 iw [8086]
|
|
SUB r/m32,imm32 ; o32 81 /5 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
SUB r/m16,imm8 ; o16 83 /5 ib [8086]
|
|
SUB r/m32,imm8 ; o32 83 /5 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
SUB AL,imm8 ; 2C ib [8086]
|
|
SUB AX,imm16 ; o16 2D iw [8086]
|
|
SUB EAX,imm32 ; o32 2D id [386]
|
|
</pre>
|
|
<p><code><nobr>SUB</nobr></code> performs integer subtraction: it subtracts
|
|
its second operand from its first, and leaves the result in its destination
|
|
(first) operand. The flags are set according to the result of the
|
|
operation: in particular, the carry flag is affected and can be used by a
|
|
subsequent <code><nobr>SBB</nobr></code> instruction
|
|
(<a href="#section-B.4.285">section B.4.285</a>).
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<h4><a name="section-B.4.306">B.4.306 <code><nobr>SUBPD</nobr></code>: Packed Double-Precision FP Subtract</a></h4>
|
|
<p><pre>
|
|
SUBPD xmm1,xmm2/m128 ; 66 0F 5C /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>SUBPD</nobr></code> subtracts the packed double-precision FP
|
|
values of the source operand from those of the destination operand, and
|
|
stores the result in the destination operation.
|
|
<h4><a name="section-B.4.307">B.4.307 <code><nobr>SUBPS</nobr></code>: Packed Single-Precision FP Subtract</a></h4>
|
|
<p><pre>
|
|
SUBPS xmm1,xmm2/m128 ; 0F 5C /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>SUBPS</nobr></code> subtracts the packed single-precision FP
|
|
values of the source operand from those of the destination operand, and
|
|
stores the result in the destination operation.
|
|
<h4><a name="section-B.4.308">B.4.308 <code><nobr>SUBSD</nobr></code>: Scalar Single-FP Subtract</a></h4>
|
|
<p><pre>
|
|
SUBSD xmm1,xmm2/m128 ; F2 0F 5C /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>SUBSD</nobr></code> subtracts the low-order double-precision
|
|
FP value of the source operand from that of the destination operand, and
|
|
stores the result in the destination operation. The high quadword is
|
|
unchanged.
|
|
<h4><a name="section-B.4.309">B.4.309 <code><nobr>SUBSS</nobr></code>: Scalar Single-FP Subtract</a></h4>
|
|
<p><pre>
|
|
SUBSS xmm1,xmm2/m128 ; F3 0F 5C /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>SUBSS</nobr></code> subtracts the low-order single-precision
|
|
FP value of the source operand from that of the destination operand, and
|
|
stores the result in the destination operation. The three high doublewords
|
|
are unchanged.
|
|
<h4><a name="section-B.4.310">B.4.310 <code><nobr>SVDC</nobr></code>: Save Segment Register and Descriptor</a></h4>
|
|
<p><pre>
|
|
SVDC m80,segreg ; 0F 78 /r [486,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>SVDC</nobr></code> saves a segment register (DS, ES, FS, GS,
|
|
or SS) and its descriptor to mem80.
|
|
<h4><a name="section-B.4.311">B.4.311 <code><nobr>SVLDT</nobr></code>: Save LDTR and Descriptor</a></h4>
|
|
<p><pre>
|
|
SVLDT m80 ; 0F 7A /0 [486,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>SVLDT</nobr></code> saves the Local Descriptor Table (LDTR)
|
|
to mem80.
|
|
<h4><a name="section-B.4.312">B.4.312 <code><nobr>SVTS</nobr></code>: Save TSR and Descriptor</a></h4>
|
|
<p><pre>
|
|
SVTS m80 ; 0F 7C /0 [486,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>SVTS</nobr></code> saves the Task State Register (TSR) to
|
|
mem80.
|
|
<h4><a name="section-B.4.313">B.4.313 <code><nobr>SYSCALL</nobr></code>: Call Operating System</a></h4>
|
|
<p><pre>
|
|
SYSCALL ; 0F 05 [P6,AMD]
|
|
</pre>
|
|
<p><code><nobr>SYSCALL</nobr></code> provides a fast method of transferring
|
|
control to a fixed entry point in an operating system.
|
|
<ul>
|
|
<li>The <code><nobr>EIP</nobr></code> register is copied into the
|
|
<code><nobr>ECX</nobr></code> register.
|
|
<li>Bits [31-0] of the 64-bit SYSCALL/SYSRET Target Address Register
|
|
(<code><nobr>STAR</nobr></code>) are copied into the
|
|
<code><nobr>EIP</nobr></code> register.
|
|
<li>Bits [47-32] of the <code><nobr>STAR</nobr></code> register specify the
|
|
selector that is copied into the <code><nobr>CS</nobr></code> register.
|
|
<li>Bits [47-32]+1000b of the <code><nobr>STAR</nobr></code> register
|
|
specify the selector that is copied into the SS register.
|
|
</ul>
|
|
<p>The <code><nobr>CS</nobr></code> and <code><nobr>SS</nobr></code>
|
|
registers should not be modified by the operating system between the
|
|
execution of the <code><nobr>SYSCALL</nobr></code> instruction and its
|
|
corresponding <code><nobr>SYSRET</nobr></code> instruction.
|
|
<p>For more information, see the
|
|
<code><nobr>SYSCALL and SYSRET Instruction Specification</nobr></code> (AMD
|
|
document number 21086.pdf).
|
|
<h4><a name="section-B.4.314">B.4.314 <code><nobr>SYSENTER</nobr></code>: Fast System Call</a></h4>
|
|
<p><pre>
|
|
SYSENTER ; 0F 34 [P6]
|
|
</pre>
|
|
<p><code><nobr>SYSENTER</nobr></code> executes a fast call to a level 0
|
|
system procedure or routine. Before using this instruction, various MSRs
|
|
need to be set up:
|
|
<ul>
|
|
<li><code><nobr>SYSENTER_CS_MSR</nobr></code> contains the 32-bit segment
|
|
selector for the privilege level 0 code segment. (This value is also used
|
|
to compute the segment selector of the privilege level 0 stack segment.)
|
|
<li><code><nobr>SYSENTER_EIP_MSR</nobr></code> contains the 32-bit offset
|
|
into the privilege level 0 code segment to the first instruction of the
|
|
selected operating procedure or routine.
|
|
<li><code><nobr>SYSENTER_ESP_MSR</nobr></code> contains the 32-bit stack
|
|
pointer for the privilege level 0 stack.
|
|
</ul>
|
|
<p><code><nobr>SYSENTER</nobr></code> performs the following sequence of
|
|
operations:
|
|
<ul>
|
|
<li>Loads the segment selector from the
|
|
<code><nobr>SYSENTER_CS_MSR</nobr></code> into the
|
|
<code><nobr>CS</nobr></code> register.
|
|
<li>Loads the instruction pointer from the
|
|
<code><nobr>SYSENTER_EIP_MSR</nobr></code> into the
|
|
<code><nobr>EIP</nobr></code> register.
|
|
<li>Adds 8 to the value in <code><nobr>SYSENTER_CS_MSR</nobr></code> and
|
|
loads it into the <code><nobr>SS</nobr></code> register.
|
|
<li>Loads the stack pointer from the
|
|
<code><nobr>SYSENTER_ESP_MSR</nobr></code> into the
|
|
<code><nobr>ESP</nobr></code> register.
|
|
<li>Switches to privilege level 0.
|
|
<li>Clears the <code><nobr>VM</nobr></code> flag in the
|
|
<code><nobr>EFLAGS</nobr></code> register, if the flag is set.
|
|
<li>Begins executing the selected system procedure.
|
|
</ul>
|
|
<p>In particular, note that this instruction des not save the values of
|
|
<code><nobr>CS</nobr></code> or <code><nobr>(E)IP</nobr></code>. If you
|
|
need to return to the calling code, you need to write your code to cater
|
|
for this.
|
|
<p>For more information, see the Intel Architecture Software Developer's
|
|
Manual, Volume 2.
|
|
<h4><a name="section-B.4.315">B.4.315 <code><nobr>SYSEXIT</nobr></code>: Fast Return From System Call</a></h4>
|
|
<p><pre>
|
|
SYSEXIT ; 0F 35 [P6,PRIV]
|
|
</pre>
|
|
<p><code><nobr>SYSEXIT</nobr></code> executes a fast return to privilege
|
|
level 3 user code. This instruction is a companion instruction to the
|
|
<code><nobr>SYSENTER</nobr></code> instruction, and can only be executed by
|
|
privilege level 0 code. Various registers need to be set up before calling
|
|
this instruction:
|
|
<ul>
|
|
<li><code><nobr>SYSENTER_CS_MSR</nobr></code> contains the 32-bit segment
|
|
selector for the privilege level 0 code segment in which the processor is
|
|
currently executing. (This value is used to compute the segment selectors
|
|
for the privilege level 3 code and stack segments.)
|
|
<li><code><nobr>EDX</nobr></code> contains the 32-bit offset into the
|
|
privilege level 3 code segment to the first instruction to be executed in
|
|
the user code.
|
|
<li><code><nobr>ECX</nobr></code> contains the 32-bit stack pointer for the
|
|
privilege level 3 stack.
|
|
</ul>
|
|
<p><code><nobr>SYSEXIT</nobr></code> performs the following sequence of
|
|
operations:
|
|
<ul>
|
|
<li>Adds 16 to the value in <code><nobr>SYSENTER_CS_MSR</nobr></code> and
|
|
loads the sum into the <code><nobr>CS</nobr></code> selector register.
|
|
<li>Loads the instruction pointer from the <code><nobr>EDX</nobr></code>
|
|
register into the <code><nobr>EIP</nobr></code> register.
|
|
<li>Adds 24 to the value in <code><nobr>SYSENTER_CS_MSR</nobr></code> and
|
|
loads the sum into the <code><nobr>SS</nobr></code> selector register.
|
|
<li>Loads the stack pointer from the <code><nobr>ECX</nobr></code> register
|
|
into the <code><nobr>ESP</nobr></code> register.
|
|
<li>Switches to privilege level 3.
|
|
<li>Begins executing the user code at the <code><nobr>EIP</nobr></code>
|
|
address.
|
|
</ul>
|
|
<p>For more information on the use of the
|
|
<code><nobr>SYSENTER</nobr></code> and <code><nobr>SYSEXIT</nobr></code>
|
|
instructions, see the Intel Architecture Software Developer's Manual,
|
|
Volume 2.
|
|
<h4><a name="section-B.4.316">B.4.316 <code><nobr>SYSRET</nobr></code>: Return From Operating System</a></h4>
|
|
<p><pre>
|
|
SYSRET ; 0F 07 [P6,AMD,PRIV]
|
|
</pre>
|
|
<p><code><nobr>SYSRET</nobr></code> is the return instruction used in
|
|
conjunction with the <code><nobr>SYSCALL</nobr></code> instruction to
|
|
provide fast entry/exit to an operating system.
|
|
<ul>
|
|
<li>The <code><nobr>ECX</nobr></code> register, which points to the next
|
|
sequential instruction after the corresponding
|
|
<code><nobr>SYSCALL</nobr></code> instruction, is copied into the
|
|
<code><nobr>EIP</nobr></code> register.
|
|
<li>Bits [63-48] of the <code><nobr>STAR</nobr></code> register specify the
|
|
selector that is copied into the <code><nobr>CS</nobr></code> register.
|
|
<li>Bits [63-48]+1000b of the <code><nobr>STAR</nobr></code> register
|
|
specify the selector that is copied into the <code><nobr>SS</nobr></code>
|
|
register.
|
|
<li>Bits [1-0] of the <code><nobr>SS</nobr></code> register are set to 11b
|
|
(RPL of 3) regardless of the value of bits [49-48] of the
|
|
<code><nobr>STAR</nobr></code> register.
|
|
</ul>
|
|
<p>The <code><nobr>CS</nobr></code> and <code><nobr>SS</nobr></code>
|
|
registers should not be modified by the operating system between the
|
|
execution of the <code><nobr>SYSCALL</nobr></code> instruction and its
|
|
corresponding <code><nobr>SYSRET</nobr></code> instruction.
|
|
<p>For more information, see the
|
|
<code><nobr>SYSCALL and SYSRET Instruction Specification</nobr></code> (AMD
|
|
document number 21086.pdf).
|
|
<h4><a name="section-B.4.317">B.4.317 <code><nobr>TEST</nobr></code>: Test Bits (notional bitwise AND)</a></h4>
|
|
<p><pre>
|
|
TEST r/m8,reg8 ; 84 /r [8086]
|
|
TEST r/m16,reg16 ; o16 85 /r [8086]
|
|
TEST r/m32,reg32 ; o32 85 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
TEST r/m8,imm8 ; F6 /0 ib [8086]
|
|
TEST r/m16,imm16 ; o16 F7 /0 iw [8086]
|
|
TEST r/m32,imm32 ; o32 F7 /0 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
TEST AL,imm8 ; A8 ib [8086]
|
|
TEST AX,imm16 ; o16 A9 iw [8086]
|
|
TEST EAX,imm32 ; o32 A9 id [386]
|
|
</pre>
|
|
<p><code><nobr>TEST</nobr></code> performs a `mental' bitwise AND of its
|
|
two operands, and affects the flags as if the operation had taken place,
|
|
but does not store the result of the operation anywhere.
|
|
<h4><a name="section-B.4.318">B.4.318 <code><nobr>UCOMISD</nobr></code>: Unordered Scalar Double-Precision FP compare and set EFLAGS</a></h4>
|
|
<p><pre>
|
|
UCOMISD xmm1,xmm2/m128 ; 66 0F 2E /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>UCOMISD</nobr></code> compares the low-order
|
|
double-precision FP numbers in the two operands, and sets the
|
|
<code><nobr>ZF</nobr></code>, <code><nobr>PF</nobr></code> and
|
|
<code><nobr>CF</nobr></code> bits in the <code><nobr>EFLAGS</nobr></code>
|
|
register. In addition, the <code><nobr>OF</nobr></code>,
|
|
<code><nobr>SF</nobr></code> and <code><nobr>AF</nobr></code> bits in the
|
|
<code><nobr>EFLAGS</nobr></code> register are zeroed out. The unordered
|
|
predicate (<code><nobr>ZF</nobr></code>, <code><nobr>PF</nobr></code> and
|
|
<code><nobr>CF</nobr></code> all set) is returned if either source operand
|
|
is a <code><nobr>NaN</nobr></code> (<code><nobr>qNaN</nobr></code> or
|
|
<code><nobr>sNaN</nobr></code>).
|
|
<h4><a name="section-B.4.319">B.4.319 <code><nobr>UCOMISS</nobr></code>: Unordered Scalar Single-Precision FP compare and set EFLAGS</a></h4>
|
|
<p><pre>
|
|
UCOMISS xmm1,xmm2/m128 ; 0F 2E /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>UCOMISS</nobr></code> compares the low-order
|
|
single-precision FP numbers in the two operands, and sets the
|
|
<code><nobr>ZF</nobr></code>, <code><nobr>PF</nobr></code> and
|
|
<code><nobr>CF</nobr></code> bits in the <code><nobr>EFLAGS</nobr></code>
|
|
register. In addition, the <code><nobr>OF</nobr></code>,
|
|
<code><nobr>SF</nobr></code> and <code><nobr>AF</nobr></code> bits in the
|
|
<code><nobr>EFLAGS</nobr></code> register are zeroed out. The unordered
|
|
predicate (<code><nobr>ZF</nobr></code>, <code><nobr>PF</nobr></code> and
|
|
<code><nobr>CF</nobr></code> all set) is returned if either source operand
|
|
is a <code><nobr>NaN</nobr></code> (<code><nobr>qNaN</nobr></code> or
|
|
<code><nobr>sNaN</nobr></code>).
|
|
<h4><a name="section-B.4.320">B.4.320 <code><nobr>UD0</nobr></code>, <code><nobr>UD1</nobr></code>, <code><nobr>UD2</nobr></code>: Undefined Instruction</a></h4>
|
|
<p><pre>
|
|
UD0 ; 0F FF [186,UNDOC]
|
|
UD1 ; 0F B9 [186,UNDOC]
|
|
UD2 ; 0F 0B [186]
|
|
</pre>
|
|
<p><code><nobr>UDx</nobr></code> can be used to generate an invalid opcode
|
|
exception, for testing purposes.
|
|
<p><code><nobr>UD0</nobr></code> is specifically documented by AMD as being
|
|
reserved for this purpose.
|
|
<p><code><nobr>UD1</nobr></code> is documented by Intel as being available
|
|
for this purpose.
|
|
<p><code><nobr>UD2</nobr></code> is specifically documented by Intel as
|
|
being reserved for this purpose. Intel document this as the preferred
|
|
method of generating an invalid opcode exception.
|
|
<p>All these opcodes can be used to generate invalid opcode exceptions on
|
|
all currently available processors.
|
|
<h4><a name="section-B.4.321">B.4.321 <code><nobr>UMOV</nobr></code>: User Move Data</a></h4>
|
|
<p><pre>
|
|
UMOV r/m8,reg8 ; 0F 10 /r [386,UNDOC]
|
|
UMOV r/m16,reg16 ; o16 0F 11 /r [386,UNDOC]
|
|
UMOV r/m32,reg32 ; o32 0F 11 /r [386,UNDOC]
|
|
</pre>
|
|
<p><pre>
|
|
UMOV reg8,r/m8 ; 0F 12 /r [386,UNDOC]
|
|
UMOV reg16,r/m16 ; o16 0F 13 /r [386,UNDOC]
|
|
UMOV reg32,r/m32 ; o32 0F 13 /r [386,UNDOC]
|
|
</pre>
|
|
<p>This undocumented instruction is used by in-circuit emulators to access
|
|
user memory (as opposed to host memory). It is used just like an ordinary
|
|
memory/register or register/register <code><nobr>MOV</nobr></code>
|
|
instruction, but accesses user space.
|
|
<p>This instruction is only available on some AMD and IBM 386 and 486
|
|
processors.
|
|
<h4><a name="section-B.4.322">B.4.322 <code><nobr>UNPCKHPD</nobr></code>: Unpack and Interleave High Packed Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
UNPCKHPD xmm1,xmm2/m128 ; 66 0F 15 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>UNPCKHPD</nobr></code> performs an interleaved unpack of the
|
|
high-order data elements of the source and destination operands, saving the
|
|
result in <code><nobr>xmm1</nobr></code>. It ignores the lower half of the
|
|
sources.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[63-0] := dst[127-64];
|
|
dst[127-64] := src[127-64].
|
|
</pre>
|
|
<h4><a name="section-B.4.323">B.4.323 <code><nobr>UNPCKHPS</nobr></code>: Unpack and Interleave High Packed Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
UNPCKHPS xmm1,xmm2/m128 ; 0F 15 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>UNPCKHPS</nobr></code> performs an interleaved unpack of the
|
|
high-order data elements of the source and destination operands, saving the
|
|
result in <code><nobr>xmm1</nobr></code>. It ignores the lower half of the
|
|
sources.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[31-0] := dst[95-64];
|
|
dst[63-32] := src[95-64];
|
|
dst[95-64] := dst[127-96];
|
|
dst[127-96] := src[127-96].
|
|
</pre>
|
|
<h4><a name="section-B.4.324">B.4.324 <code><nobr>UNPCKLPD</nobr></code>: Unpack and Interleave Low Packed Double-Precision FP Data</a></h4>
|
|
<p><pre>
|
|
UNPCKLPD xmm1,xmm2/m128 ; 66 0F 14 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>UNPCKLPD</nobr></code> performs an interleaved unpack of the
|
|
low-order data elements of the source and destination operands, saving the
|
|
result in <code><nobr>xmm1</nobr></code>. It ignores the lower half of the
|
|
sources.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[63-0] := dst[63-0];
|
|
dst[127-64] := src[63-0].
|
|
</pre>
|
|
<h4><a name="section-B.4.325">B.4.325 <code><nobr>UNPCKLPS</nobr></code>: Unpack and Interleave Low Packed Single-Precision FP Data</a></h4>
|
|
<p><pre>
|
|
UNPCKLPS xmm1,xmm2/m128 ; 0F 14 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>UNPCKLPS</nobr></code> performs an interleaved unpack of the
|
|
low-order data elements of the source and destination operands, saving the
|
|
result in <code><nobr>xmm1</nobr></code>. It ignores the lower half of the
|
|
sources.
|
|
<p>The operation of this instruction is:
|
|
<p><pre>
|
|
dst[31-0] := dst[31-0];
|
|
dst[63-32] := src[31-0];
|
|
dst[95-64] := dst[63-32];
|
|
dst[127-96] := src[63-32].
|
|
</pre>
|
|
<h4><a name="section-B.4.326">B.4.326 <code><nobr>VERR</nobr></code>, <code><nobr>VERW</nobr></code>: Verify Segment Readability/Writability</a></h4>
|
|
<p><pre>
|
|
VERR r/m16 ; 0F 00 /4 [286,PRIV]
|
|
</pre>
|
|
<p><pre>
|
|
VERW r/m16 ; 0F 00 /5 [286,PRIV]
|
|
</pre>
|
|
<ul>
|
|
<li><code><nobr>VERR</nobr></code> sets the zero flag if the segment
|
|
specified by the selector in its operand can be read from at the current
|
|
privilege level. Otherwise it is cleared.
|
|
<li><code><nobr>VERW</nobr></code> sets the zero flag if the segment can be
|
|
written.
|
|
</ul>
|
|
<h4><a name="section-B.4.327">B.4.327 <code><nobr>WAIT</nobr></code>: Wait for Floating-Point Processor</a></h4>
|
|
<p><pre>
|
|
WAIT ; 9B [8086]
|
|
FWAIT ; 9B [8086]
|
|
</pre>
|
|
<p><code><nobr>WAIT</nobr></code>, on 8086 systems with a separate 8087
|
|
FPU, waits for the FPU to have finished any operation it is engaged in
|
|
before continuing main processor operations, so that (for example) an FPU
|
|
store to main memory can be guaranteed to have completed before the CPU
|
|
tries to read the result back out.
|
|
<p>On higher processors, <code><nobr>WAIT</nobr></code> is unnecessary for
|
|
this purpose, and it has the alternative purpose of ensuring that any
|
|
pending unmasked FPU exceptions have happened before execution continues.
|
|
<h4><a name="section-B.4.328">B.4.328 <code><nobr>WBINVD</nobr></code>: Write Back and Invalidate Cache</a></h4>
|
|
<p><pre>
|
|
WBINVD ; 0F 09 [486]
|
|
</pre>
|
|
<p><code><nobr>WBINVD</nobr></code> invalidates and empties the processor's
|
|
internal caches, and causes the processor to instruct external caches to do
|
|
the same. It writes the contents of the caches back to memory first, so no
|
|
data is lost. To flush the caches quickly without bothering to write the
|
|
data back first, use <code><nobr>INVD</nobr></code>
|
|
(<a href="#section-B.4.125">section B.4.125</a>).
|
|
<h4><a name="section-B.4.329">B.4.329 <code><nobr>WRMSR</nobr></code>: Write Model-Specific Registers</a></h4>
|
|
<p><pre>
|
|
WRMSR ; 0F 30 [PENT]
|
|
</pre>
|
|
<p><code><nobr>WRMSR</nobr></code> writes the value in
|
|
<code><nobr>EDX:EAX</nobr></code> to the processor Model-Specific Register
|
|
(MSR) whose index is stored in <code><nobr>ECX</nobr></code>. See also
|
|
<code><nobr>RDMSR</nobr></code> (<a href="#section-B.4.270">section
|
|
B.4.270</a>).
|
|
<h4><a name="section-B.4.330">B.4.330 <code><nobr>WRSHR</nobr></code>: Write SMM Header Pointer Register</a></h4>
|
|
<p><pre>
|
|
WRSHR r/m32 ; 0F 37 /0 [386,CYRIX,SMM]
|
|
</pre>
|
|
<p><code><nobr>WRSHR</nobr></code> loads the contents of either a 32-bit
|
|
memory location or a 32-bit register into the SMM header pointer register.
|
|
<p>See also <code><nobr>RDSHR</nobr></code>
|
|
(<a href="#section-B.4.272">section B.4.272</a>).
|
|
<h4><a name="section-B.4.331">B.4.331 <code><nobr>XADD</nobr></code>: Exchange and Add</a></h4>
|
|
<p><pre>
|
|
XADD r/m8,reg8 ; 0F C0 /r [486]
|
|
XADD r/m16,reg16 ; o16 0F C1 /r [486]
|
|
XADD r/m32,reg32 ; o32 0F C1 /r [486]
|
|
</pre>
|
|
<p><code><nobr>XADD</nobr></code> exchanges the values in its two operands,
|
|
and then adds them together and writes the result into the destination
|
|
(first) operand. This instruction can be used with a
|
|
<code><nobr>LOCK</nobr></code> prefix for multi-processor synchronisation
|
|
purposes.
|
|
<h4><a name="section-B.4.332">B.4.332 <code><nobr>XBTS</nobr></code>: Extract Bit String</a></h4>
|
|
<p><pre>
|
|
XBTS reg16,r/m16 ; o16 0F A6 /r [386,UNDOC]
|
|
XBTS reg32,r/m32 ; o32 0F A6 /r [386,UNDOC]
|
|
</pre>
|
|
<p>The implied operation of this instruction is:
|
|
<p><pre>
|
|
XBTS r/m16,reg16,AX,CL
|
|
XBTS r/m32,reg32,EAX,CL
|
|
</pre>
|
|
<p>Writes a bit string from the source operand to the destination.
|
|
<code><nobr>CL</nobr></code> indicates the number of bits to be copied, and
|
|
<code><nobr>(E)AX</nobr></code> indicates the low order bit offset in the
|
|
source. The bits are written to the low order bits of the destination
|
|
register. For example, if <code><nobr>CL</nobr></code> is set to 4 and
|
|
<code><nobr>AX</nobr></code> (for 16-bit code) is set to 5, bits 5-8 of
|
|
<code><nobr>src</nobr></code> will be copied to bits 0-3 of
|
|
<code><nobr>dst</nobr></code>. This instruction is very poorly documented,
|
|
and I have been unable to find any official source of documentation on it.
|
|
<p><code><nobr>XBTS</nobr></code> is supported only on the early Intel
|
|
386s, and conflicts with the opcodes for
|
|
<code><nobr>CMPXCHG486</nobr></code> (on early Intel 486s). NASM supports
|
|
it only for completeness. Its counterpart is <code><nobr>IBTS</nobr></code>
|
|
(see <a href="#section-B.4.116">section B.4.116</a>).
|
|
<h4><a name="section-B.4.333">B.4.333 <code><nobr>XCHG</nobr></code>: Exchange</a></h4>
|
|
<p><pre>
|
|
XCHG reg8,r/m8 ; 86 /r [8086]
|
|
XCHG reg16,r/m8 ; o16 87 /r [8086]
|
|
XCHG reg32,r/m32 ; o32 87 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
XCHG r/m8,reg8 ; 86 /r [8086]
|
|
XCHG r/m16,reg16 ; o16 87 /r [8086]
|
|
XCHG r/m32,reg32 ; o32 87 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
XCHG AX,reg16 ; o16 90+r [8086]
|
|
XCHG EAX,reg32 ; o32 90+r [386]
|
|
XCHG reg16,AX ; o16 90+r [8086]
|
|
XCHG reg32,EAX ; o32 90+r [386]
|
|
</pre>
|
|
<p><code><nobr>XCHG</nobr></code> exchanges the values in its two operands.
|
|
It can be used with a <code><nobr>LOCK</nobr></code> prefix for purposes of
|
|
multi-processor synchronisation.
|
|
<p><code><nobr>XCHG AX,AX</nobr></code> or
|
|
<code><nobr>XCHG EAX,EAX</nobr></code> (depending on the
|
|
<code><nobr>BITS</nobr></code> setting) generates the opcode
|
|
<code><nobr>90h</nobr></code>, and so is a synonym for
|
|
<code><nobr>NOP</nobr></code> (<a href="#section-B.4.190">section
|
|
B.4.190</a>).
|
|
<h4><a name="section-B.4.334">B.4.334 <code><nobr>XLATB</nobr></code>: Translate Byte in Lookup Table</a></h4>
|
|
<p><pre>
|
|
XLAT ; D7 [8086]
|
|
XLATB ; D7 [8086]
|
|
</pre>
|
|
<p><code><nobr>XLATB</nobr></code> adds the value in
|
|
<code><nobr>AL</nobr></code>, treated as an unsigned byte, to
|
|
<code><nobr>BX</nobr></code> or <code><nobr>EBX</nobr></code>, and loads
|
|
the byte from the resulting address (in the segment specified by
|
|
<code><nobr>DS</nobr></code>) back into <code><nobr>AL</nobr></code>.
|
|
<p>The base register used is <code><nobr>BX</nobr></code> if the address
|
|
size is 16 bits, and <code><nobr>EBX</nobr></code> if it is 32 bits. If you
|
|
need to use an address size not equal to the current
|
|
<code><nobr>BITS</nobr></code> setting, you can use an explicit
|
|
<code><nobr>a16</nobr></code> or <code><nobr>a32</nobr></code> prefix.
|
|
<p>The segment register used to load from <code><nobr>[BX+AL]</nobr></code>
|
|
or <code><nobr>[EBX+AL]</nobr></code> can be overridden by using a segment
|
|
register name as a prefix (for example,
|
|
<code><nobr>es xlatb</nobr></code>).
|
|
<h4><a name="section-B.4.335">B.4.335 <code><nobr>XOR</nobr></code>: Bitwise Exclusive OR</a></h4>
|
|
<p><pre>
|
|
XOR r/m8,reg8 ; 30 /r [8086]
|
|
XOR r/m16,reg16 ; o16 31 /r [8086]
|
|
XOR r/m32,reg32 ; o32 31 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
XOR reg8,r/m8 ; 32 /r [8086]
|
|
XOR reg16,r/m16 ; o16 33 /r [8086]
|
|
XOR reg32,r/m32 ; o32 33 /r [386]
|
|
</pre>
|
|
<p><pre>
|
|
XOR r/m8,imm8 ; 80 /6 ib [8086]
|
|
XOR r/m16,imm16 ; o16 81 /6 iw [8086]
|
|
XOR r/m32,imm32 ; o32 81 /6 id [386]
|
|
</pre>
|
|
<p><pre>
|
|
XOR r/m16,imm8 ; o16 83 /6 ib [8086]
|
|
XOR r/m32,imm8 ; o32 83 /6 ib [386]
|
|
</pre>
|
|
<p><pre>
|
|
XOR AL,imm8 ; 34 ib [8086]
|
|
XOR AX,imm16 ; o16 35 iw [8086]
|
|
XOR EAX,imm32 ; o32 35 id [386]
|
|
</pre>
|
|
<p><code><nobr>XOR</nobr></code> performs a bitwise XOR operation between
|
|
its two operands (i.e. each bit of the result is 1 if and only if exactly
|
|
one of the corresponding bits of the two inputs was 1), and stores the
|
|
result in the destination (first) operand.
|
|
<p>In the forms with an 8-bit immediate second operand and a longer first
|
|
operand, the second operand is considered to be signed, and is
|
|
sign-extended to the length of the first operand. In these cases, the
|
|
<code><nobr>BYTE</nobr></code> qualifier is necessary to force NASM to
|
|
generate this form of the instruction.
|
|
<p>The <code><nobr>MMX</nobr></code> instruction
|
|
<code><nobr>PXOR</nobr></code> (see <a href="#section-B.4.266">section
|
|
B.4.266</a>) performs the same operation on the 64-bit
|
|
<code><nobr>MMX</nobr></code> registers.
|
|
<h4><a name="section-B.4.336">B.4.336 <code><nobr>XORPD</nobr></code>: Bitwise Logical XOR of Double-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
XORPD xmm1,xmm2/m128 ; 66 0F 57 /r [WILLAMETTE,SSE2]
|
|
</pre>
|
|
<p><code><nobr>XORPD</nobr></code> returns a bit-wise logical XOR between
|
|
the source and destination operands, storing the result in the destination
|
|
operand.
|
|
<h4><a name="section-B.4.337">B.4.337 <code><nobr>XORPS</nobr></code>: Bitwise Logical XOR of Single-Precision FP Values</a></h4>
|
|
<p><pre>
|
|
XORPS xmm1,xmm2/m128 ; 0F 57 /r [KATMAI,SSE]
|
|
</pre>
|
|
<p><code><nobr>XORPS</nobr></code> returns a bit-wise logical XOR between
|
|
the source and destination operands, storing the result in the destination
|
|
operand.
|
|
<p align=center><a href="nasmdoca.html">Previous Chapter</a> |
|
|
<a href="nasmdoc0.html">Contents</a> |
|
|
<a href="nasmdoci.html">Index</a>
|
|
</body></html>
|