319 lines
17 KiB
Plaintext
319 lines
17 KiB
Plaintext
[ Article crossposted from comp.lang.asm.x86 ]
|
||
[ Author was Jerzy Tarasiuk ]
|
||
[ Posted on 25 May 1995 17:37:03 +0200 ]
|
||
|
||
Real and Protected Modes.
|
||
|
||
Beginning from 80286, Intel CPUs have ability to work in Protected Mode
|
||
(older CPUs have Real Mode only). For compatibility reasons, all CPUs
|
||
start in Real Mode after reset. Below are presented main differences
|
||
between Real Mode and Protected modes for Intel CPUs. Note there are:
|
||
Real Mode, Protected Mode, Virtual 8086 Mode (they will be frequently
|
||
called RM, PM, VM86, respectively; also 286+(386+) will mean Intel
|
||
80286(80386) or better).
|
||
|
||
There are some differences between these modes in memory addressing
|
||
(PM can address all memory, while RM can't unless it is set in PM on
|
||
386+, and VM86 cannot unless using PM supporting it to remap memory
|
||
- this way EMM386 works); instruction set (some instruction are not
|
||
allowed in RM), privileges (something can be forbidden in PM for less
|
||
privileged code, many operations are forbidden in VM86), interrupt
|
||
handling. PM supports multitasking, PM can run tasks in VM86 (the
|
||
VM86 cannot function alone, must have PM code supporting it; it works
|
||
similarly 8086 CPU with few enhancements except interrupt servicing
|
||
which goes through PM). PM cannot store data to code segment (unless
|
||
by aliasing; MOV CS:[BX],AX is illegal in PM). VM86 and PM on 386+ can
|
||
have selective I/O port access restrictions (some ports can be accessed
|
||
without causing exception and other can't).
|
||
|
||
|
||
Memory addressing and Paging.
|
||
|
||
In any mode, opcode defines some offset and segment of referenced memory
|
||
address, e.g. mov ax,es:[bx+si+1] - segment es, offset bx+si+1, push si
|
||
- segment ss, offset sp-2, opcode itself is referenced by segment cs and
|
||
offset ip; the address is translated to Linear Address by adding the
|
||
offset to base of the segment and the Linear Address is then translated
|
||
to Physical Address which is outputed by CPU on its address pins.
|
||
|
||
In RM or VM86, the base is segment*10h; in PM the base is taken from
|
||
descriptor table (LDT or GDT) and can have any value.
|
||
The value in segment register is called "selector" and its bits 15-3
|
||
specify offset in LDT or GDT (the offset is multiply of 8), bit 2 is 0
|
||
for GDT, 1 for LDT, bits 1-0 specify RPL (Requested Privilege Level).
|
||
|
||
Unless Paging (possible in PM and VM86, on 386+ only) is enabled,
|
||
Physical Address = Linear. With Paging, low 12 bits of Linear Address
|
||
go to Physical, other are used as index to two-level page tables
|
||
(first bits 31-22 select page directory, then bits 21-12 select page).
|
||
|
||
Paging can also restrict access to some pages (in a way non-privileged
|
||
code can read it only or has no access at all), or define non-present
|
||
pages which have assigned physical addresses and put in memory in a way
|
||
transparent to program when access to their Linear Address is attempted.
|
||
Note Linear Address space is 4GB on 386+, and probably no system has so
|
||
much physical memory: Paging makes system able to simulate it has.
|
||
|
||
Segment has also limit. Initially, the limit is 0FFFFh for all segment
|
||
registers and cannot be changed in RM or VM86. In PM it is loaded from
|
||
LDT or GDT when segment register is loaded. On 286 in PM the limit can
|
||
be up to 0FFFFh, on 386+ in PM it can be up to 0FFFFFFFFh.
|
||
Also, PM allows "expand down" segments which allow access from address
|
||
limit+1 to maximum possible value of limit (depend on segment type).
|
||
|
||
|
||
Privilege Levels and Rules.
|
||
|
||
In RM, CPU has full privileges. In PM and VM86, they can be restricted.
|
||
This reduces possibility of making disasters by bad code.
|
||
|
||
Base rules: cannot access more privileged data or call less privileged
|
||
code than own privilege (although can return to less privileged code).
|
||
Additional: call to more privileged code cannot use any target address
|
||
caller wants, it can use addresses specified by system only; call to
|
||
more privileged code must change stack to make sure enough stack space
|
||
is available for called code (so caller cannot cause crash in it).
|
||
|
||
There are 4 levels: level 0 is full privilege (except Debug Registers,
|
||
which can be protected from access even from level 0; some instructions
|
||
are reserved for level 0 only), the bigger level the less privileges
|
||
are. Few terms used for Privilege Levels: CPL - Current PL, DPL -
|
||
Descriptor PL, RPL - Requested PL (in selector), IOPL (in flags) -
|
||
max CPL allowing I/O sensitive opcodes (CLI, STI, PUSHF, POPF,...).
|
||
|
||
Unless accessing Conforming Code segment, privilege rules require
|
||
max(CPL,RPL)<=DPL. To execute code (by FAR CALL or JMP) need DPL<=CPL
|
||
(note unless it is Conforming, must be DPL=CPL and RPL<=CPL) - cannot
|
||
call less privileged procedure, for example. To transfer control to
|
||
code with less PL (more privileged), must CALL via call gate (in such
|
||
a case, need max(CPL,RPL)<=gate_DPL, but for code the gate refers to
|
||
may be code_DPL<gate_DPL; the gate is entry in GDT or LDT; privilege
|
||
rules require also target_code_DPL <= CPL for CALL, = for JMP), this
|
||
also requires TR to point to valid TSS because it switches stack: old
|
||
SS:[E]SP are pushed on new stack, then parameters (as defined in call
|
||
gate) are pushed, finally CS:[E]IP are pushed. On return from the call
|
||
CPU detects RPL of CS on stack > CPL and switches stack back (if =, no
|
||
stack switch, < inhibited by privilege rules), for proper functioning
|
||
parameter counts on RET and in call gate must match. For stack segment
|
||
DPL must be equal CPL (so in more privileged mode no crash is possible
|
||
due to incorrect stack setting in less privileged, and in the less
|
||
privileged there is no access to more privileged mode stack).
|
||
|
||
The RPL is for system to block possibility to pass a pointer from user
|
||
code which is invalid in user mode and valid in system: system uses RPL
|
||
as for user code and gets access violation error in such a case.
|
||
It can be done using ARPL opcode which adjusts RPL for a selector, and
|
||
sets ZF if changed (to inform OS invalid access might be attempted).
|
||
OS uses it to set RPL of the pointer to CPL of the application code.
|
||
|
||
It is possible to check what access having to a segment by opcodes like
|
||
VERR, VERW, LAR, LSL. They all set ZF if having access, clear if not.
|
||
First two simply verify R/W access, LAR gets bits defining access right
|
||
for a segment, LSL gives the segment limit value. These opcodes allow
|
||
checking what would cause access violation, instead getting the error.
|
||
|
||
Some instructions are allowed at CPL=0 only. They are:
|
||
Clear Task<73>Switched Flag (CLTS), Halt Processor (HLT), loading some
|
||
system registers (GDTR,IDTR,LDTR,MSW,TR), any access to CRx,DRx,TRx.
|
||
Some other require CPL<=IOPL. They are: IN, INS, OUT, OUTS, CLI, STI.
|
||
Also, POPF behavior depends on CPL: if CPL>0, IOPL and VM aren't
|
||
changed by POPF, if CPL>IOPL, IF (interrupt enable) isn't changed.
|
||
|
||
|
||
Interrupts.
|
||
|
||
In every mode, there is an array containing information what action is
|
||
to be taken in case of interrupt. Its first entry corresponds to INT 0,
|
||
next to INT 1, and so on. It is called IDT(Interrupt Descriptor Table).
|
||
In RM, each entry in the IDT is simply far address of interrupt service
|
||
routine. Initially IDT is located at address 0 and has 100h entries
|
||
(400h bytes; some CPU-s have its limit 0FFFFh but the remainder isn't
|
||
accessible in RM); on pre-80286 CPUs the IDT address and size cannot be
|
||
changed, on 286+ can load and store them using LIDT and SIDT opcodes.
|
||
|
||
In PM the IDT has 8-byte entries which can be interrupt, trap or task
|
||
gates. Trap differs from interrupt by leaving interrupt flag same as
|
||
in interrupted code. Task gate causes calling another task. They all
|
||
have DPLs and interrupt instruction causes General Protection error
|
||
if CPL > interrupt or trap gate DPL. However, other interrupt sources
|
||
have "CPL 0" - they can access any gate needed.
|
||
|
||
Some conditions can cause an Exception. They are (for 80386): divide
|
||
error (0), debug exceptions (1), non-maskable interrupt (2), breakpoint
|
||
(3), overflow (4, on into opcode), bounds check (5, on bound opcode),
|
||
invalid opcode (6), coprocessor not available (7), double fault (8,E),
|
||
coprocessor segment overrun (9,P), invalid TSS (10,PE), segment not
|
||
present (11,PE), stack error (12,E), general protection error (13,E),
|
||
page fault (14,PE), coprocessor error (16); marked by P can occur in
|
||
PM and VM86 only, marked by E push error code on stack if they occur
|
||
in PM or VM86 (so stack is: error, IP, CS, flags; the error code is
|
||
usually either 0 or selector causing the exception (in case selector is
|
||
invalid or non-accessible), with flags on low order bits: bit 0 means
|
||
external source, bit 1 IDT selector, bit 2 LDT; for page fault it is
|
||
set of flags (bits 3-31 undefined): bit 0 set if page protection
|
||
violation, 1 if writing, 2 if user mode), most of them push IP of
|
||
opcode causing them, except 3,4,9 which push IP of next opcode.
|
||
Note: interrupt cannot be serviced at PL>CPL (unless via task switch),
|
||
attempt to do it causes General Protection error.
|
||
|
||
Interrupt processing in PM is more complicated when interrupt handler
|
||
has Privilege Level other than current code. It is handled similarly
|
||
CALL via gate: stack is switched, new SS:SP are taken from TSS, old
|
||
SS:SP are pushed on the new stack, then flags, CS, IP and eventually
|
||
error code (for some exceptions) are pushed.
|
||
In VM86 interrupt pushes GS,FS,DS,ES,SS,ESP,EFLAGS,CS,EIP (exception
|
||
also error code) onto PL 0 stack. There is VM bit in EFLAGS set to tell
|
||
interrupt occured in VM86. Note IDT must contain task gates and 80386
|
||
trap or interrupt gates pointing to a non-conforming code segment with
|
||
DPL=0 only - interrupt service must come through PL 0 or task switch.
|
||
The VM86 itself has CPL 3 and is allowed in 386 task only.
|
||
|
||
|
||
Descriptor Tables (PM only).
|
||
|
||
Global Descriptor Table(GDT) can contain descriptors of any type except
|
||
interrupt and trap gates. It is necessary for PM. First entry in GDT
|
||
isn't used - it corresponds to null selector which can be loaded into
|
||
segment register but causes exception if used for memory addressing.
|
||
|
||
Local Descriptor Table(LDT) can contain "normal" segment descriptors
|
||
(not e.g. TSS) and call or task gates only. Usually every task has its
|
||
own LDT (changed on task switch). The LDT must have descriptor in GDT.
|
||
|
||
Interrupt Descriptor Table(IDT) was discussed in "Interrupts" section.
|
||
|
||
"Normal" segment descriptors are referenced when a segment register is
|
||
loaded and they describe a memory area and give some access to it.
|
||
Bit 2 of selector used selects table: 0 means GDT, 1 means LDT.
|
||
Other descriptors can be Task State Segment(TSS), and gates. They can
|
||
be referenced "as a code segment", e.g. by far jump or call and they
|
||
cause transferring control to task or code segment referenced by them.
|
||
It is kind of indirect jump or call (they contain target selector).
|
||
TSS or gate pointing to TSS cause task switch. Gate can be used to
|
||
transfer control to more privileged code not accessible directly.
|
||
TSS can be also referenced by LTR (Load Task Register) opcode and it
|
||
is done once during PM initialization. LDT descriptor can be loaded
|
||
into LDTR(register) by LLDT opcode and usually it is done once.
|
||
|
||
|
||
Segment and System Descriptors.
|
||
|
||
The following segment types (in byte [descriptor+5]) are supported
|
||
(for all bit 7 means present in memory, bits 5-6 keep DPL which says
|
||
what is maximum CPL which can access the descriptor, the restriction is
|
||
for all descriptors, not segments only, except conforming segments):
|
||
|
||
10h+flags - data: bit 1 - writable, bit 2 - expand down
|
||
18h+flags - code: bit 1 - readable, bit 2 - conforming
|
||
|
||
for both, bit 0 is set by any access. The descriptor also contains
|
||
limit in word [0] (in 386 segments extended to bits 0-3 of byte [6])
|
||
and base in bytes [2..4] (in 386 segments extended to byte [7]).
|
||
Byte [6] keeps few additional flags: bit 7 - granularity (limit is in
|
||
4kB pages; e.g. limit 0 means 0..0FFFh accessible), bit 6 - 32-bit
|
||
addressing (applies to code and stack - use EIP, ESP, makes expand down
|
||
segment upper limit 4GB), bit 5 must be 0, bit 4 is for programmer.
|
||
|
||
01h+flags - TSS: bit 1 - busy, bit 3 - 386 TSS
|
||
02h - LDT
|
||
04h+flags - call gate
|
||
05h - task gate
|
||
06h+flags - interrupt gate: bit 0 - trap, bit 3 - 386.
|
||
|
||
for all gates, word[2] keeps selector, word[0] and word[3] keep offset
|
||
of called code (ignored for task gate), byte[4] keeps word count (0-31)
|
||
for copying in case of inter-level call (call gate only, else ignored);
|
||
TSS and LDT have base and limit in same form as code and data segments
|
||
have, they can have bit 7 set in byte [6] to specify limit in pages.
|
||
Word [6] should be 0 for the descriptor to mean the same on 286/386.
|
||
|
||
LDT is similar GDT, except not all descriptor types are allowed.
|
||
TSS holds entire task state (all registers: general, segment, flags,
|
||
ip, ldtr); it also keeps link to caller TSS (valid if the task was
|
||
activated by INT or CALL) and stacks (SS and [E]SP) for PL 0,1,2
|
||
(they are used when more privileged code is invoked via gate from less
|
||
privileged). 386 TSS has also debug trap bit (if set, causes INT 1 on
|
||
task switch to the TSS), I/O bit map (saying which I/O addresses can
|
||
be accessed when CPL>IOPL without General Protection exception), and
|
||
CR3 value for the task (can remap memory on task switch).
|
||
|
||
|
||
Page tables:
|
||
|
||
both page directory and page table entries keep referenced address in
|
||
bits 31-12, have bits 11-9 reserved for programmer, must have bits 8,7,
|
||
4,3 set to 0; bit 5 is called A (accessed), it is set by CPU on access
|
||
to the entry, bit 6 is called D (dirty), it is set if referenced memory
|
||
is written; bit 0 is called P (present), all other are ignored if it is
|
||
not set; bit 2 allows user (CPL=3) access if set, bit 1 allows user to
|
||
write (together with bit 2 only), for CPL<3 read/write is allowed for
|
||
any setting of bits 1 and 2 (no protection against system this way).
|
||
Note page table entries used are usually cached by CPU: modifying them
|
||
in memory may cause no mapping change until the cache is reloaded. The
|
||
cache is flushed every time CR3 (which points to first page directory
|
||
entry) is loaded. Bits 0-11 of CR3 must be 0 (directory page-aligned).
|
||
Addressing through page tables: CR3+(Linear_Address SHR 20) AND 0FFCh
|
||
is address in Page Directory, the entry at the address contains Page
|
||
Table address; Page Table address + (Linear_Address SHR 10) AND 0FFCh
|
||
is address in Page Table and the entry at the address contains base
|
||
address of the page, combine it with bits 11-0 of Linear_Address and
|
||
the result is Physical Address. In case of any error, CR2 is set to the
|
||
Linear Address causing the error and error code explains what error.
|
||
Note: if Paging is enabled, CR3 must keep Physical Address of Page
|
||
Directory and all other addresses are Linear Addresses.
|
||
|
||
|
||
Switching to Protected Mode or back to Real Mode:
|
||
|
||
First: to get control in case of crash, need store in dword [0467h]
|
||
address where control is to be passed, and put 0Ah in CMOS register 0Fh
|
||
(by CLI; MOV AL,8Fh; OUT 70h,AL; (1us delay) MOV AL,0Ah; OUT 71h,AL;).
|
||
Also: normally, some circuitry in PC compatibles disables address line
|
||
A20; must enable it. If you use HIMEM, it can be enabled by a request
|
||
to HIMEM. If you also have DOS=HIGH, it is usually enabled, as it is
|
||
enabled by any DOS call. In other cases, you must send output port
|
||
value to keyboard controller to enable it before switching to PM.
|
||
|
||
Switch to PM: required is loading GDTR, then can enable protection by
|
||
setting CR0/MSW bit 0 (MOV EAX,CR0; OR AL,1; MOV CR0,EAX; or SMSW AX;
|
||
OR AL,1; LMSW AX; first on 386+, second on 286+); it is recommended
|
||
to load IDTR immediately before or after mode switch (same IDT can't be
|
||
valid in both modes); immediately after mode change should execute JMP
|
||
to flush prefetch queue which may be partially decoded (the decoding
|
||
may be mode dependent); need load CS and SS - they contain invalid
|
||
selectors and e.g. interrupt causes them to be put on stack and crash
|
||
on IRET; it is also recommended to load all segment registers (they can
|
||
be loaded with 0 to contain invalid selector and cause exception if any
|
||
of them is used to address memory) and LDTR; before first task switch
|
||
must load TR (selector of valid free TSS descriptor; the TSS will be
|
||
used to store state on task switch).
|
||
|
||
These is also a BIOS call which switches to PM and changes external
|
||
interrupt vector mapping (normally 1st controller has 08h..0Fh, 2nd
|
||
70h..77h, the 1st conflicts with some CPU exceptions; however it is
|
||
easy to distinguish external interrupt from an exception), it also
|
||
enables address line A20. See INT 15h, AH=89h description.
|
||
|
||
Returning to RM: it can be done by clearing bit 0 in CR0 but it needs
|
||
some preparation: must disable paging (go to code/stack which has
|
||
linear addresses same as physical, clear PG bit in CR0, clear CR3), go
|
||
to code segment with limit=64k and load all segment registers except
|
||
CS with valid descriptor of 64kB read/write expand-up byte-granular
|
||
present segment (attribute byte=93h, extended attribute=0) - otherwise
|
||
you can get RM with e.g. read-only or 32kB ES, which will soon cause
|
||
crash. After clearing the bit 0 of CR0 execute far jump to load CS and
|
||
flush prefetch queue and load segment registers for RM.
|
||
|
||
This is not available on 80286 which has no CR0 register (the Protect
|
||
Enable bit cannot be cleared by LMSW). The only way to get to RM again
|
||
is resetting the CPU: it can be done by the following code: CLI;
|
||
XOR CX,CX; wait_kbd_ctrlr_input_empty: IN AL,64h; TEST AL,2; LOOPNZ
|
||
wait_kbd_ctrlr_input_empty; MOV AL,0FEh; OUT 64h,AL; HLT; or by CPU
|
||
shutdown (resulting in case of exception while servicing double fault).
|
||
|
||
Note most programs running system in VM86 provide interface to switch
|
||
to PM and back to VM86, it is called VCPI (Virtual Control Program
|
||
Interface), can be tested for presence and invoked by INT 67h,AH=0DEh.
|
||
It requires 3 entries in GDT to be reserved for VCPI provider.
|
||
|