add directory study

This commit is contained in:
gohigh
2024-02-19 00:25:23 -05:00
parent b1306b38b1
commit f3774e2f8c
4001 changed files with 2285787 additions and 0 deletions

View File

@@ -0,0 +1,533 @@
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html> <head> <title>A Memory Allocator</title> </head>
<body bgcolor="#ffffee" vlink="#0000aa" link="#cc0000">
<h1>A Memory Allocator</h1>
<p> by <a href="http://g.oswego.edu">Doug Lea</a>
<p>
[A German adaptation and translation of this article appears
in <b>unix/mail</b> December, 1996.]
<h2>Introduction</h2>
<p>
Memory allocators form interesting case studies in the engineering
of infrastructure software. I started writing one in 1987, and
have maintained and evolved it (with the help of many volunteer
contributors) ever since. This allocator provides implementations
of the the standard C routines <code>malloc()</code>,
<code>free()</code>, and <code>realloc()</code>, as well as a few
auxiliary utility routines. The allocator has never been given a
specific name. Most people just call it <em>Doug Lea's
Malloc</em>, or <em>dlmalloc</em> for short.
<p>
The code for this allocator
has been placed in the public domain (available from
<a href="ftp://g.oswego.edu/pub/misc/malloc.c">
ftp://g.oswego.edu/pub/misc/malloc.c</a>), and is apparently
widely used: It serves as the default native version of malloc in
some versions of Linux; it is compiled into several commonly
available software packages (overriding the native malloc), and
has been used in various PC environments as well as in embedded
systems, and surely many other places I don't even know about.
<p>
I wrote the first version of the allocator after writing some C++
programs that almost exclusively relied on allocating dynamic
memory. I found that they ran much more slowly and/or with much
more total memory consumption than I expected them to. This was
due to characteristics of the memory allocators on the systems I
was running on (mainly the then-current versions of SunOs and BSD
). To counter this, at first I wrote a number of special-purpose
allocators in C++, normally by overloading <code>operator
new</code> for various classes. Some of these are described in a
paper on C++ allocation techniques that was adapted into the 1989
<em>C++ Report</em> article <a
href="ftp://g.oswego.edu/pub/papers/C++Report89.txt"> <em>Some
storage allocation techniques for container classes</em></a>.
<p>
However, I soon realized that building a special allocator for
each new class that tended to be dynamically allocated and heavily
used was not a good strategy when building kinds of
general-purpose programming support classes I was writing at the
time. (From 1986 to 1991, I was the the primary author of <A
HREF="http://g.oswego.edu/dl/libg++paper/libg++/libg++.html">
libg++ </A>, the GNU C++ library.) A broader solution was needed --
to write an allocator that was good enough under normal C++ and C
loads so that programmers would not be tempted to write
special-purpose allocators except under very special conditions.
<p>
This article presents a description of some of the main design
goals, algorithms, and implementation considerations for this
allocator. More detailed documentation can be found with the code
distribution.
<h2>Goals</h2>
A good memory allocator needs to balance a number of goals:
<dl>
<dt>Maximizing Compatibility
<dd>An allocator should be plug-compatible with others; in particular
it should obey ANSI/POSIX conventions.
<dt> Maximizing Portability
<dd> Reliance on as few system-dependent features (such as system calls)
as possible, while still providing optional support for other useful
features found only one some systems; conformance
to all known system constraints on alignment and addressing rules.
<dt> Minimizing Space
<dd> The allocator should not waste space: It should obtain as little
memory from the system as possible, and should maintain memory in ways
that minimize <em>fragmentation</em> -- ``holes''in contiguous chunks
of memory that are not used by the program.
<dt> Minimizing Time
<dd> The <code>malloc()</code>, <code>free()</code> and <code>realloc</code>
routines should be as fast as possible in the average case.
<dt> Maximizing Tunability
<dd> Optional features and behavior should be controllable by users
either statically (via <code>#define</code> and the like) or
dynamically (via control commands such as <code>mallopt</code>).
<dt> Maximizing Locality
<dd> Allocating chunks of memory that are typically
used together near each other. This helps minimize page and cache misses
during program execution.
<dt> Maximizing Error Detection
<dd> It does not seem possible for a general-purpose allocator to
also serve as general-purpose memory error testing tool
such as <em>Purify</em>. However,
allocators should provide some means for detecting corruption due
to overwriting memory, multiple frees, and so on.
<dt>Minimizing Anomalies
<dd>An allocator configured using default settings should perform well
across a wide range of real loads that depend heavily on
dynamic allocation -- windowing toolkits, GUI applications, compilers,
interpretors, development tools, network (packet)-intensive programs,
graphics-intensive packages, web browsers,
string-processing applications, and so on.
</dl>
<p>
Paul Wilson and colleagues have written an excellent survey
paper on allocation techniques that discusses some of these goals
in more detail. See Paul R. Wilson, Mark S. Johnstone, Michael
Neely, and David Boles, ``Dynamic Storage Allocation: A Survey
and Critical Review'' in <em>International Workshop on Memory
Management</em>, September 1995 (also
available via <a href=
"ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps"> ftp</a>).
(Note that the version of my allocator they describe is
<em>not</em> the most current one however.)
<p>
As they discuss,
minimizing space by minimizing wastage (generally due to
fragmentation) must be the primary goal in any allocator.
<p>
For an extreme example, among the fastest possible versions of
<code>malloc()</code> is one that always allocates the next
sequential memory location available on the system, and the
corresponding fastest version of <code>free()</code> is a no-op.
However, such an implementation is hardly ever acceptable: it will
cause a program to run out of memory quickly since it never
reclaims unused space. Wastages seen in some allocators used in
practice can be almost this extreme under some loads. As Wilson
also notes, wastage can be measured monetarily: Considered
globally, poor allocation schemes cost people perhaps even
billions of dollars in memory chips.
<p>
While time-space issues dominate, the set of trade-offs and compromises
is nearly endless. Here are just a few of the many examples:
<ul>
<li> Accommodating worst-case alignment requirements increases
wastage by forcing the allocator to skip over bytes in order
to align chunks.
<li> Most provisions for dynamic tunability (such as setting
a <em>debug</em> mode) can seriously impact time efficiency
by adding levels of indirection and increasing numbers of branches.
<li> Some provisions designed to catch errors limit range of
applicability. For example, regardless of platform, the
current malloc internally handles allocation size arguments as if they
were signed 32-bit integers, and treats nonpositive arguments
as if they were requests for a size of zero. This is considered
by nearly all users as a feature rather than a bug: A negative
32-bit argument or a huge 64-bit argument is essentially always
a programming mistake. Returning a minimally-sized chunk will
help catch this error.
<li> Accommodating the oddities of other allocators to remain
plug-compatible with them can reduce flexibility and performance.
For the oddest example, some early versions of Unix allocators
allowed programmers to <code>realloc</code>
memory that had already been <code>freed</code>. Until 1993,
I allowed this for the sake of compatibility.
(However, no one at all complained when this ``feature'' was dropped.)
<li> Some (but by no means all) heuristics that improve time and/or
space for small programs cause unacceptably
worse time and/or space characteristics for larger programs that
dominate the load on typical systems these days.
</ul>
<p>
No set of compromises along these lines can be
perfect. However, over the years, the allocator has
evolved to make trade-offs that the majority of users find to
be acceptable. The driving forces that continue to impact the
evolution of this malloc include:
<ol>
<li> Empirical studies of malloc performance by others
(including the above-mentioned paper by Wilson et al, as well
as others that it in turn cites). These papers find that
versions of this malloc increasingly rank as simultaneously
among the most time- and space-efficient memory allocators
available. However, each reveals weaknesses or opportunities
for further improvements.
<li> Changes in target workloads. The nature of the kinds of
programs that are most sensitive to malloc implementations
continually change. For perhaps the primary example, the
memory characteristics of <em>X</em> and other windowing
systems increasingly dominate.
<li> Changes in systems and processors. Implementation details
and fine-tunings that try to make code readily optimizable for
typical processors change across time. Additionally, operating
systems (including Linux and Solaris) have themselves evolved,
for example to make memory mapping an occasionally-wise choice
for system-level allocation.
<li> Suggestions, experience reports, and code from users and
contributors. The code has evolved with the help of
several regular volunteer contributors.
The majority of recent changes were instigated
by people using the version supplied in Linux, and were
implemented in large part by Wolfram Gloger for the Linux
version and then integrated by me.
</ol>
<h2>Algorithms</h2>
The two core elements of the malloc algorithm have remained
unchanged since the earliest versions:
<p>
<dl>
<dt> Boundary Tags
<dd> Chunks of memory carry around with them size information
fields both before and after the chunk. This allows for
two important capabilities:
<ul>
<li> Two bordering unused chunks can be coalesced into
one larger chunk. This minimizes the number of unusable
small chunks.
<li> All chunks can be traversed starting from any known
chunk in either a forward or backward direction.
</ul>
<p>
<img src="malloc1.gif">
<p>
The original versions implemented boundary tags exactly in
this fashion. More recent versions omit trailer
fields on chunks that are in use by the program. This
is itself a minor trade-off: The fields are not ever used
while chunks are active so need not be present. Eliminating them decreases
overhead and wastage. However,
lack of these fields weakens error detection a bit by
making it impossible to check if users mistakenly overwrite
fields that should have known values.
<dt>Binning
<dd> Available chunks are maintained in bins, grouped by size.
There are a surprisingly large number (128) of fixed-width
bins, approximately logarithmically spaced in size. Bins for
sizes less than 512 bytes each hold only exactly one size
(spaced 8 bytes apart, simplifying enforcement of 8-byte alignment).
Searches for available chunks are processed in smallest-first,
<em>best-fit</em> order. As shown by Wilson et al, best-fit
schemes (of various kinds and approximations) tend to produce
the least fragmentation on real loads
compared to other general approaches such as first-fit.
<p>
<img src="malloc2.gif">
<p>
Until the versions released in 1995, chunks were left unsorted
within bins, so that the best-fit strategy was only approximate.
More recent versions instead sort chunks by size within bins, with
ties broken by an oldest-first rule. (This was done after finding that
the minor time investment was worth it to avoid observed bad cases.)
</dl>
<p>
Thus, the general categorization of this algorithm is
<em>best-first with coalescing</em>: Freed chunks are
coalesced with neighboring ones, and held in bins that are
searched in size order.
<p>
This approach leads to fixed
bookkeeping overhead per chunk. Because both size information
and bin links must be held in each available chunk, the
smallest allocatable chunk is 16 bytes in systems with 32-bit
pointers and 24 bytes in systems with 64-bit pointers. These
minimum sizes are larger than most people would like to see --
they can lead to significant wastage for example in
applications allocating many tiny linked-list nodes. However,
the 16 bytes minimum at least is characteristic of
<em>any</em> system requiring 8-byte alignment in which there
is <em>any</em> malloc bookkeeping overhead.
<p>
This basic algorithm can be made to be very fast. Even though
it rests upon a search mechanism to find best fits, the use
of indexing techniques, exploitation of special cases, and
careful coding lead to average cases requiring only a few
dozen instructions, depending of course on the machine and the
allocation pattern.
<p>
While coalescing via boundary tags and best-fit via binning
represent the main ideas of the algorithm, further
considerations lead to a number of heuristic
improvements. They include locality preservation, wilderness
preservation, memory mapping, and caching.
<h3>Locality preservation</h3>
Chunks allocated at about the same time by a program tend to have
similar reference patterns and coexistent lifetimes. Maintaining
locality minimizes page faults and cache misses, which can have
a dramatic effect on performance on modern processors.
If locality
were the <em>only</em> goal, an allocator might always allocate
each successive chunk as close to the previous one as possible.
However, this <em>nearest-fit</em> (often approximated by <em>next-fit</em>)
strategy can lead to very bad fragmentation. In the current
version of malloc, a version of next-fit is used only in a
restricted context that maintains locality in those cases where
it conflicts the least with other goals: If a chunk of the
exact desired size is not available, the most recently split-off
space is used (and resplit) if it is big enough; otherwise
best-fit is used. This restricted use eliminates cases where
a perfectly usable existing chunk fails to be allocated; thus
eliminating at least this form of fragmentation. And, because this form
of next-fit is faster than best-fit bin-search, it speeds up
the average <code>malloc</code>.
<h3>Wilderness Preservation</h3>
The ``wilderness'' (so named by Kiem-Phong Vo) chunk represents
the space bordering the topmost address allocated from the
system. Because it is at the border, it is the only chunk that
can be arbitrarily extended
(via <code>sbrk</code> in Unix) to be bigger than it is (unless
of course <code>sbrk</code> fails because all memory has been
exhausted).
<p>
One way to deal with the wilderness chunk is to
handle it about the same way as any other chunk. (This
technique was used in most versions of this malloc until 1994).
While this simplifies and speeds up implementation, without care
it can lead to some very bad worst-case space characteristics:
Among other problems, if the wilderness chunk is used when
another available chunk exists, you increase the chances that a
later request will cause an otherwise preventable
<code>sbrk</code>.
<p>
A better strategy is currently used: treat the wilderness
chunk as ``bigger'' than all others, since it can be made so
(up to system limitations) and use it as such in a best-first
scan. This results in the wilderness chunk always being used
only if no other chunk exists, further avoiding preventable
fragmentation.
<h3>Memory Mapping</h3>
<p>
In addition to extending general-purpose allocation regions
via <code>sbrk</code>, most versions of Unix support system
calls such as <code>mmap</code> that allocate a separate
non-contiguous region of memory for use by a program. This
provides a second option within <code>malloc</code> for
satisfying a memory request. Requesting and returning a
<code>mmap</code>ed chunk can further reduce downstream
fragmentation, since a released memory map does not create a
``hole'' that would need to be managed. However, because of
built-in limitations and overheads associated with
<code>mmap</code>, it is only worth doing this in very
restricted situations. For example, in all current systems,
mapped regions must be page-aligned. Also, invoking
<code>mmap</code> and <code>mfree</code> is much slower than
carving out an existing chunk of memory. For these reasons,
the current version of malloc relies on <code>mmap</code> only
if (1) the request is greater than a (dynamically adjustable)
threshold size (currently by default 1MB) and (2) the space
requested is not already available in the existing arena so
would have to be obtained via <code>sbrk</code>.
<p>
In part because <code>mmap</code> is not always applicable in most
programs, the current version of malloc also supports
<em>trimming</em> of the main arena, which achieves one of the effects
of memory mapping -- releasing unused space back to the system. When
long-lived programs contain brief peaks where they allocate large
amounts of memory, followed by longer valleys where the have more
modest requirements, system performance as a whole can be improved
by releasing unused parts of the <em>wilderness</em> chunk back to
the system. (In nearly all versions of Unix, <code>sbrk</code> can
be used with negative arguments to achieve this effect.) Releasing
space allows the underlying operating system to cut down on swap
space requirements and reuse memory mapping tables. However, as with
<code>mmap</code>, the call itself can be expensive, so is only attempted
if trailing unused memory exceeds a tunable threshold.
<h3>Caching</h3>
<p>
In the most straightforward version of the basic algorithm,
each freed chunk is immediately coalesced with neighbors to
form the largest possible unused chunk. Similarly, chunks
are created (by splitting larger chunks) only when
explicitly requested.
<p>
Operations to split and to coalesce chunks take time. This time
overhead can sometimes be avoided by using either of both of
two <em>caching</em> strategies:
<dl>
<dt> Deferred Coalescing
<dd> Rather than coalescing freed chunks, leave them at their
current sizes in hopes that another request for the same size
will come along soon. This saves a coalesce, a later split,
and the time it would take to find a non-exactly-matching chunk
to split.
<dt> Preallocation
<dd> Rather than splitting out new chunks one-by one, pre-split
many at once. This is normally faster than doing it one-at-a-time.
</dl>
Because the basic data structures in the allocator permit
coalescing at any time, in any of <code>malloc</code>,
<code>free</code>, or <code>realloc</code>, corresponding caching
heuristics are easy to apply.
<p>
The effectiveness of caching obviously depends on the costs of
splitting, coalescing, and searching relative to the work
needed to track cached chunks. Additionally, effectiveness
less obviously depends on the policy used in deciding when
to cache versus coalesce them. .
<p>
Caching can be a good idea in programs that continuously
allocate and release chunks of only a few sizes.
For example, if you write a program that
allocates and frees many tree nodes, you might decide that is
worth it to cache some nodes, assuming you know of a fast way
to do this. However, without knowledge of the program,
<code>malloc</code> cannot know whether it would be a good
idea to coalesce cached small chunks in order to satisfy a
larger request, or whether that larger request should be taken
from somewhere else. And it is difficult for the allocator to
make more informed guesses about this matter. For example, it
is just as costly for an allocator to determine how much total
contiguous space would be gained by coalescing chunks as it
would be to just coalesce them and then resplit them.
<p>
Previous versions of the allocator used a few
search-ordering heuristics that made adequate guesses about
caching, although with occasionally bad worst-case
results. But across time, these heuristics appear to be
decreasingly effective under real loads. This is probably because
actual programs that rely heavily on malloc increasingly tend
to use a larger variety of chunk sizes. For example, in C++
programs, this probably corresponds to a trend for programs to
use an increasing number of classes. Different classes tend to
have different sizes.
<p>
As a consequence, the current version <em>never</em> caches
chunks. It appears to be more effective to concentrate
efforts on further reducing the costs of handling non-cached
chunks than to rely on policies and heuristics that are of
decreasing utility. However, the issue is still open for further
experimentation.
<h3>Lookasides</h3>
<p>
There remains one kind of caching that is highly desirable in
some applications but not implemented in this allocator --
lookasides for very small chunks. As mentioned above, the
basic algorithm imposes a minimum chunk size that can be
very wasteful for very small requests. For example, a linked
list on a system with 4-byte pointers might allocate nodes
holding only, say, two pointers, requiring only 8 bytes.
Since the minimum chunk size is 16 bytes, user programs
allocating only list nodes suffer 100% overhead.
<p>
Eliminating this problem while still maintaining portable
alignment would require that the allocator not impose
<em>any</em> overhead. Techniques for carrying this out
exist. For example, chunks could be checked to see if they
belong to a larger aggregated space via address
comparisons. However, doing so can impose significant costs;
in fact the cost would be unacceptable in this allocator.
Chunks are not otherwise tracked by address, so unless
arbitrarily limited, checking might lead to random searches
through memory. Additionally, support requires the adoption of
one or more policies controlling whether and how to ever
coalesce small chunks.
<p>
Such issues and limitations lead to one of the very few kinds
of situations in which programmers should routinely write their
own special purpose memory management routines (by, for example
in C++ overloading <code>operator new()</code>). Programs relying
on large but approximately known numbers of very small chunks
may find it profitable to build very simple allocators. For
example, chunks can be allocated out of a fixed array with
an embedded freelist, along with a provision to rely on
<code>malloc</code> as a backup if the array becomes exhausted.
Somewhat more flexibly, these can be based on the C or C++
versions of <em>obstack</em> available with GNU gcc and libg++.
<hr>
<address><a href="mailto:dl@gee.cs.oswego.edu">Doug Lea</a></address>
<!-- Created: Fri Oct 25 19:07:46 EDT 1996 -->
<!-- hhmts start -->
Last modified: Wed Dec 4 12:20:31 EST
<!-- hhmts end -->
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,828 @@
eXtended Memory Specification (XMS), ver 2.0
July 19, 1988
Copyright (c) 1988, Microsoft Corporation, Lotus Development
Corporation, Intel Corporation, and AST Research, Inc.
Microsoft Corporation
Box 97017
16011 NE 36th Way
Redmond, WA 98073
LOTUS (r)
INTEL (r)
MICROSOFT (r)
AST (r) Research
This specification was jointly developed by Microsoft Corporation,
Lotus Development Corporation, Intel Corporation,and AST Research,
Inc. Although it has been released into the public domain and is not
confidential or proprietary, the specification is still the copyright
and property of Microsoft Corporation, Lotus Development Corporation,
Intel Corporation, and AST Research, Inc.
Disclaimer of Warranty
MICROSOFT CORPORATION, LOTUS DEVELOPMENT CORPORATION, INTEL
CORPORATION, AND AST RESEARCH, INC., EXCLUDE ANY AND ALL IMPLIED
WARRANTIES, INCLUDING WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. NEITHER MICROSOFT NOR LOTUS NOR INTEL NOR AST
RESEARCH MAKE ANY WARRANTY OF REPRESENTATION, EITHER EXPRESS OR
IMPLIED, WITH RESPECT TO THIS SPECIFICATION, ITS QUALITY,
PERFORMANCE, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
NEITHER MICROSOFT NOR LOTUS NOR INTEL NOR AST RESEARCH SHALL HAVE ANY
LIABILITY FOR SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING
OUT OF OR RESULTING FROM THE USE OR MODIFICATION OF THIS
SPECIFICATION.
This specification uses the following trademarks:
Intel is a registered trademark of Intel Corporation, Microsoft is a
registered trademark of Microsoft Corporation, Lotus is a registered
trademark of Lotus Development Corporation, and AST is a registered
trademark of AST Research, Inc.
Extended Memory Specification
=============================
The purpose of this document is to define the Extended Memory Specifica-
tion (XMS) version 2.00 for MS-DOS. XMS allows DOS programs to utilize
additional memory found in Intel's 80286 and 80386 based machines in
a consistent, machine independent manner. With some restrictions, XMS adds
almost 64K to the 640K which DOS programs can access directly. Depending on
available hardware, XMS may provide even more memory to DOS programs. XMS
also provides DOS programs with a standard method of storing data in extended
memory.
DEFINITIONS:
------------
Extended
Memory - Memory in 80286 and 80386 based machines which is located
above the 1MB address boundary.
High Memory
Area (HMA) - The first 64K of extended memory. The High Memory
Area is unique because code can be executed in it while
in real mode. The HMA officially starts at FFFF:10h
and ends at FFFF:FFFFh making it 64K-16 bytes in length.
Upper Memory
Blocks (UMBs)- Blocks of memory available on some 80x86 based machines
which are located between DOS's 640K limit and the
1MB address boundary. The number, size, and location
of these blocks vary widely depending upon the types
of hardware adapter cards installed in the machine.
Extended Memory
Blocks (EMBs)- Blocks of extended memory located above the HMA which
can only be used for data storage.
A20 Line - The 21st address line of 80x86 CPUs. Enabling the A20
line allows access to the HMA.
XMM - An Extended Memory Manager. A DOS device driver which
implements XMS. XMMs are machine specific but allow
programs to use extended memory in a machine-independent
manner.
HIMEM.SYS - The Extended Memory Manager currently being distributed
by Microsoft.
Helpful Diagram:
| | Top of Memory
| |
| |
| /\ |
| /||\ |
| || |
| || |
|.......................................................|
| |
| |
| Possible Extended Memory Block |
| |
| |
|.......................................................|
| || |
| || |
| \||/ |
| \/ |
| |
| |
| Other EMBs could exist above 1088K (1MB+64K) |
| |
| |
|-------------------------------------------------------| 1088K
| |
| |
| The High Memory Area |
| |
| |
|=======================================================| 1024K or 1MB
| |
| /\ |
| /||\ |
| || |
| || |
|.......................................................|
| |
| Possible Upper Memory Block |
|.......................................................|
| || |
| || |
| \||/ |
| \/ |
| |
| Other UMBs could exist between 640K and 1MB |
| |
|-------------------------------------------------------| 640K
| |
| |
| |
| Conventional or DOS Memory |
| |
| |
| |
| |
| |
+-------------------------------------------------------+ 0K
DRIVER INSTALLATION:
--------------------
An XMS driver is installed by including a DEVICE= statement in the
machine's CONFIG.SYS file. It must be installed prior to any other
devices or TSRs which use it. An optional parameter after the driver's
name (suggested name "/HMAMIN=") indicates the minimum amount of space in
the HMA a program can use. Programs which use less than the minimum will
not be placed in the HMA. See "Prioritizing HMA Usage" below for more
information. A second optional parameter (suggested name "/NUMHANDLES=")
allows users to specify the maximum number of extended memory blocks which
may be allocated at any time.
NOTE: XMS requires DOS 3.00 or above.
THE PROGRAMMING API:
--------------------
The XMS API Functions are accessed via the XMS driver's Control Function.
The address of the Control Function is determined via INT 2Fh. First, a
program should determine if an XMS driver is installed. Next, it should
retrieve the address of the driver's Control Function. It can then use any
of the available XMS functions. The functions are divided into several
groups:
1. Driver Information Functions (0h)
2. HMA Management Functions (1h-2h)
3. A20 Management Functions (3h-7h)
4. Extended Memory Management Functions (8h-Fh)
5. Upper Memory Management Functions (10h-11h)
DETERMINING IF AN XMS DRIVER IS INSTALLED:
------------------------------------------
The recommended way of determining if an XMS driver is installed is to
set AH=43h and AL=00h and then execute INT 2Fh. If an XMS driver is available,
80h will be returned in AL.
Example:
; Is an XMS driver installed?
mov ax,4300h
int 2Fh
cmp al,80h
jne NoXMSDriver
CALLING THE API FUNCTIONS:
--------------------------
Programs can execute INT 2Fh with AH=43h and AL=10h to obtain the address
of the driver's control function. The address is returned in ES:BX. This
function is called to access all of the XMS functions. It should be called
with AH set to the number of the API function requested. The API function
will put a success code of 0001h or 0000h in AX. If the function succeeded
(AX=0001h), additional information may be passed back in BX and DX. If the
function failed (AX=0000h), an error code may be returned in BL. Valid
error codes have their high bit set. Developers should keep in mind that
some of the XMS API functions may not be implemented by all drivers and will
return failure in all cases.
Example:
; Get the address of the driver's control function
mov ax,4310h
int 2Fh
mov word ptr [XMSControl],bx ; XMSControl is a DWORD
mov word ptr [XMSControl+2],es
; Get the XMS driver's version number
mov ah,00h
call [XMSControl] ; Get XMS Version Number
NOTE: Programs should make sure that at least 256 bytes of stack space
is available before calling XMS API functions.
API FUNCTION DESCRIPTIONS:
--------------------------
The following XMS API functions are available:
0h) Get XMS Version Number
1h) Request High Memory Area
2h) Release High Memory Area
3h) Global Enable A20
4h) Global Disable A20
5h) Local Enable A20
6h) Local Disable A20
7h) Query A20
8h) Query Free Extended Memory
9h) Allocate Extended Memory Block
Ah) Free Extended Memory Block
Bh) Move Extended Memory Block
Ch) Lock Extended Memory Block
Dh) Unlock Extended Memory Block
Eh) Get Handle Information
Fh) Reallocate Extended Memory Block
10h) Request Upper Memory Block
11h) Release Upper Memory Block
Each is described below.
Get XMS Version Number (Function 00h):
--------------------------------------
ARGS: AH = 00h
RETS: AX = XMS version number
BX = Driver internal revision number
DX = 0001h if the HMA exists, 0000h otherwise
ERRS: None
This function returns with AX equal to a 16-bit BCD number representing
the revision of the DOS Extended Memory Specification which the driver
implements (e.g. AX=0235h would mean that the driver implemented XMS version
2.35). BX is set equal to the driver's internal revision number mainly for
debugging purposes. DX indicates the existence of the HMA (not its
availability) and is intended mainly for installation programs.
NOTE: This document defines version 2.00 of the specification.
Request High Memory Area (Function 01h):
----------------------------------------
ARGS: AH = 01h
If the caller is a TSR or device driver,
DX = Space needed in the HMA by the caller in bytes
If the caller is an application program,
DX = FFFFh
RETS: AX = 0001h if the HMA is assigned to the caller, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = 90h if the HMA does not exist
BL = 91h if the HMA is already in use
BL = 92h if DX is less than the /HMAMIN= parameter
This function attempts to reserve the 64K-16 byte high memory area for
the caller. If the HMA is currently unused, the caller's size parameter is
compared to the /HMAMIN= parameter on the driver's command line. If the
value passed by the caller is greater than or equal to the amount specified
by the driver's parameter, the request succeeds. This provides the ability
to ensure that programs which use the HMA efficiently have priority over
those which do not.
NOTE: See the sections "Prioritizing HMA Usage" and "High Memory Area
Restrictions" below for more information.
Release High Memory Area (Function 02h):
----------------------------------------
ARGS: AH = 02h
RETS: AX = 0001h if the HMA is successfully released, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = 90h if the HMA does not exist
BL = 93h if the HMA was not allocated
This function releases the high memory area and allows other programs to
use it. Programs which allocate the HMA must release it before exiting.
When the HMA has been released, any code or data stored in it becomes invalid
and should not be accessed.
Global Enable A20 (Function 03h):
---------------------------------
ARGS: AH = 03h
RETS: AX = 0001h if the A20 line is enabled, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = 82h if an A20 error occurs
This function attempts to enable the A20 line. It should only be used
by programs which have control of the HMA. The A20 line should be turned
off via Function 04h (Global Disable A20) before a program releases control
of the system.
NOTE: On many machines, toggling the A20 line is a relatively slow
operation.
Global Disable A20 (Function 04h):
----------------------------------
ARGS: AH = 04h
RETS: AX = 0001h if the A20 line is disabled, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = 82h if an A20 error occurs
BL = 94h if the A20 line is still enabled
This function attempts to disable the A20 line. It should only be used
by programs which have control of the HMA. The A20 line should be disabled
before a program releases control of the system.
NOTE: On many machines, toggling the A20 line is a relatively slow
operation.
Local Enable A20 (Function 05h):
--------------------------------
ARGS: AH = 05h
RETS: AX = 0001h if the A20 line is enabled, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = 82h if an A20 error occurs
This function attempts to enable the A20 line. It should only be used
by programs which need direct access to extended memory. Programs which use
this function should call Function 06h (Local Disable A20) before releasing
control of the system.
NOTE: On many machines, toggling the A20 line is a relatively slow
operation.
Local Disable A20 (Function 06h):
---------------------------------
ARGS: AH = 06h
RETS: AX = 0001h if the function succeeds, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = 82h if an A20 error occurs
BL = 94h if the A20 line is still enabled
This function cancels a previous call to Function 05h (Local Enable
A20). It should only be used by programs which need direct access to
extended memory. Previous calls to Function 05h must be canceled before
releasing control of the system.
NOTE: On many machines, toggling the A20 line is a relatively slow
operation.
Query A20 (Function 07h):
-------------------------
ARGS: AH = 07h
RETS: AX = 0001h if the A20 line is physically enabled, 0000h otherwise
ERRS: BL = 00h if the function succeeds
BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
This function checks to see if the A20 line is physically enabled. It
does this in a hardware independent manner by seeing if "memory wrap" occurs.
Query Free Extended Memory (Function 08h):
------------------------------------------
ARGS: AH = 08h
RETS: AX = Size of the largest free extended memory block in K-bytes
DX = Total amount of free extended memory in K-bytes
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = A0h if all extended memory is allocated
This function returns the size of the largest available extended memory
block in the system.
NOTE: The 64K HMA is not included in the returned value even if it is
not in use.
Allocate Extended Memory Block (Function 09h):
----------------------------------------------
ARGS: AH = 09h
DX = Amount of extended memory being requested in K-bytes
RETS: AX = 0001h if the block is allocated, 0000h otherwise
DX = 16-bit handle to the allocated block
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = A0h if all available extended memory is allocated
BL = A1h if all available extended memory handles are in use
This function attempts to allocate a block of the given size out of the
pool of free extended memory. If a block is available, it is reserved
for the caller and a 16-bit handle to that block is returned. The handle
should be used in all subsequent extended memory calls. If no memory was
allocated, the returned handle is null.
NOTE: Extended memory handles are scarce resources. Programs should
try to allocate as few as possible at any one time. When all
of a driver's handles are in use, any free extended memory is
unavailable.
Free Extended Memory Block (Function 0Ah):
------------------------------------------
ARGS: AH = 0Ah
DX = Handle to the allocated block which should be freed
RETS: AX = 0001h if the block is successfully freed, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = A2h if the handle is invalid
BL = ABh if the handle is locked
This function frees a block of extended memory which was previously
allocated using Function 09h (Allocate Extended Memory Block). Programs
which allocate extended memory should free their memory blocks before
exiting. When an extended memory buffer is freed, its handle and all data
stored in it become invalid and should not be accessed.
Move Extended Memory Block (Function 0Bh):
------------------------------------------
ARGS: AH = 0Bh
DS:SI = Pointer to an Extended Memory Move Structure (see below)
RETS: AX = 0001h if the move is successful, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = 82h if an A20 error occurs
BL = A3h if the SourceHandle is invalid
BL = A4h if the SourceOffset is invalid
BL = A5h if the DestHandle is invalid
BL = A6h if the DestOffset is invalid
BL = A7h if the Length is invalid
BL = A8h if the move has an invalid overlap
BL = A9h if a parity error occurs
Extended Memory Move Structure Definition:
ExtMemMoveStruct struc
Length dd ? ; 32-bit number of bytes to transfer
SourceHandle dw ? ; Handle of source block
SourceOffset dd ? ; 32-bit offset into source
DestHandle dw ? ; Handle of destination block
DestOffset dd ? ; 32-bit offset into destination block
ExtMemMoveStruct ends
This function attempts to transfer a block of data from one location to
another. It is primarily intended for moving blocks of data between
conventional memory and extended memory, however it can be used for moving
blocks within conventional memory and within extended memory.
NOTE: If SourceHandle is set to 0000h, the SourceOffset is interpreted
as a standard segment:offset pair which refers to memory that is
directly accessible by the processor. The segment:offset pair
is stored in Intel DWORD notation. The same is true for DestHandle
and DestOffset.
SourceHandle and DestHandle do not have to refer to locked memory
blocks.
Length must be even. Although not required, WORD-aligned moves
can be significantly faster on most machines. DWORD aligned move
can be even faster on 80386 machines.
If the source and destination blocks overlap, only forward moves
(i.e. where the source base is less than the destination base) are
guaranteed to work properly.
Programs should not enable the A20 line before calling this
function. The state of the A20 line is preserved.
This function is guaranteed to provide a reasonable number of
interrupt windows during long transfers.
Lock Extended Memory Block (Function 0Ch):
------------------------------------------
ARGS: AH = 0Ch
DX = Extended memory block handle to lock
RETS: AX = 0001h if the block is locked, 0000h otherwise
DX:BX = 32-bit linear address of the locked block
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = A2h if the handle is invalid
BL = ACh if the block's lock count overflows
BL = ADh if the lock fails
This function locks an extended memory block and returns its base
address as a 32-bit linear address. Locked memory blocks are guaranteed not
to move. The 32-bit pointer is only valid while the block is locked.
Locked blocks should be unlocked as soon as possible.
NOTE: A block does not have to be locked before using Function 0Bh (Move
Extended Memory Block).
"Lock counts" are maintained for EMBs.
Unlock Extended Memory Block (Function 0Dh):
--------------------------------------------
ARGS: AH = 0Dh
DX = Extended memory block handle to unlock
RETS: AX = 0001h if the block is unlocked, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = A2h if the handle is invalid
BL = AAh if the block is not locked
This function unlocks a locked extended memory block. Any 32-bit
pointers into the block become invalid and should no longer be used.
Get EMB Handle Information (Function 0Eh):
------------------------------------------
ARGS: AH = 0Eh
DX = Extended memory block handle
RETS: AX = 0001h if the block's information is found, 0000h otherwise
BH = The block's lock count
BL = Number of free EMB handles in the system
DX = The block's length in K-bytes
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = A2h if the handle is invalid
This function returns additional information about an extended memory
block to the caller.
NOTE: To get the block's base address, use Function 0Ch (Lock Extended
Memory Block).
Reallocate Extended Memory Block (Function 0Fh):
------------------------------------------------
ARGS: AH = 0Fh
BX = New size for the extended memory block in K-bytes
DX = Unlocked extended memory block handle to reallocate
RETS: AX = 0001h if the block is reallocated, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = 81h if a VDISK device is detected
BL = A0h if all available extended memory is allocated
BL = A1h if all available extended memory handles are in use
BL = A2h if the handle is invalid
BL = ABh if the block is locked
This function attempts to reallocate an unlocked extended memory block
so that it becomes the newly specified size. If the new size is smaller
than the old block's size, all data at the upper end of the old block is
lost.
Request Upper Memory Block (Function 10h):
------------------------------------------
ARGS: AH = 10h
DX = Size of requested memory block in paragraphs
RETS: AX = 0001h if the request is granted, 0000h otherwise
BX = Segment number of the upper memory block
If the request is granted,
DX = Actual size of the allocated block in paragraphs
otherwise,
DX = Size of the largest available UMB in paragraphs
ERRS: BL = 80h if the function is not implemented
BL = B0h if a smaller UMB is available
BL = B1h if no UMBs are available
This function attempts to allocate an upper memory block to the caller.
If the function fails, the size of the largest free UMB is returned in DX.
NOTE: By definition UMBs are located below the 1MB address boundary.
The A20 Line does not need to be enabled before accessing an
allocated UMB.
UMBs are paragraph aligned.
To determine the size of the largest available UMB, attempt to
allocate one with a size of FFFFh.
UMBs are unaffected by EMS calls.
Release Upper Memory Block (Function 11h):
------------------------------------------
ARGS: AH = 11h
DX = Segment number of the upper memory block
RETS: AX = 0001h if the block was released, 0000h otherwise
ERRS: BL = 80h if the function is not implemented
BL = B2h if the UMB segment number is invalid
This function frees a previously allocated upper memory block. When an
UMB has been released, any code or data stored in it becomes invalid and
should not be accessed.
PRIORITIZING HMA USAGE:
-----------------------
For DOS users to receive the maximum benefit from the High Memory Area,
programs which use the HMA must store as much of their resident code in it as
is possible. It is very important that developers realize that the HMA is
allocated as a single unit.
For example, a TSR program which grabs the HMA and puts 10K of code into
it may prevent a later TSR from putting 62K into the HMA. Obviously, regular
DOS programs would have more memory available to them below the 640K line if
the 62K TSR was moved into the HMA instead of the 10K one.
The first method for dealing with conflicts such as this is to require
programs which use the HMA to provide a command line option for disabling
this feature. It is crucial that TSRs which do not make full use of the HMA
provide such a switch on their own command line (suggested name "/NOHMA").
The second method for optimizing HMA usage is through the /HMAMIN=
parameter on the XMS device driver line. The number after the parameter
is defined to be the minimum amount of HMA space (in K-bytes) used by any
driver or TSR. For example, if "DEVICE=HIMEM.SYS /HMAMIN=48" is in a
user's CONFIG.SYS file, only programs which request at least 48K would be
allowed to allocate the HMA. This number can be adjusted either by
installation programs or by the user himself. If this parameter is not
specified, the default value of 0 is used causing the HMA to be allocated
on a first come, first served basis.
Note that this problem does not impact application programs. If the HMA
is available when an application program starts, the application is free to
use as much or as little of the HMA as it wants. For this reason,
applications should pass FFFFh in DX when calling Function 01h.
HIGH MEMORY AREA RESTRICTIONS:
------------------------------
- Far pointers to data located in the HMA cannot be passed to DOS. DOS
normalizes any pointer which is passed into it. This will cause data
addresses in the HMA to be invalidated.
- Disk I/O directly into the HMA (via DOS, INT 13h, or otherwise) is not
recommended.
- Programs, especially drivers and TSRs, which use the HMA *MUST* use
as much of it as possible. If a driver or TSR is unable to use at
least 90% of the available HMA (typically ~58K), they must provide
a command line switch for overriding HMA usage. This will allow
the user to configure his machine for optimum use of the HMA.
- Device drivers and TSRs cannot leave the A20 line permanently turned
on. Several applications rely on 1MB memory wrap and will overwrite the
HMA if the A20 line is left enabled potentially causing a system crash.
- Interrupt vectors must not point into the HMA. This is a result of
the previous restriction. Note that interrupt vectors can point into
any allocated upper memory blocks however.
ERROR CODE INDEX:
-----------------
If AX=0000h when a function returns and the high bit of BL is set,
BL=80h if the function is not implemented
81h if a VDISK device is detected
82h if an A20 error occurs
8Eh if a general driver error occurs
8Fh if an unrecoverable driver error occurs
90h if the HMA does not exist
91h if the HMA is already in use
92h if DX is less than the /HMAMIN= parameter
93h if the HMA is not allocated
94h if the A20 line is still enabled
A0h if all extended memory is allocated
A1h if all available extended memory handles are in use
A2h if the handle is invalid
A3h if the SourceHandle is invalid
A4h if the SourceOffset is invalid
A5h if the DestHandle is invalid
A6h if the DestOffset is invalid
A7h if the Length is invalid
A8h if the move has an invalid overlap
A9h if a parity error occurs
AAh if the block is not locked
ABh if the block is locked
ACh if the block's lock count overflows
ADh if the lock fails
B0h if a smaller UMB is available
B1h if no UMBs are available
B2h if the UMB segment number is invalid
IMPLEMENTATION NOTES FOR DOS XMS DRIVERS:
-----------------------------------------
- A DOS XMS driver's control function must begin with code similar to the
following:
XMMControl proc far
jmp short XCControlEntry ; For "hookability"
nop ; NOTE: The jump must be a short
nop ; jump to indicate the end of
nop ; any hook chain. The nop's
; allow a far jump to be
; patched in.
XCControlEntry:
- XMS drivers must preserve all registers except those containing
returned values across any function call.
- XMS drivers are required to hook INT 15h and watch for calls to
functions 87h (Block Move) and 88h (Extended Memory Available). The
INT 15h Block Move function must be hooked so that the state of the A20
line is preserved across the call. The INT 15h Extended Memory
Available function must be hooked to return 0h to protect the HMA.
- In order to maintain compatibility with existing device drivers, DOS XMS
drivers must not hook INT 15h until the first non-Version Number call
to the control function is made.
- XMS drivers are required to check for the presence of drivers which
use the IBM VDISK allocation scheme. Note that it is not sufficient to
check for VDISK users at installation time but at the time when the HMA
is first allocated. If a VDISK user is detected, the HMA must not be
allocated. Microsoft will publish a standard method for detecting
drivers which use the VDISK allocation scheme.
- XMS drivers which have a fixed number of extended memory handles (most
do) should implement a command line parameter for adjusting that number
(suggested name "/NUMHANDLES=")
- XMS drivers should make sure that the major DOS version number is
greater than or equal to 3 before installing themselves.
- UMBs cannot occupy memory addresses that can be banked by EMS 4.0.
EMS 4.0 takes precedence over UMBs for physically addressable memory.
- All driver functions must be re-entrant. Care should be taken to not
leave interrupts disabled for long periods of time.
- Allocation of a zero length extended memory buffer is allowed. Programs
which hook XMS drivers may need to reserve a handle for private use via
this method. Programs which hook an XMS driver should pass all requests
for zero length EMBs to the next driver in the chain.
- Drivers should control the A20 line via an "enable count." Local En-
able only enables the A20 line if the count is zero. It then increments
the count. Local Disable only disables A20 if the count is one. It
then decrements the count. Global Enable/Disable keeps a flag which
indicates the state of A20. They use Local Enable/Disable to actually
change the state.
IMPLEMENTATION NOTES FOR HIMEM.SYS:
-----------------------------------
- HIMEM.SYS currently supports true AT-compatibles, 386 AT machines, IBM
PS/2s, AT&T 6300 Plus systems and Hewlett Packard Vectras.
- If HIMEM finds that it cannot properly control the A20 line or if there
is no extended memory available when HIMEM.SYS is invoked, the driver
does not install itself. HIMEM.SYS displays the message "High Memory
Area Unavailable" when this situation occurs.
- If HIMEM finds that the A20 line is already enabled when it is invoked,
it will NOT change the A20 line's state. The assumption is that whoever
enabled it knew what they were doing. HIMEM.SYS displays the message "A20
Line Permanently Enabled" when this situation occurs.
- HIMEM.SYS is incompatible with IBM's VDISK.SYS driver and other drivers
which use the VDISK scheme for allocating extended memory. However,
HIMEM does attempt to detect these drivers and will not allocate the
HMA if one is found.
- HIMEM.SYS supports the optional "/HMAMIN=" parameter. The valid values
are decimal numbers between 0 and 63.
- By default, HIMEM.SYS has 32 extended memory handles available for use.
This number may be adjusted with the "/NUMHANDLES=" parameter. The
maximum value for this parameter is 128 and the minimum is 0. Each
handle currently requires 6 bytes of resident space.
Copyright (c) 1988, Microsoft Corporation

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,7 @@
<html>
<head>
<meta http-equiv="refresh" content="0;url=/Linux.old/sabre/os/articles">
</head>
<body lang="zh-CN">
</body>
</html>

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.0 KiB