add directory study
This commit is contained in:
BIN
study/sabre/os/files/Misc/AddressTranslation.pdf
Normal file
BIN
study/sabre/os/files/Misc/AddressTranslation.pdf
Normal file
Binary file not shown.
BIN
study/sabre/os/files/Misc/BSD_VM/fig1.gif
Normal file
BIN
study/sabre/os/files/Misc/BSD_VM/fig1.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 464 B |
BIN
study/sabre/os/files/Misc/BSD_VM/fig2.gif
Normal file
BIN
study/sabre/os/files/Misc/BSD_VM/fig2.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 697 B |
BIN
study/sabre/os/files/Misc/BSD_VM/fig3.gif
Normal file
BIN
study/sabre/os/files/Misc/BSD_VM/fig3.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1008 B |
BIN
study/sabre/os/files/Misc/BSD_VM/fig4.gif
Normal file
BIN
study/sabre/os/files/Misc/BSD_VM/fig4.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 942 B |
870
study/sabre/os/files/Misc/BSD_VM/index.html
Normal file
870
study/sabre/os/files/Misc/BSD_VM/index.html
Normal file
@@ -0,0 +1,870 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>OS/RC: Design Elements of the FreeBSD VM System</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff">
|
||||
|
||||
<p align=right>Back to the <a href="..">OS/RC</a></p>
|
||||
|
||||
<h2>Design Elements of the FreeBSD VM System</h2>
|
||||
<h3>By Matthew Dillon <a href="mailto:dillon@apollo.backplane.com">dillon@apollo.backplane.com</a></h3>
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
The title is really just a fancy way of saying that I am going
|
||||
to attempt to describe the whole VM enchilada, hopefully in a
|
||||
way that everyone can follow. For the last year I have
|
||||
concentrated on a number of major kernel subsystems within
|
||||
FreeBSD, with the VM and Swap subsystems being the most
|
||||
interesting and NFS being 'a necessary chore'. I rewrote only
|
||||
small portions of the code. In the VM arena the only major
|
||||
rewrite I have done is to the swap subsystem. Most of my work
|
||||
was cleanup and maintenance, with only moderate code rewriting
|
||||
and no major algorithmic adjustments within the VM subsystem.
|
||||
The bulk of the VM subsystem's theoretical base remains unchanged
|
||||
and a lot of the credit for the modernization effort in the
|
||||
last few years belongs to John Dyson and David Greenman. Not
|
||||
being a historian like Kirk I will not attempt to tag all the
|
||||
various features with peoples names, since I will invariably
|
||||
get it wrong.
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Before moving along to the actual design let's spend a little
|
||||
time on the necessity of maintaining and modernizing any
|
||||
long-living codebase. In the programming world, algorithms
|
||||
tend to be more important than code and it is precisely due to
|
||||
BSD's academic roots that a great deal of attention was paid
|
||||
to algorithm design from the beginning. More attention paid
|
||||
to the design generally leads to a clean and flexible codebase
|
||||
that can be fairly easily modified, extended, or replaced over
|
||||
time. While BSD is considered an 'old' operating system by
|
||||
some people, those of us who work on it tend to view it more
|
||||
as a 'mature' codebase which has various components modified,
|
||||
extended, or replaced with modern code. It has evolved, and
|
||||
FreeBSD is at the bleeding edge no matter how old some of the
|
||||
code might be. This is an important distinction to make and
|
||||
one that is unfortunately lost to many people. The biggest
|
||||
error a programmer can make is to not learn from history, and
|
||||
this is precisely the error that many other modern operating
|
||||
systems have made. NT is the best example of this, and the
|
||||
consequences have been dire. Linux also makes this mistake to
|
||||
some degree -- enough that we BSD folk can make small jokes
|
||||
about it every once in a while, anyway (grin). Linux's problem
|
||||
is simply one of a lack of experience and history to compare
|
||||
ideas against, a problem that is easily and rapidly being
|
||||
addressed by the Linux community in the same way it has been
|
||||
addressed in the BSD community -- by continuous code development.
|
||||
The NT folk, on the other hand, repeatedly make the same mistakes
|
||||
solved by UNIX decades ago and then spend years fixing them.
|
||||
Over and over again. They have a severe case of 'not designed
|
||||
here' and 'we are always right because our marketing department
|
||||
says so'. I have little tolerance for anyone who cannot learn
|
||||
from history.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Much of the apparent complexity of the FreeBSD design, especially
|
||||
in the VM/Swap subsystem, is a direct result of having to solve
|
||||
serious performance issues that occur under various conditions.
|
||||
These issues are not due to bad algorithmic design but instead
|
||||
rise from environmental factors. In any direct comparison
|
||||
between platforms, these issues become most apparent when system
|
||||
resources begin to get stressed. As I describe FreeBSD's
|
||||
VM/Swap subsystem the reader should always keep two points in
|
||||
mind. First, the most important aspect of performance design
|
||||
is what is known as "Optimizing the Critical Path". It is
|
||||
often the case that performance optimizations add a little
|
||||
bloat to the code in order to make the critical path perform
|
||||
better. Second, a solid, generalized design outperforms a
|
||||
heavily-optimized design over the long run. While a generalized
|
||||
design may end up being slower than an heavily-optimized design
|
||||
when they are first implemented, the generalized design tends
|
||||
to be easier to adapt to changing conditions and the
|
||||
heavily-optimized design winds up having to be thrown away.
|
||||
Any codebase that will survive and be maintainable for years
|
||||
must therefore be designed properly from the beginning even if
|
||||
it costs some performance. Twenty years ago people were still
|
||||
arguing that programming in assembly was better than programming
|
||||
in a high-level language because it produced code that was ten
|
||||
times as fast. Today, the fallibility of that argument is
|
||||
obvious -- as are the parallels to algorithmic design and code
|
||||
generalization.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
<strong>VM Objects</strong>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
The best way to begin describing the FreeBSD VM system is to
|
||||
look at it from the perspective of a user-level process. Each
|
||||
user process sees a single, private, contiguous VM address
|
||||
space containing several types of memory objects. These objects
|
||||
have various characteristics. Program code and program data
|
||||
are effectively a single memory-mapped file (the binary file
|
||||
being run), but program code is read-only while program data
|
||||
is copy-on-write. Program BSS is just memory allocated and
|
||||
filled with zeros on demand, called demand zero page fill.
|
||||
Arbitrary files can be memory-mapped into the address space as
|
||||
well, which is how the shared library mechanism works. Such
|
||||
mappings can require modifications to remain private to the
|
||||
process making them. The fork system call adds an entirely
|
||||
new dimension to the VM management problem on top of the
|
||||
complexity already given.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
A program binary data page (which is a basic copy-on-write
|
||||
page) illustrates the complexity. A program binary contains
|
||||
a preinitialized data section which is initially mapped directly
|
||||
from the program file. When a program is loaded into a process's
|
||||
VM space, this area is initially memory-mapped and backed by
|
||||
the program binary itself, allowing the VM system to free/reuse
|
||||
the page and later load it back in from the binary. The moment
|
||||
a process modifies this data, however, the VM system must make
|
||||
a private copy of the page for that process. Since the private
|
||||
copy has been modified, the VM system may no longer free it,
|
||||
because there is no longer any way to restore it later on.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
You will notice immediately that what was originally a simple
|
||||
file mapping has become much more complex. Data may be modified
|
||||
on a page-by-page basis whereas the file mapping encompasses
|
||||
many pages at once. The complexity further increases when a
|
||||
process forks. When a process forks, the result is two processes
|
||||
-- each with their own private address spaces, including any
|
||||
modifications made by the original process prior to the call
|
||||
to fork(). It would be silly for the VM system to make a
|
||||
complete copy of the data at the time of the fork() because it
|
||||
is quite possible that at least one of the two processes will
|
||||
only need to read from that page from then on, allowing the
|
||||
original page to continue to be used. What was a private page
|
||||
is made copy-on-write again, since each process (parent and
|
||||
child) expects their own personal post-fork modifications to
|
||||
remain private to themselves and not effect the other.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
FreeBSD manages all of this with a layered VM Object model.
|
||||
The original binary program file winds up being the lowest VM
|
||||
Object layer. A copy-on-write layer is pushed on top of that
|
||||
to hold those pages which had to be copied from the original
|
||||
file. If the program modifies a data page belonging to the
|
||||
original file the VM system takes a fault and makes a copy of
|
||||
the page in the higher layer. When a process forks, additional
|
||||
VM Object layers are pushed on.
|
||||
|
||||
This might make a little more sense with a fairly basic example.
|
||||
A fork() is a common operation for any *BSD system, so this
|
||||
example will consider a program that starts up, and forks.
|
||||
|
||||
When the process starts, the VM system creates an object layer,
|
||||
let's call this A:
|
||||
|
||||
<center><img src="fig1.gif"></center>
|
||||
<p font class="Normal">
|
||||
A represents the file--pages may be paged in and out of the file's
|
||||
physical media as necessary. Paging in from the disk is reasonable
|
||||
for a program, but we really don't want to page back out and
|
||||
overwrite the executable. The VM system therefore creates a second
|
||||
layer, B, that will be physically backed by swap space:
|
||||
|
||||
<center><img src="fig2.gif"></center>
|
||||
<p font class="Normal">
|
||||
On the first write to a page after this, a new page is created in
|
||||
B, and its contents are initialized from A. All pages in B can be
|
||||
paged in or out to a swap device. When the program forks, the
|
||||
VM system creates two new object layers--C1 for the parent, and C2
|
||||
for the child--that rest on top of B:
|
||||
|
||||
<center><img src="fig3.gif"></center>
|
||||
<p font class="Normal">
|
||||
In this case, let's say a page in B is modified by the original
|
||||
parent process. The process will take a copy-on-write fault and
|
||||
duplicate the page in C1, leaving the original page in B untouched.
|
||||
Now, let's say the same page in B is modified by the child process.
|
||||
The process will take a copy-on-write fault and duplicate the page
|
||||
in C2. The original page in B is now completely hidden since both
|
||||
C1 and C2 have a copy and B could theoretically be destroyed if it
|
||||
does not represent a 'real' file). However, this sort of
|
||||
optimization is not trivial to make because it is so fine-grained.
|
||||
FreeBSD does not make this optimization.
|
||||
|
||||
Now, suppose (as is often the case) that the child process does an
|
||||
exec(). Its current address space is usually replaced by a new
|
||||
address space representing a new file. In this case, the C2 layer
|
||||
is destroyed:
|
||||
|
||||
<center><img src="fig4.gif"></center>
|
||||
<p font class="Normal">
|
||||
In this case, the number of children of B drops to one, and all
|
||||
accesses to B now go through C1. This means that B and C1 can
|
||||
be collapsed together. Any pages in B that also exist in C1 are
|
||||
deleted from B during the collapse. Thus, even though the
|
||||
optimization in the previous step could not be made, we can
|
||||
recover the dead pages when either of the processes exit or exec().
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
This model creates a number of potential problems. The first
|
||||
is that you can wind up with a relatively deep stack of layered
|
||||
VM Objects which can cost scanning time and memory when you
|
||||
when you take a fault. Deep layering can occur when processes
|
||||
fork and then fork again (either parent or child).
|
||||
The second problem is that you can wind up with dead,
|
||||
inaccessible pages deep in the stack of VM Objects. In our
|
||||
last example if both the parent and child processes modify the
|
||||
same page, they both get their own private copies of the page
|
||||
and the original page in B is no longer accessible by anyone.
|
||||
That page in B can be freed.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
FreeBSD solves the deep layering problem with a special optimization
|
||||
called the "All Shadowed Case". This case occurs if either C1 or C2
|
||||
take sufficient COW faults to completely shadow all pages in B. Lets
|
||||
say that C1 achieves this. C1 can now bypass B entirely, so rather
|
||||
then have C1->B->A and C2->B->A we now have C1->A and C2->B->A. But
|
||||
look what also happened -- now B has only one reference (C2), so we
|
||||
can collapse B and C2 together. The end result is that B is deleted
|
||||
entirely and we have C1->A and C2->A. It is often the case that B
|
||||
will contain a large number of pages and neither C1 nor C2 will be
|
||||
able to completely overshadow it. If we fork again and create a set
|
||||
of D layers, however, it is much more likely that one of the D layers
|
||||
will eventually be able to completely overshadow the much smaller dataset
|
||||
reprsented by C1 or C2. The same optimization will work at any point
|
||||
in the graph and the grand result of this is that even on a heavily forked
|
||||
machine VM Object stacks tend to not get much deeper then 4. This is
|
||||
true of both the parent and the children and true whether the parent is
|
||||
doing the forking or whether the children cascade forks.
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
The dead page problem still exists in the case where C1 or C2 do not
|
||||
completely overshadow B. Due to our other optimizations this case does
|
||||
not represent much of a problem and we simply allow the pages to be dead.
|
||||
If the system runs low on memory it will swap them out, eating a little
|
||||
swap, but that's it.
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
The advantage to the VM Object model is that fork() is extremely
|
||||
fast, since no real data copying need take place. The disadvantage
|
||||
is that you can build a relatively complex VM Object layering
|
||||
that slows page fault handling down a little, and you spend
|
||||
memory managing the VM Object structures. The optimizations
|
||||
FreeBSD makes proves to reduce the problems enough that they
|
||||
can be ignored, leaving no real disadvantage.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
<strong>SWAP Layers</strong>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Private data pages are initially either copy-on-write or
|
||||
zero-fill pages. When a change, and therefore a copy, is made,
|
||||
the original backing object (usually a file) can no longer be
|
||||
used to save a copy of the page when the VM system needs to
|
||||
reuse it for other purposes. This is where SWAP comes in.
|
||||
SWAP is allocated to create backing store for memory that does
|
||||
not otherwise have it. FreeBSD allocates the swap management
|
||||
structure for a VM Object only when it is actually needed.
|
||||
However, the swap management structure has had problems
|
||||
historically.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Under FreeBSD 3.x the swap management structure preallocates
|
||||
an array that encompasses the entire object requiring swap
|
||||
backing store -- even if only a few pages of that object are
|
||||
swap-backed. This creates a kernel memory fragmentation problem
|
||||
when large objects are mapped, or processes with large runsizes
|
||||
(RSS) fork. Also, in order to keep track of swap space, a
|
||||
'list of holes' is kept in kernel memory, and this tends to
|
||||
get severely fragmented as well. Since the 'list of holes' is
|
||||
a linear list, the swap allocation and freeing performance is
|
||||
a non-optimal O(n)-per-page. It also requires kernel memory
|
||||
allocations to take place during the swap freeing process, and
|
||||
that creates low memory deadlock problems. The problem is
|
||||
further exacerbated by holes created due to the interleaving
|
||||
algorithm. Also, the swap block map can become fragmented fairly
|
||||
easily resulting in non-contiguous allocations. Kernel memory
|
||||
must also be allocated on the fly for additional swap management
|
||||
structures when a swapout occurs. It is evident that there
|
||||
was plenty of room for improvement.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
For FreeBSD 4.x, I completely rewrote the swap subsystem. With
|
||||
this rewrite, swap management structures are allocated through
|
||||
a hash table rather than a linear array giving them a fixed
|
||||
allocation size and much finer granularity. Rather then using
|
||||
a linearly linked list to keep track of swap space reservations,
|
||||
it now uses a bitmap of swap blocks arranged in a radix tree
|
||||
structure with free-space hinting in the radix node structures.
|
||||
This effectively makes swap allocation and freeing an O(1)
|
||||
operation. The entire radix tree bitmap is also preallocated
|
||||
in order to avoid having to allocate kernel memory during
|
||||
critical low memory swapping operations. After all, the system
|
||||
tends to swap when it is low on memory so we should avoid
|
||||
allocating kernel memory at such times in order to avoid
|
||||
potential deadlocks. Finally, to reduce fragmentation the
|
||||
radix tree is capable of allocating large contiguous chunks at
|
||||
once, skipping over smaller fragmented chunks. I did not take
|
||||
the final step of having an 'allocating hint pointer' that
|
||||
would trundle through a portion of swap as allocations were
|
||||
made in order to further guarantee contiguous allocations or
|
||||
at least locality of reference, but I ensured that such an
|
||||
addition could be made.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
<strong>When To Free a Page</strong>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Since the VM system uses all available memory for disk caching,
|
||||
there are usually very few truly-free pages. The VM system
|
||||
depends on being able to properly choose pages which are not
|
||||
in use to reuse for new allocations. Selecting the optimal
|
||||
pages to free is possibly the single-most important function
|
||||
any VM system can perform because if it makes a poor selection,
|
||||
the VM system may be forced to unnecessarily retrieve pages
|
||||
from disk, seriously degrading system performance.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
How much overhead are we willing to suffer in the critical path
|
||||
to avoid freeing the wrong page? Each wrong choice we make
|
||||
will cost us hundreds of thousands of CPU cycles and a noticeable
|
||||
stall of the affected processes, so we are willing to endure
|
||||
a significant amount of overhead in order to be sure that the
|
||||
right page is chosen. This is why FreeBSD tends to outperform
|
||||
other systems when memory resources become stressed.
|
||||
|
||||
</P>
|
||||
|
||||
<p font class="Normal">
|
||||
<table border="0" cellspacing="5" cellpadding="5">
|
||||
<tr>
|
||||
<td width="65%" align="top"><p font class="Normal">
|
||||
The free page determination algorithm is built upon a history
|
||||
of the use of memory pages. To acquire this history, the system
|
||||
takes advantage of a page-used bit feature that most hardware
|
||||
page tables have. </p>
|
||||
<p font class="Normal"> In any
|
||||
case, the page-used bit is cleared and at some later point the
|
||||
VM system comes across the page again and sees that the page-used
|
||||
bit has been set. This indicates that the page is still being
|
||||
actively used. If the bit is still clear it is an indication
|
||||
that the page is not being actively used. By testing this bit
|
||||
periodically, a use history (in the form of a counter) for the
|
||||
physical page is developed. When the VM system later needs to
|
||||
free up some pages, checking this history becomes the cornerstone
|
||||
of determining the best candidate page to reuse.
|
||||
</p>
|
||||
</td>
|
||||
<td width="35%" align="top" bgcolor="#dadada"><font size="-1"><center><strong>What if the hardware
|
||||
has no page-used bit?</strong></center><br>
|
||||
<p font class="Normal">For those platforms that do not have
|
||||
this feature, the system actually emulates a page-used bit.
|
||||
It unmaps or protects a page, forcing a page fault if the page
|
||||
is accessed again. When the page fault is taken, the system
|
||||
simply marks the page as having been used and unprotects the
|
||||
page so that it may be used. While taking such page faults
|
||||
just to determine if a page is being used appears to be an
|
||||
expensive proposition, it is much less expensive than reusing
|
||||
the page for some other purpose only to find that a process
|
||||
needs it back and then have to go to disk.</P></font>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
FreeBSD makes use of several page queues to further refine the
|
||||
selection of pages to reuse as well as to determine when dirty
|
||||
pages must be flushed to their backing store. Since page tables
|
||||
are dynamic entities under FreeBSD, it costs virtually nothing
|
||||
to unmap a page from the address space of any processes using
|
||||
it. When a page candidate has been chosen based on the page-use
|
||||
counter, this is precisely what is done. The system must make
|
||||
a distinction between clean pages which can theoretically be
|
||||
freed up at any time, and dirty pages which must first be
|
||||
written to their backing store before being reusable. When a
|
||||
page candidate has been found it is moved to the inactive queue
|
||||
if it is dirty, or the cache queue if it is clean. A separate
|
||||
algorithm based on the dirty-to-clean page ratio determines
|
||||
when dirty pages in the inactive queue must be flushed to disk.
|
||||
Once this is accomplished, the flushed pages are moved from
|
||||
the inactive queue to the cache queue. At this point, pages
|
||||
in the cache queue can still be reactivated by a VM fault at
|
||||
relatively low cost. However, pages in the cache queue are
|
||||
considered to be 'immediately freeable' and will be reused in
|
||||
an LRU (least-recently used) fashion when the system needs to
|
||||
allocate new memory.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
It is important to note that the FreeBSD VM system attempts to
|
||||
separate clean and dirty pages for the express reason of avoiding
|
||||
unnecessary flushes of dirty pages (which eats I/O bandwidth),
|
||||
nor does it move pages between the various page queues gratuitously
|
||||
when the memory subsystem is not being stressed. This is why
|
||||
you will see some systems with very low cache queue counts and
|
||||
high active queue counts when doing a 'systat -vm' command.
|
||||
As the VM system becomes more stressed, it makes a greater
|
||||
effort to maintain the various page queues at the levels
|
||||
determined to be the most effective. An urban myth has circulated
|
||||
for years that Linux did a better job avoiding swapouts than
|
||||
FreeBSD, but this in fact is not true. What was actually
|
||||
occurring was that FreeBSD was proactively paging out unused
|
||||
pages in order to make room for more disk cache while Linux
|
||||
was keeping unused pages in core and leaving less memory
|
||||
available for cache and process pages. I don't know whether
|
||||
this is still true today.
|
||||
|
||||
</P>
|
||||
|
||||
<p font class="Normal">
|
||||
<strong>Pre-Faulting and Zeroing Optimizations</strong>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Taking a VM fault is not expensive if the underlying page is
|
||||
already in core and can simply be mapped into the process, but
|
||||
it can become expensive if you take a whole lot of them on a
|
||||
regular basis. A good example of this is running a program
|
||||
such as 'ls' or 'ps' over and over again. If the program binary
|
||||
is mapped into memory but not mapped into the page table, then
|
||||
all the pages that will be accessed by the program will have
|
||||
to be faulted in every time the program is run. This is
|
||||
unnecessary when the pages in question are already in the VM
|
||||
Cache, so FreeBSD will attempt to pre-populate a process's page
|
||||
tables with those pages that are already in the VM Cache. One
|
||||
thing that FreeBSD does not yet do is pre-copy-on-write certain
|
||||
pages on exec. For example, if you run the /bin/ls program
|
||||
while running 'vmstat 1' you will notice that it always takes
|
||||
a certain number of page faults, even when you run it over and
|
||||
over again. These are zero-fill faults, not program code faults
|
||||
(which were pre-faulted in already). Pre-copying pages on exec
|
||||
or fork is an area that could use more study.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
A large percentage of page faults that occur are zero-fill
|
||||
faults. You can usually see this by observing the 'vmstat -s'
|
||||
output. These occur when a process accesses pages in its BSS
|
||||
area. The BSS area is expected to be initially zero but the
|
||||
VM system does not bother to allocate any memory at all until
|
||||
the process actually accesses it. When a fault occurs the VM
|
||||
system must not only allocate a new page, it must zero it as
|
||||
well. To optimize the zeroing operation the VM system has the
|
||||
ability to pre-zero pages and mark them as such, and to request
|
||||
pre-zeroed pages when zero-fill faults occur. The pre-zeroing
|
||||
occurs whenever the CPU is idle but the number of pages the
|
||||
system pre-zeros is limited in order to avoid blowing away the
|
||||
memory caches. This is an excellent example of adding complexity
|
||||
to the VM system in order to optimize the critical path.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
<strong>Page Table Optimizations</strong>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
The page table optimizations make up the most contentious part
|
||||
of the FreeBSD VM design and they have shown some strain with
|
||||
the advent of serious use of mmap(). I think this is actually
|
||||
a feature of most BSDs though I am not sure when it was first
|
||||
introduced. There are two major optimizations. The first is
|
||||
that hardware page tables do not contain persistent state but
|
||||
instead can be thrown away at any time with only a minor amount
|
||||
of management overhead. The second is that every active page
|
||||
table entry in the system has a governing pv_entry structure
|
||||
which is tied into the vm_page structure. FreeBSD can simply
|
||||
iterate through those mappings that are known to exist while
|
||||
Linux must check all page tables that *might* contain a specific
|
||||
mapping to see if it does, which can achieve O(n^2) overhead
|
||||
in certain situations. It is because of this that FreeBSD
|
||||
tends to make better choices on which pages to reuse or swap
|
||||
when memory is stressed, giving it better performance under
|
||||
load. However, FreeBSD requires kernel tuning to accommodate
|
||||
large-shared-address-space situations such as those that can
|
||||
occur in a news system because it may run out of pv_entry
|
||||
structures.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Both Linux and FreeBSD need work in this area. FreeBSD is
|
||||
trying to maximize the advantage of a potentially sparse
|
||||
active-mapping model (not all processes need to map all pages
|
||||
of a shared library, for example), whereas Linux is trying to
|
||||
simplify its algorithms. FreeBSD generally has the performance
|
||||
advantage here at the cost of wasting a little extra memory,
|
||||
but FreeBSD breaks down in the case where a large file is
|
||||
massively shared across hundreds of processes. Linux, on the
|
||||
other hand, breaks down in the case where many processes are
|
||||
sparsely-mapping the same shared library and also runs non-optimally
|
||||
when trying to determine whether a page can be reused or not.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
<strong>Page Coloring</strong>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
We'll end with the page coloring optimizations. Page coloring
|
||||
is a performance optimization designed to ensure that accesses
|
||||
to contiguous pages in virtual memory make the best use of the
|
||||
processor cache. In ancient times (i.e. 10+ years ago) processor
|
||||
caches tended to map virtual memory rather than physical memory.
|
||||
This led to a huge number of problems including having to clear
|
||||
the cache on every context switch in some cases, and problems
|
||||
with data aliasing in the cache. Modern processor caches map
|
||||
physical memory precisely to solve those problems. This means
|
||||
that two side-by-side pages in a processes address space may
|
||||
not correspond to two side-by-side pages in the cache. In
|
||||
fact, if you aren't careful side-by-side pages in virtual memory
|
||||
could wind up using the same page in the processor cache --
|
||||
leading to cacheable data being thrown away prematurely and
|
||||
reducing CPU performance. This is true even with multi-way
|
||||
set-associative caches (though the effect is mitigated somewhat).
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
FreeBSD's memory allocation code implements page coloring
|
||||
optimizations, which means that the memory allocation code will
|
||||
attempt to locate free pages that are contiguous from the point
|
||||
of view of the cache. For example, if page 16 of physical
|
||||
memory is assigned to page 0 of a process's virtual memory and
|
||||
the cache can hold 4 pages, the page coloring code will not
|
||||
assign page 20 of physical memory to page 1 of a process's
|
||||
virtual memory. It would, instead, assign page 21 of physical
|
||||
memory. The page coloring code attempts to avoid assigning
|
||||
page 20 because this maps over the same cache memory as page
|
||||
16 and would result in non-optimal caching. This code adds a
|
||||
significant amount of complexity to the VM memory allocation
|
||||
subsystem as you can well imagine, but the result is well worth
|
||||
the effort. Page Coloring makes VM memory as deterministic as
|
||||
physical memory in regards to cache performance.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
<strong>Conclusion</strong>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Virtual memory in modern operating systems must address a number
|
||||
of different issues efficiently and for many different usage
|
||||
patterns. The modular and algorithmic approach that BSD has
|
||||
historically taken allows us to study and understand the current
|
||||
implementation as well as relatively cleanly replace large
|
||||
sections of the code. There have been a number of improvements
|
||||
to the FreeBSD VM system in the last several years, and work
|
||||
is ongoing.
|
||||
|
||||
</P>
|
||||
<hr>
|
||||
<p font class="Normal">
|
||||
<h2>A Bonus Question and Answer Session by Allen Briggs </h2><a href="mailto:briggs@ninthwonder.com"><briggs@ninthwonder.com></a>
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Q: What is "the interleaving algorithm" that you refer to in your listing
|
||||
of the ills of the FreeBSD 3.x swap arrangments?
|
||||
|
||||
</P>
|
||||
<blockquote>
|
||||
<p font class="Normal">
|
||||
A: FreeBSD uses a fixed swap interleave which defaults to 4. This means
|
||||
that FreeBSD reserves space for four swap areas even if you only have one,
|
||||
two, or three. Since swap is interleaved the linear address space
|
||||
representing the 'four swap areas' will be fragmented if you don't actually
|
||||
have four swap areas. For example, if you have two swap areas A and B
|
||||
FreeBSD's address space representation for that swap area will be
|
||||
interleaved in blocks of 16 pages:
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
A B C D A B C D A B C D A B C D
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
FreeBSD 3.x uses a 'sequential list of free regions' approach to accounting
|
||||
for the free swap areas. The idea is that large blocks of free linear
|
||||
space can be represented with a single list node (kern/subr_rlist.c).
|
||||
But due to the fragmentation the sequential list winds up being insanely
|
||||
fragmented. In the above example, completely unused swap will have A and
|
||||
B shown as 'free' and C and D shown as 'all allocated'. Each A-B sequence
|
||||
requires a list node to account for because C and D are holes, so the list
|
||||
node cannot be combined with the next A-B sequence.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Why do we interleave our swap space instead of just tack swap areas onto
|
||||
the end and do something fancier? Because it's a whole lot easier to
|
||||
allocate linear swaths of an address space and have the result
|
||||
automatically be interleaved across multiple disks than it is to try to
|
||||
put that sophistication elsewhere.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
The fragmentation causes other problems. Being a linear
|
||||
list under 3.x, and having such a huge amount of inherent fragmentation,
|
||||
allocating and freeing swap winds up being an O(N) algorithm instead of
|
||||
an O(1) algorithm. Combined with other factors (heavy swapping) and you
|
||||
start getting into O(N^2) and O(N^3) levels of overhead, which is bad.
|
||||
The 3.x system may also need to allocate KVM during a swap operation to
|
||||
create a new list node which can lead to a deadlock if the system is
|
||||
trying to pageout pages in a low-memory situation.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Under 4.x we do not use a sequential list. Instead we use a radix tree
|
||||
and bitmaps of swap blocks rather than ranged list nodes. We take the
|
||||
hit of preallocating all the bitmaps required for the entire swap
|
||||
area up front but it winds up wasting less memory due to the use of
|
||||
a bitmap (one bit per block) instead of a linked list of nodes. The
|
||||
use of a radix tree instead of a sequential list gives us nearly O(1)
|
||||
performance no matter how fragmented the tree becomes.
|
||||
|
||||
</P>
|
||||
</blockquote>
|
||||
|
||||
<p font class="Normal">
|
||||
Q: I don't get the following:
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
It is important to note that the FreeBSD VM system attempts to separate
|
||||
clean and dirty pages for the express reason of avoiding unnecessary
|
||||
flushes of dirty pages (which eats I/O bandwidth), nor does it move
|
||||
pages between the various page queues gratitously when the memory subsystem
|
||||
is not being stressed. This is why you will see some systems with
|
||||
very low cache queue counts and high active queue counts when doing a
|
||||
'systat -vm' command.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Q: How is the separation of clean and dirty (inactive) pages related to the
|
||||
situation where you see low cache queue counts and high active queue counts
|
||||
in 'systat -vm'? Do the systat stats roll the active and dirty pages
|
||||
together for the active queue count?
|
||||
|
||||
<blockquote>
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
A: Yes, that is confusing. The relationship is "goal" verses "reality".
|
||||
Our goal is to separate the pages but the reality is that if we are not
|
||||
in a memory crunch, we don't really have to.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
What this means is that FreeBSD will not try very hard to separate out
|
||||
dirty pages (inactive queue) from clean pages (cache queue) when the
|
||||
system is not being stressed, nor will it try to deactivate pages
|
||||
(active queue -> inactive queue) when the system is not being stressed,
|
||||
even if they aren't being used.
|
||||
|
||||
</P>
|
||||
</blockquote>
|
||||
<p font class="Normal">
|
||||
Q: In the /bin/ls / 'vmstat 1' example, wouldn't some of the page faults be
|
||||
data page faults (COW from executable file to private page)? I.e., I
|
||||
would expect the page faults to be some zero-fill and some program data.
|
||||
Or are you implying that FreeBSD does do pre-COW for the program data?
|
||||
|
||||
</P>
|
||||
<blockquote>
|
||||
<p font class="Normal">
|
||||
A: A COW fault can be either zero-fill or program-data. The mechanism
|
||||
is the same either way because the backing program-data is almost
|
||||
certainly already in the cache. I am indeed lumping the two together.
|
||||
FreeBSD does not pre-COW program data or zero-fill, but it *does*
|
||||
pre-map pages that exist in its cache.
|
||||
|
||||
</P>
|
||||
</blockquote>
|
||||
<p font class="Normal">
|
||||
Q: In your section on page table optimizations, can you give a little more
|
||||
detail about pv_entry and vm_page (or should vm_page be vm_pmap -- as
|
||||
in 4.4, cf. pp. 180-181 of McKusick, Bostic, Karel, Quarterman)?
|
||||
Specifically, what kind of operation/reaction would require scanning the
|
||||
mappings?
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
How does Linux do in the case where FreeBSD breaks down (sharing a large
|
||||
file mapping over many processes)?
|
||||
|
||||
</P>
|
||||
<blockquote>
|
||||
<p font class="Normal">
|
||||
A: A vm_page represents an (object,index#) tuple. A pv_entry represents
|
||||
a hardware page table entry (pte). If you have five processes sharing
|
||||
the same physical page, and three of those processes's page tables
|
||||
actually map the page, that page will be represented by a single
|
||||
vm_page structure and three pv_entry structures.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
pv_entry structures only represent pages mapped by the MMU (one
|
||||
pv_entry represnts one pte). This means that when we need to remove
|
||||
all hardware references to a vm_page (in order to reuse the page for
|
||||
something else, page it out, clear it, dirty it, and so forth) we can
|
||||
simply scan the linked list of pv_entry's associated with that vm_page
|
||||
to remove or modify the pte's from their page tables.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Under Linux there is no such linked list. In order to remove all the
|
||||
hardware page table mappings for a vm_page linux must index into every
|
||||
VM object that *might* have mapped the page. For example, if you have
|
||||
50 processes all mapping the same shared library and want to get rid of
|
||||
page X in that library, you need to index into the page table for
|
||||
each of those 50 processes even if only 10 of them have actually mapped
|
||||
the page. So Linux is trading off the simplicity of its design against
|
||||
performance. Many VM algorithms which are O(1) or (small N) under FreeBSD
|
||||
wind up being O(N), O(N^2), or worse under Linux. Since the pte's
|
||||
representing a particular page in an object tend to be at the same
|
||||
offset in all the page tables they are mapped in, reducing the number
|
||||
of accesses into the page tables at the same pte offset will often avoid
|
||||
blowing away the L1 cache line for that offset, which can lead to better
|
||||
performance.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
FreeBSD has added complexity (the pv_entry scheme) in order to increase
|
||||
performance (to limit page table accesses to *only* those pte's that need
|
||||
to be modified).
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
But FreeBSD has a scaling problem that Linux does not in that there are
|
||||
a limited number of pv_entry structures and this causes problems when you
|
||||
have massive sharing of data. In this case you may run out of pv_entry
|
||||
structures even though there is plenty of free memory available. This
|
||||
can be fixed easily enough by bumping up the number of pv_entry structures
|
||||
in the kernel config, but we really need to find a better way to do it.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
In regards to the memory overhead of a page table verses the pv_entry
|
||||
scheme: Linux uses 'permanent' page tables that are not throw away, but
|
||||
does not need a pv_entry for each potentially mapped pte. FreeBSD uses
|
||||
'throw away' page tables but adds in a pv_entry structure for each
|
||||
actually-mapped pte. I think memory utilization winds up being about
|
||||
the same, giving FreeBSD an algorithmic advantage with its ability to
|
||||
throw away page tables at will with very low overhead.
|
||||
|
||||
</P>
|
||||
</blockquote>
|
||||
<p font class="Normal">
|
||||
Q: Finally, in the page coloring section, it might help to have a little
|
||||
more description of what you mean here. I didn't quite follow it.
|
||||
|
||||
</P>
|
||||
<blockquote>
|
||||
<p font class="Normal">
|
||||
A: Do you know how an L1 hardware memory cache works? I'll explain:
|
||||
Consider a machine with 16MB of main memory but only 128K of L1 cache.
|
||||
Generally the way this cache works is that each 128K block of main memory
|
||||
uses the *same* 128K of cache. If you access offset 0 in main memory
|
||||
and then offset offset 128K in main memory you can wind up throwing
|
||||
away the cached data you read from offset 0!
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Now, I am simplifying things greatly. What I just described is what
|
||||
is called a 'direct mapped' hardware memory cache. Most modern caches
|
||||
are what are called 2-way-set-associative or 4-way-set-associative
|
||||
caches. The set-associatively allows you to access up to N different
|
||||
memory regions that overlap the same cache memory without destroying
|
||||
the previously cached data. But only N.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
So if I have a 4-way set associative cache I can access offset 0,
|
||||
offset 128K, 256K and offset 384K and still be able to access offset 0
|
||||
again and have it come from the L1 cache. If I then access offset 512K,
|
||||
however, one of the four previously cached data objects will be thrown
|
||||
away by the cache.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
It is extremely important... EXTREMELY important for most of a processor's
|
||||
memory accesses to be able to come from the L1 cache, because the L1
|
||||
cache operates at the processor frequency. The moment you have an L1
|
||||
cahe miss and have to go to the L2 cache or to main memory, the processor
|
||||
will stall and potentially sit twidling its fingers for *hundreds* of
|
||||
instructions worth of time waiting for a read from main memory to complete.
|
||||
Main memory (the dynamic ram you stuff into a computer) is S.L.O.W.,
|
||||
capitalized and boldfaced, when compared to the speed of a modern
|
||||
processor core.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Ok, so now onto page coloring: All modern memory caches are what are
|
||||
known as *physical* caches. They cache physical memory addresses, not
|
||||
virtual memory addresses. This allows the cache to be left alone across
|
||||
a process context switch, which is very important.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
But in the UNIX world you are dealing with virtual address spaces, not
|
||||
physical address spaces. Any program you write will see the virtual
|
||||
address space given to it. The actual *physical* pages underlying that
|
||||
virtual address space are not necessarily physically contiguous! In
|
||||
fact, you might have two pages that are side by side in a processes
|
||||
address space which wind up being at offset 0 and offset 128K in
|
||||
*physical* memory.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
A program normally assumes that two side-by-side pages will be optimally
|
||||
cached. That is, that you can access data objects in both pages without
|
||||
having them blow away each other's cache entry. But this is only true
|
||||
if the physical pages underlying the virtual address space are contiguous
|
||||
(insofar as the cache is concerned).
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
This is what Page coloring does. Instead of assigning *random* physical
|
||||
pages to virtual addresses, which may result in non-optimal cache
|
||||
performance , Page coloring assigns *reasonably-contiguous* physical pages
|
||||
to virtual addresses. Thus programs can be written under the assumption
|
||||
that the characteristics of the underlying hardware cache are the same
|
||||
for their virtual address space as they would be if the program had been
|
||||
run directly in a physical address space.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Note that I say 'reasonably' contiguous rather than simply 'contiguous'.
|
||||
From the point of view of a 128K direct mapped cache, the physical
|
||||
address 0 is the same as the physical address 128K. So two side-by-side
|
||||
pages in your virtual address space may wind up being offset 128K and
|
||||
offset 132K in physical memory, but could also easily be offset 128K
|
||||
and offset 4K in physical memory and still retain the same cache
|
||||
performance characteristics. So page-coloring does *NOT* have to
|
||||
assign truely contiguous pages of physical memory to contiguous pages
|
||||
of virtual memory, it just needs to make sure it assigns contiguous
|
||||
pages from the point of view of cache performance and operation.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
Oops, that was a bit longer explanation than I intended.
|
||||
|
||||
</P>
|
||||
<p font class="Normal">
|
||||
-Matt
|
||||
|
||||
</P>
|
||||
</blockquote>
|
||||
|
||||
|
||||
<hr noshade color="#dadada"><br>
|
||||
<font class="Small">Author maintains all copyrights on this article.<br></font>
|
||||
<p align=right>Back to the <a href="..">OS/RC</a></p>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,401 @@
|
||||
<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<title>Bringing SMP to Your UP Operating System</title>
|
||||
<meta http-equiv="Cache-Control" content="must-revalidate">
|
||||
</head>
|
||||
|
||||
<body bgcolor="#ffffff" vlink="#0000ff">
|
||||
|
||||
<table width=600 align=center>
|
||||
<tr><td>
|
||||
|
||||
<center>
|
||||
<h1>Bringing SMP to Your UP Operating System</h1>
|
||||
Sidney Cammeresi
|
||||
</center>
|
||||
<br><br><br>
|
||||
|
||||
<blockquote>
|
||||
<h3>Overview</h3>
|
||||
<ol>
|
||||
<li><a href="#preface">Preface</a></li>
|
||||
<li><a href="#terms">Terminology</a></li>
|
||||
<li><a href="#config">MP Configuration Tables</a></li>
|
||||
<li><a href="#apic">Using Local APICs</a></li>
|
||||
<li><a href="#booting">Booting Application Processors</a></li>
|
||||
<li><a href="#protmode">Switching from Real Mode to Protected Mode</a></li>
|
||||
<li><a href="#ipis">Interprocessor Interrupts</a></li>
|
||||
<li><a href="#other">Other Considerations<br><br></a></li>
|
||||
|
||||
<li>Scheduling</li>
|
||||
<li>Using the I/O APIC<br><br></li>
|
||||
<!--
|
||||
<li><a href="#scheduling">Scheduling</a></li>
|
||||
<li><a href="#ioapic">Using the I/O APIC</a></li>
|
||||
-->
|
||||
|
||||
<li><a href="#references">References</a></li>
|
||||
<li><a href="#revsions">Revision History</a></li>
|
||||
</ol>
|
||||
</blockquote>
|
||||
<br><br><br>
|
||||
|
||||
<h3>
|
||||
<a name="preface">Preface</a></h3>
|
||||
|
||||
This tutorial is intended as a supplement to the
|
||||
<a href="http://www.acm.uiuc.edu/sigops/roll_your_own">SigOps OS
|
||||
Tutorial</a> to teach the fundamentals of symmetric multprocessing
|
||||
using Intel MP compliant hardware. Knowledge of the concepts and
|
||||
implementations of basic operating system parts such as managing virtual
|
||||
memory and multitasking are assumed and will not be discussed except as
|
||||
they relate to multiprocessing. Knowledge equivalent to an intermediate
|
||||
or advanced computer architecture college course will be helpful in
|
||||
understanding scheduling issues, but is not required.
|
||||
<p>
|
||||
|
||||
This tutorial is not intended to be a complete explanation of how to
|
||||
implement an SMP-capable operating system, nor as a replacement for
|
||||
Intel's documentation. Rather it is designed to
|
||||
give an overview of the things I learned in writing SMP support for
|
||||
<a href="http://www.frotz.net/openblt">OpenBLT</a>, a freely
|
||||
redistributable microkernel-based operating system under the BSD
|
||||
licence. Particularly, some tedious hardware
|
||||
aspects will not be discussed in detail when the reader could just as
|
||||
easily read official Intel documentation. The interested reader should
|
||||
refer to the references for more detailed information. For code examples,
|
||||
the reader should refer to the source code of
|
||||
<a href="http://www.frotz.net/openblt">OpenBLT</a> or
|
||||
<a href="http://www.freebsd.org">FreeBSD</a>. The Linux kernel source
|
||||
code might be helpful, although it is under the GPL.
|
||||
<p>
|
||||
|
||||
This tutorial is a work in progress. If you see an error or something
|
||||
that needs clarification, please <a href="mailto:cammeres@uiuc.edu">e-mail
|
||||
me</a>.
|
||||
<p>
|
||||
|
||||
<h3>
|
||||
<a name="terms">Terminology</a>
|
||||
</h3>
|
||||
|
||||
<dl>
|
||||
<dt>AP
|
||||
<dd>application processor. A processor that is not the BSP. All APs are
|
||||
in a halted state when the BIOS first gives control to the operating
|
||||
system.
|
||||
<dt>APIC
|
||||
<dd>Advanced Programmable Interrupt Controller. Either a local APIC or
|
||||
an I/O APIC. It is attached to the APIC bus.
|
||||
<dt>APIC bus
|
||||
<dd>A special non-architechural bus on which the APICs in the system
|
||||
send messages.
|
||||
<dt>BSP
|
||||
<dd>bootstrap processor. The processor which is given control after the
|
||||
BIOS finishes its POST.
|
||||
<dt>I/O APIC
|
||||
<dd>A special APIC for receiving and distributing interrupts from external
|
||||
devices which is backward compatible with the PIC. There is generally
|
||||
only one per computer.
|
||||
<dt>IPI
|
||||
<dd>interprocessor interrupt. A special interrupt sent to a processor by
|
||||
the originating processor programming its APIC with a target or logical
|
||||
target ID, and an interrupt vector.
|
||||
<dt>Local APIC
|
||||
<dd>an APIC built in to the processor. It is responsible for dispatching
|
||||
interrupts sent over the APIC bus to its processor core and sending
|
||||
interrupts to other processors over the APIC bus.
|
||||
<dt>MP
|
||||
<dd>Intel's MultiProcessor Specification, a standard which defines how
|
||||
SMP hardware should be presented to the operating system and how the
|
||||
operating system should interact with this hardware.
|
||||
<dt>serialisation
|
||||
<dd>The act of executing a certain instruction which causes the processor
|
||||
to pause to
|
||||
retire all instructions currently being executed before proceeding
|
||||
to the next instruction in the stream. For example, before switching
|
||||
to protected mode, the processor must retire all instructions that
|
||||
began executing in real mode before beginning any in protected mode.
|
||||
<dt>SMP
|
||||
<dd>symmetric multiprocessing. Using multiple processors which share
|
||||
the same physical memory in the same computer at the time. You are
|
||||
probably reading this tutorial with the hope that your operating
|
||||
system will become SMP-capable.
|
||||
<dt>UP
|
||||
<dd>uniprocessor. Your operating system to date is a UP operating system.
|
||||
</dl>
|
||||
|
||||
<h3>
|
||||
<a name="config">MP Detection and Configuration</a>
|
||||
</h3>
|
||||
|
||||
When the system first starts, the BIOS detects the hardware installed in
|
||||
the system using electric means and then creates structures to describe
|
||||
this hardware to the operating system. There are two such tables.
|
||||
The first is the MP Floating Pointer Structure, which is required.
|
||||
The second is the MP Configuration Table, which is optional. If the
|
||||
configuration table does not exist, the operating system should set up
|
||||
the default configuration indicated in the floating pointer structure.
|
||||
Some data in the tables is in ASCII. Strings are padded with spaces
|
||||
and are not null-terminated.
|
||||
<p>
|
||||
|
||||
First, you need to find the floating pointer structure. According to
|
||||
the spec, it can be in one of four places: (1) in the first kilobyte
|
||||
of the extended BIOS data area, (2) the last kilobyte of base memory,
|
||||
(3) the top of physical memory, or (4) the BIOS read-only memory space
|
||||
between 0xe0000 and 0xfffff. You need to search these areas for the
|
||||
four-byte signature "_MP_" which denotes the start of the floating
|
||||
pointer structure. Absence of this structure indicates that the system
|
||||
is not MP compliant. At this point your operating system can either halt,
|
||||
or it can fall back into a UP setup.
|
||||
<p>
|
||||
|
||||
You should checksum the structure to make sure it has not been corrupted.
|
||||
There is not much of interest in the floating pointer structure, unless
|
||||
your system does not have a configuration table. In this case, you will
|
||||
need to get the number of the default configuration your system adheres to
|
||||
and set up the system for SMP using those parameters. Otherwise, you will
|
||||
need to get the address of the configuration table and begin parsing that.
|
||||
<p>
|
||||
|
||||
The configuration table is divided into three parts: a header, a base
|
||||
section, and an extended section. The header begins with the four-byte
|
||||
signature "PCMP", although you do not have to search for it. Once you
|
||||
find it, checksum it. At this point, you can print the OEM and product
|
||||
ID strings in the configuration table if you want. You should get the
|
||||
address of the local APIC from this and store it. Then, proceed to
|
||||
parse the base section.
|
||||
<p>
|
||||
|
||||
The base section consists of a set of entries that describe either
|
||||
processors, system busses, I/O APICs, I/O interrupt assignments, or
|
||||
local interrupt assignments. All entries are eight bytes in length,
|
||||
save processor entries which are twenty bytes. The first byte of each
|
||||
entry denotes the type of the entry. Look through each entry. You will
|
||||
probably want to generate quite a few OS-specific data structure here.
|
||||
In particular, you will want to note the APIC ID of each processor in
|
||||
the system, its version, and its type as well as the address of the
|
||||
system's I/O APIC.
|
||||
<p>
|
||||
|
||||
<h3>
|
||||
<a name="apic">Using Local APICs</a>
|
||||
</h3>
|
||||
|
||||
MP systems have a special bus to which all APICs in the system are
|
||||
connected. This bus is one of the ways the processors can communicate
|
||||
with one another (the other, of course, is shared memory). APICs (both
|
||||
local and I/O) are memory mapped devices. The default location for
|
||||
the local APIC is at 0xfee00000 in physical memory. The local APIC
|
||||
will appear in the same place for each processor, but each processor
|
||||
will reference its own APIC; the APIC intercepts memory references to
|
||||
its registers, and those references will not generate bus cycles on
|
||||
some systems. Since APICs are mapped in high memory, the APs will have
|
||||
to switch to protected mode before they can intialise their local APICs.
|
||||
If you like, you can map the APIC to a different address using the paging
|
||||
unit, but be sure to disable caching in the page table entry since some
|
||||
registers can change between accesses. For this reason, pointers to
|
||||
APIC registers should be volatile. To initialise the BSP's local APIC,
|
||||
set the enable bit in the spurious interrupt vector register and set
|
||||
the error interrupt vector in the local vector table.
|
||||
<p>
|
||||
|
||||
<h3>
|
||||
<a name="booting">Booting Application Processors</a>
|
||||
</h3>
|
||||
|
||||
Once you have detected the processors in the system, set up your local
|
||||
APIC, and verified that you can communicate with it (hint: read the
|
||||
APIC version register), it's time to boot the APs. N.B. that it is good
|
||||
practise to not try to boot the BSP here. That would be bad.
|
||||
<p>
|
||||
|
||||
Since the APs will wake up in real mode, everything they need to get
|
||||
started should be in low memory (below 0x100000 or one megabyte).
|
||||
First, set the shutdown code by setting address 0:f to 0xa. Then,
|
||||
grab a page of memory for the AP's stack. You will also need space
|
||||
to store the `trampoline' code, i.e. the code the processor executes
|
||||
after waking up to switch to protected mode and jump to the kernel.
|
||||
You can either use the same page of code for each processor or store
|
||||
the code at the bottom of the processor's stack. Note that the start of
|
||||
the code must be at a page-aligned address. Copy the code there, then
|
||||
set the warm reset vector at address 40:67 to the start of this code.
|
||||
Next, you should reset a bit in the kernel which the processor will use
|
||||
to signal that it has booted and finished initialisation and clear any
|
||||
APIC error by writing a zero to the error status register. If you need
|
||||
to pass any parameters or data to the AP, now would be a good time to
|
||||
set that up. For example, since OpenBLT's kernel runs in high memory,
|
||||
I have to pass the address of the page directory in memory so that the
|
||||
AP can load it and enable paging before calling the kernel.
|
||||
<p>
|
||||
|
||||
Now you can actually boot the processor. The procedure consists of
|
||||
sending a sequence of interrupts to the processor. The incremental effect
|
||||
of each is undefined, but at the end of the sequence, the processor will
|
||||
be booted. First send an INIT IPI. Assert the INIT signal by writing
|
||||
the target processor's APIC ID to the high word of the interrupt command
|
||||
register. Then write to the low word with the bits set to enable the INIT
|
||||
delivery mode, level triggered, and assert the interrupt. Deassert INIT
|
||||
by repeating the procedure with the assert bit reset. Now, wait 10 ms.
|
||||
Use of the APIC timer is suggested.
|
||||
<p>
|
||||
|
||||
If the local APIC is not an 82489dx, you need to send two STARTUP IPIs.
|
||||
Clear APIC errors, set the target APIC ID in the ICR, then send the
|
||||
interrupt by writing to the low word of the ICR with bits set for STARTUP
|
||||
delivery mode and with the code vector in the low byte. The code vector
|
||||
is the physical page number at which the processor should start executing,
|
||||
i.e. the start of your trampoline code. Wait 200 ms, then check the low
|
||||
word of the ICR to make sure bit 16 is reset to indicate the interrupt
|
||||
was dispatched before sending the second STARTUP. After sending it,
|
||||
spin and wait for the AP to set its ready bit in memory. You may want
|
||||
to set a timeout of 5 seconds, after which you assume the processor did
|
||||
not wake up.
|
||||
<p>
|
||||
|
||||
<h3>
|
||||
<a name="protmode">Switching from Real Mode to Protected Mode</a>
|
||||
</h3>
|
||||
|
||||
Provided you did everything right above, the processor at some point woke
|
||||
up in real mode and started executing the code you told it to. First,
|
||||
execute a "cli" instruction to turn off interrupts, just in case. Now,
|
||||
begin the switch to protected mode. Load an appropriatee value into GDTR.
|
||||
This can either point to the actual GDT or in my case, a temporary GDT.
|
||||
If you need to activate paging, load the address of a page directory
|
||||
into cr3. Then set bit zero in cr0 to enable protected mode as well as
|
||||
bit 31 if you need to enable paging to get into the kernel. Then do an
|
||||
ljmp to the kernel text segment with an offset that points to the next
|
||||
instruction to serialise the processor. Now that you're in protected
|
||||
mode, load appropriate descriptors into the segment registers, then
|
||||
execute a "cld", which is reportedly what gcc expects. Then, jump to
|
||||
the starting address of your kernel.
|
||||
<p>
|
||||
|
||||
Don't reference any symbols in this code since it
|
||||
will be running at an address for which it was not linked;
|
||||
all memory references must be absolute. Since your kernel is above
|
||||
one megabyte in memory, you can't access any global variables in real
|
||||
mode. Also be careful in specifying your offset address for the ljmp
|
||||
instruction, and do specify the address of the start of your kernel,
|
||||
not a symbol in the instruction that goes into the kernel.
|
||||
Jumping to a symbol doesn't seem to work. For details,
|
||||
see OpenBLT's kernel/trampoline.S.
|
||||
<p>
|
||||
|
||||
Debugging this part is really not too bad. What you have to do is
|
||||
establish some communication space in low memory, then have the AP write
|
||||
bytes to that memory to explain what it is doing and print these out on
|
||||
the BSP.
|
||||
<p>
|
||||
|
||||
<h3>
|
||||
<a name="ipis">Interprocessor Interrupts</a>
|
||||
</h3>
|
||||
|
||||
IPIs are used to maintain synchronisation between the processors.
|
||||
For example, if a kernel page table entry changes, both processors must
|
||||
either flush their TLBs or invalidate that particular page table entry.
|
||||
Whichever processor changed the mapping knows to do this automatically,
|
||||
but the other processor does not; therefore, the processor which changed
|
||||
the mapping must send an IPI to the other processor to tell it to flush
|
||||
its TLB or invalidate the page table entry.
|
||||
<p>
|
||||
|
||||
Using the local APIC, you can send interrupts to all processors, all
|
||||
processors but the one sending the interrupt, or a specific processor
|
||||
or logical address as well as self-interrupts. To send an IPI, write
|
||||
the destination APIC ID, if needed, into the high word of the ICR, then
|
||||
write the low word of ICR with the destination shorthand and interrupt
|
||||
vector set to send the IPI. Be sure to wrap these functions in spinlocks.
|
||||
You might want to turn off interrupts as well while sending IPIs.
|
||||
<p>
|
||||
|
||||
<h3>
|
||||
<a name="other">Other Considerations</a>
|
||||
</h3>
|
||||
|
||||
One thing to note is that semaphores (a.k.a. spinlocks) may need to
|
||||
be done differently under SMP. Consider a scenario where semaphores
|
||||
are procured with a ``bts'' instruction. If both processors hit that
|
||||
instruction at the same time while the semaphore is reset, they might
|
||||
both think they have acquired it. For this reason, you would need to
|
||||
use a ``lock'' prefix on that instruction to lock the system bus and
|
||||
maintain synchronisation.
|
||||
<p>
|
||||
|
||||
<h3>
|
||||
<a name="scheduling">Scheduling</a>
|
||||
</h3>
|
||||
|
||||
Not yet.
|
||||
|
||||
<h3>
|
||||
<a name="ioapic">Using the I/O APIC</a>
|
||||
</h3>
|
||||
|
||||
Not yet.
|
||||
|
||||
<h3>
|
||||
<a name="references">References</a>
|
||||
</h3>
|
||||
|
||||
All of these documents are available from Intel's developer web site at
|
||||
<a href="http://developer.intel.com">developer.intel.com</a>. Supposedly,
|
||||
by request, Intel will also send you printed documentation by post.
|
||||
|
||||
<ul>
|
||||
<li>82378ZB System I/O and 82379AB System I/O APIC
|
||||
<ul>
|
||||
<li>§3.7: APIC Registers</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>MultiProcessor Specification, Version 1.4
|
||||
<ul>
|
||||
<li>§4: MP Configuration Table</li>
|
||||
<li>§B: Operating System Programming Guidelines</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Intel Architecture Software Developer's Manual, Volume 2: Instruction
|
||||
Set Reference</li>
|
||||
<li>Intel Architecture Software Developer's Manual, Volume 3: System
|
||||
Programming Guide
|
||||
<ul>
|
||||
<li>§7.1: Locked Atomic Operations</li>
|
||||
<li>§7.4: The APIC</li>
|
||||
<li>§8.7: Software Initialization for Protected Mode</li>
|
||||
<li>§8.8: Mode Switching</li>
|
||||
<li>§B: MP Bootup Sequence</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h3>
|
||||
<a name="revisions">Revision History</a>
|
||||
</h3>
|
||||
|
||||
<pre>
|
||||
07 Nov 1998 - initial version, most sections filled out.
|
||||
</pre>
|
||||
|
||||
<br clear=all>
|
||||
<hr>
|
||||
<i>Bringing SMP to Your UP Operating System</i> is Copyright © 1998 by
|
||||
Sidney Cammeresi in its entirety. All rights reserved.<p>
|
||||
Permission is granted to make verbatim copies of this tutorial for
|
||||
non-commercial use provided this notice remains intact on all copies.
|
||||
|
||||
</td></tr>
|
||||
</table>
|
||||
|
||||
<pre>$Id: smp.html 1.3 Thu, 01 Jul 1999 10:51:51 -0500 sac $</pre>
|
||||
|
||||
</body>
|
||||
|
||||
</html>
|
||||
|
||||
|
||||
BIN
study/sabre/os/files/Misc/Implementing Lightweight Threads.pdf
Normal file
BIN
study/sabre/os/files/Misc/Implementing Lightweight Threads.pdf
Normal file
Binary file not shown.
BIN
study/sabre/os/files/Misc/Thread_Segment_Stacks.pdf
Normal file
BIN
study/sabre/os/files/Misc/Thread_Segment_Stacks.pdf
Normal file
Binary file not shown.
BIN
study/sabre/os/files/Misc/bios-asm.zip
Normal file
BIN
study/sabre/os/files/Misc/bios-asm.zip
Normal file
Binary file not shown.
7
study/sabre/os/files/Misc/index.html
Normal file
7
study/sabre/os/files/Misc/index.html
Normal file
@@ -0,0 +1,7 @@
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="refresh" content="0;url=/Linux.old/sabre/os/articles">
|
||||
</head>
|
||||
<body lang="zh-CN">
|
||||
</body>
|
||||
</html>
|
||||
280
study/sabre/os/files/Misc/os-faq/index.html
Normal file
280
study/sabre/os/files/Misc/os-faq/index.html
Normal file
@@ -0,0 +1,280 @@
|
||||
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||||
<TITLE>Write Your Own Operating System</TITLE>
|
||||
|
||||
</HEAD>
|
||||
|
||||
<BODY>
|
||||
<hr color="#0000FF" noshade>
|
||||
|
||||
<p>
|
||||
|
||||
<div align="center">
|
||||
<h1>Write Your Own Operating System [FAQ]</h1>
|
||||
<h5>by <A href="mailto:dfiber@mega-tokyo.com?subject=OS-FAQ">Stuart 'Dark Fiber' George</A> <dfiber@mega-tokyo.com></h3>
|
||||
<h5>Last Updated : Tuesday 23rd October 2000</h3>
|
||||
</div>
|
||||
</P>
|
||||
|
||||
|
||||
<P title="" align="CENTER">Download a .zip'ed version of the os-faq <A href="ftp://ftp.mega-tokyo.com/pub/operating-systems/os-faq.zip">here</A></P>
|
||||
|
||||
<hr color="#0000FF" noshade>
|
||||
|
||||
<p>
|
||||
<UL>
|
||||
<LI>Introductions, Overview
|
||||
<UL>
|
||||
<LI><A href="os-faq-intro.html#introduction">Introduction</a>
|
||||
<LI><A href="os-faq-intro.html#getting_started">Getting Started</a>
|
||||
<LI><A href="os-faq-intro.html#what_bits_cant_i_make_in_c">What bits can't I make in C</a>
|
||||
<LI><A href="os-faq-intro.html#what_order_should_i_make_things_in">What order should I make things in</a>
|
||||
</UL>
|
||||
|
||||
<LI>Kernel Questions
|
||||
<UL>
|
||||
<LI><A href="os-faq-kernel.html#load_kernel">How do I make a kernel file I can load</a>
|
||||
<LI><A href="os-faq-kernel.html#32bit_files">Help! When I load my kernel my machine resets</a>
|
||||
<LI><A href="os-faq-boot.html#easier_load">Is there an easier way to boot my kernel</a>
|
||||
</UL>
|
||||
|
||||
<LI>Boot Loader and Boot Menu's
|
||||
<UL>
|
||||
<LI><A href="os-faq-boot.html#boot_loadmenu">Tell me about bootloaders and bootmenus</a>
|
||||
<li><a href="os-faq-bsmbr.html#boot_sector">Boot Sectors</a>
|
||||
<li><a href="os-faq-bsmbr.html#mbr">Master Boot Record</a>
|
||||
<LI>GRUB
|
||||
<UL>
|
||||
<LI><A href="os-faq-grub.html#whats_grub">What is GRUB</a>
|
||||
<LI><A href="os-faq-grub.html#get_grub">Where can I get GRUB</a>
|
||||
<LI><A href="os-faq-grub.html#grub_aout">GRUB and DJGPP A.OUT</a>
|
||||
<LI><A href="os-faq-grub.html#grub_nasm">GRUB and NASM ELF files</a>
|
||||
<LI><A href="os-faq-grub.html#grub_watcom">GRUB and Watcom ELF files</a>
|
||||
</UL>
|
||||
|
||||
<LI>LILO
|
||||
<UL>
|
||||
<LI><A href="os-faq-lilo.html#lilo">What is LILO</a>
|
||||
<LI>Where can I get LILO
|
||||
</UL>
|
||||
|
||||
<li>XOSL
|
||||
<li>System Commander
|
||||
<li>Boot Magic
|
||||
</UL>
|
||||
|
||||
<LI>COMPILERS
|
||||
<Ul>
|
||||
<LI>DJGPP
|
||||
<UL>
|
||||
<LI><A href="os-faq-elf.html#elf_files">Can DJGPP output ELF files</a>
|
||||
</UL>
|
||||
<! TODO >
|
||||
<LI>Watcom C/C++
|
||||
<LI>Visual C/C++
|
||||
</UL>
|
||||
|
||||
<LI>Hardware
|
||||
<UL>
|
||||
<LI>CPU
|
||||
<UL>
|
||||
<LI><A href="os-faq-v86.html#whats_v8086">What is v8086 mode?</a>
|
||||
<LI><A href="os-faq-v86.html#detect_v86">How do I detect v8086 mode?</a>
|
||||
<li>AMD K6
|
||||
<ul>
|
||||
<LI><A href="os-faq-cpu-amdk6.html#k6_writeback">AMD K6 WriteBack Optimisations</a>
|
||||
</ul>
|
||||
|
||||
</UL>
|
||||
|
||||
<LI>Memory
|
||||
<UL>
|
||||
<LI>The A20
|
||||
<UL>
|
||||
<LI><A href="os-faq-memory.html#what_is_a20">What is the A20 line?</a>
|
||||
<LI><A href="os-faq-memory.html#access_my_memory">Why cant I access all my memory?</a>
|
||||
<LI><A href="os-faq-memory.html#enable_a20">How do I enable the A20 line?</a>
|
||||
</UL>
|
||||
|
||||
<LI>Memory Sizing
|
||||
<UL>
|
||||
<LI><A href="os-faq-memory.html#determine_memory">How do I determine the amount of RAM?</a>
|
||||
<LI><A href="os-faq-memory.html#determine_memory_bios">How do I determine the amount of RAM with the BIOS?</a>
|
||||
<LI><A href="os-faq-memory.html#determine_memory_probe">How do I determine the amount of RAM with direct probing?</a>
|
||||
</UL>
|
||||
</UL>
|
||||
|
||||
<LI>IRQ's and Exceptions, PIC, NMI, APIC, OPIC
|
||||
<UL>
|
||||
<LI><A href="os-faq-pics.html#irq_exception">How do I know if an IRQ or exception is firing?</a>
|
||||
<LI><A href="os-faq-pics.html#what_pic">What is the PIC?</a>
|
||||
<LI><A href="os-faq-pics.html#remap_pic">Can I remap the PIC?</a>
|
||||
<LI><A href="os-faq-pics.html#nmi">So whats the NMI then?</a>
|
||||
<LI><A href="os-faq-pics.html#apic">Tell me about APIC</a>
|
||||
<LI><A href="os-faq-pics.html#opic">Tell me about OPIC</a>
|
||||
</UL>
|
||||
|
||||
<LI>Interrupt Service Routines (ISR's)
|
||||
<UL>
|
||||
<LI><A href="os-faq-isr.html#isr">Whats an ISR?</a>
|
||||
<LI><A href="os-faq-isr.html#normal_v_isr">Whats the difference between an ISR and a normal routine?</a>
|
||||
<LI><A href="os-faq-isr.html#gcc_isr">So how do I do an ISR with GCC?</a>
|
||||
</UL>
|
||||
|
||||
<LI>Video
|
||||
<UL>
|
||||
<LI><A href="os-faq-console.html#text_mode">How do I output text to the screen in protected mode?</a>
|
||||
<LI><A href="os-faq-console.html#detect_text_screen">How do I detect if I have a colour or monochrome monitor?</a>
|
||||
<LI><A href="os-faq-console.html#moving_cursor">How do I move the cursor when I print?</a>
|
||||
</UL>
|
||||
|
||||
<LI>Plug and Play
|
||||
<UL>
|
||||
<LI><A href="os-faq-pnp.html#prog_pnp">Where can I find programming info on PNP?</a>
|
||||
<LI><A href="os-faq-pnp.html#pnp_pmode">I heard you can do PNP calls with the BIOS in Protected Mode?</a>
|
||||
</UL>
|
||||
|
||||
<LI>PCI
|
||||
<UL>
|
||||
<LI><A href="os-faq-pci.html#prog_pci">Where can I find programming info on PCI?</a>
|
||||
<LI><A href="os-faq-pci.html#pci_pmode">I heard you can do PCI calls with the BIOS in Protected Mode?</a>
|
||||
</UL>
|
||||
</UL>
|
||||
|
||||
<LI>C Programming
|
||||
<UL>
|
||||
<LI><A href="os-faq-libc.html#no_printf">Where did my printf go?</a>
|
||||
<LI><A href="os-faq-libc.html#libc">Whats this LIBC thing?</a>
|
||||
<LI><A href="os-faq-libc.html#existing_libc">What C libraries exist for me to use?</a>
|
||||
</UL>
|
||||
|
||||
<LI>C++ Programming
|
||||
<UL>
|
||||
<LI><A href="os-faq-cpp.html#start">Doing a kernel in C++</a>
|
||||
<LI><A href="os-faq-cpp.html#rtti">Aiyah! Whats RTTI? (Run Time Type Info)</a>
|
||||
<LI><A href="os-faq-cpp.html#disable_rtti">How do I disable RTTI in GCC?</a>
|
||||
<LI><A href="os-faq-cpp.html#new_delete">Can I use NEW and DELETE in my kernel?</a>
|
||||
</UL>
|
||||
|
||||
<LI>Linkers
|
||||
<UL>
|
||||
<LI><a href="os-faq-linker.html#linkers">Linker Info!</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_jloc">JLoc</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_alink">ALink</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_ld">LD (GNU)</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_tlink">TLink / TLink32 (Borland)</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_link">Link / NLink (Microsoft)</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_val">VAL</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_wlink">WLink (Watcom)</a>
|
||||
<LI><a href="os-faq-linker.html#linkers_comp">A Comparison</a>
|
||||
</UL>
|
||||
|
||||
<LI>Executable File Types
|
||||
<UL>
|
||||
<LI><A href="os-faq-exec.html#exec_files">Executable Files</a>
|
||||
<LI><A href="os-faq-exec.html#exec_exe">EXE (dos "MZ")</a>
|
||||
<LI><A href="os-faq-exec.html#exec_ne">EXE (win16 "NE")</a>
|
||||
<LI><A href="os-faq-exec.html#exec_le">EXE (OS/2 "LE/LX")</a>
|
||||
<LI><A href="os-faq-exec.html#exec_pe">EXE (Win32 "PE")</a>
|
||||
<LI><A href="os-faq-exec.html#exec_elf">ELF</a>
|
||||
<LI><A href="os-faq-exec.html#exec_coff">COFF</a>
|
||||
<LI><A href="os-faq-exec.html#exec_aout">A.OUT</a>
|
||||
</UL>
|
||||
|
||||
<LI>Filesystems
|
||||
<UL>
|
||||
<LI><A href="os-faq-fs.html#file_systems">Tell me about filesystems</a>
|
||||
<LI><A href="os-faq-fs.html#fs_fat">FAT</a>
|
||||
<LI><A href="os-faq-fs.html#fs_vfat">VFAT</a>
|
||||
<LI><A href="os-faq-fs.html#fs_fat32">FAT32</a>
|
||||
<LI><A href="os-faq-fs.html#fs_hpfs">HPFS (High Performance File System)</a>
|
||||
<LI><A href="os-faq-fs.html#fs_ntfs">NTFS (New Technology File System)</a>
|
||||
<LI><A href="os-faq-fs.html#fs_ext2fs">ext2fs (2nd extended file system)</a>
|
||||
<LI><A href="os-faq-fs.html#fs_befs">BeFS</a>
|
||||
<LI><A href="os-faq-fs.html#fs_ffs_amiga">FFS (Amiga)</a>
|
||||
<LI><A href="os-faq-fs.html#fs_ffs_bsd">FFS (BSD)</a>
|
||||
<LI><A href="os-faq-fs.html#fs_nfs">NFS</a>
|
||||
<LI><A href="os-faq-fs.html#fs_afs">AFS</a>
|
||||
<LI><a href="os-faq-fs.html#fs_rfs">RFS</a>
|
||||
<LI><A href="os-faq-fs.html#fs_xfs">XFS</a>
|
||||
</UL>
|
||||
|
||||
<LI>Resources
|
||||
<UL>
|
||||
<LI>Books
|
||||
<ul>
|
||||
<LI><A href="os-faq-books.html#books">Reference Books</a>
|
||||
<LI><A href="os-faq-books.html#book_0">The Indispensable PC Hardware Book</A>
|
||||
<LI><A href="os-faq-books.html#book_1">Operating System Concepts</A>
|
||||
<LI><A href="os-faq-books.html#book_2">Operating Systems : Design and Implementation</A>
|
||||
<LI><A href="os-faq-books.html#book_3">Operating Systems : Internals and Design Principals</A>
|
||||
<LI><a href="os-faq-books.html#book_4">Distributed Operating Systems</a></td>
|
||||
<LI><a href="os-faq-books.html#book_5">Inside Windows NT, Second Edition</a></td>
|
||||
<LI><a href="os-faq-books.html#book_6">Lion's Commentary on UNIX sixth edition, with source code</a></td>
|
||||
<LI><a href="os-faq-books.html#book_7">UNIX Internals: The New Frontiers</a></td>
|
||||
</ul>
|
||||
<LI><A href="os-faq-links.html#small_free_kernels">Some small kernels with source</a>
|
||||
<LI><A href="os-faq-acronyms.html#acronyms">Chip Numbers, Acronyms and Things</A>
|
||||
</UL>
|
||||
|
||||
|
||||
<LI>Third Party Tools
|
||||
<UL>
|
||||
<LI><a href="os-faq-3rd.html#vmware">VMWare PC Emulator</a>
|
||||
<LI><a href="os-faq-3rd.html#bochs">Bochs (i386) PC emulator</a>
|
||||
<LI><a href="os-faq-3rd.html#mtools">MTools (DOS disk image tools)</a>
|
||||
<LI><a href="os-faq-3rd.html#simics">SimICS (SunSparc Simulator)</a>
|
||||
</UL>
|
||||
|
||||
<LI>Contributors
|
||||
<UL>
|
||||
<LI><A href="os-faq-contributors.html#contributors">Who helped with the FAQ</a>
|
||||
</UL>
|
||||
|
||||
<LI>Todo
|
||||
<UL>
|
||||
<LI><A href="os-faq-todo.html#todo">The TODO list</a>
|
||||
</UL>
|
||||
|
||||
</UL>
|
||||
|
||||
|
||||
<!-- *************** DHTML Outline (end) ***************** -->
|
||||
|
||||
<hr color="#0000FF" noshade>
|
||||
|
||||
<p>
|
||||
Whats New!
|
||||
<ul>
|
||||
<li>trying to add more material on various C/C++ compilers
|
||||
<li>Added VMWare to the tools list
|
||||
<li>Removed the link to the free Intel Developer CD's (offer is no longer valid)
|
||||
<li>More info on some boot mangaers (xoxsl, system commander, boot magic, etc)
|
||||
<li>Fixed those nasty link colours
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
|
||||
<hr color="#0000FF" noshade>
|
||||
|
||||
<P>
|
||||
<TABLE border="0" cellspacing="1" cellpadding="10" align="CENTER">
|
||||
<CAPTION>The OS-FAQ is a member of the OS Web Ring
|
||||
</CAPTION>
|
||||
<TR>
|
||||
<TD><A href="http://www.webring.org/cgi-bin/webring?ring=os&id=31&next" target="_top">Next</A>
|
||||
</TD>
|
||||
<TD><A href="http://www.webring.org/cgi-bin/webring?ring=os&id=31&skip" target="_top">Skip Next</A>
|
||||
</TD>
|
||||
<TD><A href="http://www.webring.org/cgi-bin/webring?ring=os&id=31&next5" target="_top">Next 5</A><BR>
|
||||
</TD>
|
||||
<TD><A href="http://www.webring.org/cgi-bin/webring?ring=os&id=31&list" target="_top">List Sites</A>
|
||||
</TD>
|
||||
</TR>
|
||||
</TABLE></P>
|
||||
|
||||
|
||||
</BODY>
|
||||
</HTML>
|
||||
BIN
study/sabre/os/files/Misc/os-faq/os-faq.zip
Normal file
BIN
study/sabre/os/files/Misc/os-faq/os-faq.zip
Normal file
Binary file not shown.
BIN
study/sabre/os/files/Misc/vade.mecum.2.pdf
Normal file
BIN
study/sabre/os/files/Misc/vade.mecum.2.pdf
Normal file
Binary file not shown.
Reference in New Issue
Block a user