7230 lines
434 KiB
HTML
7230 lines
434 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 -->
|
|
<!-- Copyright (c) 2001 The Open Group, All Rights Reserved -->
|
|
<title>Rationale</title>
|
|
</head>
|
|
<body>
|
|
|
|
<basefont size="3">
|
|
|
|
|
|
<center><font size="2">The Open Group Base Specifications Issue 6<br>
|
|
IEEE Std 1003.1-2001<br>
|
|
Copyright © 2001 The IEEE and The Open Group</font></center>
|
|
|
|
<hr size="2" noshade>
|
|
<h3><a name="tag_03_02"></a>General Information</h3>
|
|
|
|
<h4><a name="tag_03_02_01"></a>Use and Implementation of Functions</h4>
|
|
|
|
<p>The information concerning the use of functions was adapted from a description in the ISO C standard. Here is an example of
|
|
how an application program can protect itself from functions that may or may not be macros, rather than true functions:</p>
|
|
|
|
<p>The <a href="../functions/atoi.html"><i>atoi</i>()</a> function may be used in any of several ways:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>By use of its associated header (possibly generating a macro expansion):</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#include <stdlib.h>
|
|
/* ... */
|
|
i = atoi(str);
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
</li>
|
|
|
|
<li>
|
|
<p>By use of its associated header (assuredly generating a true function call):</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#include <stdlib.h>
|
|
#undef atoi
|
|
/* ... */
|
|
i = atoi(str);
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>or:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#include <stdlib.h>
|
|
/* ... */
|
|
i = (atoi) (str);
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
</li>
|
|
|
|
<li>
|
|
<p>By explicit declaration:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>extern int atoi (const char *);
|
|
/* ... */
|
|
i = atoi(str);
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
</li>
|
|
|
|
<li>
|
|
<p>By implicit declaration:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>/* ... */
|
|
i = atoi(str);
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>(Assuming no function prototype is in scope. This is not allowed by the ISO C standard for functions with variable
|
|
arguments; furthermore, parameter type conversion "widening" is subject to different rules in this case.)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Note that the ISO C standard reserves names starting with <tt>'_'</tt> for the compiler. Therefore, the compiler could, for
|
|
example, implement an intrinsic, built-in function <i>_asm_builtin_atoi</i>(), which it recognized and expanded into inline
|
|
assembly code. Then, in <a href="../basedefs/stdlib.h.html"><i><stdlib.h></i></a>, there could be the following:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#define atoi(X) _asm_builtin_atoi(X)
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The user's "normal" call to <a href="../functions/atoi.html"><i>atoi</i>()</a> would then be expanded inline, but the
|
|
implementor would also be required to provide a callable function named <a href="../functions/atoi.html"><i>atoi</i>()</a> for use
|
|
when the application requires it; for example, if its address is to be stored in a function pointer variable.</p>
|
|
|
|
<h4><a name="tag_03_02_02"></a>The Compilation Environment</h4>
|
|
|
|
<h5><a name="tag_03_02_02_01"></a>POSIX.1 Symbols</h5>
|
|
|
|
<p>This and the following section address the issue of "name space pollution". The ISO C standard requires that the name
|
|
space beyond what it reserves not be altered except by explicit action of the application writer. This section defines the actions
|
|
to add the POSIX.1 symbols for those headers where both the ISO C standard and POSIX.1 need to define symbols, and also where
|
|
the XSI extension extends the base standard.</p>
|
|
|
|
<p>When headers are used to provide symbols, there is a potential for introducing symbols that the application writer cannot
|
|
predict. Ideally, each header should only contain one set of symbols, but this is not practical for historical reasons. Thus, the
|
|
concept of feature test macros is included. Two feature test macros are explicitly defined by IEEE Std 1003.1-2001; it is
|
|
expected that future revisions may add to this. <basefont size="2"></p>
|
|
|
|
<dl>
|
|
<dt><b>Note:</b></dt>
|
|
|
|
<dd>Feature test macros allow an application to announce to the implementation its desire to have certain symbols and prototypes
|
|
exposed. They should not be confused with the version test macros and constants for options in <a href=
|
|
"../basedefs/unistd.h.html"><i><unistd.h></i></a> which are the implementation's way of announcing functionality to the
|
|
application.</dd>
|
|
</dl>
|
|
|
|
<basefont size="3">
|
|
|
|
<p>It is further intended that these feature test macros apply only to the headers specified by IEEE Std 1003.1-2001.
|
|
Implementations are expressly permitted to make visible symbols not specified by IEEE Std 1003.1-2001, within both
|
|
POSIX.1 and other headers, under the control of feature test macros that are not defined by IEEE Std 1003.1-2001.</p>
|
|
|
|
<h5><a name="tag_03_02_02_02"></a>The _POSIX_C_SOURCE Feature Test Macro</h5>
|
|
|
|
<p>Since _POSIX_SOURCE specified by the POSIX.1-1990 standard did not have a value associated with it, the _POSIX_C_SOURCE macro
|
|
replaces it, allowing an application to inform the system of the revision of the standard to which it conforms. This symbol will
|
|
allow implementations to support various revisions of IEEE Std 1003.1-2001 simultaneously. For instance, when either
|
|
_POSIX_SOURCE is defined or _POSIX_C_SOURCE is defined as 1, the system should make visible the same name space as permitted and
|
|
required by the POSIX.1-1990 standard. When _POSIX_C_SOURCE is defined, the state of _POSIX_SOURCE is completely irrelevant.</p>
|
|
|
|
<p>It is expected that C bindings to future POSIX standards will define new values for _POSIX_C_SOURCE, with each new value
|
|
reserving the name space for that new standard, plus all earlier POSIX standards.</p>
|
|
|
|
<h5><a name="tag_03_02_02_03"></a>The _XOPEN_SOURCE Feature Test Macro</h5>
|
|
|
|
<p>The feature test macro _XOPEN_SOURCE is provided as the announcement mechanism for the application that it requires
|
|
functionality from the Single UNIX Specification. _XOPEN_SOURCE must be defined to the value 600 before the inclusion of any header
|
|
to enable the functionality in the Single UNIX Specification. Its definition subsumes the use of _POSIX_SOURCE and
|
|
_POSIX_C_SOURCE.</p>
|
|
|
|
<p>An extract of code from a conforming application, that appears before any <b>#include</b> statements, is given below:</p>
|
|
|
|
<pre>
|
|
<tt>#define _XOPEN_SOURCE 600 /* Single UNIX Specification, Version 3 */
|
|
<br>
|
|
#include ...
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Note that the definition of _XOPEN_SOURCE with the value 600 makes the definition of _POSIX_C_SOURCE redundant and it can safely
|
|
be omitted.</p>
|
|
|
|
<h5><a name="tag_03_02_02_04"></a>The Name Space</h5>
|
|
|
|
<p>The reservation of identifiers is paraphrased from the ISO C standard. The text is included because it needs to be part of
|
|
IEEE Std 1003.1-2001, regardless of possible changes in future versions of the ISO C standard.</p>
|
|
|
|
<p>These identifiers may be used by implementations, particularly for feature test macros. Implementations should not use feature
|
|
test macro names that might be reasonably used by a standard.</p>
|
|
|
|
<p>Including headers more than once is a reasonably common practice, and it should be carried forward from the ISO C standard.
|
|
More significantly, having definitions in more than one header is explicitly permitted. Where the potential declaration is
|
|
"benign" (the same definition twice) the declaration can be repeated, if that is permitted by the compiler. (This is usually true
|
|
of macros, for example.) In those situations where a repetition is not benign (for example, <b>typedef</b>s), conditional
|
|
compilation must be used. The situation actually occurs both within the ISO C standard and within POSIX.1: <b>time_t</b>
|
|
should be in <a href="../basedefs/sys/types.h.html"><i><sys/types.h></i></a>, and the ISO C standard mandates that it be
|
|
in <a href="../basedefs/time.h.html"><i><time.h></i></a>.</p>
|
|
|
|
<p>The area of name space pollution <i>versus</i> additions to structures is difficult because of the macro structure of C. The
|
|
following discussion summarizes all the various problems with and objections to the issue.</p>
|
|
|
|
<p>Note the phrase "user-defined macro". Users are not permitted to define macro names (or any other name) beginning with
|
|
<tt>"_[A-Z_]"</tt> . Thus, the conflict cannot occur for symbols reserved to the vendor's name space, and the permission to add
|
|
fields automatically applies, without qualification, to those symbols.</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Data structures (and unions) need to be defined in headers by implementations to meet certain requirements of POSIX.1 and the
|
|
ISO C standard.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The structures defined by POSIX.1 are typically minimal, and any practical implementation would wish to add fields to these
|
|
structures either to hold additional related information or for backwards-compatibility (or both). Future standards (and <i>de
|
|
facto</i> standards) would also wish to add to these structures. Issues of field alignment make it impractical (at least in the
|
|
general case) to simply omit fields when they are not defined by the particular standard involved.</p>
|
|
|
|
<p>The <b>dirent</b> structure is an example of such a minimal structure (although one could argue about whether the other fields
|
|
need visible names). The <i>st_rdev</i> field of most implementations' <b>stat</b> structure is a common example where extension is
|
|
needed and where a conflict could occur.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Fields in structures are in an independent name space, so the addition of such fields presents no problem to the C language
|
|
itself in that such names cannot interact with identically named user symbols because access is qualified by the specific structure
|
|
name.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>There is an exception to this: macro processing is done at a lexical level. Thus, symbols added to a structure might be
|
|
recognized as user-provided macro names at the location where the structure is declared. This only can occur if the user-provided
|
|
name is declared as a macro before the header declaring the structure is included. The user's use of the name after the declaration
|
|
cannot interfere with the structure because the symbol is hidden and only accessible through access to the structure. Presumably,
|
|
the user would not declare such a macro if there was an intention to use that field name.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Macros from the same or a related header might use the additional fields in the structure, and those field names might also
|
|
collide with user macros. Although this is a less frequent occurrence, since macros are expanded at the point of use, no constraint
|
|
on the order of use of names can apply.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>An "obvious" solution of using names in the reserved name space and then redefining them as macros when they should be visible
|
|
does not work because this has the effect of exporting the symbol into the general name space. For example, given a (hypothetical)
|
|
system-provided header <i><h.h></i>, and two parts of a C program in <b>a.c</b> and <b>b.c</b>, in header
|
|
<i><h.h></i>:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>struct foo {
|
|
int __i;
|
|
}
|
|
<br>
|
|
#ifdef _FEATURE_TEST
|
|
#define i __i;
|
|
#endif
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>In file <b>a.c</b>:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#include h.h
|
|
extern int i;
|
|
...
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>In file <b>b.c</b>:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>extern int i;
|
|
...
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The symbol that the user thinks of as <i>i</i> in both files has an external name of <i>__i</i> in <b>a.c</b>; the same symbol
|
|
<i>i</i> in <b>b.c</b> has an external name <i>i</i> (ignoring any hidden manipulations the compiler might perform on the names).
|
|
This would cause a mysterious name resolution problem when <b>a.o</b> and <b>b.o</b> are linked.</p>
|
|
|
|
<p>Simply avoiding definition then causes alignment problems in the structure.</p>
|
|
|
|
<p>A structure of the form:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>struct foo {
|
|
union {
|
|
int __i;
|
|
#ifdef _FEATURE_TEST
|
|
int i;
|
|
#endif
|
|
} __ii;
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>does not work because the name of the logical field <i>i</i> is <i>__ii.i</i>, and introduction of a macro to restore the
|
|
logical name immediately reintroduces the problem discussed previously (although its manifestation might be more immediate because
|
|
a syntax error would result if a recursive macro did not cause it to fail first).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A more workable solution would be to declare the structure:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>struct foo {
|
|
#ifdef _FEATURE_TEST
|
|
int i;
|
|
#else
|
|
int __i;
|
|
#endif
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>However, if a macro (particularly one required by a standard) is to be defined that uses this field, two must be defined: one
|
|
that uses <i>i</i>, the other that uses <i>__i</i>. If more than one additional field is used in a macro and they are conditional
|
|
on distinct combinations of features, the complexity goes up as 2<small><sup><i>n</i></sup></small>.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>All this leaves a difficult situation: vendors must provide very complex headers to deal with what is conceptually simple and
|
|
safe-adding a field to a structure. It is the possibility of user-provided macros with the same name that makes this difficult.</p>
|
|
|
|
<p>Several alternatives were proposed that involved constraining the user's access to part of the name space available to the user
|
|
(as specified by the ISO C standard). In some cases, this was only until all the headers had been included. There were two
|
|
proposals discussed that failed to achieve consensus:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Limiting it for the whole program.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Restricting the use of identifiers containing only uppercase letters until after all system headers had been included. It was
|
|
also pointed out that because macros might wish to access fields of a structure (and macro expansion occurs totally at point of
|
|
use) restricting names in this way would not protect the macro expansion, and thus the solution was inadequate.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>It was finally decided that reservation of symbols would occur, but as constrained.</p>
|
|
|
|
<p>The current wording also allows the addition of fields to a structure, but requires that user macros of the same name not
|
|
interfere. This allows vendors to do one of the following:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Not create the situation (do not extend the structures with user-accessible names or use the solution in (7) above)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Extend their compilers to allow some way of adding names to structures and macros safely</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>There are at least two ways that the compiler might be extended: add new preprocessor directives that turn off and on macro
|
|
expansion for certain symbols (without changing the value of the macro) and a function or lexical operation that suppresses
|
|
expansion of a word. The latter seems more flexible, particularly because it addresses the problem in macros as well as in
|
|
declarations.</p>
|
|
|
|
<p>The following seems to be a possible implementation extension to the C language that will do this: any token that during macro
|
|
expansion is found to be preceded by three <tt>'#'</tt> symbols shall not be further expanded in exactly the same way as described
|
|
for macros that expand to their own name as in Section 3.8.3.4 of the ISO C standard. A vendor may also wish to implement this
|
|
as an operation that is lexically a function, which might be implemented as:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#define __safe_name(x) ###x
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>Using a function notation would insulate vendors from changes in standards until such a functionality is standardized (if ever).
|
|
Standardization of such a function would be valuable because it would then permit third parties to take advantage of it portably in
|
|
software they may supply.</p>
|
|
|
|
<p>The symbols that are "explicitly permitted, but not required by IEEE Std 1003.1-2001" include those classified
|
|
below. (That is, the symbols classified below might, but are not required to, be present when _POSIX_C_SOURCE is defined to have
|
|
the value 200112L.)</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Symbols in <a href="../basedefs/limits.h.html"><i><limits.h></i></a> and <a href=
|
|
"../basedefs/unistd.h.html"><i><unistd.h></i></a> that are defined to indicate support for options or limits that are
|
|
constant at compile-time</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Symbols in the name space reserved for the implementation by the ISO C standard</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Symbols in a name space reserved for a particular type of extension (for example, type names ending with <b>_t</b> in <a href=
|
|
"../basedefs/sys/types.h.html"><i><sys/types.h></i></a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Additional members of structures or unions whose names do not reduce the name space reserved for applications</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Since both implementations and future revisions of IEEE Std 1003.1 and other POSIX standards may use symbols in the
|
|
reserved spaces described in these tables, there is a potential for name space clashes. To avoid future name space clashes when
|
|
adding symbols, implementations should not use the posix_, POSIX_, or _POSIX_ prefixes.</p>
|
|
|
|
<h4><a name="tag_03_02_03"></a>Error Numbers</h4>
|
|
|
|
<p>It was the consensus of the standard developers that to allow the conformance document to state that an error occurs and under
|
|
what conditions, but to disallow a statement that it never occurs, does not make sense. It could be implied by the current wording
|
|
that this is allowed, but to reduce the possibility of future interpretation requests, it is better to make an explicit
|
|
statement.</p>
|
|
|
|
<p>The ISO C standard requires that <i>errno</i> be an assignable lvalue. Originally, the definition in POSIX.1 was stricter
|
|
than that in the ISO C standard, <b>extern int</b> <i>errno</i>, in order to support historical usage. In a multi-threaded
|
|
environment, implementing <i>errno</i> as a global variable results in non-deterministic results when accessed. It is required,
|
|
however, that <i>errno</i> work as a per-thread error reporting mechanism. In order to do this, a separate <i>errno</i> value has
|
|
to be maintained for each thread. The following section discusses the various alternative solutions that were considered.</p>
|
|
|
|
<p>In order to avoid this problem altogether for new functions, these functions avoid using <i>errno</i> and, instead, return the
|
|
error number directly as the function return value; a return value of zero indicates that no error was detected.</p>
|
|
|
|
<p>For any function that can return errors, the function return value is not used for any purpose other than for reporting errors.
|
|
Even when the output of the function is scalar, it is passed through a function argument. While it might have been possible to
|
|
allow some scalar outputs to be coded as negative function return values and mixed in with positive error status returns, this was
|
|
rejected-using the return value for a mixed purpose was judged to be of limited use and error prone.</p>
|
|
|
|
<p>Checking the value of <i>errno</i> alone is not sufficient to determine the existence or type of an error, since it is not
|
|
required that a successful function call clear <i>errno</i>. The variable <i>errno</i> should only be examined when the return
|
|
value of a function indicates that the value of <i>errno</i> is meaningful. In that case, the function is required to set the
|
|
variable to something other than zero.</p>
|
|
|
|
<p>The variable <i>errno</i> is never set to zero by any function call; to do so would contradict the ISO C standard.</p>
|
|
|
|
<p>POSIX.1 requires (in the ERRORS sections of function descriptions) certain error values to be set in certain conditions because
|
|
many existing applications depend on them. Some error numbers, such as [EFAULT], are entirely implementation-defined and are noted
|
|
as such in their description in the ERRORS section. This section otherwise allows wide latitude to the implementation in handling
|
|
error reporting.</p>
|
|
|
|
<p>Some of the ERRORS sections in IEEE Std 1003.1-2001 have two subsections. The first:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
"The function shall fail if:''
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>could be called the "mandatory" section.</p>
|
|
|
|
<p>The second:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
"The function may fail if:''
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>could be informally known as the "optional" section.</p>
|
|
|
|
<p>Attempting to infer the quality of an implementation based on whether it detects optional error conditions is not useful.</p>
|
|
|
|
<p>Following each one-word symbolic name for an error, there is a description of the error. The rationale for some of the symbolic
|
|
names follows:</p>
|
|
|
|
<dl compact>
|
|
<dt>[ECANCELED]</dt>
|
|
|
|
<dd>This spelling was chosen as being more common.</dd>
|
|
|
|
<dt>[EFAULT]</dt>
|
|
|
|
<dd>Most historical implementations do not catch an error and set <i>errno</i> when an invalid address is given to the functions <a
|
|
href="../functions/wait.html"><i>wait</i>()</a>, <a href="../functions/time.html"><i>time</i>()</a>, or <a href=
|
|
"../functions/times.html"><i>times</i>()</a>. Some implementations cannot reliably detect an invalid address. And most systems that
|
|
detect invalid addresses will do so only for a system call, not for a library routine.</dd>
|
|
|
|
<dt>[EFTYPE]</dt>
|
|
|
|
<dd>This error code was proposed in earlier proposals as "Inappropriate operation for file type", meaning that the operation
|
|
requested is not appropriate for the file specified in the function call. This code was proposed, although the same idea was
|
|
covered by [ENOTTY], because the connotations of the name would be misleading. It was pointed out that the <a href=
|
|
"../functions/fcntl.html"><i>fcntl</i>()</a> function uses the error code [EINVAL] for this notion, and hence all instances of
|
|
[EFTYPE] were changed to this code.</dd>
|
|
|
|
<dt>[EINTR]</dt>
|
|
|
|
<dd>POSIX.1 prohibits conforming implementations from restarting interrupted system calls of conforming applications unless the
|
|
SA_RESTART flag is in effect for the signal. However, it does not require that [EINTR] be returned when another legitimate value
|
|
may be substituted; for example, a partial transfer count when <a href="../functions/read.html"><i>read</i>()</a> or <a href=
|
|
"../functions/write.html"><i>write</i>()</a> are interrupted. This is only given when the signal-catching function returns normally
|
|
as opposed to returns by mechanisms like <a href="../functions/longjmp.html"><i>longjmp</i>()</a> or <a href=
|
|
"../functions/siglongjmp.html"><i>siglongjmp</i>()</a>.</dd>
|
|
|
|
<dt>[ELOOP]</dt>
|
|
|
|
<dd>In specifying conditions under which implementations would generate this error, the following goals were considered:
|
|
|
|
<ul>
|
|
<li>
|
|
<p>To ensure that actual loops are detected, including loops that result from symbolic links across distributed file systems.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>To ensure that during pathname resolution an application can rely on the ability to follow at least {SYMLOOP_MAX} symbolic links
|
|
in the absence of a loop.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>To allow implementations to provide the capability of traversing more than {SYMLOOP_MAX} symbolic links in the absence of a
|
|
loop.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>To allow implementations to detect loops and generate the error prior to encountering {SYMLOOP_MAX} symbolic links.</p>
|
|
</li>
|
|
</ul>
|
|
</dd>
|
|
|
|
<dt>[ENAMETOOLONG]</dt>
|
|
|
|
<dd><br>
|
|
When a symbolic link is encountered during pathname resolution, the contents of that symbolic link are used to create a new
|
|
pathname. The standard developers intended to allow, but not require, that implementations enforce the restriction of {PATH_MAX} on
|
|
the result of this pathname substitution.</dd>
|
|
|
|
<dt>[ENOMEM]</dt>
|
|
|
|
<dd>The term "main memory" is not used in POSIX.1 because it is implementation-defined.</dd>
|
|
|
|
<dt>[ENOTSUP]</dt>
|
|
|
|
<dd>This error code is to be used when an implementation chooses to implement the required functionality of
|
|
IEEE Std 1003.1-2001 but does not support optional facilities defined by IEEE Std 1003.1-2001. The return of
|
|
[ENOSYS] is to be taken to indicate that the function of the interface is not supported at all; the function will always fail with
|
|
this error code.</dd>
|
|
|
|
<dt>[ENOTTY]</dt>
|
|
|
|
<dd>The symbolic name for this error is derived from a time when device control was done by <a href=
|
|
"../functions/ioctl.html"><i>ioctl</i>()</a> and that operation was only permitted on a terminal interface. The term "TTY" is
|
|
derived from "teletypewriter", the devices to which this error originally applied.</dd>
|
|
|
|
<dt>[EOVERFLOW]</dt>
|
|
|
|
<dd>Most of the uses of this error code are related to large file support. Typically, these cases occur on systems which support
|
|
multiple programming environments with different sizes for <b>off_t</b>, but they may also occur in connection with remote file
|
|
systems.
|
|
|
|
<p>In addition, when different programming environments have different widths for types such as <b>int</b> and <b>uid_t</b>,
|
|
several functions may encounter a condition where a value in a particular environment is too wide to be represented. In that case,
|
|
this error should be raised. For example, suppose the currently running process has 64-bit <b>int</b>, and file descriptor
|
|
9223372036854775807 is open and does not have the close-on- <i>exec</i> flag set. If the process then uses <a href=
|
|
"../functions/execl.html"><i>execl</i>()</a> to <i>exec</i> a file compiled in a programming environment with 32-bit <b>int</b>,
|
|
the call to <a href="../functions/execl.html"><i>execl</i>()</a> can fail with <i>errno</i> set to [EOVERFLOW]. A similar failure
|
|
can occur with <a href="../functions/execl.html"><i>execl</i>()</a> if any of the user IDs or any of the group IDs to be assigned
|
|
to the new process image are out of range for the executed file's programming environment.</p>
|
|
|
|
<p>Note, however, that this condition cannot occur for functions that are explicitly described as always being successful, such as
|
|
<a href="../functions/getpid.html"><i>getpid</i>()</a>.</p>
|
|
</dd>
|
|
|
|
<dt>[EPIPE]</dt>
|
|
|
|
<dd>This condition normally generates the signal SIGPIPE; the error is returned if the signal does not terminate the process.</dd>
|
|
|
|
<dt>[EROFS]</dt>
|
|
|
|
<dd>In historical implementations, attempting to <a href="../functions/unlink.html"><i>unlink</i>()</a> or <a href=
|
|
"../functions/rmdir.html"><i>rmdir</i>()</a> a mount point would generate an [EBUSY] error. An implementation could be envisioned
|
|
where such an operation could be performed without error. In this case, if <i>either</i> the directory entry or the actual data
|
|
structures reside on a read-only file system, [EROFS] is the appropriate error to generate. (For example, changing the link count
|
|
of a file on a read-only file system could not be done, as is required by <a href="../functions/unlink.html"><i>unlink</i>()</a>,
|
|
and thus an error should be reported.)</dd>
|
|
</dl>
|
|
|
|
<p>Three error numbers, [EDOM], [EILSEQ], and [ERANGE], were added to this section primarily for consistency with the ISO C
|
|
standard.</p>
|
|
|
|
<h5><a name="tag_03_02_03_01"></a>Alternative Solutions for Per-Thread errno</h5>
|
|
|
|
<p>The usual implementation of <i>errno</i> as a single global variable does not work in a multi-threaded environment. In such an
|
|
environment, a thread may make a POSIX.1 call and get a -1 error return, but before that thread can check the value of
|
|
<i>errno</i>, another thread might have made a second POSIX.1 call that also set <i>errno</i>. This behavior is unacceptable in
|
|
robust programs. There were a number of alternatives that were considered for handling the <i>errno</i> problem:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Implement <i>errno</i> as a per-thread integer variable.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Implement <i>errno</i> as a service that can access the per-thread error number.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Change all POSIX.1 calls to accept an extra status argument and avoid setting <i>errno</i>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Change all POSIX.1 calls to raise a language exception.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>The first option offers the highest level of compatibility with existing practice but requires special support in the linker,
|
|
compiler, and/or virtual memory system to support the new concept of thread private variables. When compared with current practice,
|
|
the third and fourth options are much cleaner, more efficient, and encourage a more robust programming style, but they require new
|
|
versions of all of the POSIX.1 functions that might detect an error. The second option offers compatibility with existing code that
|
|
uses the <a href="../basedefs/errno.h.html"><i><errno.h></i></a> header to define the symbol <i>errno</i>. In this option,
|
|
<i>errno</i> may be a macro defined:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#define errno (*__errno())
|
|
extern int *__errno();
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>This option may be implemented as a per-thread variable whereby an <i>errno</i> field is allocated in the user space object
|
|
representing a thread, and whereby the function <i>__errno</i>() makes a system call to determine the location of its user space
|
|
object and returns the address of the <i>errno</i> field of that object. Another implementation, one that avoids calling the
|
|
kernel, involves allocating stacks in chunks. The stack allocator keeps a side table indexed by chunk number containing a pointer
|
|
to the thread object that uses that chunk. The <i>__errno</i>() function then looks at the stack pointer, determines the chunk
|
|
number, and uses that as an index into the chunk table to find its thread object and thus its private value of <i>errno</i>. On
|
|
most architectures, this can be done in four to five instructions. Some compilers may wish to implement <i>__errno</i>() inline to
|
|
improve performance.</p>
|
|
|
|
<h5><a name="tag_03_02_03_02"></a>Disallowing Return of the [EINTR] Error Code</h5>
|
|
|
|
<p>Many blocking interfaces defined by IEEE Std 1003.1-2001 may return [EINTR] if interrupted during their execution by a
|
|
signal handler. Blocking interfaces introduced under the Threads option do not have this property. Instead, they require that the
|
|
interface appear to be atomic with respect to interruption. In particular, clients of blocking interfaces need not handle any
|
|
possible [EINTR] return as a special case since it will never occur. If it is necessary to restart operations or complete
|
|
incomplete operations following the execution of a signal handler, this is handled by the implementation, rather than by the
|
|
application.</p>
|
|
|
|
<p>Requiring applications to handle [EINTR] errors on blocking interfaces has been shown to be a frequent source of often
|
|
unreproducible bugs, and it adds no compelling value to the available functionality. Thus, blocking interfaces introduced for use
|
|
by multi-threaded programs do not use this paradigm. In particular, in none of the functions <a href=
|
|
"../functions/flockfile.html"><i>flockfile</i>()</a>, <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a>, <a href=
|
|
"../functions/pthread_cond_wait.html"><i>pthread_cond_wait</i>()</a>, <a href=
|
|
"../functions/pthread_join.html"><i>pthread_join</i>()</a>, <a href=
|
|
"../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a>, and <a href=
|
|
"../functions/sigwait.html"><i>sigwait</i>()</a> did providing [EINTR] returns add value, or even particularly make sense. Thus,
|
|
these functions do not provide for an [EINTR] return, even when interrupted by a signal handler. The same arguments can be applied
|
|
to <a href="../functions/sem_wait.html"><i>sem_wait</i>()</a>, <a href="../functions/sem_trywait.html"><i>sem_trywait</i>()</a>, <a
|
|
href="../functions/sigwaitinfo.html"><i>sigwaitinfo</i>()</a>, and <a href=
|
|
"../functions/sigtimedwait.html"><i>sigtimedwait</i>()</a>, but implementations are permitted to return [EINTR] error codes for
|
|
these functions for compatibility with earlier versions of IEEE Std 1003.1. Applications cannot rely on calls to these
|
|
functions returning [EINTR] error codes when signals are delivered to the calling thread, but they should allow for the
|
|
possibility.</p>
|
|
|
|
<h5><a name="tag_03_02_03_03"></a>Additional Error Numbers</h5>
|
|
|
|
<p>The ISO C standard defines the name space for implementations to add additional error numbers.</p>
|
|
|
|
<h4><a name="tag_03_02_04"></a>Signal Concepts</h4>
|
|
|
|
<p>Historical implementations of signals, using the <a href="../functions/signal.html"><i>signal</i>()</a> function, have
|
|
shortcomings that make them unreliable for many application uses. Because of this, a new signal mechanism, based very closely on
|
|
the one of 4.2 BSD and 4.3 BSD, was added to POSIX.1.</p>
|
|
|
|
<h5><a name="tag_03_02_04_01"></a>Signal Names</h5>
|
|
|
|
<p>The restriction on the actual type used for <b>sigset_t</b> is intended to guarantee that these objects can always be assigned,
|
|
have their address taken, and be passed as parameters by value. It is not intended that this type be a structure including pointers
|
|
to other data structures, as that could impact the portability of applications performing such operations. A reasonable
|
|
implementation could be a structure containing an array of some integer type.</p>
|
|
|
|
<p>The signals described in IEEE Std 1003.1-2001 must have unique values so that they may be named as parameters of
|
|
<b>case</b> statements in the body of a C-language <b>switch</b> clause. However, implementation-defined signals may have values
|
|
that overlap with each other or with signals specified in IEEE Std 1003.1-2001. An example of this is SIGABRT, which
|
|
traditionally overlaps some other signal, such as SIGIOT.</p>
|
|
|
|
<p>SIGKILL, SIGTERM, SIGUSR1, and SIGUSR2 are ordinarily generated only through the explicit use of the <a href=
|
|
"../functions/kill.html"><i>kill</i>()</a> function, although some implementations generate SIGKILL under extraordinary
|
|
circumstances. SIGTERM is traditionally the default signal sent by the <a href="../utilities/kill.html"><i>kill</i></a>
|
|
command.</p>
|
|
|
|
<p>The signals SIGBUS, SIGEMT, SIGIOT, SIGTRAP, and SIGSYS were omitted from POSIX.1 because their behavior is
|
|
implementation-defined and could not be adequately categorized. Conforming implementations may deliver these signals, but must
|
|
document the circumstances under which they are delivered and note any restrictions concerning their delivery. The signals SIGFPE,
|
|
SIGILL, and SIGSEGV are similar in that they also generally result only from programming errors. They were included in POSIX.1
|
|
because they do indicate three relatively well-categorized conditions. They are all defined by the ISO C standard and thus
|
|
would have to be defined by any system with an ISO C standard binding, even if not explicitly included in POSIX.1.</p>
|
|
|
|
<p>There is very little that a Conforming POSIX.1 Application can do by catching, ignoring, or masking any of the signals SIGILL,
|
|
SIGTRAP, SIGIOT, SIGEMT, SIGBUS, SIGSEGV, SIGSYS, or SIGFPE. They will generally be generated by the system only in cases of
|
|
programming errors. While it may be desirable for some robust code (for example, a library routine) to be able to detect and
|
|
recover from programming errors in other code, these signals are not nearly sufficient for that purpose. One portable use that does
|
|
exist for these signals is that a command interpreter can recognize them as the cause of a process' termination (with <a href=
|
|
"../functions/wait.html"><i>wait</i>()</a>) and print an appropriate message. The mnemonic tags for these signals are derived from
|
|
their PDP-11 origin.</p>
|
|
|
|
<p>The signals SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU, and SIGCONT are provided for job control and are unchanged from 4.2 BSD. The
|
|
signal SIGCHLD is also typically used by job control shells to detect children that have terminated or, as in 4.2 BSD, stopped.</p>
|
|
|
|
<p>Some implementations, including System V, have a signal named SIGCLD, which is similar to SIGCHLD in 4.2 BSD. POSIX.1
|
|
permits implementations to have a single signal with both names. POSIX.1 carefully specifies ways in which conforming applications
|
|
can avoid the semantic differences between the two different implementations. The name SIGCHLD was chosen for POSIX.1 because most
|
|
current application usages of it can remain unchanged in conforming applications. SIGCLD in System V has more cases of
|
|
semantics that POSIX.1 does not specify, and thus applications using it are more likely to require changes in addition to the name
|
|
change.</p>
|
|
|
|
<p>The signals SIGUSR1 and SIGUSR2 are commonly used by applications for notification of exceptional behavior and are described as
|
|
"reserved as application-defined" so that such use is not prohibited. Implementations should not generate SIGUSR1 or SIGUSR2,
|
|
except when explicitly requested by <a href="../functions/kill.html"><i>kill</i>()</a>. It is recommended that libraries not use
|
|
these two signals, as such use in libraries could interfere with their use by applications calling the libraries. If such use is
|
|
unavoidable, it should be documented. It is prudent for non-portable libraries to use non-standard signals to avoid conflicts with
|
|
use of standard signals by portable libraries.</p>
|
|
|
|
<p>There is no portable way for an application to catch or ignore non-standard signals. Some implementations define the range of
|
|
signal numbers, so applications can install signal-catching functions for all of them. Unfortunately, implementation-defined
|
|
signals often cause problems when caught or ignored by applications that do not understand the reason for the signal. While the
|
|
desire exists for an application to be more robust by handling all possible signals (even those only generated by <a href=
|
|
"../functions/kill.html"><i>kill</i>()</a>), no existing mechanism was found to be sufficiently portable to include in POSIX.1. The
|
|
value of such a mechanism, if included, would be diminished given that SIGKILL would still not be catchable.</p>
|
|
|
|
<p>A number of new signal numbers are reserved for applications because the two user signals defined by POSIX.1 are insufficient
|
|
for many realtime applications. A range of signal numbers is specified, rather than an enumeration of additional reserved signal
|
|
names, because different applications and application profiles will require a different number of application signals. It is not
|
|
desirable to burden all application domains and therefore all implementations with the maximum number of signals required by all
|
|
possible applications. Note that in this context, signal numbers are essentially different signal priorities.</p>
|
|
|
|
<p>The relatively small number of required additional signals, {_POSIX_RTSIG_MAX}, was chosen so as not to require an unreasonably
|
|
large signal mask/set. While this number of signals defined in POSIX.1 will fit in a single 32-bit word signal mask, it is
|
|
recognized that most existing implementations define many more signals than are specified in POSIX.1 and, in fact, many
|
|
implementations have already exceeded 32 signals (including the "null signal"). Support of {_POSIX_RTSIG_MAX} additional signals
|
|
may push some implementation over the single 32-bit word line, but is unlikely to push any implementations that are already over
|
|
that line beyond the 64-signal line.</p>
|
|
|
|
<h5><a name="tag_03_02_04_02"></a>Signal Generation and Delivery</h5>
|
|
|
|
<p>The terms defined in this section are not used consistently in documentation of historical systems. Each signal can be
|
|
considered to have a lifetime beginning with generation and ending with delivery or acceptance. The POSIX.1 definition of
|
|
"delivery" does not exclude ignored signals; this is considered a more consistent definition. This revised text in several parts
|
|
of IEEE Std 1003.1-2001 clarifies the distinct semantics of asynchronous signal delivery and synchronous signal
|
|
acceptance. The previous wording attempted to categorize both under the term "delivery", which led to conflicts over whether the
|
|
effects of asynchronous signal delivery applied to synchronous signal acceptance.</p>
|
|
|
|
<p>Signals generated for a process are delivered to only one thread. Thus, if more than one thread is eligible to receive a signal,
|
|
one has to be chosen. The choice of threads is left entirely up to the implementation both to allow the widest possible range of
|
|
conforming implementations and to give implementations the freedom to deliver the signal to the "easiest possible" thread should
|
|
there be differences in ease of delivery between different threads.</p>
|
|
|
|
<p>Note that should multiple delivery among cooperating threads be required by an application, this can be trivially constructed
|
|
out of the provided single-delivery semantics. The construction of a <i>sigwait_multiple</i>() function that accomplishes this goal
|
|
is presented with the rationale for <a href="../functions/sigwaitinfo.html"><i>sigwaitinfo</i>()</a>.</p>
|
|
|
|
<p>Implementations should deliver unblocked signals as soon after they are generated as possible. However, it is difficult for
|
|
POSIX.1 to make specific requirements about this, beyond those in <a href="../functions/kill.html"><i>kill</i>()</a> and <a href=
|
|
"../functions/sigprocmask.html"><i>sigprocmask</i>()</a>. Even on systems with prompt delivery, scheduling of higher priority
|
|
processes is always likely to cause delays.</p>
|
|
|
|
<p>In general, the interval between the generation and delivery of unblocked signals cannot be detected by an application. Thus,
|
|
references to pending signals generally apply to blocked, pending signals. An implementation registers a signal as pending on the
|
|
process when no thread has the signal unblocked and there are no threads blocked in a <a href=
|
|
"../functions/sigwait.html"><i>sigwait</i>()</a> function for that signal. Thereafter, the implementation delivers the signal to
|
|
the first thread that unblocks the signal or calls a <a href="../functions/sigwait.html"><i>sigwait</i>()</a> function on a signal
|
|
set containing this signal rather than choosing the recipient thread at the time the signal is sent.</p>
|
|
|
|
<p>In the 4.3 BSD system, signals that are blocked and set to SIG_IGN are discarded immediately upon generation. For a signal that
|
|
is ignored as its default action, if the action is SIG_DFL and the signal is blocked, a generated signal remains pending. In the
|
|
4.1 BSD system and in System V Release 3 (two other implementations that support a somewhat similar signal mechanism), all
|
|
ignored blocked signals remain pending if generated. Because it is not normally useful for an application to simultaneously ignore
|
|
and block the same signal, it was unnecessary for POSIX.1 to specify behavior that would invalidate any of the historical
|
|
implementations.</p>
|
|
|
|
<p>There is one case in some historical implementations where an unblocked, pending signal does not remain pending until it is
|
|
delivered. In the System V implementation of <a href="../functions/signal.html"><i>signal</i>()</a>, pending signals are
|
|
discarded when the action is set to SIG_DFL or a signal-catching routine (as well as to SIG_IGN). Except in the case of setting
|
|
SIGCHLD to SIG_DFL, implementations that do this do not conform completely to POSIX.1. Some earlier proposals for POSIX.1
|
|
explicitly stated this, but these statements were redundant due to the requirement that functions defined by POSIX.1 not change
|
|
attributes of processes defined by POSIX.1 except as explicitly stated.</p>
|
|
|
|
<p>POSIX.1 specifically states that the order in which multiple, simultaneously pending signals are delivered is unspecified. This
|
|
order has not been explicitly specified in historical implementations, but has remained quite consistent and been known to those
|
|
familiar with the implementations. Thus, there have been cases where applications (usually system utilities) have been written with
|
|
explicit or implicit dependencies on this order. Implementors and others porting existing applications may need to be aware of such
|
|
dependencies.</p>
|
|
|
|
<p>When there are multiple pending signals that are not blocked, implementations should arrange for the delivery of all signals at
|
|
once, if possible. Some implementations stack calls to all pending signal-catching routines, making it appear that each
|
|
signal-catcher was interrupted by the next signal. In this case, the implementation should ensure that this stacking of signals
|
|
does not violate the semantics of the signal masks established by <a href="../functions/sigaction.html"><i>sigaction</i>()</a>.
|
|
Other implementations process at most one signal when the operating system is entered, with remaining signals saved for later
|
|
delivery. Although this practice is widespread, this behavior is neither standardized nor endorsed. In either case, implementations
|
|
should attempt to deliver signals associated with the current state of the process (for example, SIGFPE) before other signals, if
|
|
possible.</p>
|
|
|
|
<p>In 4.2 BSD and 4.3 BSD, it is not permissible to ignore or explicitly block SIGCONT, because if blocking or ignoring this signal
|
|
prevented it from continuing a stopped process, such a process could never be continued (only killed by SIGKILL). However, 4.2 BSD
|
|
and 4.3 BSD do block SIGCONT during execution of its signal-catching function when it is caught, creating exactly this problem. A
|
|
proposal was considered to disallow catching SIGCONT in addition to ignoring and blocking it, but this limitation led to
|
|
objections. The consensus was to require that SIGCONT always continue a stopped process when generated. This removed the need to
|
|
disallow ignoring or explicit blocking of the signal; note that SIG_IGN and SIG_DFL are equivalent for SIGCONT.</p>
|
|
|
|
<h5><a name="tag_03_02_04_03"></a>Realtime Signal Generation and Delivery</h5>
|
|
|
|
<p>The Realtime Signals Extension option to POSIX.1 signal generation and delivery behavior is required for the following
|
|
reasons:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>The <b>sigevent</b> structure is used by other POSIX.1 functions that result in asynchronous event notifications to specify the
|
|
notification mechanism to use and other information needed by the notification mechanism. IEEE Std 1003.1-2001 defines
|
|
only three symbolic values for the notification mechanism. SIGEV_NONE is used to indicate that no notification is required when the
|
|
event occurs. This is useful for applications that use asynchronous I/O with polling for completion. SIGEV_SIGNAL indicates that a
|
|
signal is generated when the event occurs. SIGEV_NOTIFY provides for "callback functions" for asynchronous notifications done by
|
|
a function call within the context of a new thread. This provides a multi-threaded process a more natural means of notification
|
|
than signals. The primary difficulty with previous notification approaches has been to specify the environment of the notification
|
|
routine.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>One approach is to limit the notification routine to call only functions permitted in a signal handler. While the list of
|
|
permissible functions is clearly stated, this is overly restrictive.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A second approach is to define a new list of functions or classes of functions that are explicitly permitted or not permitted.
|
|
This would give a programmer more lists to deal with, which would be awkward.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The third approach is to define completely the environment for execution of the notification function. A clear definition of an
|
|
execution environment for notification is provided by executing the notification function in the environment of a newly created
|
|
thread.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Implementations may support additional notification mechanisms by defining new values for <i>sigev_notify</i>.</p>
|
|
|
|
<p>For a notification type of SIGEV_SIGNAL, the other members of the <b>sigevent</b> structure defined by
|
|
IEEE Std 1003.1-2001 specify the realtime signal-that is, the signal number and application-defined value that
|
|
differentiates between occurrences of signals with the same number-that will be generated when the event occurs. The structure is
|
|
defined in <a href="../basedefs/signal.h.html"><i><signal.h></i></a>, even though the structure is not directly used by any
|
|
of the signal functions, because it is part of the signals interface used by the POSIX.1b "client functions". When the client
|
|
functions include <a href="../basedefs/signal.h.html"><i><signal.h></i></a> to define the signal names, the <b>sigevent</b>
|
|
structure will also be defined.</p>
|
|
|
|
<p>An application-defined value passed to the signal handler is used to differentiate between different "events" instead of
|
|
requiring that the application use different signal numbers for several reasons:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Realtime applications potentially handle a very large number of different events. Requiring that implementations support a
|
|
correspondingly large number of distinct signal numbers will adversely impact the performance of signal delivery because the signal
|
|
masks to be manipulated on entry and exit to the handlers will become large.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Event notifications are prioritized by signal number (the rationale for this is explained in the following paragraphs) and the
|
|
use of different signal numbers to differentiate between the different event notifications overloads the signal number more than
|
|
has already been done. It also requires that the application writer make arbitrary assignments of priority to events that are
|
|
logically of equal priority.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>A union is defined for the application-defined value so that either an integer constant or a pointer can be portably passed to
|
|
the signal-catching function. On some architectures a pointer cannot be cast to an <b>int</b> and <i>vice versa</i>.</p>
|
|
|
|
<p>Use of a structure here with an explicit notification type discriminant rather than explicit parameters to realtime functions,
|
|
or embedded in other realtime structures, provides for future extensions to IEEE Std 1003.1-2001. Additional, perhaps
|
|
more efficient, notification mechanisms can be supported for existing realtime function interfaces, such as timers and asynchronous
|
|
I/O, by extending the <b>sigevent</b> structure appropriately. The existing realtime function interfaces will not have to be
|
|
modified to use any such new notification mechanism. The revised text concerning the SIGEV_SIGNAL value makes consistent the
|
|
semantics of the members of the <b>sigevent</b> structure, particularly in the definitions of <a href=
|
|
"../functions/lio_listio.html"><i>lio_listio</i>()</a> and <a href="../functions/aio_fsync.html"><i>aio_fsync</i>()</a>. For
|
|
uniformity, other revisions cause this specification to be referred to rather than inaccurately duplicated in the descriptions of
|
|
functions and structures using the <b>sigevent</b> structure. The revised wording does not relax the requirement that the signal
|
|
number be in the range SIGRTMIN to SIGRTMAX to guarantee queuing and passing of the application value, since that requirement is
|
|
still implied by the signal names.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>IEEE Std 1003.1-2001 is intentionally vague on whether "non-realtime" signal-generating mechanisms can result in a
|
|
<b>siginfo_t</b> being supplied to the handler on delivery. In one existing implementation, a <b>siginfo_t</b> is posted on signal
|
|
generation, even though the implementation does not support queuing of multiple occurrences of a signal. It is not the intent of
|
|
IEEE Std 1003.1-2001 to preclude this, independent of the mandate to define signals that do support queuing. Any
|
|
interpretation that appears to preclude this is a mistake in the reading or writing of the standard.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Signals handled by realtime signal handlers might be generated by functions or conditions that do not allow the specification of
|
|
an application-defined value and do not queue. IEEE Std 1003.1-2001 specifies the <i>si_code</i> member of the
|
|
<b>siginfo_t</b> structure used in existing practice and defines additional codes so that applications can detect whether an
|
|
application-defined value is present or not. The code SI_USER for <a href="../functions/kill.html"><i>kill</i>()</a>- generated
|
|
signals is adopted from existing practice.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <a href="../functions/sigaction.html"><i>sigaction</i>()</a> <i>sa_flags</i> value SA_SIGINFO tells the implementation that
|
|
the signal-catching function expects two additional arguments. When the flag is not set, a single argument, the signal number, is
|
|
passed as specified by IEEE Std 1003.1-2001. Although IEEE Std 1003.1-2001 does not explicitly allow the
|
|
<i>info</i> argument to the handler function to be NULL, this is existing practice. This provides for compatibility with programs
|
|
whose signal-catching functions are not prepared to accept the additional arguments. IEEE Std 1003.1-2001 is explicitly
|
|
unspecified as to whether signals actually queue when SA_SIGINFO is not set for a signal, as there appear to be no benefits to
|
|
applications in specifying one behavior or another. One existing implementation queues a <b>siginfo_t</b> on each signal
|
|
generation, unless the signal is already pending, in which case the implementation discards the new <b>siginfo_t</b>; that is, the
|
|
queue length is never greater than one. This implementation only examines SA_SIGINFO on signal delivery, discarding the queued
|
|
<b>siginfo_t</b> if its delivery was not requested.</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 specifies several new values for the <i>si_code</i> member of the <b>siginfo_t</b> structure. In
|
|
existing practice, a <i>si_code</i> value of less than or equal to zero indicates that the signal was generated by a process via
|
|
the <a href="../functions/kill.html"><i>kill</i>()</a> function. In existing practice, values of <i>si_code</i> that provide
|
|
additional information for implementation-generated signals, such as SIGFPE or SIGSEGV, are all positive. Thus, if implementations
|
|
define the new constants specified in IEEE Std 1003.1-2001 to be negative numbers, programs written to use existing
|
|
practice will not break. IEEE Std 1003.1-2001 chose not to attempt to specify existing practice values of <i>si_code</i>
|
|
other than SI_USER both because it was deemed beyond the scope of IEEE Std 1003.1-2001 and because many of the values in
|
|
existing practice appear to be platform and implementation-defined. But, IEEE Std 1003.1-2001 does specify that if an
|
|
implementation-for example, one that does not have existing practice in this area-chooses to define additional values for
|
|
<i>si_code</i>, these values have to be different from the values of the symbols specified by IEEE Std 1003.1-2001. This
|
|
will allow conforming applications to differentiate between signals generated by one of the POSIX.1b asynchronous events and those
|
|
generated by other implementation events in a manner compatible with existing practice.</p>
|
|
|
|
<p>The unique values of <i>si_code</i> for the POSIX.1b asynchronous events have implications for implementations of, for example,
|
|
asynchronous I/O or message passing in user space library code. Such an implementation will be required to provide a hidden
|
|
interface to the signal generation mechanism that allows the library to specify the standard values of <i>si_code</i>.</p>
|
|
|
|
<p>Existing practice also defines additional members of <b>siginfo_t</b>, such as the process ID and user ID of the sending process
|
|
for <a href="../functions/kill.html"><i>kill</i>()</a>- generated signals. These members were deemed not necessary to meet the
|
|
requirements of realtime applications and are not specified by IEEE Std 1003.1-2001. Neither are they precluded.</p>
|
|
|
|
<p>The third argument to the signal-catching function, <i>context</i>, is left undefined by IEEE Std 1003.1-2001, but is
|
|
specified in the interface because it matches existing practice for the SA_SIGINFO flag. It was considered undesirable to require a
|
|
separate implementation for SA_SIGINFO for POSIX conformance on implementations that already support the two additional
|
|
parameters.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The requirement to deliver lower numbered signals in the range SIGRTMIN to SIGRTMAX first, when multiple unblocked signals are
|
|
pending, results from several considerations:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>A method is required to prioritize event notifications. The signal number was chosen instead of, for instance, associating a
|
|
separate priority with each request, because an implementation has to check pending signals at various points and select one for
|
|
delivery when more than one is pending. Specifying a selection order is the minimal additional semantic that will achieve
|
|
prioritized delivery. If a separate priority were to be associated with queued signals, it would be necessary for an implementation
|
|
to search all non-empty, non-blocked signal queues and select from among them the pending signal with the highest priority. This
|
|
would significantly increase the cost of and decrease the determinism of signal delivery.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Given the specified selection of the lowest numeric unblocked pending signal, preemptive priority signal delivery can be
|
|
achieved using signal numbers and signal masks by ensuring that the <i>sa_mask</i> for each signal number blocks all signals with a
|
|
higher numeric value.</p>
|
|
|
|
<p>For realtime applications that want to use only the newly defined realtime signal numbers without interference from the standard
|
|
signals, this can be achieved by blocking all of the standard signals in the process signal mask and in the <i>sa_mask</i>
|
|
installed by the signal action for the realtime signal handlers.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>IEEE Std 1003.1-2001 explicitly leaves unspecified the ordering of signals outside of the range of realtime signals
|
|
and the ordering of signals within this range with respect to those outside the range. It was believed that this would unduly
|
|
constrain implementations or standards in the future definition of new signals.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_04_04"></a>Signal Actions</h5>
|
|
|
|
<p>Early proposals mentioned SIGCONT as a second exception to the rule that signals are not delivered to stopped processes until
|
|
continued. Because IEEE Std 1003.1-2001 now specifies that SIGCONT causes the stopped process to continue when it is
|
|
generated, delivery of SIGCONT is not prevented because a process is stopped, even without an explicit exception to this rule.</p>
|
|
|
|
<p>Ignoring a signal by setting the action to SIG_IGN (or SIG_DFL for signals whose default action is to ignore) is not the same as
|
|
installing a signal-catching function that simply returns. Invoking such a function will interrupt certain system functions that
|
|
block processes (for example, <a href="../functions/wait.html"><i>wait</i>()</a>, <a href=
|
|
"../functions/sigsuspend.html"><i>sigsuspend</i>()</a>, <a href="../functions/pause.html"><i>pause</i>()</a>, <a href=
|
|
"../functions/read.html"><i>read</i>()</a>, <a href="../functions/write.html"><i>write</i>()</a>) while ignoring a signal has no
|
|
such effect on the process.</p>
|
|
|
|
<p>Historical implementations discard pending signals when the action is set to SIG_IGN. However, they do not always do the same
|
|
when the action is set to SIG_DFL and the default action is to ignore the signal. IEEE Std 1003.1-2001 requires this for
|
|
the sake of consistency and also for completeness, since the only signal this applies to is SIGCHLD, and
|
|
IEEE Std 1003.1-2001 disallows setting its action to SIG_IGN.</p>
|
|
|
|
<p>Some implementations (System V, for example) assign different semantics for SIGCLD depending on whether the action is set
|
|
to SIG_IGN or SIG_DFL. Since POSIX.1 requires that the default action for SIGCHLD be to ignore the signal, applications should
|
|
always set the action to SIG_DFL in order to avoid SIGCHLD.</p>
|
|
|
|
<p>Whether or not an implementation allows SIG_IGN as a SIGCHLD disposition to be inherited across a call to one of the <i>exec</i>
|
|
family of functions or <a href="../functions/posix_spawn.html"><i>posix_spawn</i>()</a> is explicitly left as unspecified. This
|
|
change was made as a result of IEEE PASC Interpretation 1003.1 #132, and permits the implementation to decide between the following
|
|
alternatives:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Unconditionally leave SIGCHLD set to SIG_IGN, in which case the implementation would not allow applications that assume
|
|
inheritance of SIG_DFL to conform to IEEE Std 1003.1-2001 without change. The implementation would, however, retain an
|
|
ability to control applications that create child processes but never call on the <i>wait</i> family of functions, potentially
|
|
filling up the process table.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Unconditionally reset SIGCHLD to SIG_DFL, in which case the implementation would allow applications that assume inheritance of
|
|
SIG_DFL to conform. The implementation would, however, lose an ability to control applications that spawn child processes but never
|
|
reap them.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Provide some mechanism, not specified in IEEE Std 1003.1-2001, to control inherited SIGCHLD dispositions.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Some implementations (System V, for example) will deliver a SIGCLD signal immediately when a process establishes a
|
|
signal-catching function for SIGCLD when that process has a child that has already terminated. Other implementations, such as 4.3
|
|
BSD, do not generate a new SIGCHLD signal in this way. In general, a process should not attempt to alter the signal action for the
|
|
SIGCHLD signal while it has any outstanding children. However, it is not always possible for a process to avoid this; for example,
|
|
shells sometimes start up processes in pipelines with other processes from the pipeline as children. Processes that cannot ensure
|
|
that they have no children when altering the signal action for SIGCHLD thus need to be prepared for, but not depend on, generation
|
|
of an immediate SIGCHLD signal.</p>
|
|
|
|
<p>The default action of the stop signals (SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU) is to stop a process that is executing. If a stop
|
|
signal is delivered to a process that is already stopped, it has no effect. In fact, if a stop signal is generated for a stopped
|
|
process whose signal mask blocks the signal, the signal will never be delivered to the process since the process must receive a
|
|
SIGCONT, which discards all pending stop signals, in order to continue executing.</p>
|
|
|
|
<p>The SIGCONT signal continues a stopped process even if SIGCONT is blocked (or ignored). However, if a signal-catching routine
|
|
has been established for SIGCONT, it will not be entered until SIGCONT is unblocked.</p>
|
|
|
|
<p>If a process in an orphaned process group stops, it is no longer under the control of a job control shell and hence would not
|
|
normally ever be continued. Because of this, orphaned processes that receive terminal-related stop signals (SIGTSTP, SIGTTIN,
|
|
SIGTTOU, but not SIGSTOP) must not be allowed to stop. The goal is to prevent stopped processes from languishing forever. (As
|
|
SIGSTOP is sent only via <a href="../functions/kill.html"><i>kill</i>()</a>, it is assumed that the process or user sending a
|
|
SIGSTOP can send a SIGCONT when desired.) Instead, the system must discard the stop signal. As an extension, it may also deliver
|
|
another signal in its place. 4.3 BSD sends a SIGKILL, which is overly effective because SIGKILL is not catchable. Another possible
|
|
choice is SIGHUP. 4.3 BSD also does this for orphaned processes (processes whose parent has terminated) rather than for members of
|
|
orphaned process groups; this is less desirable because job control shells manage process groups. POSIX.1 also prevents SIGTTIN and
|
|
SIGTTOU signals from being generated for processes in orphaned process groups as a direct result of activity on a terminal,
|
|
preventing infinite loops when <a href="../functions/read.html"><i>read</i>()</a> and <a href=
|
|
"../functions/write.html"><i>write</i>()</a> calls generate signals that are discarded; see <a href=
|
|
"xbd_chap11.html#tag_01_11_01_04"><i>Terminal Access Control</i></a> . A similar restriction on the generation of SIGTSTP was
|
|
considered, but that would be unnecessary and more difficult to implement due to its asynchronous nature.</p>
|
|
|
|
<p>Although POSIX.1 requires that signal-catching functions be called with only one argument, there is nothing to prevent
|
|
conforming implementations from extending POSIX.1 to pass additional arguments, as long as Strictly Conforming POSIX.1 Applications
|
|
continue to compile and execute correctly. Most historical implementations do, in fact, pass additional, signal-specific arguments
|
|
to certain signal-catching routines.</p>
|
|
|
|
<p>There was a proposal to change the declared type of the signal handler to:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>void</tt> <i>func</i> <tt>(int</tt> <i>sig</i><tt>, ...);
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The usage of ellipses ( <tt>"..."</tt> ) is ISO C standard syntax to indicate a variable number of arguments. Its use was
|
|
intended to allow the implementation to pass additional information to the signal handler in a standard manner.</p>
|
|
|
|
<p>Unfortunately, this construct would require all signal handlers to be defined with this syntax because the ISO C standard
|
|
allows implementations to use a different parameter passing mechanism for variable parameter lists than for non-variable parameter
|
|
lists. Thus, all existing signal handlers in all existing applications would have to be changed to use the variable syntax in order
|
|
to be standard and portable. This is in conflict with the goal of Minimal Changes to Existing Application Code.</p>
|
|
|
|
<p>When terminating a process from a signal-catching function, processes should be aware of any interpretation that their parent
|
|
may make of the status returned by <a href="../functions/wait.html"><i>wait</i>()</a> or <a href=
|
|
"../functions/waitpid.html"><i>waitpid</i>()</a>. In particular, a signal-catching function should not call <i>exit</i>(0) or
|
|
<i>_exit</i>(0) unless it wants to indicate successful termination. A non-zero argument to <a href=
|
|
"../functions/exit.html"><i>exit</i>()</a> or <a href="../functions/_exit.html"><i>_exit</i>()</a> can be used to indicate
|
|
unsuccessful termination. Alternatively, the process can use <a href="../functions/kill.html"><i>kill</i>()</a> to send itself a
|
|
fatal signal (first ensuring that the signal is set to the default action and not blocked). See also the RATIONALE section of the
|
|
<a href="../functions/_exit.html"><i>_exit</i>()</a> function.</p>
|
|
|
|
<p>The behavior of <i>unsafe</i> functions, as defined by this section, is undefined when they are invoked from signal-catching
|
|
functions in certain circumstances. The behavior of reentrant functions, as defined by this section, is as specified by POSIX.1,
|
|
regardless of invocation from a signal-catching function. This is the only intended meaning of the statement that reentrant
|
|
functions may be used in signal-catching functions without restriction. Applications must still consider all effects of such
|
|
functions on such things as data structures, files, and process state. In particular, application writers need to consider the
|
|
restrictions on interactions when interrupting <a href="../functions/sleep.html"><i>sleep</i>()</a> (see <a href=
|
|
"../functions/sleep.html"><i>sleep</i>()</a>) and interactions among multiple handles for a file description. The fact that any
|
|
specific function is listed as reentrant does not necessarily mean that invocation of that function from a signal-catching function
|
|
is recommended.</p>
|
|
|
|
<p>In order to prevent errors arising from interrupting non-reentrant function calls, applications should protect calls to these
|
|
functions either by blocking the appropriate signals or through the use of some programmatic semaphore. POSIX.1 does not address
|
|
the more general problem of synchronizing access to shared data structures. Note in particular that even the "safe" functions may
|
|
modify the global variable <i>errno</i>; the signal-catching function may want to save and restore its value. The same principles
|
|
apply to the reentrancy of application routines and asynchronous data access.</p>
|
|
|
|
<p>Note that <a href="../functions/longjmp.html"><i>longjmp</i>()</a> and <a href=
|
|
"../functions/siglongjmp.html"><i>siglongjmp</i>()</a> are not in the list of reentrant functions. This is because the code
|
|
executing after <a href="../functions/longjmp.html"><i>longjmp</i>()</a> or <a href=
|
|
"../functions/siglongjmp.html"><i>siglongjmp</i>()</a> can call any unsafe functions with the same danger as calling those unsafe
|
|
functions directly from the signal handler. Applications that use <a href="../functions/longjmp.html"><i>longjmp</i>()</a> or <a
|
|
href="../functions/siglongjmp.html"><i>siglongjmp</i>()</a> out of signal handlers require rigorous protection in order to be
|
|
portable. Many of the other functions that are excluded from the list are traditionally implemented using either the C language <a
|
|
href="../functions/malloc.html"><i>malloc</i>()</a> or <a href="../functions/free.html"><i>free</i>()</a> functions or the
|
|
ISO C standard I/O library, both of which traditionally use data structures in a non-reentrant manner. Because any combination
|
|
of different functions using a common data structure can cause reentrancy problems, POSIX.1 does not define the behavior when any
|
|
unsafe function is called in a signal handler that interrupts any unsafe function.</p>
|
|
|
|
<p>The only realtime extension to signal actions is the addition of the additional parameters to the signal-catching function. This
|
|
extension has been explained and motivated in the previous section. In making this extension, though, developers of POSIX.1b ran
|
|
into issues relating to function prototypes. In response to input from the POSIX.1 standard developers, members were added to the
|
|
<b>sigaction</b> structure to specify function prototypes for the newer signal-catching function specified by POSIX.1b. These
|
|
members follow changes that are being made to POSIX.1. Note that IEEE Std 1003.1-2001 explicitly states that these fields
|
|
may overlap so that a union can be defined. This enabled existing implementations of POSIX.1 to maintain binary-compatibility when
|
|
these extensions were added.</p>
|
|
|
|
<p>The <b>siginfo_t</b> structure was adopted for passing the application-defined value to match existing practice, but the
|
|
existing practice has no provision for an application-defined value, so this was added. Note that POSIX normally reserves the
|
|
"_t" type designation for opaque types. The <b>siginfo_t</b> structure breaks with this convention to follow existing practice
|
|
and thus promote portability. Standardization of the existing practice for the other members of this structure may be addressed in
|
|
the future.</p>
|
|
|
|
<p>Although it is not explicitly visible to applications, there are additional semantics for signal actions implied by queued
|
|
signals and their interaction with other POSIX.1b realtime functions. Specifically:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>It is not necessary to queue signals whose action is SIG_IGN.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>For implementations that support POSIX.1b timers, some interaction with the timer functions at signal delivery is implied to
|
|
manage the timer overrun count.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_04_05"></a>Signal Effects on Other Functions</h5>
|
|
|
|
<p>The most common behavior of an interrupted function after a signal-catching function returns is for the interrupted function to
|
|
give an [EINTR] error unless the SA_RESTART flag is in effect for the signal. However, there are a number of specific exceptions,
|
|
including <a href="../functions/sleep.html"><i>sleep</i>()</a> and certain situations with <a href=
|
|
"../functions/read.html"><i>read</i>()</a> and <a href="../functions/write.html"><i>write</i>()</a>.</p>
|
|
|
|
<p>The historical implementations of many functions defined by IEEE Std 1003.1-2001 are not interruptible, but delay
|
|
delivery of signals generated during their execution until after they complete. This is never a problem for functions that are
|
|
guaranteed to complete in a short (imperceptible to a human) period of time. It is normally those functions that can suspend a
|
|
process indefinitely or for long periods of time (for example, <a href="../functions/wait.html"><i>wait</i>()</a>, <a href=
|
|
"../functions/pause.html"><i>pause</i>()</a>, <a href="../functions/sigsuspend.html"><i>sigsuspend</i>()</a>, <a href=
|
|
"../functions/sleep.html"><i>sleep</i>()</a>, or <a href="../functions/read.html"><i>read</i>()</a>/ <a href=
|
|
"../functions/write.html"><i>write</i>()</a> on a slow device like a terminal) that are interruptible. This permits applications to
|
|
respond to interactive signals or to set timeouts on calls to most such functions with <a href=
|
|
"../functions/alarm.html"><i>alarm</i>()</a>. Therefore, implementations should generally make such functions (including ones
|
|
defined as extensions) interruptible.</p>
|
|
|
|
<p>Functions not mentioned explicitly as interruptible may be so on some implementations, possibly as an extension where the
|
|
function gives an [EINTR] error. There are several functions (for example, <a href="../functions/getpid.html"><i>getpid</i>()</a>,
|
|
<a href="../functions/getuid.html"><i>getuid</i>()</a>) that are specified as never returning an error, which can thus never be
|
|
extended in this way.</p>
|
|
|
|
<p>If a signal-catching function returns while the SA_RESTART flag is in effect, an interrupted function is restarted at the point
|
|
it was interrupted. Conforming applications cannot make assumptions about the internal behavior of interrupted functions, even if
|
|
the functions are async-signal-safe. For example, suppose the <a href="../functions/read.html"><i>read</i>()</a> function is
|
|
interrupted with SA_RESTART in effect, the signal-catching function closes the file descriptor being read from and returns, and the
|
|
<a href="../functions/read.html"><i>read</i>()</a> function is then restarted; in this case the application cannot assume that the
|
|
<a href="../functions/read.html"><i>read</i>()</a> function will give an [EBADF] error, since <a href=
|
|
"../functions/read.html"><i>read</i>()</a> might have checked the file descriptor for validity before being interrupted.</p>
|
|
|
|
<h4><a name="tag_03_02_05"></a>Standard I/O Streams</h4>
|
|
|
|
<h5><a name="tag_03_02_05_01"></a>Interaction of File Descriptors and Standard I/O Streams</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_05_02"></a>Stream Orientation and Encoding Rules</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h4><a name="tag_03_02_06"></a>STREAMS</h4>
|
|
|
|
<p>STREAMS are introduced into IEEE Std 1003.1-2001 as part of the alignment with the Single UNIX Specification, but
|
|
marked as an option in recognition that not all systems may wish to implement the facility. The option within
|
|
IEEE Std 1003.1-2001 is denoted by the XSR margin marker. The standard developers made this option independent of the XSI
|
|
option.</p>
|
|
|
|
<p>STREAMS are a method of implementing network services and other character-based input/output mechanisms, with the STREAM being a
|
|
full-duplex connection between a process and a device. STREAMS provides direct access to protocol modules, and optional protocol
|
|
modules can be interposed between the process-end of the STREAM and the device-driver at the device-end of the STREAM. Pipes can be
|
|
implemented using the STREAMS mechanism, so they can provide process-to-process as well as process-to-device communications.</p>
|
|
|
|
<p>This section introduces STREAMS I/O, the message types used to control them, an overview of the priority mechanism, and the
|
|
interfaces used to access them.</p>
|
|
|
|
<h5><a name="tag_03_02_06_01"></a>Accessing STREAMS</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h4><a name="tag_03_02_07"></a>XSI Interprocess Communication</h4>
|
|
|
|
<p>There are two forms of IPC supported as options in IEEE Std 1003.1-2001. The traditional System V IPC routines
|
|
derived from the SVID-that is, the <i>msg*</i>(),
|
|
<i>sem*</i>(), and <i>shm*</i>() interfaces-are mandatory on
|
|
XSI-conformant systems. Thus, all XSI-conformant systems provide the same mechanisms for manipulating messages, shared memory, and
|
|
semaphores.</p>
|
|
|
|
<p>In addition, the POSIX Realtime Extension provides an alternate set of routines for those systems supporting the appropriate
|
|
options.</p>
|
|
|
|
<p>The application writer is presented with a choice: the System V interfaces or the POSIX interfaces (loosely derived from
|
|
the Berkeley interfaces). The XSI profile prefers the System V interfaces, but the POSIX interfaces may be more suitable for
|
|
realtime or other performance-sensitive applications.</p>
|
|
|
|
<h5><a name="tag_03_02_07_01"></a>IPC General Information</h5>
|
|
|
|
<p>General information that is shared by all three mechanisms is described in this section. The common permissions mechanism is
|
|
briefly introduced, describing the mode bits, and how they are used to determine whether or not a process has access to read or
|
|
write/alter the appropriate instance of one of the IPC mechanisms. All other relevant information is contained in the reference
|
|
pages themselves.</p>
|
|
|
|
<p>The semaphore type of IPC allows processes to communicate through the exchange of semaphore values. A semaphore is a positive
|
|
integer. Since many applications require the use of more than one semaphore, XSI-conformant systems have the ability to create sets
|
|
or arrays of semaphores.</p>
|
|
|
|
<p>Calls to support semaphores include:</p>
|
|
|
|
<blockquote><a href="../functions/semctl.html"><i>semctl</i>()</a>, <a href="../functions/semget.html"><i>semget</i>()</a>, <a
|
|
href="../functions/semop.html"><i>semop</i>()</a></blockquote>
|
|
|
|
<p>Semaphore sets are created by using the <a href="../functions/semget.html"><i>semget</i>()</a> function.</p>
|
|
|
|
<p>The message type of IPC allows processes to communicate through the exchange of data stored in buffers. This data is transmitted
|
|
between processes in discrete portions known as messages.</p>
|
|
|
|
<p>Calls to support message queues include:</p>
|
|
|
|
<blockquote><a href="../functions/msgctl.html"><i>msgctl</i>()</a>, <a href="../functions/msgget.html"><i>msgget</i>()</a>, <a
|
|
href="../functions/msgrcv.html"><i>msgrcv</i>()</a>, <a href="../functions/msgsnd.html"><i>msgsnd</i>()</a></blockquote>
|
|
|
|
<p>The shared memory type of IPC allows two or more processes to share memory and consequently the data contained therein. This is
|
|
done by allowing processes to set up access to a common memory address space. This sharing of memory provides a fast means of
|
|
exchange of data between processes.</p>
|
|
|
|
<p>Calls to support shared memory include:</p>
|
|
|
|
<blockquote><a href="../functions/shmctl.html"><i>shmctl</i>()</a>, <a href="../functions/shmdt.html"><i>shmdt</i>()</a>, <a href=
|
|
"../functions/shmget.html"><i>shmget</i>()</a></blockquote>
|
|
|
|
<p>The <a href="../functions/ftok.html"><i>ftok</i>()</a> interface is also provided.</p>
|
|
|
|
<h4><a name="tag_03_02_08"></a>Realtime</h4>
|
|
|
|
<h5><a name="tag_03_02_08_01"></a>Advisory Information</h5>
|
|
|
|
<p>POSIX.1b contains an Informative Annex with proposed interfaces for "realtime files". These interfaces could determine groups
|
|
of the exact parameters required to do "direct I/O" or "extents". These interfaces were objected to by a significant portion of
|
|
the balloting group as too complex. A conforming application had little chance of correctly navigating the large parameter space to
|
|
match its desires to the system. In addition, they only applied to a new type of file (realtime files) and they told the
|
|
implementation exactly what to do as opposed to advising the implementation on application behavior and letting it optimize for the
|
|
system the (portable) application was running on. For example, it was not clear how a system that had a disk array should set its
|
|
parameters.</p>
|
|
|
|
<p>There seemed to be several overall goals:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Optimizing sequential access</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Optimizing caching behavior</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Optimizing I/O data transfer</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Preallocation</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>The advisory interfaces, <a href="../functions/posix_fadvise.html"><i>posix_fadvise</i>()</a> and <a href=
|
|
"../functions/posix_madvise.html"><i>posix_madvise</i>()</a>, satisfy the first two goals. The POSIX_FADV_SEQUENTIAL and
|
|
POSIX_MADV_SEQUENTIAL advice tells the implementation to expect serial access. Typically the system will prefetch the next several
|
|
serial accesses in order to overlap I/O. It may also free previously accessed serial data if memory is tight. If the application is
|
|
not doing serial access it can use POSIX_FADV_WILLNEED and POSIX_MADV_WILLNEED to accomplish I/O overlap, as required. When the
|
|
application advises POSIX_FADV_RANDOM or POSIX_MADV_RANDOM behavior, the implementation usually tries to fetch a minimum amount of
|
|
data with each request and it does not expect much locality. POSIX_FADV_DONTNEED and POSIX_MADV_DONTNEED allow the system to free
|
|
up caching resources as the data will not be required in the near future.</p>
|
|
|
|
<p>POSIX_FADV_NOREUSE tells the system that caching the specified data is not optimal. For file I/O, the transfer should go
|
|
directly to the user buffer instead of being cached internally by the implementation. To portably perform direct disk I/O on all
|
|
systems, the application must perform its I/O transfers according to the following rules:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>The user buffer should be aligned according to the {POSIX_REC_XFER_ALIGN} <a href=
|
|
"../functions/pathconf.html"><i>pathconf</i>()</a> variable.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The number of bytes transferred in an I/O operation should be a multiple of the {POSIX_ALLOC_SIZE_MIN} <a href=
|
|
"../functions/pathconf.html"><i>pathconf</i>()</a> variable.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The offset into the file at the start of an I/O operation should be a multiple of the {POSIX_ALLOC_SIZE_MIN} <a href=
|
|
"../functions/pathconf.html"><i>pathconf</i>()</a> variable.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The application should ensure that all threads which open a given file specify POSIX_FADV_NOREUSE to be sure that there is no
|
|
unexpected interaction between threads using buffered I/O and threads using direct I/O to the same file.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>In some cases, a user buffer must be properly aligned in order to be transferred directly to/from the device. The
|
|
{POSIX_REC_XFER_ALIGN} <a href="../functions/pathconf.html"><i>pathconf</i>()</a> variable tells the application the proper
|
|
alignment.</p>
|
|
|
|
<p>The preallocation goal is met by the space control function, <a href=
|
|
"../functions/posix_fallocate.html"><i>posix_fallocate</i>()</a>. The application can use <a href=
|
|
"../functions/posix_fallocate.html"><i>posix_fallocate</i>()</a> to guarantee no [ENOSPC] errors and to improve performance by
|
|
prepaying any overhead required for block allocation.</p>
|
|
|
|
<p>Implementations may use information conveyed by a previous <a href="../functions/posix_fadvise.html"><i>posix_fadvise</i>()</a>
|
|
call to influence the manner in which allocation is performed. For example, if an application did the following calls:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>fd = open("file");
|
|
posix_fadvise(fd, offset, len, POSIX_FADV_SEQUENTIAL);
|
|
posix_fallocate(fd, len, size);
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>an implementation might allocate the file contiguously on disk.</p>
|
|
|
|
<p>Finally, the <a href="../functions/pathconf.html"><i>pathconf</i>()</a> variables {POSIX_REC_MIN_XFER_SIZE},
|
|
{POSIX_REC_MAX_XFER_SIZE}, and {POSIX_REC_INCR_XFER_SIZE} tell the application a range of transfer sizes that are recommended for
|
|
best I/O performance.</p>
|
|
|
|
<p>Where bounded response time is required, the vendor can supply the appropriate settings of the advisories to achieve a
|
|
guaranteed performance level.</p>
|
|
|
|
<p>The interfaces meet the goals while allowing applications using regular files to take advantage of performance optimizations.
|
|
The interfaces tell the implementation expected application behavior which the implementation can use to optimize performance on a
|
|
particular system with a particular dynamic load.</p>
|
|
|
|
<p>The <a href="../functions/posix_memalign.html"><i>posix_memalign</i>()</a> function was added to allow for the allocation of
|
|
specifically aligned buffers; for example, for {POSIX_REC_XFER_ALIGN}.</p>
|
|
|
|
<p>The working group also considered the alternative of adding a function which would return an aligned pointer to memory within a
|
|
user-supplied buffer. This was not considered to be the best method, because it potentially wastes large amounts of memory when
|
|
buffers need to be aligned on large alignment boundaries.</p>
|
|
|
|
<h5><a name="tag_03_02_08_02"></a>Message Passing</h5>
|
|
|
|
<p>This section provides the rationale for the definition of the message passing interface in IEEE Std 1003.1-2001. This
|
|
is presented in terms of the objectives, models, and requirements imposed upon this interface.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Objectives</p>
|
|
|
|
<p>Many applications, including both realtime and database applications, require a means of passing arbitrary amounts of data
|
|
between cooperating processes comprising the overall application on one or more processors. Many conventional interfaces for
|
|
interprocess communication are insufficient for realtime applications in that efficient and deterministic data passing methods
|
|
cannot be implemented. This has prompted the definition of message passing interfaces providing these facilities:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Open a message queue.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Send a message to a message queue.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Receive a message from a queue, either synchronously or asynchronously.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Alter message queue attributes for flow and resource control.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>It is assumed that an application may consist of multiple cooperating processes and that these processes may wish to communicate
|
|
and coordinate their activities. The message passing facility described in IEEE Std 1003.1-2001 allows processes to
|
|
communicate through system-wide queues. These message queues are accessed through names that may be pathnames. A message queue can
|
|
be opened for use by multiple sending and/or multiple receiving processes.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Background on Embedded Applications</p>
|
|
|
|
<p>Interprocess communication utilizing message passing is a key facility for the construction of deterministic, high-performance
|
|
realtime applications. The facility is present in all realtime systems and is the framework upon which the application is
|
|
constructed. The performance of the facility is usually a direct indication of the performance of the resulting application.</p>
|
|
|
|
<p>Realtime applications, especially for embedded systems, are typically designed around the performance constraints imposed by the
|
|
message passing mechanisms. Applications for embedded systems are typically very tightly constrained. Application writers expect to
|
|
design and control the entire system. In order to minimize system costs, the writer will attempt to use all resources to their
|
|
utmost and minimize the requirement to add additional memory or processors.</p>
|
|
|
|
<p>The embedded applications usually share address spaces and only a simple message passing mechanism is required. The application
|
|
can readily access common data incurring only mutual-exclusion overheads. The models desired are the simplest possible with the
|
|
application building higher-level facilities only when needed.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Requirements</p>
|
|
|
|
<p>The following requirements determined the features of the message passing facilities defined in
|
|
IEEE Std 1003.1-2001:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Naming of Message Queues</p>
|
|
|
|
<p>The mechanism for gaining access to a message queue is a pathname evaluated in a context that is allowed to be a file system
|
|
name space, or it can be independent of any file system. This is a specific attempt to allow implementations based on either method
|
|
in order to address both embedded systems and to also allow implementation in larger systems.</p>
|
|
|
|
<p>The interface of <a href="../functions/mq_open.html"><i>mq_open</i>()</a> is defined to allow but not require the access control
|
|
and name conflicts resulting from utilizing a file system for name resolution. All required behavior is specified for the access
|
|
control case. Yet a conforming implementation, such as an embedded system kernel, may define that there are no distinctions between
|
|
users and may define that all processes have all access privileges.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Embedded System Naming</p>
|
|
|
|
<p>Embedded systems need to be able to utilize independent name spaces for accessing the various system objects. They typically do
|
|
not have a file system, precluding its utilization as a common name resolution mechanism. The modularity of an embedded system
|
|
limits the connections between separate mechanisms that can be allowed.</p>
|
|
|
|
<p>Embedded systems typically do not have any access protection. Since the system does not support the mixing of applications from
|
|
different areas, and usually does not even have the concept of an authorization entity, access control is not useful.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Large System Naming</p>
|
|
|
|
<p>On systems with more functionality, the name resolution must support the ability to use the file system as the name resolution
|
|
mechanism/object storage medium and to have control over access to the objects. Utilizing the pathname space can result in further
|
|
errors when the names conflict with other objects.<br>
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Fixed Size of Messages</p>
|
|
|
|
<p>The interfaces impose a fixed upper bound on the size of messages that can be sent to a specific message queue. The size is set
|
|
on an individual queue basis and cannot be changed dynamically.</p>
|
|
|
|
<p>The purpose of the fixed size is to increase the ability of the system to optimize the implementation of <a href=
|
|
"../functions/mq_send.html"><i>mq_send</i>()</a> and <a href="../functions/mq_receive.html"><i>mq_receive</i>()</a>. With fixed
|
|
sizes of messages and fixed numbers of messages, specific message blocks can be pre-allocated. This eliminates a significant amount
|
|
of checking for errors and boundary conditions. Additionally, an implementation can optimize data copying to maximize performance.
|
|
Finally, with a restricted range of message sizes, an implementation is better able to provide deterministic operations.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Prioritization of Messages</p>
|
|
|
|
<p>Message prioritization allows the application to determine the order in which messages are received. Prioritization of messages
|
|
is a key facility that is provided by most realtime kernels and is heavily utilized by the applications. The major purpose of
|
|
having priorities in message queues is to avoid priority inversions in the message system, where a high-priority message is delayed
|
|
behind one or more lower-priority messages. This allows the applications to be designed so that they do not need to be interrupted
|
|
in order to change the flow of control when exceptional conditions occur. The prioritization does add additional overhead to the
|
|
message operations in those cases it is actually used but a clever implementation can optimize for the FIFO case to make that more
|
|
efficient.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Asynchronous Notification</p>
|
|
|
|
<p>The interface supports the ability to have a task asynchronously notified of the availability of a message on the queue. The
|
|
purpose of this facility is to allow the task to perform other functions and yet still be notified that a message has become
|
|
available on the queue.</p>
|
|
|
|
<p>To understand the requirement for this function, it is useful to understand two models of application design: a single task
|
|
performing multiple functions and multiple tasks performing a single function. Each of these models has advantages.</p>
|
|
|
|
<p>Asynchronous notification is required to build the model of a single task performing multiple operations. This model typically
|
|
results from either the expectation that interruption is less expensive than utilizing a separate task or from the growth of the
|
|
application to include additional functions.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_03"></a>Semaphores</h5>
|
|
|
|
<p>Semaphores are a high-performance process synchronization mechanism. Semaphores are named by null-terminated strings of
|
|
characters.</p>
|
|
|
|
<p>A semaphore is created using the <a href="../functions/sem_init.html"><i>sem_init</i>()</a> function or the <a href=
|
|
"../functions/sem_open.html"><i>sem_open</i>()</a> function with the O_CREAT flag set in <i>oflag</i>.</p>
|
|
|
|
<p>To use a semaphore, a process has to first initialize the semaphore or inherit an open descriptor for the semaphore via <a href=
|
|
"../functions/fork.html"><i>fork</i>()</a>.</p>
|
|
|
|
<p>A semaphore preserves its state when the last reference is closed. For example, if a semaphore has a value of 13 when the last
|
|
reference is closed, it will have a value of 13 when it is next opened.</p>
|
|
|
|
<p>When a semaphore is created, an initial state for the semaphore has to be provided. This value is a non-negative integer.
|
|
Negative values are not possible since they indicate the presence of blocked processes. The persistence of any of these objects
|
|
across a system crash or a system reboot is undefined. Conforming applications must not depend on any sort of persistence across a
|
|
system reboot or a system crash.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Models and Requirements</p>
|
|
|
|
<p>A realtime system requires synchronization and communication between the processes comprising the overall application. An
|
|
efficient and reliable synchronization mechanism has to be provided in a realtime system that will allow more than one schedulable
|
|
process mutually-exclusive access to the same resource. This synchronization mechanism has to allow for the optimal implementation
|
|
of synchronization or systems implementors will define other, more cost-effective methods.</p>
|
|
|
|
<p>At issue are the methods whereby multiple processes (tasks) can be designed and implemented to work together in order to perform
|
|
a single function. This requires interprocess communication and synchronization. A semaphore mechanism is the lowest level of
|
|
synchronization that can be provided by an operating system.</p>
|
|
|
|
<p>A semaphore is defined as an object that has an integral value and a set of blocked processes associated with it. If the value
|
|
is positive or zero, then the set of blocked processes is empty; otherwise, the size of the set is equal to the absolute value of
|
|
the semaphore value. The value of the semaphore can be incremented or decremented by any process with access to the semaphore and
|
|
must be done as an indivisible operation. When a semaphore value is less than or equal to zero, any process that attempts to lock
|
|
it again will block or be informed that it is not possible to perform the operation.</p>
|
|
|
|
<p>A semaphore may be used to guard access to any resource accessible by more than one schedulable task in the system. It is a
|
|
global entity and not associated with any particular process. As such, a method of obtaining access to the semaphore has to be
|
|
provided by the operating system. A process that wants access to a critical resource (section) has to wait on the semaphore that
|
|
guards that resource. When the semaphore is locked on behalf of a process, it knows that it can utilize the resource without
|
|
interference by any other cooperating process in the system. When the process finishes its operation on the resource, leaving it in
|
|
a well-defined state, it posts the semaphore, indicating that some other process may now obtain the resource associated with that
|
|
semaphore.</p>
|
|
|
|
<p>In this section, mutexes and condition variables are specified as the synchronization mechanisms between threads.</p>
|
|
|
|
<p>These primitives are typically used for synchronizing threads that share memory in a single process. However, this section
|
|
provides an option allowing the use of these synchronization interfaces and objects between processes that share memory, regardless
|
|
of the method for sharing memory.</p>
|
|
|
|
<p>Much experience with semaphores shows that there are two distinct uses of synchronization: locking, which is typically of short
|
|
duration; and waiting, which is typically of long or unbounded duration. These distinct usages map directly onto mutexes and
|
|
condition variables, respectively.</p>
|
|
|
|
<p>Semaphores are provided in IEEE Std 1003.1-2001 primarily to provide a means of synchronization for processes; these
|
|
processes may or may not share memory. Mutexes and condition variables are specified as synchronization mechanisms between threads;
|
|
these threads always share (some) memory. Both are synchronization paradigms that have been in widespread use for a number of
|
|
years. Each set of primitives is particularly well matched to certain problems.</p>
|
|
|
|
<p>With respect to binary semaphores, experience has shown that condition variables and mutexes are easier to use for many
|
|
synchronization problems than binary semaphores. The primary reason for this is the explicit appearance of a Boolean predicate that
|
|
specifies when the condition wait is satisfied. This Boolean predicate terminates a loop, including the call to <a href=
|
|
"../functions/pthread_cond_wait.html"><i>pthread_cond_wait</i>()</a>. As a result, extra wakeups are benign since the predicate
|
|
governs whether the thread will actually proceed past the condition wait. With stateful primitives, such as binary semaphores, the
|
|
wakeup in itself typically means that the wait is satisfied. The burden of ensuring correctness for such waits is thus placed on
|
|
<i>all</i> signalers of the semaphore rather than on an <i>explicitly coded</i> Boolean predicate located at the condition wait.
|
|
Experience has shown that the latter creates a major improvement in safety and ease-of-use.</p>
|
|
|
|
<p>Counting semaphores are well matched to dealing with producer/consumer problems, including those that might exist between
|
|
threads of different processes, or between a signal handler and a thread. In the former case, there may be little or no memory
|
|
shared by the processes; in the latter case, one is not communicating between co-equal threads, but between a thread and an
|
|
interrupt-like entity. It is for these reasons that IEEE Std 1003.1-2001 allows semaphores to be used by threads.</p>
|
|
|
|
<p>Mutexes and condition variables have been effectively used with and without priority inheritance, priority ceiling, and other
|
|
attributes to synchronize threads that share memory. The efficiency of their implementation is comparable to or better than that of
|
|
other synchronization primitives that are sometimes harder to use (for example, binary semaphores). Furthermore, there is at least
|
|
one known implementation of Ada tasking that uses these primitives. Mutexes and condition variables together constitute an
|
|
appropriate, sufficient, and complete set of inter-thread synchronization primitives.</p>
|
|
|
|
<p>Efficient multi-threaded applications require high-performance synchronization primitives. Considerations of efficiency and
|
|
generality require a small set of primitives upon which more sophisticated synchronization functions can be built.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Standardization Issues</p>
|
|
|
|
<p>It is possible to implement very high-performance semaphores using test-and-set instructions on shared memory locations. The
|
|
library routines that implement such a high-performance interface have to properly ensure that a <a href=
|
|
"../functions/sem_wait.html"><i>sem_wait</i>()</a> or <a href="../functions/sem_trywait.html"><i>sem_trywait</i>()</a> operation
|
|
that cannot be performed will issue a blocking semaphore system call or properly report the condition to the application. The same
|
|
interface to the application program would be provided by a high-performance implementation.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_04"></a>Realtime Signals</h5>
|
|
|
|
<h5><a name="tag_03_02_08_05"></a>Realtime Signals Extension</h5>
|
|
|
|
<p>This portion of the rationale presents models, requirements, and standardization issues relevant to the Realtime Signals
|
|
Extension. This extension provides the capability required to support reliable, deterministic, asynchronous notification of events.
|
|
While a new mechanism, unencumbered by the historical usage and semantics of POSIX.1 signals, might allow for a more efficient
|
|
implementation, the application requirements for event notification can be met with a small number of extensions to signals.
|
|
Therefore, a minimal set of extensions to signals to support the application requirements is specified.</p>
|
|
|
|
<p>The realtime signal extensions specified in this section are used by other realtime functions requiring asynchronous
|
|
notification:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Models</p>
|
|
|
|
<p>The model supported is one of multiple cooperating processes, each of which handles multiple asynchronous external events.
|
|
Events represent occurrences that are generated as the result of some activity in the system. Examples of occurrences that can
|
|
constitute an event include:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Completion of an asynchronous I/O request</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Expiration of a POSIX.1b timer</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Arrival of an interprocess message</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Generation of a user-defined event</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Processing of these events may occur synchronously via polling for event notifications or asynchronously via a software
|
|
interrupt mechanism. Existing practice for this model is well established for traditional proprietary realtime operating systems,
|
|
realtime executives, and realtime extended POSIX-like systems.</p>
|
|
|
|
<p>A contrasting model is that of "cooperating sequential processes" where each process handles a single priority of events via
|
|
polling. Each process blocks while waiting for events, and each process depends on the preemptive, priority-based process
|
|
scheduling mechanism to arbitrate between events of different priority that need to be processed concurrently. Existing practice
|
|
for this model is also well established for small realtime executives that typically execute in an unprotected physical address
|
|
space, but it is just emerging in the context of a fuller function operating system with multiple virtual address spaces.</p>
|
|
|
|
<p>It could be argued that the cooperating sequential process model, and the facilities supported by the POSIX Threads Extension
|
|
obviate a software interrupt model. But, even with the cooperating sequential process model, the need has been recognized for a
|
|
software interrupt model to handle exceptional conditions and process aborting, so the mechanism must be supported in any case.
|
|
Furthermore, it is not the purview of IEEE Std 1003.1-2001 to attempt to convince realtime practitioners that their
|
|
current application models based on software interrupts are "broken" and should be replaced by the cooperating sequential process
|
|
model. Rather, it is the charter of IEEE Std 1003.1-2001 to provide standard extensions to mechanisms that support
|
|
existing realtime practice.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Requirements</p>
|
|
|
|
<p>This section discusses the following realtime application requirements for asynchronous event notification:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Reliable delivery of asynchronous event notification</p>
|
|
|
|
<p>The events notification mechanism guarantees delivery of an event notification. Asynchronous operations (such as asynchronous
|
|
I/O and timers) that complete significantly after they are invoked have to guarantee that delivery of the event notification can
|
|
occur at the time of completion.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Prioritized handling of asynchronous event notifications</p>
|
|
|
|
<p>The events notification mechanism supports the assigning of a user function as an event notification handler. Furthermore, the
|
|
mechanism supports the preemption of an event handler function by a higher priority event notification and supports the selection
|
|
of the highest priority pending event notification when multiple notifications (of different priority) are pending
|
|
simultaneously.</p>
|
|
|
|
<p>The model here is based on hardware interrupts. Asynchronous event handling allows the application to ensure that time-critical
|
|
events are immediately processed when delivered, without the indeterminism of being at a random location within a polling loop. Use
|
|
of handler priority allows the specification of how handlers are interrupted by other higher priority handlers.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Differentiation between multiple occurrences of event notifications of the same type</p>
|
|
|
|
<p>The events notification mechanism passes an application-defined value to the event handler function. This value can be used for
|
|
a variety of purposes, such as enabling the application to identify which of several possible events of the same type (for example,
|
|
timer expirations) has occurred.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Polled reception of asynchronous event notifications</p>
|
|
|
|
<p>The events notification mechanism supports blocking and non-blocking polls for asynchronous event notification.</p>
|
|
|
|
<p>The polled mode of operation is often preferred over the interrupt mode by those practitioners accustomed to this model.
|
|
Providing support for this model facilitates the porting of applications based on this model to POSIX.1b conforming systems.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Deterministic response to asynchronous event notifications</p>
|
|
|
|
<p>The events notification mechanism does not preclude implementations that provide deterministic event dispatch latency and
|
|
minimizes the number of system calls needed to use the event facilities during realtime processing.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Rationale for Extension</p>
|
|
|
|
<p>POSIX.1 signals have many of the characteristics necessary to support the asynchronous handling of event notifications, and the
|
|
Realtime Signals Extension addresses the following deficiencies in the POSIX.1 signal mechanism:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Signals do not support reliable delivery of event notification. Subsequent occurrences of a pending signal are not guaranteed to
|
|
be delivered.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Signals do not support prioritized delivery of event notifications. The order of signal delivery when multiple unblocked signals
|
|
are pending is undefined.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Signals do not support the differentiation between multiple signals of the same type.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_06"></a>Asynchronous I/O</h5>
|
|
|
|
<p>Many applications need to interact with the I/O subsystem in an asynchronous manner. The asynchronous I/O mechanism provides the
|
|
ability to overlap application processing and I/O operations initiated by the application. The asynchronous I/O mechanism allows a
|
|
single process to perform I/O simultaneously to a single file multiple times or to multiple files multiple times.</p>
|
|
|
|
<h5><a name="tag_03_02_08_07"></a>Overview</h5>
|
|
|
|
<p>Asynchronous I/O operations proceed in logical parallel with the processing done by the application after the asynchronous I/O
|
|
has been initiated. Other than this difference, asynchronous I/O behaves similarly to normal I/O using <a href=
|
|
"../functions/read.html"><i>read</i>()</a>, <a href="../functions/write.html"><i>write</i>()</a>, <a href=
|
|
"../functions/lseek.html"><i>lseek</i>()</a>, and <a href="../functions/fsync.html"><i>fsync</i>()</a>. The effect of issuing an
|
|
asynchronous I/O request is as if a separate thread of execution were to perform atomically the implied <a href=
|
|
"../functions/lseek.html"><i>lseek</i>()</a> operation, if any, and then the requested I/O operation (either <a href=
|
|
"../functions/read.html"><i>read</i>()</a>, <a href="../functions/write.html"><i>write</i>()</a>, or <a href=
|
|
"../functions/fsync.html"><i>fsync</i>()</a>). There is no seek implied with a call to <a href=
|
|
"../functions/aio_fsync.html"><i>aio_fsync</i>()</a>. Concurrent asynchronous operations and synchronous operations applied to the
|
|
same file update the file as if the I/O operations had proceeded serially.</p>
|
|
|
|
<p>When asynchronous I/O completes, a signal can be delivered to the application to indicate the completion of the I/O. This signal
|
|
can be used to indicate that buffers and control blocks used for asynchronous I/O can be reused. Signal delivery is not required
|
|
for an asynchronous operation and may be turned off on a per-operation basis by the application. Signals may also be synchronously
|
|
polled using <a href="../functions/aio_suspend.html"><i>aio_suspend</i>()</a>, <a href=
|
|
"../functions/sigtimedwait.html"><i>sigtimedwait</i>()</a>, or <a href=
|
|
"../functions/sigwaitinfo.html"><i>sigwaitinfo</i>()</a>.</p>
|
|
|
|
<p>Normal I/O has a return value and an error status associated with it. Asynchronous I/O returns a value and an error status when
|
|
the operation is first submitted, but that only relates to whether the operation was successfully queued up for servicing. The I/O
|
|
operation itself also has a return status and an error value. To allow the application to retrieve the return status and the error
|
|
value, functions are provided that, given the address of an asynchronous I/O control block, yield the return and error status
|
|
associated with the operation. Until an asynchronous I/O operation is done, its error status is [EINPROGRESS]. Thus, an application
|
|
can poll for completion of an asynchronous I/O operation by waiting for the error status to become equal to a value other than
|
|
[EINPROGRESS]. The return status of an asynchronous I/O operation is undefined so long as the error status is equal to
|
|
[EINPROGRESS].</p>
|
|
|
|
<p>Storage for asynchronous operation return and error status may be limited. Submission of asynchronous I/O operations may fail if
|
|
this storage is exceeded. When an application retrieves the return status of a given asynchronous operation, therefore, any
|
|
system-maintained storage used for this status and the error status may be reclaimed for use by other asynchronous operations.</p>
|
|
|
|
<p>Asynchronous I/O can be performed on file descriptors that have been enabled for POSIX.1b synchronized I/O. In this case, the
|
|
I/O operation still occurs asynchronously, as defined herein; however, the asynchronous operation I/O in this case is not completed
|
|
until the I/O has reached either the state of synchronized I/O data integrity completion or synchronized I/O file integrity
|
|
completion, depending on the sort of synchronized I/O that is enabled on the file descriptor.</p>
|
|
|
|
<h5><a name="tag_03_02_08_08"></a>Models</h5>
|
|
|
|
<p>Three models illustrate the use of asynchronous I/O: a journalization model, a data acquisition model, and a model of the use of
|
|
asynchronous I/O in supercomputing applications.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Journalization Model</p>
|
|
|
|
<p>Many realtime applications perform low-priority journalizing functions. Journalizing requires that logging records be queued for
|
|
output without blocking the initiating process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Data Acquisition Model</p>
|
|
|
|
<p>A data acquisition process may also serve as a model. The process has two or more channels delivering intermittent data that
|
|
must be read within a certain time. The process issues one asynchronous read on each channel. When one of the channels needs data
|
|
collection, the process reads the data and posts it through an asynchronous write to secondary memory for future processing.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Supercomputing Model</p>
|
|
|
|
<p>The supercomputing community has used asynchronous I/O much like that specified in POSIX.1 for many years. This community
|
|
requires the ability to perform multiple I/O operations to multiple devices with a minimal number of entries to "the system'';
|
|
each entry to "the system" provokes a major delay in operations when compared to the normal progress made by the application.
|
|
This existing practice motivated the use of combined <a href="../functions/lseek.html"><i>lseek</i>()</a> and <a href=
|
|
"../functions/read.html"><i>read</i>()</a> or <a href="../functions/write.html"><i>write</i>()</a> calls, as well as the <a href=
|
|
"../functions/lio_listio.html"><i>lio_listio</i>()</a> call. Another common practice is to disable signal notification for I/O
|
|
completion, and simply poll for I/O completion at some interval by which the I/O should be completed. Likewise, interfaces like <a
|
|
href="../functions/aio_cancel.html"><i>aio_cancel</i>()</a> have been in successful commercial use for many years. Note also that
|
|
an underlying implementation of asynchronous I/O will require the ability, at least internally, to cancel outstanding asynchronous
|
|
I/O, at least when the process exits. (Consider an asynchronous read from a terminal, when the process intends to exit
|
|
immediately.)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_09"></a>Requirements</h5>
|
|
|
|
<p>Asynchronous input and output for realtime implementations have these requirements:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>The ability to queue multiple asynchronous read and write operations to a single open instance. Both sequential and random
|
|
access should be supported.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The ability to queue asynchronous read and write operations to multiple open instances.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The ability to obtain completion status information by polling and/or asynchronous event notification.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Asynchronous event notification on asynchronous I/O completion is optional.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It has to be possible for the application to associate the event with the <i>aiocbp</i> for the operation that generated the
|
|
event.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The ability to cancel queued requests.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The ability to wait upon asynchronous I/O completion in conjunction with other types of events.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The ability to accept an <a href="../functions/aio_read.html"><i>aio_read</i>()</a> and an <a href=
|
|
"../functions/aio_cancel.html"><i>aio_cancel</i>()</a> for a device that accepts a <a href=
|
|
"../functions/read.html"><i>read</i>()</a>, and the ability to accept an <a href=
|
|
"../functions/aio_write.html"><i>aio_write</i>()</a> and an <a href="../functions/aio_cancel.html"><i>aio_cancel</i>()</a> for a
|
|
device that accepts a <a href="../functions/write.html"><i>write</i>()</a>. This does not imply that the operation is
|
|
asynchronous.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_10"></a>Standardization Issues</h5>
|
|
|
|
<p>The following issues are addressed by the standardization of asynchronous I/O:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Rationale for New Interface</p>
|
|
|
|
<p>Non-blocking I/O does not satisfy the needs of either realtime or high-performance computing models; these models require that a
|
|
process overlap program execution and I/O processing. Realtime applications will often make use of direct I/O to or from the
|
|
address space of the process, or require synchronized (unbuffered) I/O; they also require the ability to overlap this I/O with
|
|
other computation. In addition, asynchronous I/O allows an application to keep a device busy at all times, possibly achieving
|
|
greater throughput. Supercomputing and database architectures will often have specialized hardware that can provide true asynchrony
|
|
underlying the logical asynchrony provided by this interface. In addition, asynchronous I/O should be supported by all types of
|
|
files and devices in the same manner.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Effect of Buffering</p>
|
|
|
|
<p>If asynchronous I/O is performed on a file that is buffered prior to being actually written to the device, it is possible that
|
|
asynchronous I/O will offer no performance advantage over normal I/O; the cycles <i>stolen</i> to perform the asynchronous I/O will
|
|
be taken away from the running process and the I/O will occur at interrupt time. This potential lack of gain in performance in no
|
|
way obviates the need for asynchronous I/O by realtime applications, which very often will use specialized hardware support,
|
|
multiple processors, and/or unbuffered, synchronized I/O.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_11"></a>Memory Management</h5>
|
|
|
|
<p>All memory management and shared memory definitions are located in the <a href=
|
|
"../basedefs/sys/mman.h.html"><i><sys/mman.h></i></a> header. This is for alignment with historical practice.</p>
|
|
|
|
<h5><a name="tag_03_02_08_12"></a>Memory Locking Functions</h5>
|
|
|
|
<p>This portion of the rationale presents models, requirements, and standardization issues relevant to process memory locking.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Models</p>
|
|
|
|
<p>Realtime systems that conform to IEEE Std 1003.1-2001 are expected (and desired) to be supported on systems with
|
|
demand-paged virtual memory management, non-paged swapping memory management, and physical memory systems with no memory management
|
|
hardware. The general case, however, is the demand-paged, virtual memory system with each POSIX process running in a virtual
|
|
address space. Note that this includes architectures where each process resides in its own virtual address space and architectures
|
|
where the address space of each process is only a portion of a larger global virtual address space.</p>
|
|
|
|
<p>The concept of memory locking is introduced to eliminate the indeterminacy introduced by paging and swapping, and to support an
|
|
upper bound on the time required to access the memory mapped into the address space of a process. Ideally, this upper bound will be
|
|
the same as the time required for the processor to access "main memory", including any address translation and cache miss
|
|
overheads. But some implementations-primarily on mainframes-will not actually force locked pages to be loaded and held resident in
|
|
main memory. Rather, they will handle locked pages so that accesses to these pages will meet the performance metrics for locked
|
|
process memory in the implementation. Also, although it is not, for example, the intention that this interface, as specified, be
|
|
used to lock process memory into "cache", it is conceivable that an implementation could support a large static RAM memory and
|
|
define this as "main memory" and use a large[r] dynamic RAM as "backing store". These interfaces could then be interpreted as
|
|
supporting the locking of process memory into the static RAM. Support for multiple levels of backing store would require extensions
|
|
to these interfaces.</p>
|
|
|
|
<p>Implementations may also use memory locking to guarantee a fixed translation between virtual and physical addresses where such
|
|
is beneficial to improving determinancy for direct-to/from-process input/output. IEEE Std 1003.1-2001 does not guarantee
|
|
to the application that the virtual-to-physical address translations, if such exist, are fixed, because such behavior would not be
|
|
implementable on all architectures on which implementations of IEEE Std 1003.1-2001 are expected. But
|
|
IEEE Std 1003.1-2001 does mandate that an implementation define, for the benefit of potential users, whether or not
|
|
locking guarantees fixed translations.</p>
|
|
|
|
<p>Memory locking is defined with respect to the address space of a process. Only the pages mapped into the address space of a
|
|
process may be locked by the process, and when the pages are no longer mapped into the address space-for whatever reason-the locks
|
|
established with respect to that address space are removed. Shared memory areas warrant special mention, as they may be mapped into
|
|
more than one address space or mapped more than once into the address space of a process; locks may be established on pages within
|
|
these areas with respect to several of these mappings. In such a case, the lock state of the underlying physical pages is the
|
|
logical OR of the lock state with respect to each of the mappings. Only when all such locks have been removed are the shared pages
|
|
considered unlocked.</p>
|
|
|
|
<p>In recognition of the page granularity of Memory Management Units (MMU), and in order to support locking of ranges of address
|
|
space, memory locking is defined in terms of "page" granularity. That is, for the interfaces that support an address and size
|
|
specification for the region to be locked, the address must be on a page boundary, and all pages mapped by the specified range are
|
|
locked, if valid. This means that the length is implicitly rounded up to a multiple of the page size. The page size is
|
|
implementation-defined and is available to applications as a compile-time symbolic constant or at runtime via <a href=
|
|
"../functions/sysconf.html"><i>sysconf</i>()</a>.</p>
|
|
|
|
<p>A "real memory" POSIX.1b implementation that has no MMU could elect not to support these interfaces, returning [ENOSYS]. But
|
|
an application could easily interpret this as meaning that the implementation would unconditionally page or swap the application
|
|
when such is not the case. It is the intention of IEEE Std 1003.1-2001 that such a system could define these interfaces
|
|
as "NO-OPs", returning success without actually performing any function except for mandated argument checking.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Requirements</p>
|
|
|
|
<p>For realtime applications, memory locking is generally considered to be required as part of application initialization. This
|
|
locking is performed after an application has been loaded (that is, <i>exec</i>'d) and the program remains locked for its entire
|
|
lifetime. But to support applications that undergo major mode changes where, in one mode, locking is required, but in another it is
|
|
not, the specified interfaces allow repeated locking and unlocking of memory within the lifetime of a process.</p>
|
|
|
|
<p>When a realtime application locks its address space, it should not be necessary for the application to then "touch" all of the
|
|
pages in the address space to guarantee that they are resident or else suffer potential paging delays the first time the page is
|
|
referenced. Thus, IEEE Std 1003.1-2001 requires that the pages locked by the specified interfaces be resident when the
|
|
locking functions return successfully.</p>
|
|
|
|
<p>Many architectures support system-managed stacks that grow automatically when the current extent of the stack is exceeded. A
|
|
realtime application has a requirement to be able to "preallocate" sufficient stack space and lock it down so that it will not
|
|
suffer page faults to grow the stack during critical realtime operation. There was no consensus on a portable way to specify how
|
|
much stack space is needed, so IEEE Std 1003.1-2001 supports no specific interface for preallocating stack space. But an
|
|
application can portably lock down a specific amount of stack space by specifying MCL_FUTURE in a call to <a href=
|
|
"../functions/mlockall.html"><i>mlockall</i>()</a> and then calling a dummy function that declares an automatic array of the
|
|
desired size.</p>
|
|
|
|
<p>Memory locking for realtime applications is also generally considered to be an "all or nothing" proposition. That is, the
|
|
entire process, or none, is locked down. But, for applications that have well-defined sections that need to be locked and others
|
|
that do not, IEEE Std 1003.1-2001 supports an optional set of interfaces to lock or unlock a range of process addresses.
|
|
Reasons for locking down a specific range include:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>An asynchronous event handler function that must respond to external events in a deterministic manner such that page faults
|
|
cannot be tolerated</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>An input/output "buffer" area that is the target for direct-to-process I/O, and the overhead of implicit locking and unlocking
|
|
for each I/O call cannot be tolerated</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Finally, locking is generally viewed as an "application-wide" function. That is, the application is globally aware of which
|
|
regions are locked and which are not over time. This is in contrast to a function that is used temporarily within a "third party''
|
|
library routine whose function is unknown to the application, and therefore must have no "side effects". The specified
|
|
interfaces, therefore, do not support "lock stacking" or "lock nesting" within a process. But, for pages that are shared
|
|
between processes or mapped more than once into a process address space, "lock stacking" is essentially mandated by the
|
|
requirement that unlocking of pages that are mapped by more that one process or more than once by the same process does not affect
|
|
locks established on the other mappings.</p>
|
|
|
|
<p>There was some support for "lock stacking" so that locking could be transparently used in functions or opaque modules. But the
|
|
consensus was not to burden all implementations with lock stacking (and reference counting), and an implementation option was
|
|
proposed. There were strong objections to the option because applications would have to support both options in order to remain
|
|
portable. The consensus was to eliminate lock stacking altogether, primarily through overwhelming support for the System V
|
|
"m[un]lock[all]" interface on which IEEE Std 1003.1-2001 is now based.</p>
|
|
|
|
<p>Locks are not inherited across <a href="../functions/fork.html"><i>fork</i>()</a>s because some implementations implement <a
|
|
href="../functions/fork.html"><i>fork</i>()</a> by creating new address spaces for the child. In such an implementation, requiring
|
|
locks to be inherited would lead to new situations in which a fork would fail due to the inability of the system to lock sufficient
|
|
memory to lock both the parent and the child. The consensus was that there was no benefit to such inheritance. Note that this does
|
|
not mean that locks are removed when, for instance, a thread is created in the same address space.</p>
|
|
|
|
<p>Similarly, locks are not inherited across <i>exec</i> because some implementations implement <i>exec</i> by unmapping all of the
|
|
pages in the address space (which, by definition, removes the locks on these pages), and maps in pages of the <i>exec</i>'d image.
|
|
In such an implementation, requiring locks to be inherited would lead to new situations in which <i>exec</i> would fail. Reporting
|
|
this failure would be very cumbersome to detect in time to report to the calling process, and no appropriate mechanism exists for
|
|
informing the <i>exec</i>'d process of its status.</p>
|
|
|
|
<p>It was determined that, if the newly loaded application required locking, it was the responsibility of that application to
|
|
establish the locks. This is also in keeping with the general view that it is the responsibility of the application to be aware of
|
|
all locks that are established.</p>
|
|
|
|
<p>There was one request to allow (not mandate) locks to be inherited across <a href="../functions/fork.html"><i>fork</i>()</a>,
|
|
and a request for a flag, MCL_INHERIT, that would specify inheritance of memory locks across <i>exec</i>s. Given the difficulties
|
|
raised by this and the general lack of support for the feature in IEEE Std 1003.1-2001, it was not added.
|
|
IEEE Std 1003.1-2001 does not preclude an implementation from providing this feature for administrative purposes, such as
|
|
a "run" command that will lock down and execute a specified application. Additionally, the rationale for the objection equated <a
|
|
href="../functions/fork.html"><i>fork</i>()</a> with creating a thread in the address space. IEEE Std 1003.1-2001 does
|
|
not mandate releasing locks when creating additional threads in an existing process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Standardization Issues</p>
|
|
|
|
<p>One goal of IEEE Std 1003.1-2001 is to define a set of primitives that provide the necessary functionality for
|
|
realtime applications, with consideration for the needs of other application domains where such were identified, which is based to
|
|
the extent possible on existing industry practice.</p>
|
|
|
|
<p>The Memory Locking option is required by many realtime applications to tune performance. Such a facility is accomplished by
|
|
placing constraints on the virtual memory system to limit paging of time of the process or of critical sections of the process.
|
|
This facility should not be used by most non-realtime applications.</p>
|
|
|
|
<p>Optional features provided in IEEE Std 1003.1-2001 allow applications to lock selected address ranges with the caveat
|
|
that the process is responsible for being aware of the page granularity of locking and the unnested nature of the locks.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_13"></a>Mapped Files Functions</h5>
|
|
|
|
<p>The Memory Mapped Files option provides a mechanism that allows a process to access files by directly incorporating file data
|
|
into its address space. Once a file is "mapped" into a process address space, the data can be manipulated by instructions as
|
|
memory. The use of mapped files can significantly reduce I/O data movement since file data does not have to be copied into process
|
|
data buffers as in <a href="../functions/read.html"><i>read</i>()</a> and <a href="../functions/write.html"><i>write</i>()</a>. If
|
|
more than one process maps a file, its contents are shared among them. This provides a low overhead mechanism by which processes
|
|
can synchronize and communicate.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Historical Perspective</p>
|
|
|
|
<p>Realtime applications have historically been implemented using a collection of cooperating processes or tasks. In early systems,
|
|
these processes ran on bare hardware (that is, without an operating system) with no memory relocation or protection. The
|
|
application paradigms that arose from this environment involve the sharing of data between the processes.</p>
|
|
|
|
<p>When realtime systems were implemented on top of vendor-supplied operating systems, the paradigm or performance benefits of
|
|
direct access to data by multiple processes was still deemed necessary. As a result, operating systems that claim to support
|
|
realtime applications must support the shared memory paradigm.</p>
|
|
|
|
<p>Additionally, a number of realtime systems provide the ability to map specific sections of the physical address space into the
|
|
address space of a process. This ability is required if an application is to obtain direct access to memory locations that have
|
|
specific properties (for example, refresh buffers or display devices, dual ported memory locations, DMA target locations). The use
|
|
of this ability is common enough to warrant some degree of standardization of its interface. This ability overlaps the general
|
|
paradigm of shared memory in that, in both instances, common global objects are made addressable by individual processes or
|
|
tasks.</p>
|
|
|
|
<p>Finally, a number of systems also provide the ability to map process addresses to files. This provides both a general means of
|
|
sharing persistent objects, and using files in a manner that optimizes memory and swapping space usage.</p>
|
|
|
|
<p>Simple shared memory is clearly a special case of the more general file mapping capability. In addition, there is relatively
|
|
widespread agreement and implementation of the file mapping interface. In these systems, many different types of objects can be
|
|
mapped (for example, files, memory, devices, and so on) using the same mapping interfaces. This approach both minimizes interface
|
|
proliferation and maximizes the generality of programs using the mapping interfaces.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Memory Mapped Files Usage</p>
|
|
|
|
<p>A memory object can be concurrently mapped into the address space of one or more processes. The <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> and <a href="../functions/munmap.html"><i>munmap</i>()</a> functions allow a process to
|
|
manipulate their address space by mapping portions of memory objects into it and removing them from it. When multiple processes map
|
|
the same memory object, they can share access to the underlying data. Implementations may restrict the size and alignment of
|
|
mappings to be on <i>page</i>-size boundaries. The page size, in bytes, is the value of the system-configurable variable
|
|
{PAGESIZE}, typically accessed by calling <a href="../functions/sysconf.html"><i>sysconf</i>()</a> with a <i>name</i> argument of
|
|
_SC_PAGESIZE. If an implementation has no restrictions on size or alignment, it may specify a 1-byte page size.</p>
|
|
|
|
<p>To map memory, a process first opens a memory object. The <a href="../functions/ftruncate.html"><i>ftruncate</i>()</a> function
|
|
can be used to contract or extend the size of the memory object even when the object is currently mapped. If the memory object is
|
|
extended, the contents of the extended areas are zeros.</p>
|
|
|
|
<p>After opening a memory object, the application maps the object into its address space using the <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> function call. Once a mapping has been established, it remains mapped until unmapped
|
|
with <a href="../functions/munmap.html"><i>munmap</i>()</a>, even if the memory object is closed. The <a href=
|
|
"../functions/mprotect.html"><i>mprotect</i>()</a> function can be used to change the memory protections initially established by
|
|
<a href="../functions/mmap.html"><i>mmap</i>()</a>.</p>
|
|
|
|
<p>A <a href="../functions/close.html"><i>close</i>()</a> of the file descriptor, while invalidating the file descriptor itself,
|
|
does not unmap any mappings established for the memory object. The address space, including all mapped regions, is inherited on <a
|
|
href="../functions/fork.html"><i>fork</i>()</a>. The entire address space is unmapped on process termination or by successful calls
|
|
to any of the <i>exec</i> family of functions.</p>
|
|
|
|
<p>The <a href="../functions/msync.html"><i>msync</i>()</a> function is used to force mapped file data to permanent storage.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Effects on Other Functions</p>
|
|
|
|
<p>When the Memory Mapped Files option is supported, the operation of the <a href="../functions/open.html"><i>open</i>()</a>, <a
|
|
href="../functions/creat.html"><i>creat</i>()</a>, and <a href="../functions/unlink.html"><i>unlink</i>()</a> functions are a
|
|
natural result of using the file system name space to map the global names for memory objects.</p>
|
|
|
|
<p>The <a href="../functions/ftruncate.html"><i>ftruncate</i>()</a> function can be used to set the length of a sharable memory
|
|
object.</p>
|
|
|
|
<p>The meaning of <a href="../functions/stat.html"><i>stat</i>()</a> fields other than the size and protection information is
|
|
undefined on implementations where memory objects are not implemented using regular files. When regular files are used, the times
|
|
reflect when the implementation updated the file image of the data, not when a process updated the data in memory.</p>
|
|
|
|
<p>The operations of <a href="../functions/fdopen.html"><i>fdopen</i>()</a>, <a href="../functions/write.html"><i>write</i>()</a>,
|
|
<a href="../functions/read.html"><i>read</i>()</a>, and <a href="../functions/lseek.html"><i>lseek</i>()</a> were made unspecified
|
|
for objects opened with <a href="../functions/shm_open.html"><i>shm_open</i>()</a>, so that implementations that did not implement
|
|
memory objects as regular files would not have to support the operation of these functions on shared memory objects.</p>
|
|
|
|
<p>The behavior of memory objects with respect to <a href="../functions/close.html"><i>close</i>()</a>, <a href=
|
|
"../functions/dup.html"><i>dup</i>()</a>, <a href="../functions/dup2.html"><i>dup2</i>()</a>, <a href=
|
|
"../functions/open.html"><i>open</i>()</a>, <a href="../functions/close.html"><i>close</i>()</a>, <a href=
|
|
"../functions/fork.html"><i>fork</i>()</a>, <a href="../functions/_exit.html"><i>_exit</i>()</a>, and the <i>exec</i> family of
|
|
functions is the same as the behavior of the existing practice of the <a href="../functions/mmap.html"><i>mmap</i>()</a>
|
|
function.</p>
|
|
|
|
<p>A memory object can still be referenced after a close. That is, any mappings made to the file are still in effect, and reads and
|
|
writes that are made to those mappings are still valid and are shared with other processes that have the same mapping. Likewise,
|
|
the memory object can still be used if any references remain after its name(s) have been deleted. Any references that remain after
|
|
a close must not appear to the application as file descriptors.</p>
|
|
|
|
<p>This is existing practice for <a href="../functions/mmap.html"><i>mmap</i>()</a> and <a href=
|
|
"../functions/close.html"><i>close</i>()</a>. In addition, there are already mappings present (text, data, stack) that do not have
|
|
open file descriptors. The text mapping in particular is considered a reference to the file containing the text. The desire was to
|
|
treat all mappings by the process uniformly. Also, many modern implementations use <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> to implement shared libraries, and it would not be desirable to keep file descriptors
|
|
for each of the many libraries an application can use. It was felt there were many other existing programs that used this behavior
|
|
to free a file descriptor, and thus IEEE Std 1003.1-2001 could not forbid it and still claim to be using existing
|
|
practice.</p>
|
|
|
|
<p>For implementations that implement memory objects using memory only, memory objects will retain the memory allocated to the file
|
|
after the last close and will use that same memory on the next open. Note that closing the memory object is not the same as
|
|
deleting the name, since the memory object is still defined in the memory object name space.</p>
|
|
|
|
<p>The locks of <a href="../functions/fcntl.html"><i>fcntl</i>()</a> do not block any read or write operation, including read or
|
|
write access to shared memory or mapped files. In addition, implementations that only support shared memory objects should not be
|
|
required to implement record locks. The reference to <a href="../functions/fcntl.html"><i>fcntl</i>()</a> is added to make this
|
|
point explicitly. The other <a href="../functions/fcntl.html"><i>fcntl</i>()</a> commands are useful with shared memory
|
|
objects.</p>
|
|
|
|
<p>The size of pages that mapping hardware may be able to support may be a configurable value, or it may change based on hardware
|
|
implementations. The addition of the _SC_PAGESIZE parameter to the <a href="../functions/sysconf.html"><i>sysconf</i>()</a>
|
|
function is provided for determining the mapping page size at runtime.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_14"></a>Shared Memory Functions</h5>
|
|
|
|
<p>Implementations may support the Shared Memory Objects option without supporting a general Memory Mapped Files option. Shared
|
|
memory objects are named regions of storage that may be independent of the file system and can be mapped into the address space of
|
|
one or more processes to allow them to share the associated memory.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Requirements</p>
|
|
|
|
<p>Shared memory is used to share data among several processes, each potentially running at different priority levels, responding
|
|
to different inputs, or performing separate tasks. Shared memory is not just simply providing common access to data, it is
|
|
providing the fastest possible communication between the processes. With one memory write operation, a process can pass information
|
|
to as many processes as have the memory region mapped.</p>
|
|
|
|
<p>As a result, shared memory provides a mechanism that can be used for all other interprocess communication facilities. It may
|
|
also be used by an application for implementing more sophisticated mechanisms than semaphores and message queues.</p>
|
|
|
|
<p>The need for a shared memory interface is obvious for virtual memory systems, where the operating system is directly preventing
|
|
processes from accessing each other's data. However, in unprotected systems, such as those found in some embedded controllers, a
|
|
shared memory interface is needed to provide a portable mechanism to allocate a region of memory to be shared and then to
|
|
communicate the address of that region to other processes.</p>
|
|
|
|
<p>This, then, provides the minimum functionality that a shared memory interface must have in order to support realtime
|
|
applications: to allocate and name an object to be mapped into memory for potential sharing ( <a href=
|
|
"../functions/open.html"><i>open</i>()</a> or <a href="../functions/shm_open.html"><i>shm_open</i>()</a>), and to make the memory
|
|
object available within the address space of a process ( <a href="../functions/mmap.html"><i>mmap</i>()</a>). To complete the
|
|
interface, a mechanism to release the claim of a process on a shared memory object ( <a href=
|
|
"../functions/munmap.html"><i>munmap</i>()</a>) is also needed, as well as a mechanism for deleting the name of a sharable object
|
|
that was previously created ( <a href="../functions/unlink.html"><i>unlink</i>()</a> or <a href=
|
|
"../functions/shm_unlink.html"><i>shm_unlink</i>()</a>).</p>
|
|
|
|
<p>After a mapping has been established, an implementation should not have to provide services to maintain that mapping. All memory
|
|
writes into that area will appear immediately in the memory mapping of that region by any other processes.</p>
|
|
|
|
<p>Thus, requirements include:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Support creation of sharable memory objects and the mapping of these objects into the address space of a process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Sharable memory objects should be accessed by global names accessible from all processes.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Support the mapping of specific sections of physical address space (such as a memory mapped device) into the address space of a
|
|
process. This should not be done by the process specifying the actual address, but again by an implementation-defined global name
|
|
(such as a special device name) dedicated to this purpose.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Support the mapping of discrete portions of these memory objects.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Support for minimum hardware configurations that contain no physical media on which to store shared memory contents
|
|
permanently.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The ability to preallocate the entire shared memory region so that minimum hardware configurations without virtual memory
|
|
support can guarantee contiguous space.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The maximizing of performance by not requiring functionality that would require implementation interaction above creating the
|
|
shared memory area and returning the mapping.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Note that the above requirements do not preclude:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>The sharable memory object from being implemented using actual files on an actual file system.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The global name that is accessible from all processes being restricted to a file system area that is dedicated to handling
|
|
shared memory.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>An implementation not providing implementation-defined global names for the purpose of physical address mapping.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Shared Memory Objects Usage</p>
|
|
|
|
<p>If the Shared Memory Objects option is supported, a shared memory object may be created, or opened if it already exists, with
|
|
the <a href="../functions/shm_open.html"><i>shm_open</i>()</a> function. If the shared memory object is created, it has a length of
|
|
zero. The <a href="../functions/ftruncate.html"><i>ftruncate</i>()</a> function can be used to set the size of the shared memory
|
|
object after creation. The <a href="../functions/shm_unlink.html"><i>shm_unlink</i>()</a> function removes the name for a shared
|
|
memory object created by <a href="../functions/shm_open.html"><i>shm_open</i>()</a>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Shared Memory Overview</p>
|
|
|
|
<p>The shared memory facility defined by IEEE Std 1003.1-2001 usually results in memory locations being added to the
|
|
address space of the process. The implementation returns the address of the new space to the application by means of a pointer.
|
|
This works well in languages like C. However, in languages without pointer types it will not work. In the bindings for such a
|
|
language, either a special COMMON section will need to be defined (which is unlikely), or the binding will have to allow existing
|
|
structures to be mapped. The implementation will likely have to place restrictions on the size and alignment of such structures or
|
|
will have to map a suitable region of the address space of the process into the memory object, and thus into other processes. These
|
|
are issues for that particular language binding. For IEEE Std 1003.1-2001, however, the practice will not be forbidden,
|
|
merely undefined.</p>
|
|
|
|
<p>Two potentially different name spaces are used for naming objects that may be mapped into process address spaces. When the
|
|
Memory Mapped Files option is supported, files may be accessed via <a href="../functions/open.html"><i>open</i>()</a>. When the
|
|
Shared Memory Objects option is supported, sharable memory objects that might not be files may be accessed via the <a href=
|
|
"../functions/shm_open.html"><i>shm_open</i>()</a> function. These options are not mutually-exclusive.</p>
|
|
|
|
<p>Some implementations supporting the Shared Memory Objects option may choose to implement the shared memory object name space as
|
|
part of the file system name space. There are several reasons for this:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>It allows applications to prevent name conflicts by use of the directory structure.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It uses an existing mechanism for accessing global objects and prevents the creation of a new mechanism for naming global
|
|
objects.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>In such implementations, memory objects can be implemented using regular files, if that is what the implementation chooses. The
|
|
<a href="../functions/shm_open.html"><i>shm_open</i>()</a> function can be implemented as an <a href=
|
|
"../functions/open.html"><i>open</i>()</a> call in a fixed directory followed by a call to <a href=
|
|
"../functions/fcntl.html"><i>fcntl</i>()</a> to set FD_CLOEXEC. The <a href="../functions/shm_unlink.html"><i>shm_unlink</i>()</a>
|
|
function can be implemented as an <a href="../functions/unlink.html"><i>unlink</i>()</a> call.</p>
|
|
|
|
<p>On the other hand, it is also expected that small embedded systems that support the Shared Memory Objects option may wish to
|
|
implement shared memory without having any file systems present. In this case, the implementations may choose to use a simple
|
|
string valued name space for shared memory regions. The <a href="../functions/shm_open.html"><i>shm_open</i>()</a> function permits
|
|
either type of implementation.</p>
|
|
|
|
<p>Some implementations have hardware that supports protection of mapped data from certain classes of access and some do not.
|
|
Systems that supply this functionality can support the Memory Protection option.</p>
|
|
|
|
<p>Some implementations restrict size, alignment, and protections to be on <i>page</i>-size boundaries. If an implementation has no
|
|
restrictions on size or alignment, it may specify a 1-byte page size. Applications on implementations that do support larger pages
|
|
must be cognizant of the page size since this is the alignment and protection boundary.</p>
|
|
|
|
<p>Simple embedded implementations may have a 1-byte page size and only support the Shared Memory Objects option. This provides
|
|
simple shared memory between processes without requiring mapping hardware.</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 specifically allows a memory object to remain referenced after a close because that is existing
|
|
practice for the <a href="../functions/mmap.html"><i>mmap</i>()</a> function.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_15"></a>Typed Memory Functions</h5>
|
|
|
|
<p>Implementations may support the Typed Memory Objects option without supporting either the Shared Memory option or the Memory
|
|
Mapped Files option. Typed memory objects are pools of specialized storage, different from the main memory resource normally used
|
|
by a processor to hold code and data, that can be mapped into the address space of one or more processes.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Model</p>
|
|
|
|
<p>Realtime systems conforming to one of the POSIX.13 realtime profiles are expected (and desired) to be supported on systems with
|
|
more than one type or pool of memory (for example, SRAM, DRAM, ROM, EPROM, EEPROM), where each type or pool of memory may be
|
|
accessible by one or more processors via one or more busses (ports). Memory mapped files, shared memory objects, and the
|
|
language-specific storage allocation operators ( <a href="../functions/malloc.html"><i>malloc</i>()</a> for the ISO C
|
|
standard, <i>new</i> for ISO Ada) fail to provide application program interfaces versatile enough to allow applications to control
|
|
their utilization of such diverse memory resources. The typed memory interfaces <a href=
|
|
"../functions/posix_typed_mem_open.html"><i>posix_typed_mem_open</i>()</a>, <a href=
|
|
"../functions/posix_mem_offset.html"><i>posix_mem_offset</i>()</a>, <a href=
|
|
"../functions/posix_typed_mem_get_info.html"><i>posix_typed_mem_get_info</i>()</a>, <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a>, and <a href="../functions/munmap.html"><i>munmap</i>()</a> defined herein support the
|
|
model of typed memory described below.</p>
|
|
|
|
<p>For purposes of this model, a system comprises several processors (for example, P<sub><small>1</small></sub> and
|
|
P<sub><small>2</small></sub>), several physical memory pools (for example, M<sub><small>1</small></sub>,
|
|
M<sub><small>2</small></sub>, M<sub><small>2a</small></sub>, M<sub><small>2b</small></sub>, M<sub><small>3</small></sub>,
|
|
M<sub><small>4</small></sub>, and M<sub><small>5</small></sub>), and several busses or "ports" (for example,
|
|
B<sub><small>1</small></sub>, B<sub><small>2</small></sub>, B<sub><small>3</small></sub>, and B<sub><small>4</small></sub>)
|
|
interconnecting the various processors and memory pools in some system-specific way. Notice that some memory pools may be contained
|
|
in others (for example, M<sub><small>2a</small></sub> and M<sub><small>2b</small></sub> are contained in
|
|
M<sub><small>2</small></sub>).</p>
|
|
|
|
<p><a href="#tagfcjh_1">Example of a System with Typed Memory</a> shows an example of such a model. In a system like this, an
|
|
application should be able to perform the following operations:</p>
|
|
|
|
<dl compact>
|
|
<dt></dt>
|
|
|
|
<dd><img src=".././Figures/b-1.gif"></dd>
|
|
</dl>
|
|
|
|
<center><b><a name="tagfcjh_1"></a> Figure: Example of a System with Typed Memory</b></center>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Typed Memory Allocation</p>
|
|
|
|
<p>An application should be able to allocate memory dynamically from the desired pool using the desired bus, and map it into a
|
|
process' address space. For example, processor P<sub><small>1</small></sub> can allocate some portion of memory pool
|
|
M<sub><small>1</small></sub> through port B<sub><small>1</small></sub>, treating all unmapped subareas of
|
|
M<sub><small>1</small></sub> as a heap-storage resource from which memory may be allocated. This portion of memory is mapped into
|
|
the process' address space, and subsequently deallocated when unmapped from all processes.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Using the Same Storage Region from Different Busses</p>
|
|
|
|
<p>An application process with a mapped region of storage that is accessed from one bus should be able to map that same storage
|
|
area at another address (subject to page size restrictions detailed in <a href="../functions/mmap.html"><i>mmap</i>()</a>), to
|
|
allow it to be accessed from another bus. For example, processor P<sub><small>1</small></sub> may wish to access the same region of
|
|
memory pool M<sub><small>2b</small></sub> both through ports B<sub><small>1</small></sub> and B<sub><small>2</small></sub>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Sharing Typed Memory Regions</p>
|
|
|
|
<p>Several application processes running on the same or different processors may wish to share a particular region of a typed
|
|
memory pool. Each process or processor may wish to access this region through different busses. For example, processor
|
|
P<sub><small>1</small></sub> may want to share a region of memory pool M<sub><small>4</small></sub> with processor
|
|
P<sub><small>2</small></sub>, and they may be required to use busses B<sub><small>2</small></sub> and B<sub><small>3</small></sub>,
|
|
respectively, to minimize bus contention. A problem arises here when a process allocates and maps a portion of fragmented memory
|
|
and then wants to share this region of memory with another process, either in the same processor or different processors. The
|
|
solution adopted is to allow the first process to find out the memory map (offsets and lengths) of all the different fragments of
|
|
memory that were mapped into its address space, by repeatedly calling <a href=
|
|
"../functions/posix_mem_offset.html"><i>posix_mem_offset</i>()</a>. Then, this process can pass the offsets and lengths obtained to
|
|
the second process, which can then map the same memory fragments into its address space.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Contiguous Allocation</p>
|
|
|
|
<p>The problem of finding the memory map of the different fragments of the memory pool that were mapped into logically contiguous
|
|
addresses of a given process can be solved by requesting contiguous allocation. For example, a process in
|
|
P<sub><small>1</small></sub> can allocate 10 Kbytes of physically contiguous memory from
|
|
M<sub><small>3</small></sub>-B<sub><small>1</small></sub>, and obtain the offset (within pool M<sub><small>3</small></sub>) of this
|
|
block of memory. Then, it can pass this offset (and the length) to a process in P<sub><small>2</small></sub> using some
|
|
interprocess communication mechanism. The second process can map the same block of memory by using the offset transferred and
|
|
specifying M<sub><small>3</small></sub>-B<sub><small>2</small></sub>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Unallocated Mapping</p>
|
|
|
|
<p>Any subarea of a memory pool that is mapped to a process, either as the result of an allocation request or an explicit mapping,
|
|
is normally unavailable for allocation. Special processes such as debuggers, however, may need to map large areas of a typed memory
|
|
pool, yet leave those areas available for allocation.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Typed memory allocation and mapping has to coexist with storage allocation operators like <a href=
|
|
"../functions/malloc.html"><i>malloc</i>()</a>, but systems are free to choose how to implement this coexistence. For example, it
|
|
may be system configuration-dependent if all available system memory is made part of one of the typed memory pools or if some part
|
|
will be restricted to conventional allocation operators. Equally system configuration-dependent may be the availability of
|
|
operators like <a href="../functions/malloc.html"><i>malloc</i>()</a> to allocate storage from certain typed memory pools. It is
|
|
not excluded to configure a system such that a given named pool, P<sub><small>1</small></sub>, is in turn split into
|
|
non-overlapping named subpools. For example, M<sub><small>1</small></sub>-B<sub><small>1</small></sub>,
|
|
M<sub><small>2</small></sub>-B<sub><small>1</small></sub>, and M<sub><small>3</small></sub>-B<sub><small>1</small></sub> could also
|
|
be accessed as one common pool M<sub><small>123</small></sub>-B<sub><small>1</small></sub>. A call to <a href=
|
|
"../functions/malloc.html"><i>malloc</i>()</a> on P<sub><small>1</small></sub> could work on such a larger pool while full
|
|
optimization of memory usage by P<sub><small>1</small></sub> would require typed memory allocation at the subpool level.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Existing Practice</p>
|
|
|
|
<p>OS-9 provides for the naming (numbering) and prioritization of memory types by a system administrator. It then provides APIs to
|
|
request memory allocation of typed (colored) memory by number, and to generate a bus address from a mapped memory address
|
|
(translate). When requesting colored memory, the user can specify type 0 to signify allocation from the first available type in
|
|
priority order.</p>
|
|
|
|
<p>HP-RT presents interfaces to map different kinds of storage regions that are visible through a VME bus, although it does not
|
|
provide allocation operations. It also provides functions to perform address translation between VME addresses and virtual
|
|
addresses. It represents a VME-bus unique solution to the general problem.</p>
|
|
|
|
<p>The PSOS approach is similar (that is, based on a pre-established mapping of bus address ranges to specific memories) with a
|
|
concept of segments and regions (regions dynamically allocated from a heap which is a special segment). Therefore, PSOS does not
|
|
fully address the general allocation problem either. PSOS does not have a "process''-based model, but more of a
|
|
"thread''-only-based model of multi-tasking. So mapping to a process address space is not an issue.</p>
|
|
|
|
<p>QNX uses the System V approach of opening specially named devices (shared memory segments) and using <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> to then gain access from the process. They do not address allocation directly, but once
|
|
typed shared memory can be mapped, an "allocation manager" process could be written to handle requests for allocation.</p>
|
|
|
|
<p>The System V approach also included allocation, implemented by opening yet other special "devices" which allocate, rather
|
|
than appearing as a whole memory object.</p>
|
|
|
|
<p>The Orkid realtime kernel interface definition has operations to manage memory "regions" and "pools", which are areas of
|
|
memory that may reflect the differing physical nature of the memory. Operations to allocate memory from these regions and pools are
|
|
also provided.<br>
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Requirements</p>
|
|
|
|
<p>Existing practice in SVID-derived UNIX systems relies on functionality similar to <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> and its related interfaces to achieve mapping and allocation of typed memory. However,
|
|
the issue of sharing typed memory (allocated or mapped) and the complication of multiple ports are not addressed in any consistent
|
|
way by existing UNIX system practice. Part of this functionality is existing practice in specialized realtime operating systems. In
|
|
order to solidify the capabilities implied by the model above, the following requirements are imposed on the interface:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Identification of Typed Memory Pools and Ports</p>
|
|
|
|
<p>All processes (running in all processors) in the system are able to identify a particular (system configured) typed memory pool
|
|
accessed through a particular (system configured) port by a name. That name is a member of a name space common to all these
|
|
processes, but need not be the same name space as that containing ordinary filenames. The association between memory pools/ports
|
|
and corresponding names is typically established when the system is configured. The "open" operation for typed memory objects
|
|
should be distinct from the <a href="../functions/open.html"><i>open</i>()</a> function, for consistency with other similar
|
|
services, but implementable on top of <a href="../functions/open.html"><i>open</i>()</a>. This implies that the handle for a typed
|
|
memory object will be a file descriptor.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Allocation and Mapping of Typed Memory</p>
|
|
|
|
<p>Once a typed memory object has been identified by a process, it is possible to both map user-selected subareas of that object
|
|
into process address space and to map system-selected (that is, dynamically allocated) subareas of that object, with user-specified
|
|
length, into process address space. It is also possible to determine the maximum length of memory allocation that may be requested
|
|
from a given typed memory object.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Sharing Typed Memory</p>
|
|
|
|
<p>Two or more processes are able to share portions of typed memory, either user-selected or dynamically allocated. This
|
|
requirement applies also to dynamically allocated regions of memory that are composed of several non-contiguous pieces.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Contiguous Allocation</p>
|
|
|
|
<p>For dynamic allocation, it is the user's option whether the system is required to allocate a contiguous subarea within the typed
|
|
memory object, or whether it is permitted to allocate discontiguous fragments which appear contiguous in the process mapping.
|
|
Contiguous allocation simplifies the process of sharing allocated typed memory, while discontiguous allocation allows for
|
|
potentially better recovery of deallocated typed memory.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Accessing Typed Memory Through Different Ports</p>
|
|
|
|
<p>Once a subarea of a typed memory object has been mapped, it is possible to determine the location and length corresponding to a
|
|
user-selected portion of that object within the memory pool. This location and length can then be used to remap that portion of
|
|
memory for access from another port. If the referenced portion of typed memory was allocated discontiguously, the length thus
|
|
determined may be shorter than anticipated, and the user code must adapt to the value returned.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Deallocation</p>
|
|
|
|
<p>When a previously mapped subarea of typed memory is no longer mapped by any process in the system-as a result of a call or calls
|
|
to <a href="../functions/munmap.html"><i>munmap</i>()</a>- that subarea becomes potentially reusable for dynamic allocation; actual
|
|
reuse of the subarea is a function of the dynamic typed memory allocation policy.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Unallocated Mapping</p>
|
|
|
|
<p>It must be possible to map user-selected subareas of a typed memory object without marking that subarea as unavailable for
|
|
allocation. This option is not the default behavior, and requires appropriate privilege.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Scenario</p>
|
|
|
|
<p>The following scenario will serve to clarify the use of the typed memory interfaces.</p>
|
|
|
|
<p>Process A running on P<sub><small>1</small></sub> (see <a href="#tagfcjh_1">Example of a System with Typed Memory</a> ) wants to
|
|
allocate some memory from memory pool M<sub><small>2</small></sub>, and it wants to share this portion of memory with process B
|
|
running on P<sub><small>2</small></sub>. Since P<sub><small>2</small></sub> only has access to the lower part of
|
|
M<sub><small>2</small></sub>, both processes will use the memory pool named M<sub><small>2b</small></sub> which is the part of
|
|
M<sub><small>2</small></sub> that is accessible both from P<sub><small>1</small></sub> and P<sub><small>2</small></sub>. The
|
|
operations that both processes need to perform are shown below:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Allocating Typed Memory</p>
|
|
|
|
<p>Process A calls <a href="../functions/posix_typed_mem_open.html"><i>posix_typed_mem_open</i>()</a> with the name
|
|
<b>/typed.m2b-b1</b> and a <i>tflag</i> of POSIX_TYPED_MEM_ALLOCATE to get a file descriptor usable for allocating from pool
|
|
M<sub><small>2b</small></sub> accessed through port B<sub><small>1</small></sub>. It then calls <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> with this file descriptor requesting a length of 4096 bytes. The system allocates two
|
|
discontiguous blocks of sizes 1024 and 3072 bytes within M<sub><small>2b</small></sub>. The <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> function returns a pointer to a 4096-byte array in process A's logical address space,
|
|
mapping the allocated blocks contiguously. Process A can then utilize the array, and store data in it.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Determining the Location of the Allocated Blocks</p>
|
|
|
|
<p>Process A can determine the lengths and offsets (relative to M<sub><small>2b</small></sub>) of the two blocks allocated, by
|
|
using the following procedure: First, process A calls <a href="../functions/posix_mem_offset.html"><i>posix_mem_offset</i>()</a>
|
|
with the address of the first element of the array and length 4096. Upon return, the offset and length (1024 bytes) of the first
|
|
block are returned. A second call to <a href="../functions/posix_mem_offset.html"><i>posix_mem_offset</i>()</a> is then made using
|
|
the address of the first element of the array plus 1024 (the length of the first block), and a new length of 4096-1024. If there
|
|
were more fragments allocated, this procedure could have been continued within a loop until the offsets and lengths of all the
|
|
blocks were obtained. Notice that this relatively complex procedure can be avoided if contiguous allocation is requested (by
|
|
opening the typed memory object with the <i>tflag</i> POSIX_TYPED_MEM_ALLOCATE_CONTIG).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Sharing Data Across Processes</p>
|
|
|
|
<p>Process A passes the two offset values and lengths obtained from the <a href=
|
|
"../functions/posix_mem_offset.html"><i>posix_mem_offset</i>()</a> calls to process B running on P<sub><small>2</small></sub>, via
|
|
some form of interprocess communication. Process B can gain access to process A's data by calling <a href=
|
|
"../functions/posix_typed_mem_open.html"><i>posix_typed_mem_open</i>()</a> with the name <b>/typed.m2b-b2</b> and a <i>tflag</i> of
|
|
zero, then using two <a href="../functions/mmap.html"><i>mmap</i>()</a> calls on the resulting file descriptor to map the two
|
|
subareas of that typed memory object to its own address space.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Rationale for no <i>mem_alloc</i>() and <i>mem_free</i>()</p>
|
|
|
|
<p>The standard developers had originally proposed a pair of new flags to <a href="../functions/mmap.html"><i>mmap</i>()</a> which,
|
|
when applied to a typed memory object descriptor, would cause <a href="../functions/mmap.html"><i>mmap</i>()</a> to allocate
|
|
dynamically from an unallocated and unmapped area of the typed memory object. Deallocation was similarly accomplished through the
|
|
use of <a href="../functions/munmap.html"><i>munmap</i>()</a>. This was rejected by the ballot group because it excessively
|
|
complicated the (already rather complex) <a href="../functions/mmap.html"><i>mmap</i>()</a> interface and introduced semantics
|
|
useful only for typed memory, to a function which must also map shared memory and files. They felt that a memory allocator should
|
|
be built on top of <a href="../functions/mmap.html"><i>mmap</i>()</a> instead of being incorporated within the same interface, much
|
|
as the ISO C standard libraries build <a href="../functions/malloc.html"><i>malloc</i>()</a> on top of the virtual memory
|
|
mapping functions <i>brk</i>() and <i>sbrk</i>(). This would eliminate the complicated semantics involved with unmapping only part
|
|
of an allocated block of typed memory.</p>
|
|
|
|
<p>To attempt to achieve ballot group consensus, typed memory allocation and deallocation was first migrated from <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> and <a href="../functions/munmap.html"><i>munmap</i>()</a> to a pair of complementary
|
|
functions modeled on the ISO C standard <a href="../functions/malloc.html"><i>malloc</i>()</a> and <a href=
|
|
"../functions/free.html"><i>free</i>()</a>. The <i>mem_alloc</i>() function specified explicitly the typed memory object (typed
|
|
memory pool/access port) from which allocation takes place, unlike <a href="../functions/malloc.html"><i>malloc</i>()</a> where the
|
|
memory pool and port are unspecified. The <i>mem_free</i>() function handled deallocation. These new semantics still met all of the
|
|
requirements detailed above without modifying the behavior of <a href="../functions/mmap.html"><i>mmap</i>()</a> except to allow it
|
|
to map specified areas of typed memory objects. An implementation would have been free to implement <i>mem_alloc</i>() and
|
|
<i>mem_free</i>() over <a href="../functions/mmap.html"><i>mmap</i>()</a>, through <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a>, or independently but cooperating with <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a>.</p>
|
|
|
|
<p>The ballot group was queried to see if this was an acceptable alternative, and while there was some agreement that it achieved
|
|
the goal of removing the complicated semantics of allocation from the <a href="../functions/mmap.html"><i>mmap</i>()</a> interface,
|
|
several balloters realized that it just created two additional functions that behaved, in great part, like <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a>. These balloters proposed an alternative which has been implemented here in place of a
|
|
separate <i>mem_alloc</i>() and <i>mem_free</i>(). This alternative is based on four specific suggestions:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>The <a href="../functions/posix_typed_mem_open.html"><i>posix_typed_mem_open</i>()</a> function should provide a flag which
|
|
specifies "allocate on <a href="../functions/mmap.html"><i>mmap</i>()</a>" (otherwise, <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> just maps the underlying object). This allows things roughly similar to <b>/dev/zero</b>
|
|
<i>versus</i> <b>/dev/swap</b>. Two such flags have been implemented, one of which forces contiguous allocation.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <a href="../functions/posix_mem_offset.html"><i>posix_mem_offset</i>()</a> function is acceptable because it can be applied
|
|
usefully to mapped objects in general. It should return the file descriptor of the underlying object.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <i>mem_get_info</i>() function in an earlier draft should be renamed <a href=
|
|
"../functions/posix_typed_mem_get_info.html"><i>posix_typed_mem_get_info</i>()</a> because it is not generally applicable to memory
|
|
objects. It should probably return the file descriptor's allocation attribute. The renaming of the function has been implemented,
|
|
but having it return a piece of information which is readily known by an application without this function has been rejected. Its
|
|
whole purpose is to query the typed memory object for attributes that are not user-specified, but determined by the
|
|
implementation.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>There should be no separate <i>mem_alloc</i>() or <i>mem_free</i>() functions. Instead, using <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a> on a typed memory object opened with an "allocate on <a href=
|
|
"../functions/mmap.html"><i>mmap</i>()</a>" flag should be used to force allocation. These are precisely the semantics defined in
|
|
the current draft.</p>
|
|
</li>
|
|
</ol>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Rationale for no Typed Memory Access Management</p>
|
|
|
|
<p>The working group had originally defined an additional interface (and an additional kind of object: typed memory master) to
|
|
establish and dissolve mappings to typed memory on behalf of devices or processors which were independent of the operating system
|
|
and had no inherent capability to directly establish mappings on their own. This was to have provided functionality similar to
|
|
device driver interfaces such as <i>physio</i>() and their underlying bus-specific interfaces (for example, <i>mballoc</i>()) which
|
|
serve to set up and break down DMA pathways, and derive mapped addresses for use by hardware devices and processor cards.</p>
|
|
|
|
<p>The ballot group felt that this was beyond the scope of POSIX.1 and its amendments. Furthermore, the removal of interrupt
|
|
handling interfaces from a preceding amendment (the IEEE Std 1003.1d-1999) during its balloting process renders these
|
|
typed memory access management interfaces an incomplete solution to portable device management from a user process; it would be
|
|
possible to initiate a device transfer to/from typed memory, but impossible to handle the transfer-complete interrupt in a portable
|
|
way.</p>
|
|
|
|
<p>To achieve ballot group consensus, all references to typed memory access management capabilities were removed. The concept of
|
|
portable interfaces from a device driver to both operating system and hardware is being addressed by the Uniform Driver Interface
|
|
(UDI) industry forum, with formal standardization deferred until proof of concept and industry-wide acceptance and
|
|
implementation.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_16"></a>Process Scheduling</h5>
|
|
|
|
<p>IEEE PASC Interpretation 1003.1 #96 has been applied, adding the <a href=
|
|
"../functions/pthread_setschedprio.html"><i>pthread_setschedprio</i>()</a> function. This was added since previously there was no
|
|
way for a thread to lower its own priority without going to the tail of the threads list for its new priority. This capability is
|
|
necessary to bound the duration of priority inversion encountered by a thread.</p>
|
|
|
|
<p>The following portion of the rationale presents models, requirements, and standardization issues relevant to process scheduling;
|
|
see also <a href="#tag_03_02_09_11">Thread Scheduling</a> .</p>
|
|
|
|
<p>In an operating system supporting multiple concurrent processes, the system determines the order in which processes execute to
|
|
meet implementation-defined goals. For time-sharing systems, the goal is to enhance system throughput and promote fairness; the
|
|
application is provided with little or no control over this sequencing function. While this is acceptable and desirable behavior in
|
|
a time-sharing system, it is inappropriate in a realtime system; realtime applications must specifically control the execution
|
|
sequence of their concurrent processes in order to meet externally defined response requirements.</p>
|
|
|
|
<p>In IEEE Std 1003.1-2001, the control over process sequencing is provided using a concept of scheduling policies. These
|
|
policies, described in detail in this section, define the behavior of the system whenever processor resources are to be allocated
|
|
to competing processes. Only the behavior of the policy is defined; conforming implementations are free to use any mechanism
|
|
desired to achieve the described behavior.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Models</p>
|
|
|
|
<p>In an operating system supporting multiple concurrent processes, the system determines the order in which processes execute and
|
|
might force long-running processes to yield to other processes at certain intervals. Typically, the scheduling code is executed
|
|
whenever an event occurs that might alter the process to be executed next.</p>
|
|
|
|
<p>The simplest scheduling strategy is a "first-in, first-out" (FIFO) dispatcher. Whenever a process becomes runnable, it is
|
|
placed on the end of a ready list. The process at the front of the ready list is executed until it exits or becomes blocked, at
|
|
which point it is removed from the list. This scheduling technique is also known as "run-to-completion" or "run-to-block".</p>
|
|
|
|
<p>A natural extension to this scheduling technique is the assignment of a "non-migrating priority" to each process. This policy
|
|
differs from strict FIFO scheduling in only one respect: whenever a process becomes runnable, it is placed at the end of the list
|
|
of processes runnable at that priority level. When selecting a process to run, the system always selects the first process from the
|
|
highest priority queue with a runnable process. Thus, when a process becomes unblocked, it will preempt a running process of lower
|
|
priority without otherwise altering the ready list. Further, if a process elects to alter its priority, it is removed from the
|
|
ready list and reinserted, using its new priority, according to the policy above.</p>
|
|
|
|
<p>While the above policy might be considered unfriendly in a time-sharing environment in which multiple users require more
|
|
balanced resource allocation, it could be ideal in a realtime environment for several reasons. The most important of these is that
|
|
it is deterministic: the highest-priority process is always run and, among processes of equal priority, the process that has been
|
|
runnable for the longest time is executed first. Because of this determinism, cooperating processes can implement more complex
|
|
scheduling simply by altering their priority. For instance, if processes at a single priority were to reschedule themselves at
|
|
fixed time intervals, a time-slice policy would result.</p>
|
|
|
|
<p>In a dedicated operating system in which all processes are well-behaved realtime applications, non-migrating priority scheduling
|
|
is sufficient. However, many existing implementations provide for more complex scheduling policies.</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 specifies a linear scheduling model. In this model, every process in the system has a priority.
|
|
The system scheduler always dispatches a process that has the highest (generally the most time-critical) priority among all
|
|
runnable processes in the system. As long as there is only one such process, the dispatching policy is trivial. When multiple
|
|
processes of equal priority are eligible to run, they are ordered according to a strict run-to-completion (FIFO) policy.</p>
|
|
|
|
<p>The priority is represented as a positive integer and is inherited from the parent process. For processes running under a fixed
|
|
priority scheduling policy, the priority is never altered except by an explicit function call.</p>
|
|
|
|
<p>It was determined arbitrarily that larger integers correspond to "higher priorities".</p>
|
|
|
|
<p>Certain implementations might impose restrictions on the priority ranges to which processes can be assigned. There also can be
|
|
restrictions on the set of policies to which processes can be set.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Requirements</p>
|
|
|
|
<p>Realtime processes require that scheduling be fast and deterministic, and that it guarantees to preempt lower priority
|
|
processes.</p>
|
|
|
|
<p>Thus, given the linear scheduling model, realtime processes require that they be run at a priority that is higher than other
|
|
processes. Within this framework, realtime processes are free to yield execution resources to each other in a completely portable
|
|
and implementation-defined manner.</p>
|
|
|
|
<p>As there is a generally perceived requirement for processes at the same priority level to share processor resources more
|
|
equitably, provisions are made by providing a scheduling policy (that is, SCHED_RR) intended to provide a timeslice-like facility.
|
|
<basefont size="2"></p>
|
|
|
|
<dl>
|
|
<dt><b>Note:</b></dt>
|
|
|
|
<dd>The following topics assume that low numeric priority implies low scheduling criticality and <i>vice versa</i>.</dd>
|
|
</dl>
|
|
|
|
<basefont size="3"></li>
|
|
|
|
<li>
|
|
<p>Rationale for New Interface</p>
|
|
|
|
<p>Realtime applications need to be able to determine when processes will run in relation to each other. It must be possible to
|
|
guarantee that a critical process will run whenever it is runnable; that is, whenever it wants to for as long as it needs.
|
|
SCHED_FIFO satisfies this requirement. Additionally, SCHED_RR was defined to meet a realtime requirement for a well-defined
|
|
time-sharing policy for processes at the same priority.</p>
|
|
|
|
<p>It would be possible to use the BSD <a href="../functions/setpriority.html"><i>setpriority</i>()</a> and <a href=
|
|
"../functions/getpriority.html"><i>getpriority</i>()</a> functions by redefining the meaning of the "nice" parameter according to
|
|
the scheduling policy currently in use by the process. The System V <a href="../functions/nice.html"><i>nice</i>()</a>
|
|
interface was felt to be undesirable for realtime because it specifies an adjustment to the "nice" value, rather than setting it
|
|
to an explicit value. Realtime applications will usually want to set priority to an explicit value. Also, System V <a href=
|
|
"../functions/nice.html"><i>nice</i>()</a> does not allow for changing the priority of another process.</p>
|
|
|
|
<p>With the POSIX.1b interfaces, the traditional "nice" value does not affect the SCHED_FIFO or SCHED_RR scheduling policies. If
|
|
a "nice" value is supported, it is implementation-defined whether it affects the SCHED_OTHER policy.</p>
|
|
|
|
<p>An important aspect of IEEE Std 1003.1-2001 is the explicit description of the queuing and preemption rules. It is
|
|
critical, to achieve deterministic scheduling, that such rules be stated clearly in IEEE Std 1003.1-2001.</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 does not address the interaction between priority and swapping. The issues involved with swapping
|
|
and virtual memory paging are extremely implementation-defined and would be nearly impossible to standardize at this point. The
|
|
proposed scheduling paradigm, however, fully describes the scheduling behavior of runnable processes, of which one criterion is
|
|
that the working set be resident in memory. Assuming the existence of a portable interface for locking portions of a process in
|
|
memory, paging behavior need not affect the scheduling of realtime processes.</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 also does not address the priorities of "system" processes. In general, these processes should
|
|
always execute in low-priority ranges to avoid conflict with other realtime processes. Implementations should document the priority
|
|
ranges in which system processes run.</p>
|
|
|
|
<p>The default scheduling policy is not defined. The effect of I/O interrupts and other system processing activities is not
|
|
defined. The temporary lending of priority from one process to another (such as for the purposes of affecting freeing resources) by
|
|
the system is not addressed. Preemption of resources is not addressed. Restrictions on the ability of a process to affect other
|
|
processes beyond a certain level (influence levels) is not addressed.</p>
|
|
|
|
<p>The rationale used to justify the simple time-quantum scheduler is that it is common practice to depend upon this type of
|
|
scheduling to ensure "fair" distribution of processor resources among portions of the application that must interoperate in a
|
|
serial fashion. Note that IEEE Std 1003.1-2001 is silent with respect to the setting of this time quantum, or whether it
|
|
is a system-wide value or a per-process value, although it appears that the prevailing realtime practice is for it to be a
|
|
system-wide value.</p>
|
|
|
|
<p>In a system with <i>N</i> processes at a given priority, all processor-bound, in which the time quantum is equal for all
|
|
processes at a specific priority level, the following assumptions are made of such a scheduling policy:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>A time quantum <i>Q</i> exists and the current process will own control of the processor for at least a duration of <i>Q</i> and
|
|
will have the processor for a duration of <i>Q</i>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <i>N</i>th process at that priority will control a processor within a duration of ( <i>N</i>-1) × <i>Q</i>.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>These assumptions are necessary to provide equal access to the processor and bounded response from the application.</p>
|
|
|
|
<p>The assumptions hold for the described scheduling policy only if no system overhead, such as interrupt servicing, is present. If
|
|
the interrupt servicing load is non-zero, then one of the two assumptions becomes fallacious, based upon how <i>Q</i> is measured
|
|
by the system.</p>
|
|
|
|
<p>If <i>Q</i> is measured by clock time, then the assumption that the process obtains a duration <i>Q</i> processor time is false
|
|
if interrupt overhead exists. Indeed, a scenario can be constructed with <i>N</i> processes in which a single process undergoes
|
|
complete processor starvation if a peripheral device, such as an analog-to-digital converter, generates significant interrupt
|
|
activity periodically with a period of <i>N</i> × <i>Q</i>.</p>
|
|
|
|
<p>If <i>Q</i> is measured as actual processor time, then the assumption that the <i>N</i>th process runs in within the duration (
|
|
<i>N</i>-1) × <i>Q</i> is false.</p>
|
|
|
|
<p>It should be noted that SCHED_FIFO suffers from interrupt-based delay as well. However, for SCHED_FIFO, the implied response of
|
|
the system is "as soon as possible", so that the interrupt load for this case is a vendor selection and not a compliance
|
|
issue.</p>
|
|
|
|
<p>With this in mind, it is necessary either to complete the definition by including bounds on the interrupt load, or to modify the
|
|
assumptions that can be made about the scheduling policy.</p>
|
|
|
|
<p>Since the motivation of inclusion of the policy is common usage, and since current applications do not enjoy the luxury of
|
|
bounded interrupt load, item (2) above is sufficient to express existing application needs and is less restrictive in the standard
|
|
definition. No difference in interface is necessary.</p>
|
|
|
|
<p>In an implementation in which the time quantum is equal for all processes at a specific priority, our assumptions can then be
|
|
restated as:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>A time quantum <i>Q</i> exists, and a processor-bound process will be rescheduled after a duration of, at most, <i>Q</i>. Time
|
|
quantum <i>Q</i> may be defined in either wall clock time or execution time.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>In general, the <i>N</i>th process of a priority level should wait no longer than ( <i>N</i>-1) × <i>Q</i> time to
|
|
execute, assuming no processes exist at higher priority levels.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>No process should wait indefinitely.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>For implementations supporting per-process time quanta, these assumptions can be readily extended.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_17"></a>Sporadic Server Scheduling Policy</h5>
|
|
|
|
<p>The sporadic server is a mechanism defined for scheduling aperiodic activities in time-critical realtime systems. This mechanism
|
|
reserves a certain bounded amount of execution capacity for processing aperiodic events at a high priority level. Any aperiodic
|
|
events that cannot be processed within the bounded amount of execution capacity are executed in the background at a low priority
|
|
level. Thus, a certain amount of execution capacity can be guaranteed to be available for processing periodic tasks, even under
|
|
burst conditions in the arrival of aperiodic processing requests (that is, a large number of requests in a short time interval).
|
|
The sporadic server also simplifies the schedulability analysis of the realtime system, because it allows aperiodic processes or
|
|
threads to be treated as if they were periodic. The sporadic server was first described by Sprunt, et al.</p>
|
|
|
|
<p>The key concept of the sporadic server is to provide and limit a certain amount of computation capacity for processing aperiodic
|
|
events at their assigned normal priority, during a time interval called the "replenishment period". Once the entity controlled by
|
|
the sporadic server mechanism is initialized with its period and execution-time budget attributes, it preserves its execution
|
|
capacity until an aperiodic request arrives. The request will be serviced (if there are no higher priority activities pending) as
|
|
long as there is execution capacity left. If the request is completed, the actual execution time used to service it is subtracted
|
|
from the capacity, and a replenishment of this amount of execution time is scheduled to happen one replenishment period after the
|
|
arrival of the aperiodic request. If the request is not completed, because there is no execution capacity left, then the aperiodic
|
|
process or thread is assigned a lower background priority. For each portion of consumed execution capacity the execution time used
|
|
is replenished after one replenishment period. At the time of replenishment, if the sporadic server was executing at a background
|
|
priority level, its priority is elevated to the normal level. Other similar replenishment policies have been defined, but the one
|
|
presented here represents a compromise between efficiency and implementation complexity.</p>
|
|
|
|
<p>The interface that appears in this section defines a new scheduling policy for threads and processes that behaves according to
|
|
the rules of the sporadic server mechanism. Scheduling attributes are defined and functions are provided to allow the user to set
|
|
and get the parameters that control the scheduling behavior of this mechanism, namely the normal and low priority, the
|
|
replenishment period, the maximum number of pending replenishment operations, and the initial execution-time budget.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Scheduling Aperiodic Activities</p>
|
|
|
|
<p>Virtually all realtime applications are required to process aperiodic activities. In many cases, there are tight timing
|
|
constraints that the response to the aperiodic events must meet. Usual timing requirements imposed on the response to these events
|
|
are:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>The effects of an aperiodic activity on the response time of lower priority activities must be controllable and predictable.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The system must provide the fastest possible response time to aperiodic events.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to take advantage of all the available processing bandwidth not needed by time-critical activities to
|
|
enhance average-case response times to aperiodic events.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Traditional methods for scheduling aperiodic activities are background processing, polling tasks, and direct event
|
|
execution:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Background processing consists of assigning a very low priority to the processing of aperiodic events. It utilizes all the
|
|
available bandwidth in the system that has not been consumed by higher priority threads. However, it is very difficult, or
|
|
impossible, to meet requirements on average-case response time, because the aperiodic entity has to wait for the execution of all
|
|
other entities which have higher priority.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Polling consists of creating a periodic process or thread for servicing aperiodic requests. At regular intervals, the polling
|
|
entity is started and its services accumulated pending aperiodic requests. If no aperiodic requests are pending, the polling entity
|
|
suspends itself until its next period. Polling allows the aperiodic requests to be processed at a higher priority level. However,
|
|
worst and average-case response times of polling entities are a direct function of the polling period, and there is execution
|
|
overhead for each polling period, even if no event has arrived. If the deadline of the aperiodic activity is short compared to the
|
|
inter-arrival time, the polling frequency must be increased to guarantee meeting the deadline. For this case, the increase in
|
|
frequency can dramatically reduce the efficiency of the system and, therefore, its capacity to meet all deadlines. Yet, polling
|
|
represents a good way to handle a large class of practical problems because it preserves system predictability, and because the
|
|
amortized overhead drops as load increases.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Direct event execution consists of executing the aperiodic events at a high fixed-priority level. Typically, the aperiodic event
|
|
is processed by an interrupt service routine as soon as it arrives. This technique provides predictable response times for
|
|
aperiodic events, but makes the response times of all lower priority activities completely unpredictable under burst arrival
|
|
conditions. Therefore, if the density of aperiodic event arrivals is unbounded, it may be a dangerous technique for time-critical
|
|
systems. Yet, for those cases in which the physics of the system imposes a bound on the event arrival rate, it is probably the most
|
|
efficient technique.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The sporadic server scheduling algorithm combines the predictability of the polling approach with the short response times of
|
|
the direct event execution. Thus, it allows systems to meet an important class of application requirements that cannot be met by
|
|
using the traditional approaches. Multiple sporadic servers with different attributes can be applied to the scheduling of multiple
|
|
classes of aperiodic events, each with different kinds of timing requirements, such as individual deadlines, average response
|
|
times, and so on. It also has many other interesting applications for realtime, such as scheduling producer/consumer tasks in
|
|
time-critical systems, limiting the effects of faults on the estimation of task execution-time requirements, and so on.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Existing Practice</p>
|
|
|
|
<p>The sporadic server has been used in different kinds of applications, including military avionics, robot control systems,
|
|
industrial automation systems, and so on. There are examples of many systems that cannot be successfully scheduled using the
|
|
classic approaches, such as direct event execution, or polling, and are schedulable using a sporadic server scheduler. The sporadic
|
|
server algorithm itself can successfully schedule all systems scheduled with direct event execution or polling.</p>
|
|
|
|
<p>The sporadic server scheduling policy has been implemented as a commercial product in the run-time system of the Verdix Ada
|
|
compiler. There are also many applications that have used a much less efficient application-level sporadic server. These realtime
|
|
applications would benefit from a sporadic server scheduler implemented at the scheduler level.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Library-Level <i>versus</i> Kernel-Level Implementation</p>
|
|
|
|
<p>The sporadic server interface described in this section requires the sporadic server policy to be implemented at the same level
|
|
as the scheduler. This means that the process sporadic server must be implemented at the kernel level and the thread sporadic
|
|
server policy implemented at the same level as the thread scheduler; that is, kernel or library level.</p>
|
|
|
|
<p>In an earlier interface for the sporadic server, this mechanism was implementable at a different level than the scheduler. This
|
|
feature allowed the implementor to choose between an efficient scheduler-level implementation, or a simpler user or library-level
|
|
implementation. However, the working group considered that this interface made the use of sporadic servers more complex, and that
|
|
library-level implementations would lack some of the important functionality of the sporadic server, namely the limitation of the
|
|
actual execution time of aperiodic activities. The working group also felt that the interface described in this chapter does not
|
|
preclude library-level implementations of threads intended to provide efficient low-overhead scheduling for those threads that are
|
|
not scheduled under the sporadic server policy.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Range of Scheduling Priorities</p>
|
|
|
|
<p>Each of the scheduling policies supported in IEEE Std 1003.1-2001 has an associated range of priorities. The priority
|
|
ranges for each policy might or might not overlap with the priority ranges of other policies. For time-critical realtime
|
|
applications it is usual for periodic and aperiodic activities to be scheduled together in the same processor. Periodic activities
|
|
will usually be scheduled using the SCHED_FIFO scheduling policy, while aperiodic activities may be scheduled using SCHED_SPORADIC.
|
|
Since the application developer will require complete control over the relative priorities of these activities in order to meet his
|
|
timing requirements, it would be desirable for the priority ranges of SCHED_FIFO and SCHED_SPORADIC to overlap completely.
|
|
Therefore, although IEEE Std 1003.1-2001 does not require any particular relationship between the different priority
|
|
ranges, it is recommended that these two ranges should coincide.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Dynamically Setting the Sporadic Server Policy</p>
|
|
|
|
<p>Several members of the working group requested that implementations should not be required to support dynamically setting the
|
|
sporadic server scheduling policy for a thread. The reason is that this policy may have a high overhead for library-level
|
|
implementations of threads, and if threads are allowed to dynamically set this policy, this overhead can be experienced even if the
|
|
thread does not use that policy. By disallowing the dynamic setting of the sporadic server scheduling policy, these implementations
|
|
can accomplish efficient scheduling for threads using other policies. If a strictly conforming application needs to use the
|
|
sporadic server policy, and is therefore willing to pay the overhead, it must set this policy at the time of thread creation.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Limitation of the Number of Pending Replenishments</p>
|
|
|
|
<p>The number of simultaneously pending replenishment operations must be limited for each sporadic server for two reasons: an
|
|
unlimited number of replenishment operations would need an unlimited number of system resources to store all the pending
|
|
replenishment operations; on the other hand, in some implementations each replenishment operation will represent a source of
|
|
priority inversion (just for the duration of the replenishment operation) and thus, the maximum amount of replenishments must be
|
|
bounded to guarantee bounded response times. The way in which the number of replenishments is bounded is by lowering the priority
|
|
of the sporadic server to <i>sched_ss_low_priority</i> when the number of pending replenishments has reached its limit. In this
|
|
way, no new replenishments are scheduled until the number of pending replenishments decreases.</p>
|
|
|
|
<p>In the sporadic server scheduling policy defined in IEEE Std 1003.1-2001, the application can specify the maximum
|
|
number of pending replenishment operations for a single sporadic server, by setting the value of the <i>sched_ss_max_repl</i>
|
|
scheduling parameter. This value must be between one and {SS_REPL_MAX}, which is a maximum limit imposed by the implementation. The
|
|
limit {SS_REPL_MAX} must be greater than or equal to {_POSIX_SS_REPL_MAX}, which is defined to be four in
|
|
IEEE Std 1003.1-2001. The minimum limit of four was chosen so that an application can at least guarantee that four
|
|
different aperiodic events can be processed during each interval of length equal to the replenishment period.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_18"></a>Clocks and Timers</h5>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Clocks</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 and the ISO C standard both define functions for obtaining system time. Implicit behind
|
|
these functions is a mechanism for measuring passage of time. This specification makes this mechanism explicit and calls it a
|
|
clock. The CLOCK_REALTIME clock required by IEEE Std 1003.1-2001 is a higher resolution version of the clock that
|
|
maintains POSIX.1 system time. This is a "system-wide" clock, in that it is visible to all processes and, were it possible for
|
|
multiple processes to all read the clock at the same time, they would see the same value.</p>
|
|
|
|
<p>An extensible interface was defined, with the ability for implementations to define additional clocks. This was done because of
|
|
the observation that many realtime platforms support multiple clocks, and it was desired to fit this model within the standard
|
|
interface. But implementation-defined clocks need not represent actual hardware devices, nor are they necessarily system-wide.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Timers</p>
|
|
|
|
<p>Two timer types are required for a system to support realtime applications:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>One-shot</p>
|
|
|
|
<p>A one-shot timer is a timer that is armed with an initial expiration time, either relative to the current time or at an absolute
|
|
time (based on some timing base, such as time in seconds and nanoseconds since the Epoch). The timer expires once and then is
|
|
disarmed. With the specified facilities, this is accomplished by setting the <i>it_value</i> member of the <i>value</i> argument to
|
|
the desired expiration time and the <i>it_interval</i> member to zero.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Periodic</p>
|
|
|
|
<p>A periodic timer is a timer that is armed with an initial expiration time, again either relative or absolute, and a repetition
|
|
interval. When the initial expiration occurs, the timer is reloaded with the repetition interval and continues counting. With the
|
|
specified facilities, this is accomplished by setting the <i>it_value</i> member of the <i>value</i> argument to the desired
|
|
initial expiration time and the <i>it_interval</i> member to the desired repetition interval.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>For both of these types of timers, the time of the initial timer expiration can be specified in two ways:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Relative (to the current time)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Absolute</p>
|
|
</li>
|
|
</ol>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Examples of Using Realtime Timers</p>
|
|
|
|
<p>In the diagrams below, <i>S</i> indicates a program schedule, <i>R</i> shows a schedule method request, and <i>E</i> suggests an
|
|
internal operating system event.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Periodic Timer: Data Logging</p>
|
|
|
|
<p>During an experiment, it might be necessary to log realtime data periodically to an internal buffer or to a mass storage device.
|
|
With a periodic scheduling method, a logging module can be started automatically at fixed time intervals to log the data.</p>
|
|
|
|
<p>Program schedule is requested every 10 seconds.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt> R S S S S S
|
|
----+----+----+----+----+----+----+----+----+----+----+--->
|
|
5 10 15 20 25 30 35 40 45 50 55
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>[Time (in Seconds)]</p>
|
|
|
|
<p>To achieve this type of scheduling using the specified facilities, one would allocate a per-process timer based on clock ID
|
|
CLOCK_REALTIME. Then the timer would be armed via a call to <a href="../functions/timer_settime.html"><i>timer_settime</i>()</a>
|
|
with the TIMER_ABSTIME flag reset, and with an initial expiration value and a repetition interval of 10 seconds.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>One-shot Timer (Relative Time): Device Initialization</p>
|
|
|
|
<p>In an emission test environment, large sample bags are used to capture the exhaust from a vehicle. The exhaust is purged from
|
|
these bags before each and every test. With a one-shot timer, a module could initiate the purge function and then suspend itself
|
|
for a predetermined period of time while the sample bags are prepared.</p>
|
|
|
|
<p>Program schedule requested 20 seconds after call is issued.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt> R S
|
|
----+----+----+----+----+----+----+----+----+----+----+--->
|
|
5 10 15 20 25 30 35 40 45 50 55
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>[Time (in Seconds)]</p>
|
|
|
|
<p>To achieve this type of scheduling using the specified facilities, one would allocate a per-process timer based on clock ID
|
|
CLOCK_REALTIME. Then the timer would be armed via a call to <a href="../functions/timer_settime.html"><i>timer_settime</i>()</a>
|
|
with the TIMER_ABSTIME flag reset, and with an initial expiration value of 20 seconds and a repetition interval of zero.</p>
|
|
|
|
<p>Note that if the program wishes merely to suspend itself for the specified interval, it could more easily use <a href=
|
|
"../functions/nanosleep.html"><i>nanosleep</i>()</a>.<br>
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>One-shot Timer (Absolute Time): Data Transmission</p>
|
|
|
|
<p>The results from an experiment are often moved to a different system within a network for postprocessing or archiving. With an
|
|
absolute one-shot timer, a module that moves data from a test-cell computer to a host computer can be automatically scheduled on a
|
|
daily basis.</p>
|
|
|
|
<p>Program schedule requested for 2:30 a.m.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt> R S
|
|
-----+-----+-----+-----+-----+-----+-----+-----+-----+----->
|
|
23:00 23:30 24:00 00:30 01:00 01:30 02:00 02:30 03:00
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>[Time of Day]</p>
|
|
|
|
<p>To achieve this type of scheduling using the specified facilities, a per-process timer would be allocated based on clock ID
|
|
CLOCK_REALTIME. Then the timer would be armed via a call to <a href="../functions/timer_settime.html"><i>timer_settime</i>()</a>
|
|
with the TIMER_ABSTIME flag set, and an initial expiration value equal to 2:30 a.m. of the next day.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Periodic Timer (Relative Time): Signal Stabilization</p>
|
|
|
|
<p>Some measurement devices, such as emission analyzers, do not respond instantaneously to an introduced sample. With a periodic
|
|
timer with a relative initial expiration time, a module that introduces a sample and records the average response could suspend
|
|
itself for a predetermined period of time while the signal is stabilized and then sample at a fixed rate.</p>
|
|
|
|
<p>Program schedule requested 15 seconds after call is issued and every 2 seconds thereafter.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt> R S S S S S S S S S S S S S S S S S S S S
|
|
----+----+----+----+----+----+----+----+----+----+----+--->
|
|
5 10 15 20 25 30 35 40 45 50 55
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>[Time (in Seconds)]</p>
|
|
|
|
<p>To achieve this type of scheduling using the specified facilities, one would allocate a per-process timer based on clock ID
|
|
CLOCK_REALTIME. Then the timer would be armed via a call to <a href="../functions/timer_settime.html"><i>timer_settime</i>()</a>
|
|
with TIMER_ABSTIME flag reset, and with an initial expiration value of 15 seconds and a repetition interval of 2 seconds.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Periodic Timer (Absolute Time): Work Shift-related Processing</p>
|
|
|
|
<p>Resource utilization data is useful when time to perform experiments is being scheduled at a facility. With a periodic timer
|
|
with an absolute initial expiration time, a module can be scheduled at the beginning of a work shift to gather resource utilization
|
|
data throughout the shift. This data can be used to allocate resources effectively to minimize bottlenecks and delays and maximize
|
|
facility throughput.</p>
|
|
|
|
<p>Program schedule requested for 2:00 a.m. and every 15 minutes thereafter.</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt> R S S S S S S
|
|
-----+-----+-----+-----+-----+-----+-----+-----+-----+----->
|
|
23:00 23:30 24:00 00:30 01:00 01:30 02:00 02:30 03:00
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>[Time of Day]</p>
|
|
|
|
<p>To achieve this type of scheduling using the specified facilities, one would allocate a per-process timer based on clock ID
|
|
CLOCK_REALTIME. Then the timer would be armed via a call to <a href="../functions/timer_settime.html"><i>timer_settime</i>()</a>
|
|
with TIMER_ABSTIME flag set, and with an initial expiration value equal to 2:00 a.m. and a repetition interval equal to 15
|
|
minutes.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Relationship of Timers to Clocks</p>
|
|
|
|
<p>The relationship between clocks and timers armed with an absolute time is straightforward: a timer expiration signal is
|
|
requested when the associated clock reaches or exceeds the specified time. The relationship between clocks and timers armed with a
|
|
relative time (an interval) is less obvious, but not unintuitive. In this case, a timer expiration signal is requested when the
|
|
specified interval, <i>as measured by the associated clock</i>, has passed. For the required CLOCK_REALTIME clock, this allows
|
|
timer expiration signals to be requested at specified "wall clock" times (absolute), or when a specified interval of "realtime''
|
|
has passed (relative). For an implementation-defined clock-say, a process virtual time clock-timer expirations could be requested
|
|
when the process has used a specified total amount of virtual time (absolute), or when it has used a specified <i>additional</i>
|
|
amount of virtual time (relative).</p>
|
|
|
|
<p>The interfaces also allow flexibility in the implementation of the functions. For example, an implementation could convert all
|
|
absolute times to intervals by subtracting the clock value at the time of the call from the requested expiration time and
|
|
"counting down" at the supported resolution. Or it could convert all relative times to absolute expiration time by adding in the
|
|
clock value at the time of the call and comparing the clock value to the expiration time at the supported resolution. Or it might
|
|
even choose to maintain absolute times as absolute and compare them to the clock value at the supported resolution for absolute
|
|
timers, and maintain relative times as intervals and count them down at the resolution supported for relative timers. The choice
|
|
will be driven by efficiency considerations and the underlying hardware or software clock implementation.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Data Definitions for Clocks and Timers</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 uses a time representation capable of supporting nanosecond resolution timers for the following
|
|
reasons:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>To enable IEEE Std 1003.1-2001 to represent those computer systems already using nanosecond or submicrosecond
|
|
resolution clocks.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>To accommodate those per-process timers that might need nanoseconds to specify an absolute value of system-wide clocks, even
|
|
though the resolution of the per-process timer may only be milliseconds, or <i>vice versa</i>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Because the number of nanoseconds in a second can be represented in 32 bits.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Time values are represented in the <b>timespec</b> structure. The <i>tv_sec</i> member is of type <b>time_t</b> so that this
|
|
member is compatible with time values used by POSIX.1 functions and the ISO C standard. The <i>tv_nsec</i> member is a
|
|
<b>signed long</b> in order to simplify and clarify code that decrements or finds differences of time values. Note that because 1
|
|
billion (number of nanoseconds per second) is less than half of the value representable by a signed 32-bit value, it is always
|
|
possible to add two valid fractional seconds represented as integral nanoseconds without overflowing the signed 32-bit value.</p>
|
|
|
|
<p>A maximum allowable resolution for the CLOCK_REALTIME clock of 20 ms (1/50 seconds) was chosen to allow line frequency clocks in
|
|
European countries to be conforming. 60 Hz clocks in the U.S. will also be conforming, as will finer granularity clocks, although a
|
|
Strictly Conforming Application cannot assume a granularity of less than 20 ms (1/50 seconds).</p>
|
|
|
|
<p>The minimum allowable maximum time allowed for the CLOCK_REALTIME clock and the function <a href=
|
|
"../functions/nanosleep.html"><i>nanosleep</i>()</a>, and timers created with <i>clock_id</i>= CLOCK_REALTIME, is determined by the
|
|
fact that the <i>tv_sec</i> member is of type <b>time_t</b>.</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 specifies that timer expirations must not be delivered early, and <a href=
|
|
"../functions/nanosleep.html"><i>nanosleep</i>()</a> must not return early due to quantization error.
|
|
IEEE Std 1003.1-2001 discusses the various implementations of <a href="../functions/alarm.html"><i>alarm</i>()</a> in the
|
|
rationale and states that implementations that do not allow alarm signals to occur early are the most appropriate, but refrained
|
|
from mandating this behavior. Because of the importance of predictability to realtime applications, IEEE Std 1003.1-2001
|
|
takes a stronger stance.</p>
|
|
|
|
<p>The developers of IEEE Std 1003.1-2001 considered using a time representation that differs from POSIX.1b in the second
|
|
32 bit of the 64-bit value. Whereas POSIX.1b defines this field as a fractional second in nanoseconds, the other methodology
|
|
defines this as a binary fraction of one second, with the radix point assumed before the most significant bit.</p>
|
|
|
|
<p>POSIX.1b is a software, source-level standard and most of the benefits of the alternate representation are enjoyed by hardware
|
|
implementations of clocks and algorithms. It was felt that mandating this format for POSIX.1b clocks and timers would unnecessarily
|
|
burden the application writer with writing, possibly non-portable, multiple precision arithmetic packages to perform conversion
|
|
between binary fractions and integral units such as nanoseconds, milliseconds, and so on.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_19"></a>Rationale for the Monotonic Clock</h5>
|
|
|
|
<p>For those applications that use time services to achieve realtime behavior, changing the value of the clock on which these
|
|
services rely may cause erroneous timing behavior. For these applications, it is necessary to have a monotonic clock which cannot
|
|
run backwards, and which has a maximum clock jump that is required to be documented by the implementation. Additionally, it is
|
|
desirable (but not required by IEEE Std 1003.1-2001) that the monotonic clock increases its value uniformly. This clock
|
|
should not be affected by changes to the system time; for example, to synchronize the clock with an external source or to account
|
|
for leap seconds. Such changes would cause errors in the measurement of time intervals for those time services that use the
|
|
absolute value of the clock.</p>
|
|
|
|
<p>One could argue that by defining the behavior of time services when the value of a clock is changed, deterministic realtime
|
|
behavior can be achieved. For example, one could specify that relative time services should be unaffected by changes in the value
|
|
of a clock. However, there are time services that are based upon an absolute time, but that are essentially intended as relative
|
|
time services. For example, <a href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> uses an absolute
|
|
time to allow it to wake up after the required interval despite spurious wakeups. Although sometimes the <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> timeouts are absolute in nature, there are many
|
|
occasions in which they are relative, and their absolute value is determined from the current time plus a relative time interval.
|
|
In this latter case, if the clock changes while the thread is waiting, the wait interval will not be the expected length. If a <a
|
|
href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> function were created that would take a
|
|
relative time, it would not solve the problem because to retain the intended "deadline" a thread would need to compensate for
|
|
latency due to the spurious wakeup, and preemption between wakeup and the next wait.</p>
|
|
|
|
<p>The solution is to create a new monotonic clock, whose value does not change except for the regular ticking of the clock, and
|
|
use this clock for implementing the various relative timeouts that appear in the different POSIX interfaces, as well as allow <a
|
|
href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> to choose this new clock for its timeout. A new
|
|
<a href="../functions/clock_nanosleep.html"><i>clock_nanosleep</i>()</a> function is created to allow an application to take
|
|
advantage of this newly defined clock. Notice that the monotonic clock may be implemented using the same hardware clock as the
|
|
system clock.</p>
|
|
|
|
<p>Relative timeouts for <a href="../functions/sigtimedwait.html"><i>sigtimedwait</i>()</a> and <a href=
|
|
"../functions/aio_suspend.html"><i>aio_suspend</i>()</a> have been redefined to use the monotonic clock, if present. The <a href=
|
|
"../functions/alarm.html"><i>alarm</i>()</a> function has not been redefined, because the same effect but with better resolution
|
|
can be achieved by creating a timer (for which the appropriate clock may be chosen).</p>
|
|
|
|
<p>The <a href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> function has been treated in a
|
|
different way, compared to other functions with absolute timeouts, because it is used to wait for an event, and thus it may have a
|
|
deadline, while the other timeouts are generally used as an error recovery mechanism, and for them the use of the monotonic clock
|
|
is not so important. Since the desired timeout for the <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> function may either be a relative interval or an
|
|
absolute time of day deadline, a new initialization attribute has been created for condition variables to specify the clock that is
|
|
used for measuring the timeout in a call to <a href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a>.
|
|
In this way, if a relative timeout is desired, the monotonic clock will be used; if an absolute deadline is required instead, the
|
|
CLOCK_REALTIME or another appropriate clock may be used. This capability has not been added to other functions with absolute
|
|
timeouts because for those functions the expected use of the timeout is mostly to prevent errors, and not so often to meet precise
|
|
deadlines. As a consequence, the complexity of adding this capability is not justified by its perceived application usage.</p>
|
|
|
|
<p>The <a href="../functions/nanosleep.html"><i>nanosleep</i>()</a> function has not been modified with the introduction of the
|
|
monotonic clock. Instead, a new <a href="../functions/clock_nanosleep.html"><i>clock_nanosleep</i>()</a> function has been created,
|
|
in which the desired clock may be specified in the function call.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>History of Resolution Issues</p>
|
|
|
|
<p>Due to the shift from relative to absolute timeouts in IEEE Std 1003.1d-1999, the amendments to the <a href=
|
|
"../functions/sem_timedwait.html"><i>sem_timedwait</i>()</a>, <a href=
|
|
"../functions/pthread_mutex_timedlock.html"><i>pthread_mutex_timedlock</i>()</a>, <a href=
|
|
"../functions/mq_timedreceive.html"><i>mq_timedreceive</i>()</a>, and <a href=
|
|
"../functions/mq_timedsend.html"><i>mq_timedsend</i>()</a> functions of that standard have been removed. Those amendments specified
|
|
that CLOCK_MONOTONIC would be used for the (relative) timeouts if the Monotonic Clock option was supported.</p>
|
|
|
|
<p>Having these functions continue to be tied solely to CLOCK_MONOTONIC would not work. Since the absolute value of a time value
|
|
obtained from CLOCK_MONOTONIC is unspecified, under the absolute timeouts interface, applications would behave differently
|
|
depending on whether the Monotonic Clock option was supported or not (because the absolute value of the clock would have different
|
|
meanings in either case).</p>
|
|
|
|
<p>Two options were considered:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Leave the current behavior unchanged, which specifies the CLOCK_REALTIME clock for these (absolute) timeouts, to allow
|
|
portability of applications between implementations supporting or not the Monotonic Clock option.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Modify these functions in the way that <a href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a>
|
|
was modified to allow a choice of clock, so that an application could use CLOCK_REALTIME when it is trying to achieve an absolute
|
|
timeout and CLOCK_MONOTONIC when it is trying to achieve a relative timeout.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>It was decided that the features of CLOCK_MONOTONIC are not as critical to these functions as they are to <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a>. The <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> function is given a relative timeout; the timeout
|
|
may represent a deadline for an event. When these functions are given relative timeouts, the timeouts are typically for error
|
|
recovery purposes and need not be so precise.</p>
|
|
|
|
<p>Therefore, it was decided that these functions should be tied to CLOCK_REALTIME and not complicated by being given a choice of
|
|
clock.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_20"></a>Execution Time Monitoring</h5>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Introduction</p>
|
|
|
|
<p>The main goals of the execution time monitoring facilities defined in this chapter are to measure the execution time of
|
|
processes and threads and to allow an application to establish CPU time limits for these entities.</p>
|
|
|
|
<p>The analysis phase of time-critical realtime systems often relies on the measurement of execution times of individual threads or
|
|
processes to determine whether the timing requirements will be met. Also, performance analysis techniques for soft deadline
|
|
realtime systems rely heavily on the determination of these execution times. The execution time monitoring functions provide
|
|
application developers with the ability to measure these execution times online and open the possibility of dynamic execution-time
|
|
analysis and system reconfiguration, if required.</p>
|
|
|
|
<p>The second goal of allowing an application to establish execution time limits for individual processes or threads and detecting
|
|
when they overrun allows program robustness to be increased by enabling online checking of the execution times.</p>
|
|
|
|
<p>If errors are detected-possibly because of erroneous program constructs, the existence of errors in the analysis phase, or a
|
|
burst of event arrivals-online detection and recovery is possible in a portable way. This feature can be extremely important for
|
|
many time-critical applications. Other applications require trapping CPU-time errors as a normal way to exit an algorithm; for
|
|
instance, some realtime artificial intelligence applications trigger a number of independent inference processes of varying
|
|
accuracy and speed, limit how long they can run, and pick the best answer available when time runs out. In many periodic systems,
|
|
overrun processes are simply restarted in the next resource period, after necessary end-of-period actions have been taken. This
|
|
allows algorithms that are inherently data-dependent to be made predictable.</p>
|
|
|
|
<p>The interface that appears in this chapter defines a new type of clock, the CPU-time clock, which measures execution time. Each
|
|
process or thread can invoke the clock and timer functions defined in POSIX.1 to use them. Functions are also provided to access
|
|
the CPU-time clock of other processes or threads to enable remote monitoring of these clocks. Monitoring of threads of other
|
|
processes is not supported, since these threads are not visible from outside of their own process with the interfaces defined in
|
|
POSIX.1.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Execution Time Monitoring Interface</p>
|
|
|
|
<p>The clock and timer interface defined in POSIX.1 historically only defined one clock, which measures wall-clock time. The
|
|
requirements for measuring execution time of processes and threads, and setting limits to their execution time by detecting when
|
|
they overrun, can be accomplished with that interface if a new kind of clock is defined. These new clocks measure execution time,
|
|
and one is associated with each process and with each thread. The clock functions currently defined in POSIX.1 can be used to read
|
|
and set these CPU-time clocks, and timers can be created using these clocks as their timing base. These timers can then be used to
|
|
send a signal when some specified execution time has been exceeded. The CPU-time clocks of each process or thread can be accessed
|
|
by using the symbols CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID.</p>
|
|
|
|
<p>The clock and timer interface defined in POSIX.1 and extended with the new kind of CPU-time clock would only allow processes or
|
|
threads to access their own CPU-time clocks. However, many realtime systems require the possibility of monitoring the execution
|
|
time of processes or threads from independent monitoring entities. In order to allow applications to construct independent
|
|
monitoring entities that do not require cooperation from or modification of the monitored entities, two functions have been added:
|
|
<a href="../functions/clock_getcpuclockid.html"><i>clock_getcpuclockid</i>()</a>, for accessing CPU-time clocks of other processes,
|
|
and <a href="../functions/pthread_getcpuclockid.html"><i>pthread_getcpuclockid</i>()</a>, for accessing CPU-time clocks of other
|
|
threads. These functions return the clock identifier associated with the process or thread specified in the call. These clock IDs
|
|
can then be used in the rest of the clock function calls.</p>
|
|
|
|
<p>The clocks accessed through these functions could also be used as a timing base for the creation of timers, thereby allowing
|
|
independent monitoring entities to limit the CPU time consumed by other entities. However, this possibility would imply additional
|
|
complexity and overhead because of the need to maintain a timer queue for each process or thread, to store the different expiration
|
|
times associated with timers created by different processes or threads. The working group decided this additional overhead was not
|
|
justified by application requirements. Therefore, creation of timers attached to the CPU-time clocks of other processes or threads
|
|
has been specified as implementation-defined.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Overhead Considerations</p>
|
|
|
|
<p>The measurement of execution time may introduce additional overhead in the thread scheduling, because of the need to keep track
|
|
of the time consumed by each of these entities. In library-level implementations of threads, the efficiency of scheduling could be
|
|
somehow compromised because of the need to make a kernel call, at each context switch, to read the process CPU-time clock.
|
|
Consequently, a thread creation attribute called <i>cpu-clock-requirement</i> was defined, to allow threads to disconnect their
|
|
respective CPU-time clocks. However, the Ballot Group considered that this attribute itself introduced some overhead, and that in
|
|
current implementations it was not worth the effort. Therefore, the attribute was deleted, and thus thread CPU-time clocks are
|
|
required for all threads if the Thread CPU-Time Clocks option is supported.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Accuracy of CPU-Time Clocks</p>
|
|
|
|
<p>The mechanism used to measure the execution time of processes and threads is specified in IEEE Std 1003.1-2001 as
|
|
implementation-defined. The reason for this is that both the underlying hardware and the implementation architecture have a very
|
|
strong influence on the accuracy achievable for measuring CPU time. For some implementations, the specification of strict accuracy
|
|
requirements would represent very large overheads, or even the impossibility of being implemented.</p>
|
|
|
|
<p>Since the mechanism for measuring execution time is implementation-defined, realtime applications will be able to take advantage
|
|
of accurate implementations using a portable interface. Of course, strictly conforming applications cannot rely on any particular
|
|
degree of accuracy, in the same way as they cannot rely on a very accurate measurement of wall clock time. There will always exist
|
|
applications whose accuracy or efficiency requirements on the implementation are more rigid than the values defined in
|
|
IEEE Std 1003.1-2001 or any other standard.</p>
|
|
|
|
<p>In any case, there is a minimum set of characteristics that realtime applications would expect from most implementations. One
|
|
such characteristic is that the sum of all the execution times of all the threads in a process equals the process execution time,
|
|
when no CPU-time clocks are disabled. This need not always be the case because implementations may differ in how they account for
|
|
time during context switches. Another characteristic is that the sum of the execution times of all processes in a system equals the
|
|
number of processors, multiplied by the elapsed time, assuming that no processor is idle during that elapsed time. However, in some
|
|
implementations it might not be possible to relate CPU time to elapsed time. For example, in a heterogeneous multi-processor system
|
|
in which each processor runs at a different speed, an implementation may choose to define each "second" of CPU time to be a
|
|
certain number of "cycles" that a CPU has executed.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Existing Practice</p>
|
|
|
|
<p>Measuring and limiting the execution time of each concurrent activity are common features of most industrial implementations of
|
|
realtime systems. Almost all critical realtime systems are currently built upon a cyclic executive. With this approach, a regular
|
|
timer interrupt kicks off the next sequence of computations. It also checks that the current sequence has completed. If it has not,
|
|
then some error recovery action can be undertaken (or at least an overrun is avoided). Current software engineering principles and
|
|
the increasing complexity of software are driving application developers to implement these systems on multi-threaded or
|
|
multi-process operating systems. Therefore, if a POSIX operating system is to be used for this type of application, then it must
|
|
offer the same level of protection.</p>
|
|
|
|
<p>Execution time clocks are also common in most UNIX implementations, although these clocks usually have requirements different
|
|
from those of realtime applications. The POSIX.1 <a href="../functions/times.html"><i>times</i>()</a> function supports the
|
|
measurement of the execution time of the calling process, and its terminated child processes. This execution time is measured in
|
|
clock ticks and is supplied as two different values with the user and system execution times, respectively. BSD supports the
|
|
function <a href="../functions/getrusage.html"><i>getrusage</i>()</a>, which allows the calling process to get information about
|
|
the resources used by itself and/or all of its terminated child processes. The resource usage includes user and system CPU time.
|
|
Some UNIX systems have options to specify high resolution (up to one microsecond) CPU-time clocks using the <a href=
|
|
"../functions/times.html"><i>times</i>()</a> or the <a href="../functions/getrusage.html"><i>getrusage</i>()</a> functions.</p>
|
|
|
|
<p>The <a href="../functions/times.html"><i>times</i>()</a> and <a href="../functions/getrusage.html"><i>getrusage</i>()</a>
|
|
interfaces do not meet important realtime requirements, such as the possibility of monitoring execution time from a different
|
|
process or thread, or the possibility of detecting an execution time overrun. The latter requirement is supported in some UNIX
|
|
implementations that are able to send a signal when the execution time of a process has exceeded some specified value. For example,
|
|
BSD defines the functions <a href="../functions/getitimer.html"><i>getitimer</i>()</a> and <a href=
|
|
"../functions/setitimer.html"><i>setitimer</i>()</a>, which can operate either on a realtime clock (wall-clock), or on virtual-time
|
|
or profile-time clocks which measure CPU time in two different ways. These functions do not support access to the execution time of
|
|
other processes.</p>
|
|
|
|
<p>IBM's MVS operating system supports per-process and per-thread execution time clocks. It also supports limiting the execution
|
|
time of a given process.</p>
|
|
|
|
<p>Given all this existing practice, the working group considered that the POSIX.1 clocks and timers interface was appropriate to
|
|
meet most of the requirements that realtime applications have for execution time clocks. Functions were added to get the CPU time
|
|
clock IDs, and to allow/disallow the thread CPU-time clocks (in order to preserve the efficiency of some implementations of
|
|
threads).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Clock Constants</p>
|
|
|
|
<p>The definition of the manifest constants CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID allows processes or threads,
|
|
respectively, to access their own execution-time clocks. However, given a process or thread, access to its own execution-time clock
|
|
is also possible if the clock ID of this clock is obtained through a call to <a href=
|
|
"../functions/clock_getcpuclockid.html"><i>clock_getcpuclockid</i>()</a> or <a href=
|
|
"../functions/pthread_getcpuclockid.html"><i>pthread_getcpuclockid</i>()</a>. Therefore, these constants are not necessary and
|
|
could be deleted to make the interface simpler. Their existence saves one system call in the first access to the CPU-time clock of
|
|
each process or thread. The working group considered this issue and decided to leave the constants in
|
|
IEEE Std 1003.1-2001 because they are closer to the POSIX.1b use of clock identifiers.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Library Implementations of Threads</p>
|
|
|
|
<p>In library implementations of threads, kernel entities and library threads can coexist. In this case, if the CPU-time clocks are
|
|
supported, most of the clock and timer functions will need to have two implementations: one in the thread library, and one in the
|
|
system calls library. The main difference between these two implementations is that the thread library implementation will have to
|
|
deal with clocks and timers that reside in the thread space, while the kernel implementation will operate on timers and clocks that
|
|
reside in kernel space. In the library implementation, if the clock ID refers to a clock that resides in the kernel, a kernel call
|
|
will have to be made. The correct version of the function can be chosen by specifying the appropriate order for the libraries
|
|
during the link process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>History of Resolution Issues: Deletion of the <i>enable</i> Attribute</p>
|
|
|
|
<p>In early proposals, consideration was given to inclusion of an attribute called <i>enable</i> for CPU-time clocks. This would
|
|
allow implementations to avoid the overhead of measuring execution time for those processes or threads for which this measurement
|
|
was not required. However, this is unnecessary since processes are already required to measure execution time by the POSIX.1 <a
|
|
href="../functions/times.html"><i>times</i>()</a> function. Consequently, the <i>enable</i> attribute is not present.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_08_21"></a>Rationale Relating to Timeouts</h5>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Requirements for Timeouts</p>
|
|
|
|
<p>Realtime systems which must operate reliably over extended periods without human intervention are characteristic in embedded
|
|
applications such as avionics, machine control, and space exploration, as well as more mundane applications such as cable TV,
|
|
security systems, and plant automation. A multi-tasking paradigm, in which many independent and/or cooperating software functions
|
|
relinquish the processor(s) while waiting for a specific stimulus, resource, condition, or operation completion, is very useful in
|
|
producing well engineered programs for such systems. For such systems to be robust and fault-tolerant, expected occurrences that
|
|
are unduly delayed or that never occur must be detected so that appropriate recovery actions may be taken. This is difficult if
|
|
there is no way for a task to regain control of a processor once it has relinquished control (blocked) awaiting an occurrence
|
|
which, perhaps because of corrupted code, hardware malfunction, or latent software bugs, will not happen when expected. Therefore,
|
|
the common practice in realtime operating systems is to provide a capability to time out such blocking services. Although there are
|
|
several methods to achieve this already defined by POSIX, none are as reliable or efficient as initiating a timeout simultaneously
|
|
with initiating a blocking service. This is especially critical in hard-realtime embedded systems because the processors typically
|
|
have little time reserve, and allowed fault recovery times are measured in milliseconds rather than seconds.</p>
|
|
|
|
<p>The working group largely agreed that such timeouts were necessary and ought to become part of IEEE Std 1003.1-2001,
|
|
particularly vendors of realtime operating systems whose customers had already expressed a strong need for timeouts. There was some
|
|
resistance to inclusion of timeouts in IEEE Std 1003.1-2001 because the desired effect, fault tolerance, could, in
|
|
theory, be achieved using existing facilities and alternative software designs, but there was no compelling evidence that realtime
|
|
system designers would embrace such designs at the sacrifice of performance and/or simplicity.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Which Services should be Timed Out?</p>
|
|
|
|
<p>Originally, the working group considered the prospect of providing timeouts on all blocking services, including those currently
|
|
existing in POSIX.1, POSIX.1b, and POSIX.1c, and future interfaces to be defined by other working groups, as sort of a general
|
|
policy. This was rather quickly rejected because of the scope of such a change, and the fact that many of those services would not
|
|
normally be used in a realtime context. More traditional timesharing solutions to timeout would suffice for most of the POSIX.1
|
|
interfaces, while others had asynchronous alternatives which, while more complex to utilize, would be adequate for some realtime
|
|
and all non-realtime applications.</p>
|
|
|
|
<p>The list of potential candidates for timeouts was narrowed to the following for further consideration:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>POSIX.1b</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p><a href="../functions/sem_wait.html"><i>sem_wait</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/mq_receive.html"><i>mq_receive</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/mq_send.html"><i>mq_send</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/lio_listio.html"><i>lio_listio</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/aio_suspend.html"><i>aio_suspend</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/sigwait.html"><i>sigwait</i>()</a> (timeout already implemented by <a href=
|
|
"../functions/sigtimedwait.html"><i>sigtimedwait</i>()</a>)</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>POSIX.1c</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p><a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/pthread_join.html"><i>pthread_join</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/pthread_cond_wait.html"><i>pthread_cond_wait</i>()</a> (timeout already implemented by <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a>)</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>POSIX.1</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p><a href="../functions/read.html"><i>read</i>()</a></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/write.html"><i>write</i>()</a></p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>After further review by the working group, the <a href="../functions/lio_listio.html"><i>lio_listio</i>()</a>, <a href=
|
|
"../functions/read.html"><i>read</i>()</a>, and <a href="../functions/write.html"><i>write</i>()</a> functions (all forms of
|
|
blocking synchronous I/O) were eliminated from the list because of the following:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Asynchronous alternatives exist</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Timeouts can be implemented, albeit non-portably, in device drivers</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A strong desire not to introduce modifications to POSIX.1 interfaces</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>The working group ultimately rejected <a href="../functions/pthread_join.html"><i>pthread_join</i>()</a> since both that
|
|
interface and a timed variant of that interface are non-minimal and may be implemented as a function. See below for a library
|
|
implementation of <a href="../functions/pthread_join.html"><i>pthread_join</i>()</a>.</p>
|
|
|
|
<p>Thus, there was a consensus among the working group members to add timeouts to 4 of the remaining 5 functions (the timeout for
|
|
<a href="../functions/aio_suspend.html"><i>aio_suspend</i>()</a> was ultimately added directly to POSIX.1b, while the others were
|
|
added by POSIX.1d). However, <a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a> remained
|
|
contentious.</p>
|
|
|
|
<p>Many feel that <a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a> falls into the same class as the
|
|
other functions; that is, it is desirable to time out a mutex lock because a mutex may fail to be unlocked due to errant or
|
|
corrupted code in a critical section (looping or branching outside of the unlock code), and therefore is equally in need of a
|
|
reliable, simple, and efficient timeout. In fact, since mutexes are intended to guard small critical sections, most <a href=
|
|
"../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a> calls would be expected to obtain the lock without blocking
|
|
nor utilizing any kernel service, even in implementations of threads with global contention scope; the timeout alternative need
|
|
only be considered after it is determined that the thread must block.</p>
|
|
|
|
<p>Those opposed to timing out mutexes feel that the very simplicity of the mutex is compromised by adding a timeout semantic, and
|
|
that to do so is senseless. They claim that if a timed mutex is really deemed useful by a particular application, then it can be
|
|
constructed from the facilities already in POSIX.1b and POSIX.1c. The following two C-language library implementations of mutex
|
|
locking with timeout represent the solutions offered (in both implementations, the timeout parameter is specified as absolute time,
|
|
not relative time as in the proposed POSIX.1c interfaces).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Spinlock Implementation</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#include <pthread.h>
|
|
#include <time.h>
|
|
#include <errno.h>
|
|
<br>
|
|
int pthread_mutex_timedlock(pthread_mutex_t *mutex,
|
|
const struct timespec *timeout)
|
|
{
|
|
struct timespec timenow;
|
|
<br>
|
|
while (pthread_mutex_trylock(mutex) == EBUSY)
|
|
{
|
|
clock_gettime(CLOCK_REALTIME, &timenow);
|
|
if (timespec_cmp(&timenow,timeout) >= 0)
|
|
{
|
|
return ETIMEDOUT;
|
|
}
|
|
pthread_yield();
|
|
}
|
|
return 0;
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The Spinlock implementation is generally unsuitable for any application using priority-based thread scheduling policies such as
|
|
SCHED_FIFO or SCHED_RR, since the mutex could currently be held by a thread of lower priority within the same allocation domain,
|
|
but since the waiting thread never blocks, only threads of equal or higher priority will ever run, and the mutex cannot be
|
|
unlocked. Setting priority inheritance or priority ceiling protocol on the mutex does not solve this problem, since the priority of
|
|
a mutex owning thread is only boosted if higher priority threads are blocked waiting for the mutex; clearly not the case for this
|
|
spinlock.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Condition Wait Implementation</p>
|
|
|
|
<pre>
|
|
<tt>#include <pthread.h>
|
|
#include <time.h>
|
|
#include <errno.h>
|
|
<br>
|
|
struct timed_mutex
|
|
{
|
|
int locked;
|
|
pthread_mutex_t mutex;
|
|
pthread_cond_t cond;
|
|
};
|
|
typedef struct timed_mutex timed_mutex_t;
|
|
<br>
|
|
int timed_mutex_lock(timed_mutex_t *tm,
|
|
const struct timespec *timeout)
|
|
{
|
|
int timedout=FALSE;
|
|
int error_status;
|
|
<br>
|
|
pthread_mutex_lock(&tm->mutex);
|
|
<br>
|
|
while (tm->locked && !timedout)
|
|
{
|
|
if ((error_status=pthread_cond_timedwait(&tm->cond,
|
|
&tm->mutex,
|
|
timeout))!=0)
|
|
{
|
|
if (error_status==ETIMEDOUT) timedout = TRUE;
|
|
}
|
|
}
|
|
<br>
|
|
if(timedout)
|
|
{
|
|
pthread_mutex_unlock(&tm->mutex);
|
|
return ETIMEDOUT;
|
|
}
|
|
else
|
|
{
|
|
tm->locked = TRUE;
|
|
pthread_mutex_unlock(&tm->mutex);
|
|
return 0;
|
|
}
|
|
}
|
|
<br>
|
|
void timed_mutex_unlock(timed_mutex_t *tm)
|
|
{
|
|
pthread_mutex_lock(&tm->mutex); / for case assignment not atomic /
|
|
tm->locked = FALSE;
|
|
pthread_mutex_unlock(&tm->mutex);
|
|
pthread_cond_signal(&tm->cond);
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The Condition Wait implementation effectively substitutes the <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> function (which is currently timed out) for the
|
|
desired <a href="../functions/pthread_mutex_timedlock.html"><i>pthread_mutex_timedlock</i>()</a>. Since waits on condition
|
|
variables currently do not include protocols which avoid priority inversion, this method is generally unsuitable for realtime
|
|
applications because it does not provide the same priority inversion protection as the untimed <a href=
|
|
"../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a>. Also, for any given implementations of the current mutex
|
|
and condition variable primitives, this library implementation has a performance cost at least 2.5 times that of the untimed <a
|
|
href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a> even in the case where the timed mutex is readily
|
|
locked without blocking (the interfaces required for this case are shown in bold). Even in uniprocessors or where assignment is
|
|
atomic, at least an additional <a href="../functions/pthread_cond_signal.html"><i>pthread_cond_signal</i>()</a> is required. <a
|
|
href="../functions/pthread_mutex_timedlock.html"><i>pthread_mutex_timedlock</i>()</a> could be implemented at effectively no
|
|
performance penalty in this case because the timeout parameters need only be considered after it is determined that the mutex
|
|
cannot be locked immediately.</p>
|
|
|
|
<p>Thus it has not yet been shown that the full semantics of mutex locking with timeout can be efficiently and reliably achieved
|
|
using existing interfaces. Even if the existence of an acceptable library implementation were proven, it is difficult to justify
|
|
why the interface itself should not be made portable, especially considering approval for the other four timeouts.<br>
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Rationale for Library Implementation of <i>pthread_timedjoin</i>()</p>
|
|
|
|
<p>Library implementation of <i>pthread_timedjoin</i>():</p>
|
|
|
|
<pre>
|
|
<tt>/*
|
|
* Construct a thread variety entirely from existing functions
|
|
* with which a join can be done, allowing the join to time out.
|
|
*/
|
|
#include <pthread.h>
|
|
#include <time.h>
|
|
<br>
|
|
struct timed_thread {
|
|
pthread_t t;
|
|
pthread_mutex_t m;
|
|
int exiting;
|
|
pthread_cond_t exit_c;
|
|
void *(*start_routine)(void *arg);
|
|
void *arg;
|
|
void *status;
|
|
};
|
|
<br>
|
|
typedef struct timed_thread *timed_thread_t;
|
|
static pthread_key_t timed_thread_key;
|
|
static pthread_once_t timed_thread_once = PTHREAD_ONCE_INIT;
|
|
<br>
|
|
static void timed_thread_init()
|
|
{
|
|
pthread_key_create(&timed_thread_key, NULL);
|
|
}
|
|
<br>
|
|
static void *timed_thread_start_routine(void *args)
|
|
<br>
|
|
/*
|
|
* Routine to establish thread-specific data value and run the actual
|
|
* thread start routine which was supplied to timed_thread_create().
|
|
*/
|
|
{
|
|
timed_thread_t tt = (timed_thread_t) args;
|
|
<br>
|
|
pthread_once(&timed_thread_once, timed_thread_init);
|
|
pthread_setspecific(timed_thread_key, (void *)tt);
|
|
timed_thread_exit((tt->start_routine)(tt->arg));
|
|
}
|
|
<br>
|
|
int timed_thread_create(timed_thread_t ttp, const pthread_attr_t *attr,
|
|
void *(*start_routine)(void *), void *arg)
|
|
<br>
|
|
/*
|
|
* Allocate a thread which can be used with timed_thread_join().
|
|
*/
|
|
{
|
|
timed_thread_t tt;
|
|
int result;
|
|
<br>
|
|
tt = (timed_thread_t) malloc(sizeof(struct timed_thread));
|
|
pthread_mutex_init(&tt->m,NULL);
|
|
tt->exiting = FALSE;
|
|
pthread_cond_init(&tt->exit_c,NULL);
|
|
tt->start_routine = start_routine;
|
|
tt->arg = arg;
|
|
tt->status = NULL;
|
|
<br>
|
|
if ((result = pthread_create(&tt->t, attr,
|
|
timed_thread_start_routine, (void *)tt)) != 0) {
|
|
free(tt);
|
|
return result;
|
|
}
|
|
<br>
|
|
pthread_detach(tt->t);
|
|
ttp = tt;
|
|
return 0;
|
|
}
|
|
<br>
|
|
int timed_thread_join(timed_thread_t tt,
|
|
struct timespec *timeout,
|
|
void **status)
|
|
{
|
|
int result;
|
|
<br>
|
|
pthread_mutex_lock(&tt->m);
|
|
result = 0;
|
|
/*
|
|
* Wait until the thread announces that it is exiting,
|
|
* or until timeout.
|
|
*/
|
|
while (result == 0 && ! tt->exiting) {
|
|
result = pthread_cond_timedwait(&tt->exit_c, &tt->m, timeout);
|
|
}
|
|
pthread_mutex_unlock(&tt->m);
|
|
if (result == 0 && tt->exiting) {
|
|
*status = tt->status;
|
|
free((void *)tt);
|
|
return result;
|
|
}
|
|
return result;
|
|
}
|
|
<br>
|
|
void timed_thread_exit(void *status)
|
|
{
|
|
timed_thread_t tt;
|
|
void *specific;
|
|
<br>
|
|
if ((specific=pthread_getspecific(timed_thread_key)) == NULL){
|
|
/*
|
|
* Handle cases which won't happen with correct usage.
|
|
*/
|
|
pthread_exit( NULL);
|
|
}
|
|
tt = (timed_thread_t) specific;
|
|
pthread_mutex_lock(&tt->m);
|
|
/*
|
|
* Tell a joiner that we're exiting.
|
|
*/
|
|
tt->status = status;
|
|
tt->exiting = TRUE;
|
|
pthread_cond_signal(&tt->exit_c);
|
|
pthread_mutex_unlock(&tt->m);
|
|
/*
|
|
* Call pthread exit() to call destructors and really
|
|
* exit the thread.
|
|
*/
|
|
pthread_exit(NULL);
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The <a href="../functions/pthread_join.html"><i>pthread_join</i>()</a> C-language example shown above demonstrates that it is
|
|
possible, using existing pthread facilities, to construct a variety of thread which allows for joining such a thread, but which
|
|
allows the join operation to time out. It does this by using a <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> to wait for the thread to exit. A
|
|
<b>timed_thread_t</b> descriptor structure is used to pass parameters from the creating thread to the created thread, and from the
|
|
exiting thread to the joining thread. This implementation is roughly equivalent to what a normal <a href=
|
|
"../functions/pthread_join.html"><i>pthread_join</i>()</a> implementation would do, with the single change being that <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> is used in place of a simple <a href=
|
|
"../functions/pthread_cond_wait.html"><i>pthread_cond_wait</i>()</a>.</p>
|
|
|
|
<p>Since it is possible to implement such a facility entirely from existing pthread interfaces, and with roughly equal efficiency
|
|
and complexity to an implementation which would be provided directly by a pthreads implementation, it was the consensus of the
|
|
working group members that any <i>pthread_timedjoin</i>() facility would be unnecessary, and should not be provided.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Form of the Timeout Interfaces</p>
|
|
|
|
<p>The working group considered a number of alternative ways to add timeouts to blocking services. At first, a system interface
|
|
which would specify a one-shot or persistent timeout to be applied to subsequent blocking services invoked by the calling process
|
|
or thread was considered because it allowed all blocking services to be timed out in a uniform manner with a single additional
|
|
interface; this was rather quickly rejected because it could easily result in the wrong services being timed out.</p>
|
|
|
|
<p>It was suggested that a timeout value might be specified as an attribute of the object (semaphore, mutex, message queue, and so
|
|
on), but there was no consensus on this, either on a case-by-case basis or for all timeouts.</p>
|
|
|
|
<p>Looking at the two existing timeouts for blocking services indicates that the working group members favor a separate interface
|
|
for the timed version of a function. However, <a href=
|
|
"../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> utilizes an absolute timeout value while <a href=
|
|
"../functions/sigtimedwait.html"><i>sigtimedwait</i>()</a> uses a relative timeout value. The working group members agreed that
|
|
relative timeout values are appropriate where the timeout mechanism's primary use was to deal with an unexpected or error
|
|
situation, but they are inappropriate when the timeout must expire at a particular time, or before a specific deadline. For the
|
|
timeouts being introduced in IEEE Std 1003.1-2001, the working group considered allowing both relative and absolute
|
|
timeouts as is done with POSIX.1b timers, but ultimately favored the simpler absolute timeout form.</p>
|
|
|
|
<p>An absolute time measure can be easily implemented on top of an interface that specifies relative time, by reading the clock,
|
|
calculating the difference between the current time and the desired wake-up time, and issuing a relative timeout call. But there is
|
|
a race condition with this approach because the thread could be preempted after reading the clock, but before making the timed-out
|
|
call; in this case, the thread would be awakened later than it should and, thus, if the wake-up time represented a deadline, it
|
|
would miss it.</p>
|
|
|
|
<p>There is also a race condition when trying to build a relative timeout on top of an interface that specifies absolute timeouts.
|
|
In this case, the clock would have to be read to calculate the absolute wake-up time as the sum of the current time plus the
|
|
relative timeout interval. In this case, if the thread is preempted after reading the clock but before making the timed-out call,
|
|
the thread would be awakened earlier than desired.</p>
|
|
|
|
<p>But the race condition with the absolute timeouts interface is not as bad as the one that happens with the relative timeout
|
|
interface, because there are simple workarounds. For the absolute timeouts interface, if the timing requirement is a deadline, the
|
|
deadline can still be met because the thread woke up earlier than the deadline. If the timeout is just used as an error recovery
|
|
mechanism, the precision of timing is not really important. If the timing requirement is that between actions A and B a minimum
|
|
interval of time must elapse, the absolute timeout interface can be safely used by reading the clock after action A has been
|
|
started. It could be argued that, since the call with the absolute timeout is atomic from the application point of view, it is not
|
|
possible to read the clock after action A, if this action is part of the timed-out call. But looking at the nature of the calls for
|
|
which timeouts are specified (locking a mutex, waiting for a semaphore, waiting for a message, or waiting until there is space in a
|
|
message queue), the timeouts that an application would build on these actions would not be triggered by these actions themselves,
|
|
but by some other external action. For example, if waiting for a message to arrive to a message queue, and waiting for at least 20
|
|
milliseconds, this time interval would start to be counted from some event that would trigger both the action that produces the
|
|
message, as well as the action that waits for the message to arrive, and not by the wait-for-message operation itself. In this
|
|
case, the workaround proposed above could be used.</p>
|
|
|
|
<p>For these reasons, the absolute timeout is preferred over the relative timeout interface.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h4><a name="tag_03_02_09"></a>Threads</h4>
|
|
|
|
<p>Threads will normally be more expensive than subroutines (or functions, routines, and so on) if specialized hardware support is
|
|
not provided. Nevertheless, threads should be sufficiently efficient to encourage their use as a medium to fine-grained structuring
|
|
mechanism for parallelism in an application. Structuring an application using threads then allows it to take immediate advantage of
|
|
any underlying parallelism available in the host environment. This means implementors are encouraged to optimize for fast execution
|
|
at the possible expense of efficient utilization of storage. For example, a common thread creation technique is to cache
|
|
appropriate thread data structures. That is, rather than releasing system resources, the implementation retains these resources and
|
|
reuses them when the program next asks to create a new thread. If this reuse of thread resources is to be possible, there has to be
|
|
very little unique state associated with each thread, because any such state has to be reset when the thread is reused.</p>
|
|
|
|
<h5><a name="tag_03_02_09_01"></a>Thread Creation Attributes</h5>
|
|
|
|
<p>Attributes objects are provided for threads, mutexes, and condition variables as a mechanism to support probable future
|
|
standardization in these areas without requiring that the interface itself be changed.</p>
|
|
|
|
<p>Attributes objects provide clean isolation of the configurable aspects of threads. For example, "stack size" is an important
|
|
attribute of a thread, but it cannot be expressed portably. When porting a threaded program, stack sizes often need to be adjusted.
|
|
The use of attributes objects can help by allowing the changes to be isolated in a single place, rather than being spread across
|
|
every instance of thread creation.</p>
|
|
|
|
<p>Attributes objects can be used to set up <i>classes</i> of threads with similar attributes; for example, "threads with large
|
|
stacks and high priority" or "threads with minimal stacks". These classes can be defined in a single place and then referenced
|
|
wherever threads need to be created. Changes to "class" decisions become straightforward, and detailed analysis of each <a href=
|
|
"../functions/pthread_create.html"><i>pthread_create</i>()</a> call is not required.</p>
|
|
|
|
<p>The attributes objects are defined as opaque types as an aid to extensibility. If these objects had been specified as
|
|
structures, adding new attributes would force recompilation of all multi-threaded programs when the attributes objects are
|
|
extended; this might not be possible if different program components were supplied by different vendors.</p>
|
|
|
|
<p>Additionally, opaque attributes objects present opportunities for improving performance. Argument validity can be checked once
|
|
when attributes are set, rather than each time a thread is created. Implementations will often need to cache kernel objects that
|
|
are expensive to create. Opaque attributes objects provide an efficient mechanism to detect when cached objects become invalid due
|
|
to attribute changes.</p>
|
|
|
|
<p>Because assignment is not necessarily defined on a given opaque type, implementation-defined default values cannot be defined in
|
|
a portable way. The solution to this problem is to allow attribute objects to be initialized dynamically by attributes object
|
|
initialization functions, so that default values can be supplied automatically by the implementation.</p>
|
|
|
|
<p>The following proposal was provided as a suggested alternative to the supplied attributes:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Maintain the style of passing a parameter formed by the bitwise-inclusive OR of flags to the initialization routines ( <a href=
|
|
"../functions/pthread_create.html"><i>pthread_create</i>()</a>, <a href=
|
|
"../functions/pthread_mutex_init.html"><i>pthread_mutex_init</i>()</a>, <a href=
|
|
"../functions/pthread_cond_init.html"><i>pthread_cond_init</i>()</a>). The parameter containing the flags should be an opaque type
|
|
for extensibility. If no flags are set in the parameter, then the objects are created with default characteristics. An
|
|
implementation may specify implementation-defined flag values and associated behavior.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If further specialization of mutexes and condition variables is necessary, implementations may specify additional procedures
|
|
that operate on the <b>pthread_mutex_t</b> and <b>pthread_cond_t</b> objects (instead of on attributes objects).</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>The difficulties with this solution are:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>A bitmask is not opaque if bits have to be set into bit-vector attributes objects using explicitly-coded bitwise-inclusive OR
|
|
operations. If the set of options exceeds an <b>int</b>, application programmers need to know the location of each bit. If bits are
|
|
set or read by encapsulation (that is, <i>get*</i>() or
|
|
<i>set*</i>() functions), then the bitmask is merely an implementation of attributes objects as
|
|
currently defined and should not be exposed to the programmer.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Many attributes are not Boolean or very small integral values. For example, scheduling policy may be placed in 3 bits or 4 bits,
|
|
but priority requires 5 bits or more, thereby taking up at least 8 bits out of a possible 16 bits on machines with 16-bit integers.
|
|
Because of this, the bitmask can only reasonably control whether particular attributes are set or not, and it cannot serve as the
|
|
repository of the value itself. The value needs to be specified as a function parameter (which is non-extensible), or by setting a
|
|
structure field (which is non-opaque), or by <i>get*</i>() and
|
|
<i>set*</i>() functions (making the bitmask a redundant addition to the attributes objects).</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>Stack size is defined as an optional attribute because the very notion of a stack is inherently machine-dependent. Some
|
|
implementations may not be able to change the size of the stack, for example, and others may not need to because stack pages may be
|
|
discontiguous and can be allocated and released on demand.</p>
|
|
|
|
<p>The attribute mechanism has been designed in large measure for extensibility. Future extensions to the attribute mechanism or to
|
|
any attributes object defined in IEEE Std 1003.1-2001 have to be done with care so as not to affect
|
|
binary-compatibility.</p>
|
|
|
|
<p>Attribute objects, even if allocated by means of dynamic allocation functions such as <a href=
|
|
"../functions/malloc.html"><i>malloc</i>()</a>, may have their size fixed at compile time. This means, for example, a <a href=
|
|
"../functions/pthread_create.html"><i>pthread_create</i>()</a> in an implementation with extensions to the <b>pthread_attr_t</b>
|
|
cannot look beyond the area that the binary application assumes is valid. This suggests that implementations should maintain a size
|
|
field in the attributes object, as well as possibly version information, if extensions in different directions (possibly by
|
|
different vendors) are to be accommodated.</p>
|
|
|
|
<h5><a name="tag_03_02_09_02"></a>Thread Implementation Models</h5>
|
|
|
|
<p>There are various thread implementation models. At one end of the spectrum is the "library-thread model". In such a model, the
|
|
threads of a process are not visible to the operating system kernel, and the threads are not kernel-scheduled entities. The process
|
|
is the only kernel-scheduled entity. The process is scheduled onto the processor by the kernel according to the scheduling
|
|
attributes of the process. The threads are scheduled onto the single kernel-scheduled entity (the process) by the runtime library
|
|
according to the scheduling attributes of the threads. A problem with this model is that it constrains concurrency. Since there is
|
|
only one kernel-scheduled entity (namely, the process), only one thread per process can execute at a time. If the thread that is
|
|
executing blocks on I/O, then the whole process blocks.</p>
|
|
|
|
<p>At the other end of the spectrum is the "kernel-thread model". In this model, all threads are visible to the operating system
|
|
kernel. Thus, all threads are kernel-scheduled entities, and all threads can concurrently execute. The threads are scheduled onto
|
|
processors by the kernel according to the scheduling attributes of the threads. The drawback to this model is that the creation and
|
|
management of the threads entails operating system calls, as opposed to subroutine calls, which makes kernel threads heavier weight
|
|
than library threads.</p>
|
|
|
|
<p>Hybrids of these two models are common. A hybrid model offers the speed of library threads and the concurrency of kernel
|
|
threads. In hybrid models, a process has some (relatively small) number of kernel scheduled entities associated with it. It also
|
|
has a potentially much larger number of library threads associated with it. Some library threads may be bound to kernel-scheduled
|
|
entities, while the other library threads are multiplexed onto the remaining kernel-scheduled entities. There are two levels of
|
|
thread scheduling:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>The runtime library manages the scheduling of (unbound) library threads onto kernel-scheduled entities.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The kernel manages the scheduling of kernel-scheduled entities onto processors.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>For this reason, a hybrid model is referred to as a two-level threads scheduling model. In this model, the process can have
|
|
multiple concurrently executing threads; specifically, it can have as many concurrently executing threads as it has
|
|
kernel-scheduled entities.</p>
|
|
|
|
<h5><a name="tag_03_02_09_03"></a>Thread-Specific Data</h5>
|
|
|
|
<p>Many applications require that a certain amount of context be maintained on a per-thread basis across procedure calls. A common
|
|
example is a multi-threaded library routine that allocates resources from a common pool and maintains an active resource list for
|
|
each thread. The thread-specific data interface provided to meet these needs may be viewed as a two-dimensional array of values
|
|
with keys serving as the row index and thread IDs as the column index (although the implementation need not work this way).</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Models</p>
|
|
|
|
<p>Three possible thread-specific data models were considered:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>No Explicit Support</p>
|
|
|
|
<p>A standard thread-specific data interface is not strictly necessary to support applications that require per-thread context. One
|
|
could, for example, provide a hash function that converted a <b>pthread_t</b> into an integer value that could then be used to
|
|
index into a global array of per-thread data pointers. This hash function, in conjunction with <a href=
|
|
"../functions/pthread_self.html"><i>pthread_self</i>()</a>, would be all the interface required to support a mechanism of this
|
|
sort. Unfortunately, this technique is cumbersome. It can lead to duplicated code as each set of cooperating modules implements
|
|
their own per-thread data management schemes.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Single (<b>void</b> *) Pointer</p>
|
|
|
|
<p>Another technique would be to provide a single word of per-thread storage and a pair of functions to fetch and store the value
|
|
of this word. The word could then hold a pointer to a block of per-thread memory. The allocation, partitioning, and general use of
|
|
this memory would be entirely up to the application. Although this method is not as problematic as technique 1, it suffers from
|
|
interoperability problems. For example, all modules using the per-thread pointer would have to agree on a common usage
|
|
protocol.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Key/Value Mechanism</p>
|
|
|
|
<p>This method associates an opaque key (for example, stored in a variable of type <b>pthread_key_t</b>) with each per-thread
|
|
datum. These keys play the role of identifiers for per-thread data. This technique is the most generic and avoids the problems
|
|
noted above, albeit at the cost of some complexity.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>The primary advantage of the third model is its information hiding properties. Modules using this model are free to create and
|
|
use their own key(s) independent of all other such usage, whereas the other models require that all modules that use
|
|
thread-specific context explicitly cooperate with all other such modules. The data-independence provided by the third model is
|
|
worth the additional interface.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Requirements</p>
|
|
|
|
<p>It is important that it be possible to implement the thread-specific data interface without the use of thread private memory. To
|
|
do otherwise would increase the weight of each thread, thereby limiting the range of applications for which the threads interfaces
|
|
provided by IEEE Std 1003.1-2001 is appropriate.</p>
|
|
|
|
<p>The values that one binds to the key via <a href="../functions/pthread_setspecific.html"><i>pthread_setspecific</i>()</a> may,
|
|
in fact, be pointers to shared storage locations available to all threads. It is only the key/value bindings that are maintained on
|
|
a per-thread basis, and these can be kept in any portion of the address space that is reserved for use by the calling thread (for
|
|
example, on the stack). Thus, no per-thread MMU state is required to implement the interface. On the other hand, there is nothing
|
|
in the interface specification to preclude the use of a per-thread MMU state if it is available (for example, the key values
|
|
returned by <a href="../functions/pthread_key_create.html"><i>pthread_key_create</i>()</a> could be thread private memory
|
|
addresses).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Standardization Issues</p>
|
|
|
|
<p>Thread-specific data is a requirement for a usable thread interface. The binding described in this section provides a portable
|
|
thread-specific data mechanism for languages that do not directly support a thread-specific storage class. A binding to
|
|
IEEE Std 1003.1-2001 for a language that does include such a storage class need not provide this specific interface.</p>
|
|
|
|
<p>If a language were to include the notion of thread-specific storage, it would be desirable (but <i>not</i> required) to provide
|
|
an implementation of the pthreads thread-specific data interface based on the language feature. For example, assume that a compiler
|
|
for a C-like language supports a <i>private</i> storage class that provides thread-specific storage. Something similar to the
|
|
following macros might be used to effect a compatible implementation:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>#define pthread_key_t private void *
|
|
#define pthread_key_create(key) /* no-op */
|
|
#define pthread_setspecific(key,value) (key)=(value)
|
|
#define pthread_getspecific(key) (key)
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<basefont size="2">
|
|
|
|
<dl>
|
|
<dt><b>Note:</b></dt>
|
|
|
|
<dd>For the sake of clarity, this example ignores destructor functions. A correct implementation would have to support them.</dd>
|
|
</dl>
|
|
|
|
<basefont size="3"></li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_09_04"></a>Barriers</h5>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Background</p>
|
|
|
|
<p>Barriers are typically used in parallel DO/FOR loops to ensure that all threads have reached a particular stage in a parallel
|
|
computation before allowing any to proceed to the next stage. Highly efficient implementation is possible on machines which support
|
|
a "Fetch and Add" operation as described in the referenced Almasi and Gottlieb (1989).</p>
|
|
|
|
<p>The use of return value PTHREAD_BARRIER_SERIAL_THREAD is shown in the following example:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>if ( (status=pthread_barrier_wait(&barrier)) ==
|
|
PTHREAD_BARRIER_SERIAL_THREAD) {
|
|
...serial section
|
|
}
|
|
else if (status != 0) {
|
|
...error processing
|
|
}
|
|
status=pthread_barrier_wait(&barrier);
|
|
...
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>This behavior allows a serial section of code to be executed by one thread as soon as all threads reach the first barrier. The
|
|
second barrier prevents the other threads from proceeding until the serial section being executed by the one thread has
|
|
completed.</p>
|
|
|
|
<p>Although barriers can be implemented with mutexes and condition variables, the referenced Almasi and Gottlieb (1989) provides
|
|
ample illustration that such implementations are significantly less efficient than is possible. While the relative efficiency of
|
|
barriers may well vary by implementation, it is important that they be recognized in the IEEE Std 1003.1-2001 to
|
|
facilitate applications portability while providing the necessary freedom to implementors.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Lack of Timeout Feature</p>
|
|
|
|
<p>Alternate versions of most blocking routines have been provided to support watchdog timeouts. No alternate interface of this
|
|
sort has been provided for barrier waits for the following reasons:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Multiple threads may use different timeout values, some of which may be indefinite. It is not clear which threads should break
|
|
through the barrier with a timeout error if and when these timeouts expire.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The barrier may become unusable once a thread breaks out of a <a href=
|
|
"../functions/pthread_barrier_wait.html"><i>pthread_barrier_wait</i>()</a> with a timeout error. There is, in general, no way to
|
|
guarantee the consistency of a barrier's internal data structures once a thread has timed out of a <a href=
|
|
"../functions/pthread_barrier_wait.html"><i>pthread_barrier_wait</i>()</a>. Even the inclusion of a special barrier
|
|
reinitialization function would not help much since it is not clear how this function would affect the behavior of threads that
|
|
reach the barrier between the original timeout and the call to the reinitialization function.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_09_05"></a>Spin Locks</h5>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Background</p>
|
|
|
|
<p>Spin locks represent an extremely low-level synchronization mechanism suitable primarily for use on shared memory
|
|
multi-processors. It is typically an atomically modified Boolean value that is set to one when the lock is held and to zero when
|
|
the lock is freed.</p>
|
|
|
|
<p>When a caller requests a spin lock that is already held, it typically spins in a loop testing whether the lock has become
|
|
available. Such spinning wastes processor cycles so the lock should only be held for short durations and not across sleep/block
|
|
operations. Callers should unlock spin locks before calling sleep operations.</p>
|
|
|
|
<p>Spin locks are available on a variety of systems. The functions included in IEEE Std 1003.1-2001 are an attempt to
|
|
standardize that existing practice.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Lack of Timeout Feature</p>
|
|
|
|
<p>Alternate versions of most blocking routines have been provided to support watchdog timeouts. No alternate interface of this
|
|
sort has been provided for spin locks for the following reasons:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>It is impossible to determine appropriate timeout intervals for spin locks in a portable manner. The amount of time one can
|
|
expect to spend spin-waiting is inversely proportional to the degree of parallelism provided by the system.</p>
|
|
|
|
<p>It can vary from a few cycles when each competing thread is running on its own processor, to an indefinite amount of time when
|
|
all threads are multiplexed on a single processor (which is why spin locking is not advisable on uniprocessors).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>When used properly, the amount of time the calling thread spends waiting on a spin lock should be considerably less than the
|
|
time required to set up a corresponding watchdog timer. Since the primary purpose of spin locks is to provide a low-overhead
|
|
synchronization mechanism for multi-processors, the overhead of a timeout mechanism was deemed unacceptable.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>It was also suggested that an additional <i>count</i> argument be provided (on the <a href=
|
|
"../functions/pthread_spin_lock.html"><i>pthread_spin_lock</i>()</a> call) in <i>lieu</i> of a true timeout so that a spin lock
|
|
call could fail gracefully if it was unable to apply the lock after <i>count</i> attempts. This idea was rejected because it is not
|
|
existing practice. Furthermore, the same effect can be obtained with <a href=
|
|
"../functions/pthread_spin_trylock.html"><i>pthread_spin_trylock</i>()</a>, as illustrated below:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>int n = MAX_SPIN;
|
|
<br>
|
|
while ( --n >= 0 )
|
|
{
|
|
if ( !pthread_spin_try_lock(...) )
|
|
break;
|
|
}
|
|
if ( n >= 0 )
|
|
{
|
|
/* Successfully acquired the lock */
|
|
}
|
|
else
|
|
{
|
|
/* Unable to acquire the lock */
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>process-shared</i> Attribute</p>
|
|
|
|
<p>The initialization functions associated with most POSIX synchronization objects (for example, mutexes, barriers, and read-write
|
|
locks) take an attributes object with a <i>process-shared</i> attribute that specifies whether or not the object is to be shared
|
|
across processes. In the draft corresponding to the first balloting round, two separate initialization functions are provided for
|
|
spin locks, however: one for spin locks that were to be shared across processes ( <i>spin_init</i>()), and one for locks that were
|
|
only used by multiple threads within a single process ( <a href=
|
|
"../functions/pthread_spin_init.html"><i>pthread_spin_init</i>()</a>). This was done so as to keep the overhead associated with
|
|
spin waiting to an absolute minimum. However, the balloting group requested that, since the overhead associated to a bit check was
|
|
small, spin locks should be consistent with the rest of the synchronization primitives, and thus the <i>process-shared</i>
|
|
attribute was introduced for spin locks.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Spin Locks <i>versus</i> Mutexes</p>
|
|
|
|
<p>It has been suggested that mutexes are an adequate synchronization mechanism and spin locks are not necessary. Locking
|
|
mechanisms typically must trade off the processor resources consumed while setting up to block the thread and the processor
|
|
resources consumed by the thread while it is blocked. Spin locks require very little resources to set up the blocking of a thread.
|
|
Existing practice is to simply loop, repeating the atomic locking operation until the lock is available. While the resources
|
|
consumed to set up blocking of the thread are low, the thread continues to consume processor resources while it is waiting.</p>
|
|
|
|
<p>On the other hand, mutexes may be implemented such that the processor resources consumed to block the thread are large relative
|
|
to a spin lock. After detecting that the mutex lock is not available, the thread must alter its scheduling state, add itself to a
|
|
set of waiting threads, and, when the lock becomes available again, undo all of this before taking over ownership of the mutex.
|
|
However, while a thread is blocked by a mutex, no processor resources are consumed.</p>
|
|
|
|
<p>Therefore, spin locks and mutexes may be implemented to have different characteristics. Spin locks may have lower overall
|
|
overhead for very short-term blocking, and mutexes may have lower overall overhead when a thread will be blocked for longer periods
|
|
of time. The presence of both interfaces allows implementations with these two different characteristics, both of which may be
|
|
useful to a particular application.</p>
|
|
|
|
<p>It has also been suggested that applications can build their own spin locks from the <a href=
|
|
"../functions/pthread_mutex_trylock.html"><i>pthread_mutex_trylock</i>()</a> function:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>while (pthread_mutex_trylock(&mutex));
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The apparent simplicity of this construct is somewhat deceiving, however. While the actual wait is quite efficient, various
|
|
guarantees on the integrity of mutex objects (for example, priority inheritance rules) may add overhead to the successful path of
|
|
the trylock operation that is not required of spin locks. One could, of course, add an attribute to the mutex to bypass such
|
|
overhead, but the very act of finding and testing this attribute represents more overhead than is found in the typical spin
|
|
lock.</p>
|
|
|
|
<p>The need to hold spin lock overhead to an absolute minimum also makes it impossible to provide guarantees against starvation
|
|
similar to those provided for mutexes or read-write locks. The overhead required to implement such guarantees (for example,
|
|
disabling preemption before spinning) may well exceed the overhead of the spin wait itself by many orders of magnitude. If a
|
|
"safe" spin wait seems desirable, it can always be provided (albeit at some performance cost) via appropriate mutex
|
|
attributes.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_09_06"></a>XSI Supported Functions</h5>
|
|
|
|
<p>On XSI-conformant systems, the following symbolic constants are always defined:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
_POSIX_READER_WRITER_LOCKS
|
|
_POSIX_THREAD_ATTR_STACKADDR
|
|
_POSIX_THREAD_ATTR_STACKSIZE
|
|
_POSIX_THREAD_PROCESS_SHARED
|
|
_POSIX_THREADS
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>Therefore, the following threads functions are always supported:</p>
|
|
|
|
<table cellpadding="3">
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><br>
|
|
<a href="../functions/pthread_atfork.html"><i>pthread_atfork</i>()</a><br>
|
|
<a href="../functions/pthread_attr_destroy.html"><i>pthread_attr_destroy</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getdetachstate.html"><i>pthread_attr_getdetachstate</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getguardsize.html"><i>pthread_attr_getguardsize</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getschedparam.html"><i>pthread_attr_getschedparam</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getstack.html"><i>pthread_attr_getstack</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getstackaddr.html"><i>pthread_attr_getstackaddr</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getstacksize.html"><i>pthread_attr_getstacksize</i>()</a><br>
|
|
<a href="../functions/pthread_attr_init.html"><i>pthread_attr_init</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setdetachstate.html"><i>pthread_attr_setdetachstate</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setguardsize.html"><i>pthread_attr_setguardsize</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setschedparam.html"><i>pthread_attr_setschedparam</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setstack.html"><i>pthread_attr_setstack</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setstackaddr.html"><i>pthread_attr_setstackaddr</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setstacksize.html"><i>pthread_attr_setstacksize</i>()</a><br>
|
|
<a href="../functions/pthread_cancel.html"><i>pthread_cancel</i>()</a><br>
|
|
<a href="../functions/pthread_cleanup_pop.html"><i>pthread_cleanup_pop</i>()</a><br>
|
|
<a href="../functions/pthread_cleanup_push.html"><i>pthread_cleanup_push</i>()</a><br>
|
|
<a href="../functions/pthread_cond_broadcast.html"><i>pthread_cond_broadcast</i>()</a><br>
|
|
<a href="../functions/pthread_cond_destroy.html"><i>pthread_cond_destroy</i>()</a><br>
|
|
<a href="../functions/pthread_cond_init.html"><i>pthread_cond_init</i>()</a><br>
|
|
<a href="../functions/pthread_cond_signal.html"><i>pthread_cond_signal</i>()</a><br>
|
|
<a href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a><br>
|
|
<a href="../functions/pthread_cond_wait.html"><i>pthread_cond_wait</i>()</a><br>
|
|
<a href="../functions/pthread_condattr_destroy.html"><i>pthread_condattr_destroy</i>()</a><br>
|
|
<a href="../functions/pthread_condattr_getpshared.html"><i>pthread_condattr_getpshared</i>()</a><br>
|
|
<a href="../functions/pthread_condattr_init.html"><i>pthread_condattr_init</i>()</a><br>
|
|
<a href="../functions/pthread_condattr_setpshared.html"><i>pthread_condattr_setpshared</i>()</a><br>
|
|
<a href="../functions/pthread_create.html"><i>pthread_create</i>()</a><br>
|
|
<a href="../functions/pthread_detach.html"><i>pthread_detach</i>()</a><br>
|
|
<a href="../functions/pthread_equal.html"><i>pthread_equal</i>()</a><br>
|
|
<a href="../functions/pthread_exit.html"><i>pthread_exit</i>()</a><br>
|
|
<a href="../functions/pthread_getconcurrency.html"><i>pthread_getconcurrency</i>()</a><br>
|
|
<a href="../functions/pthread_getspecific.html"><i>pthread_getspecific</i>()</a><br>
|
|
<a href="../functions/pthread_join.html"><i>pthread_join</i>()</a><br>
|
|
</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><br>
|
|
<a href="../functions/pthread_key_create.html"><i>pthread_key_create</i>()</a><br>
|
|
<a href="../functions/pthread_key_delete.html"><i>pthread_key_delete</i>()</a><br>
|
|
<a href="../functions/pthread_kill.html"><i>pthread_kill</i>()</a><br>
|
|
<a href="../functions/pthread_mutex_destroy.html"><i>pthread_mutex_destroy</i>()</a><br>
|
|
<a href="../functions/pthread_mutex_init.html"><i>pthread_mutex_init</i>()</a><br>
|
|
<a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a><br>
|
|
<a href="../functions/pthread_mutex_trylock.html"><i>pthread_mutex_trylock</i>()</a><br>
|
|
<a href="../functions/pthread_mutex_unlock.html"><i>pthread_mutex_unlock</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_destroy.html"><i>pthread_mutexattr_destroy</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_getpshared.html"><i>pthread_mutexattr_getpshared</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_gettype.html"><i>pthread_mutexattr_gettype</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_init.html"><i>pthread_mutexattr_init</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_setpshared.html"><i>pthread_mutexattr_setpshared</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_settype.html"><i>pthread_mutexattr_settype</i>()</a><br>
|
|
<a href="../functions/pthread_once.html"><i>pthread_once</i>()</a><br>
|
|
<a href="../functions/pthread_rwlock_destroy.html"><i>pthread_rwlock_destroy</i>()</a><br>
|
|
<a href="../functions/pthread_rwlock_init.html"><i>pthread_rwlock_init</i>()</a><br>
|
|
<a href="../functions/pthread_rwlock_rdlock.html"><i>pthread_rwlock_rdlock</i>()</a><br>
|
|
<a href="../functions/pthread_rwlock_tryrdlock.html"><i>pthread_rwlock_tryrdlock</i>()</a><br>
|
|
<a href="../functions/pthread_rwlock_trywrlock.html"><i>pthread_rwlock_trywrlock</i>()</a><br>
|
|
<a href="../functions/pthread_rwlock_unlock.html"><i>pthread_rwlock_unlock</i>()</a><br>
|
|
<a href="../functions/pthread_rwlock_wrlock.html"><i>pthread_rwlock_wrlock</i>()</a><br>
|
|
<a href="../functions/pthread_rwlockattr_destroy.html"><i>pthread_rwlockattr_destroy</i>()</a><br>
|
|
<a href="../functions/pthread_rwlockattr_getpshared.html"><i>pthread_rwlockattr_getpshared</i>()</a><br>
|
|
<a href="../functions/pthread_rwlockattr_init.html"><i>pthread_rwlockattr_init</i>()</a><br>
|
|
<a href="../functions/pthread_rwlockattr_setpshared.html"><i>pthread_rwlockattr_setpshared</i>()</a><br>
|
|
<a href="../functions/pthread_self.html"><i>pthread_self</i>()</a><br>
|
|
<a href="../functions/pthread_setcancelstate.html"><i>pthread_setcancelstate</i>()</a><br>
|
|
<a href="../functions/pthread_setcanceltype.html"><i>pthread_setcanceltype</i>()</a><br>
|
|
<a href="../functions/pthread_setconcurrency.html"><i>pthread_setconcurrency</i>()</a><br>
|
|
<a href="../functions/pthread_setspecific.html"><i>pthread_setspecific</i>()</a><br>
|
|
<a href="../functions/pthread_sigmask.html"><i>pthread_sigmask</i>()</a><br>
|
|
<a href="../functions/pthread_testcancel.html"><i>pthread_testcancel</i>()</a><br>
|
|
<a href="../functions/sigwait.html"><i>sigwait</i>()</a><br>
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>On XSI-conformant systems, the symbolic constant _POSIX_THREAD_SAFE_FUNCTIONS is always defined. Therefore, the following
|
|
functions are always supported:</p>
|
|
|
|
<table cellpadding="3">
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><br>
|
|
<a href="../functions/asctime_r.html"><i>asctime_r</i>()</a><br>
|
|
<a href="../functions/ctime_r.html"><i>ctime_r</i>()</a><br>
|
|
<a href="../functions/flockfile.html"><i>flockfile</i>()</a><br>
|
|
<a href="../functions/ftrylockfile.html"><i>ftrylockfile</i>()</a><br>
|
|
<a href="../functions/funlockfile.html"><i>funlockfile</i>()</a><br>
|
|
<a href="../functions/getc_unlocked.html"><i>getc_unlocked</i>()</a><br>
|
|
<a href="../functions/getchar_unlocked.html"><i>getchar_unlocked</i>()</a><br>
|
|
<a href="../functions/getgrgid_r.html"><i>getgrgid_r</i>()</a><br>
|
|
<a href="../functions/getgrnam_r.html"><i>getgrnam_r</i>()</a><br>
|
|
<a href="../functions/getpwnam_r.html"><i>getpwnam_r</i>()</a><br>
|
|
</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><br>
|
|
<a href="../functions/getpwuid_r.html"><i>getpwuid_r</i>()</a><br>
|
|
<a href="../functions/gmtime_r.html"><i>gmtime_r</i>()</a><br>
|
|
<a href="../functions/localtime_r.html"><i>localtime_r</i>()</a><br>
|
|
<a href="../functions/putc_unlocked.html"><i>putc_unlocked</i>()</a><br>
|
|
<a href="../functions/putchar_unlocked.html"><i>putchar_unlocked</i>()</a><br>
|
|
<a href="../functions/rand_r.html"><i>rand_r</i>()</a><br>
|
|
<a href="../functions/readdir_r.html"><i>readdir_r</i>()</a><br>
|
|
<a href="../functions/strerror_r.html"><i>strerror_r</i>()</a><br>
|
|
<a href="../functions/strtok_r.html"><i>strtok_r</i>()</a><br>
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>The following threads functions are only supported on XSI-conformant systems if the Realtime Threads Option Group is supported
|
|
:</p>
|
|
|
|
<table cellpadding="3">
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><br>
|
|
<a href="../functions/pthread_attr_getinheritsched.html"><i>pthread_attr_getinheritsched</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getschedpolicy.html"><i>pthread_attr_getschedpolicy</i>()</a><br>
|
|
<a href="../functions/pthread_attr_getscope.html"><i>pthread_attr_getscope</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setinheritsched.html"><i>pthread_attr_setinheritsched</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setschedpolicy.html"><i>pthread_attr_setschedpolicy</i>()</a><br>
|
|
<a href="../functions/pthread_attr_setscope.html"><i>pthread_attr_setscope</i>()</a><br>
|
|
<a href="../functions/pthread_getschedparam.html"><i>pthread_getschedparam</i>()</a><br>
|
|
</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><br>
|
|
<a href="../functions/pthread_mutex_getprioceiling.html"><i>pthread_mutex_getprioceiling</i>()</a><br>
|
|
<a href="../functions/pthread_mutex_setprioceiling.html"><i>pthread_mutex_setprioceiling</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_getprioceiling.html"><i>pthread_mutexattr_getprioceiling</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_getprotocol.html"><i>pthread_mutexattr_getprotocol</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_setprioceiling.html"><i>pthread_mutexattr_setprioceiling</i>()</a><br>
|
|
<a href="../functions/pthread_mutexattr_setprotocol.html"><i>pthread_mutexattr_setprotocol</i>()</a><br>
|
|
<a href="../functions/pthread_setschedparam.html"><i>pthread_setschedparam</i>()</a><br>
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h5><a name="tag_03_02_09_07"></a>XSI Threads Extensions</h5>
|
|
|
|
<p>The following XSI extensions to POSIX.1c are now supported in IEEE Std 1003.1-2001 as part of the alignment with the
|
|
Single UNIX Specification:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Extended mutex attribute types</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Read-write locks and attributes (also introduced by the IEEE Std 1003.1j-2000 amendment)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Thread concurrency level</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Thread stack guard size</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Parallel I/O</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>A total of 19 new functions were added.</p>
|
|
|
|
<p>These extensions carefully follow the threads programming model specified in POSIX.1c. As with POSIX.1c, all the new functions
|
|
return zero if successful; otherwise, an error number is returned to indicate the error.</p>
|
|
|
|
<p>The concept of attribute objects was introduced in POSIX.1c to allow implementations to extend IEEE Std 1003.1-2001
|
|
without changing the existing interfaces. Attribute objects were defined for threads, mutexes, and condition variables. Attributes
|
|
objects are defined as implementation-defined opaque types to aid extensibility, and functions are defined to allow attributes to
|
|
be set or retrieved. This model has been followed when adding the new type attribute of <b>pthread_mutexattr_t</b> or the new
|
|
read-write lock attributes object <b>pthread_rwlockattr_t</b>.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Extended Mutex Attributes</p>
|
|
|
|
<p>POSIX.1c defines a mutex attributes object as an implementation-defined opaque object of type <b>pthread_mutexattr_t</b>, and
|
|
specifies a number of attributes which this object must have and a number of functions which manipulate these attributes. These
|
|
attributes include <i>detachstate</i>, <i>inheritsched</i>, <i>schedparm</i>, <i>schedpolicy</i>, <i>contentionscope</i>,
|
|
<i>stackaddr</i>, and <i>stacksize</i>.</p>
|
|
|
|
<p>The System Interfaces volume of IEEE Std 1003.1-2001 specifies another mutex attribute called <i>type</i>. The
|
|
<i>type</i> attribute allows applications to specify the behavior of mutex locking operations in situations where POSIX.1c behavior
|
|
is undefined. The OSF DCE threads implementation, based on Draft 4 of POSIX.1c, specified a similar attribute. Note that the names
|
|
of the attributes have changed somewhat from the OSF DCE threads implementation.</p>
|
|
|
|
<p>The System Interfaces volume of IEEE Std 1003.1-2001 also extends the specification of the following POSIX.1c
|
|
functions which manipulate mutexes:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a>
|
|
<a href="../functions/pthread_mutex_trylock.html"><i>pthread_mutex_trylock</i>()</a>
|
|
<a href="../functions/pthread_mutex_unlock.html"><i>pthread_mutex_unlock</i>()</a>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>to take account of the new mutex attribute type and to specify behavior which was declared as undefined in POSIX.1c. How a
|
|
calling thread acquires or releases a mutex now depends upon the mutex <i>type</i> attribute.</p>
|
|
|
|
<p>The <i>type</i> attribute can have the following values:</p>
|
|
|
|
<dl compact>
|
|
<dt>PTHREAD_MUTEX_NORMAL</dt>
|
|
|
|
<dd><br>
|
|
Basic mutex with no specific error checking built in. Does not report a deadlock error.</dd>
|
|
|
|
<dt>PTHREAD_MUTEX_RECURSIVE</dt>
|
|
|
|
<dd><br>
|
|
Allows any thread to recursively lock a mutex. The mutex must be unlocked an equal number of times to release the mutex.</dd>
|
|
|
|
<dt>PTHREAD_MUTEX_ERRORCHECK</dt>
|
|
|
|
<dd><br>
|
|
Detects and reports simple usage errors; that is, an attempt to unlock a mutex that is not locked by the calling thread or that is
|
|
not locked at all, or an attempt to relock a mutex the thread already owns.</dd>
|
|
|
|
<dt>PTHREAD_MUTEX_DEFAULT</dt>
|
|
|
|
<dd><br>
|
|
The default mutex type. May be mapped to any of the above mutex types or may be an implementation-defined type.</dd>
|
|
</dl>
|
|
|
|
<p><i>Normal</i> mutexes do not detect deadlock conditions; for example, a thread will hang if it tries to relock a normal mutex
|
|
that it already owns. Attempting to unlock a mutex locked by another thread, or unlocking an unlocked mutex, results in undefined
|
|
behavior. Normal mutexes will usually be the fastest type of mutex available on a platform but provide the least error
|
|
checking.</p>
|
|
|
|
<p><i>Recursive</i> mutexes are useful for converting old code where it is difficult to establish clear boundaries of
|
|
synchronization. A thread can relock a recursive mutex without first unlocking it. The relocking deadlock which can occur with
|
|
normal mutexes cannot occur with this type of mutex. However, multiple locks of a recursive mutex require the same number of
|
|
unlocks to release the mutex before another thread can acquire the mutex. Furthermore, this type of mutex maintains the concept of
|
|
an owner. Thus, a thread attempting to unlock a recursive mutex which another thread has locked returns with an error. A thread
|
|
attempting to unlock a recursive mutex that is not locked returns with an error. Never use a recursive mutex with condition
|
|
variables because the implicit unlock performed by <a href="../functions/pthread_cond_wait.html"><i>pthread_cond_wait</i>()</a> or
|
|
<a href="../functions/pthread_cond_timedwait.html"><i>pthread_cond_timedwait</i>()</a> will not actually release the mutex if it
|
|
had been locked multiple times.</p>
|
|
|
|
<p><i>Errorcheck</i> mutexes provide error checking and are useful primarily as a debugging aid. A thread attempting to relock an
|
|
errorcheck mutex without first unlocking it returns with an error. Again, this type of mutex maintains the concept of an owner.
|
|
Thus, a thread attempting to unlock an errorcheck mutex which another thread has locked returns with an error. A thread attempting
|
|
to unlock an errorcheck mutex that is not locked also returns with an error. It should be noted that errorcheck mutexes will almost
|
|
always be much slower than normal mutexes due to the extra state checks performed.</p>
|
|
|
|
<p>The default mutex type provides implementation-defined error checking. The default mutex may be mapped to one of the other
|
|
defined types or may be something entirely different. This enables each vendor to provide the mutex semantics which the vendor
|
|
feels will be most useful to their target users. Most vendors will probably choose to make normal mutexes the default so as to give
|
|
applications the benefit of the fastest type of mutexes available on their platform. Check your implementation's documentation.</p>
|
|
|
|
<p>An application developer can use any of the mutex types almost interchangeably as long as the application does not depend upon
|
|
the implementation detecting (or failing to detect) any particular errors. Note that a recursive mutex can be used with condition
|
|
variable waits as long as the application never recursively locks the mutex.</p>
|
|
|
|
<p>Two functions are provided for manipulating the <i>type</i> attribute of a mutex attributes object. This attribute is set or
|
|
returned in the <i>type</i> parameter of these functions. The <a href=
|
|
"../functions/pthread_mutexattr_settype.html"><i>pthread_mutexattr_settype</i>()</a> function is used to set a specific type value
|
|
while <a href="../functions/pthread_mutexattr_gettype.html"><i>pthread_mutexattr_gettype</i>()</a> is used to return the type of
|
|
the mutex. Setting the <i>type</i> attribute of a mutex attributes object affects only mutexes initialized using that mutex
|
|
attributes object. Changing the <i>type</i> attribute does not affect mutexes previously initialized using that mutex attributes
|
|
object.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Read-Write Locks and Attributes</p>
|
|
|
|
<p>The read-write locks introduced have been harmonized with those in IEEE Std 1003.1j-2000; see also <a href=
|
|
"#tag_03_02_09_26">Thread Read-Write Locks</a> .</p>
|
|
|
|
<p>Read-write locks (also known as reader-writer locks) allow a thread to exclusively lock some shared data while updating that
|
|
data, or allow any number of threads to have simultaneous read-only access to the data.</p>
|
|
|
|
<p>Unlike a mutex, a read-write lock distinguishes between reading data and writing data. A mutex excludes all other threads. A
|
|
read-write lock allows other threads access to the data, providing no thread is modifying the data. Thus, a read-write lock is less
|
|
primitive than either a mutex-condition variable pair or a semaphore.</p>
|
|
|
|
<p>Application developers should consider using a read-write lock rather than a mutex to protect data that is frequently referenced
|
|
but seldom modified. Most threads (readers) will be able to read the data without waiting and will only have to block when some
|
|
other thread (a writer) is in the process of modifying the data. Conversely a thread that wants to change the data is forced to
|
|
wait until there are no readers. This type of lock is often used to facilitate parallel access to data on multi-processor platforms
|
|
or to avoid context switches on single processor platforms where multiple threads access the same data.</p>
|
|
|
|
<p>If a read-write lock becomes unlocked and there are multiple threads waiting to acquire the write lock, the implementation's
|
|
scheduling policy determines which thread acquires the read-write lock for writing. If there are multiple threads blocked on a
|
|
read-write lock for both read locks and write locks, it is unspecified whether the readers or a writer acquire the lock first.
|
|
However, for performance reasons, implementations often favor writers over readers to avoid potential writer starvation.</p>
|
|
|
|
<p>A read-write lock object is an implementation-defined opaque object of type <b>pthread_rwlock_t</b> as defined in <a href=
|
|
"../basedefs/pthread.h.html"><i><pthread.h></i></a>. There are two different sorts of locks associated with a read-write
|
|
lock: a read lock and a write lock.</p>
|
|
|
|
<p>The <a href="../functions/pthread_rwlockattr_init.html"><i>pthread_rwlockattr_init</i>()</a> function initializes a read-write
|
|
lock attributes object with the default value for all the attributes defined in the implementation. After a read-write lock
|
|
attributes object has been used to initialize one or more read-write locks, changes to the read-write lock attributes object,
|
|
including destruction, do not affect previously initialized read-write locks.</p>
|
|
|
|
<p>Implementations must provide at least the read-write lock attribute <i>process-shared</i>. This attribute can have the following
|
|
values:</p>
|
|
|
|
<dl compact>
|
|
<dt>PTHREAD_PROCESS_SHARED</dt>
|
|
|
|
<dd><br>
|
|
Any thread of any process that has access to the memory where the read-write lock resides can manipulate the read-write lock.</dd>
|
|
|
|
<dt>PTHREAD_PROCESS_PRIVATE</dt>
|
|
|
|
<dd><br>
|
|
Only threads created within the same process as the thread that initialized the read-write lock can manipulate the read-write lock.
|
|
This is the default value.</dd>
|
|
</dl>
|
|
|
|
<p>The <a href="../functions/pthread_rwlockattr_setpshared.html"><i>pthread_rwlockattr_setpshared</i>()</a> function is used to set
|
|
the <i>process-shared</i> attribute of an initialized read-write lock attributes object while the function <a href=
|
|
"../functions/pthread_rwlockattr_getpshared.html"><i>pthread_rwlockattr_getpshared</i>()</a> obtains the current value of the
|
|
<i>process-shared</i> attribute.</p>
|
|
|
|
<p>A read-write lock attributes object is destroyed using the <a href=
|
|
"../functions/pthread_rwlockattr_destroy.html"><i>pthread_rwlockattr_destroy</i>()</a> function. The effect of subsequent use of
|
|
the read-write lock attributes object is undefined.</p>
|
|
|
|
<p>A thread creates a read-write lock using the <a href="../functions/pthread_rwlock_init.html"><i>pthread_rwlock_init</i>()</a>
|
|
function. The attributes of the read-write lock can be specified by the application developer; otherwise, the default
|
|
implementation-defined read-write lock attributes are used if the pointer to the read-write lock attributes object is NULL. In
|
|
cases where the default attributes are appropriate, the PTHREAD_RWLOCK_INITIALIZER macro can be used to initialize statically
|
|
allocated read-write locks.</p>
|
|
|
|
<p>A thread which wants to apply a read lock to the read-write lock can use either <a href=
|
|
"../functions/pthread_rwlock_rdlock.html"><i>pthread_rwlock_rdlock</i>()</a> or <a href=
|
|
"../functions/pthread_rwlock_tryrdlock.html"><i>pthread_rwlock_tryrdlock</i>()</a>. If <a href=
|
|
"../functions/pthread_rwlock_rdlock.html"><i>pthread_rwlock_rdlock</i>()</a> is used, the thread acquires a read lock if a writer
|
|
does not hold the write lock and there are no writers blocked on the write lock. If a read lock is not acquired, the calling thread
|
|
blocks until it can acquire a lock. However, if <a href=
|
|
"../functions/pthread_rwlock_tryrdlock.html"><i>pthread_rwlock_tryrdlock</i>()</a> is used, the function returns immediately with
|
|
the error [EBUSY] if any thread holds a write lock or there are blocked writers waiting for the write lock.</p>
|
|
|
|
<p>A thread which wants to apply a write lock to the read-write lock can use either of two functions: <a href=
|
|
"../functions/pthread_rwlock_wrlock.html"><i>pthread_rwlock_wrlock</i>()</a> or <a href=
|
|
"../functions/pthread_rwlock_trywrlock.html"><i>pthread_rwlock_trywrlock</i>()</a>. If <a href=
|
|
"../functions/pthread_rwlock_wrlock.html"><i>pthread_rwlock_wrlock</i>()</a> is used, the thread acquires the write lock if no
|
|
other reader or writer threads hold the read-write lock. If the write lock is not acquired, the thread blocks until it can acquire
|
|
the write lock. However, if <a href="../functions/pthread_rwlock_trywrlock.html"><i>pthread_rwlock_trywrlock</i>()</a> is used, the
|
|
function returns immediately with the error [EBUSY] if any thread is holding either a read or a write lock.</p>
|
|
|
|
<p>The <a href="../functions/pthread_rwlock_unlock.html"><i>pthread_rwlock_unlock</i>()</a> function is used to unlock a read-write
|
|
lock object held by the calling thread. Results are undefined if the read-write lock is not held by the calling thread. If there
|
|
are other read locks currently held on the read-write lock object, the read-write lock object remains in the read locked state but
|
|
without the current thread as one of its owners. If this function releases the last read lock for this read-write lock object, the
|
|
read-write lock object is put in the unlocked read state. If this function is called to release a write lock for this read-write
|
|
lock object, the read-write lock object is put in the unlocked state.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Thread Concurrency Level</p>
|
|
|
|
<p>On threads implementations that multiplex user threads onto a smaller set of kernel execution entities, the system attempts to
|
|
create a reasonable number of kernel execution entities for the application upon application startup.</p>
|
|
|
|
<p>On some implementations, these kernel entities are retained by user threads that block in the kernel. Other implementations do
|
|
not <i>timeslice</i> user threads so that multiple compute-bound user threads can share a kernel thread. On such implementations,
|
|
some applications may use up all the available kernel execution entities before their user-space threads are used up. The process
|
|
may be left with user threads capable of doing work for the application but with no way to schedule them.</p>
|
|
|
|
<p>The <a href="../functions/pthread_setconcurrency.html"><i>pthread_setconcurrency</i>()</a> function enables an application to
|
|
request more kernel entities; that is, specify a desired concurrency level. However, this function merely provides a hint to the
|
|
implementation. The implementation is free to ignore this request or to provide some other number of kernel entities. If an
|
|
implementation does not multiplex user threads onto a smaller number of kernel execution entities, the <a href=
|
|
"../functions/pthread_setconcurrency.html"><i>pthread_setconcurrency</i>()</a> function has no effect.</p>
|
|
|
|
<p>The <a href="../functions/pthread_setconcurrency.html"><i>pthread_setconcurrency</i>()</a> function may also have an effect on
|
|
implementations where the kernel mode and user mode schedulers cooperate to ensure that ready user threads are not prevented from
|
|
running by other threads blocked in the kernel.</p>
|
|
|
|
<p>The <a href="../functions/pthread_getconcurrency.html"><i>pthread_getconcurrency</i>()</a> function always returns the value set
|
|
by a previous call to <a href="../functions/pthread_setconcurrency.html"><i>pthread_setconcurrency</i>()</a>. However, if <a href=
|
|
"../functions/pthread_setconcurrency.html"><i>pthread_setconcurrency</i>()</a> was not previously called, this function returns
|
|
zero to indicate that the threads implementation is maintaining the concurrency level.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Thread Stack Guard Size</p>
|
|
|
|
<p>DCE threads introduced the concept of a "thread stack guard size". Most thread implementations add a region of protected
|
|
memory to a thread's stack, commonly known as a "guard region", as a safety measure to prevent stack pointer overflow in one
|
|
thread from corrupting the contents of another thread's stack. The default size of the guard regions attribute is {PAGESIZE} bytes
|
|
and is implementation-defined.</p>
|
|
|
|
<p>Some application developers may wish to change the stack guard size. When an application creates a large number of threads, the
|
|
extra page allocated for each stack may strain system resources. In addition to the extra page of memory, the kernel's memory
|
|
manager has to keep track of the different protections on adjoining pages. When this is a problem, the application developer may
|
|
request a guard size of 0 bytes to conserve system resources by eliminating stack overflow protection.</p>
|
|
|
|
<p>Conversely an application that allocates large data structures such as arrays on the stack may wish to increase the default
|
|
guard size in order to detect stack overflow. If a thread allocates two pages for a data array, a single guard page provides little
|
|
protection against thread stack overflows since the thread can corrupt adjoining memory beyond the guard page.</p>
|
|
|
|
<p>The System Interfaces volume of IEEE Std 1003.1-2001 defines a new attribute of a thread attributes object; that is,
|
|
the <i>guardsize</i> attribute which allows applications to specify the size of the guard region of a thread's stack.</p>
|
|
|
|
<p>Two functions are provided for manipulating a thread's stack guard size. The <a href=
|
|
"../functions/pthread_attr_setguardsize.html"><i>pthread_attr_setguardsize</i>()</a> function sets the thread <i>guardsize</i>
|
|
attribute, and the <a href="../functions/pthread_attr_getguardsize.html"><i>pthread_attr_getguardsize</i>()</a> function retrieves
|
|
the current value.</p>
|
|
|
|
<p>An implementation may round up the requested guard size to a multiple of the configurable system variable {PAGESIZE}. In this
|
|
case, <a href="../functions/pthread_attr_getguardsize.html"><i>pthread_attr_getguardsize</i>()</a> returns the guard size specified
|
|
by the previous <a href="../functions/pthread_attr_setguardsize.html"><i>pthread_attr_setguardsize</i>()</a> function call and not
|
|
the rounded up value.</p>
|
|
|
|
<p>If an application is managing its own thread stacks using the <i>stackaddr</i> attribute, the <i>guardsize</i> attribute is
|
|
ignored and no stack overflow protection is provided. In this case, it is the responsibility of the application to manage stack
|
|
overflow along with stack allocation.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Parallel I/O</p>
|
|
|
|
<p>Suppose two or more threads independently issue read requests on the same file. To read specific data from a file, a thread must
|
|
first call <a href="../functions/lseek.html"><i>lseek</i>()</a> to seek to the proper offset in the file, and then call <a href=
|
|
"../functions/read.html"><i>read</i>()</a> to retrieve the required data. If more than one thread does this at the same time, the
|
|
first thread may complete its seek call, but before it gets a chance to issue its read call a second thread may complete its seek
|
|
call, resulting in the first thread accessing incorrect data when it issues its read call. One workaround is to lock the file
|
|
descriptor while seeking and reading or writing, but this reduces parallelism and adds overhead.</p>
|
|
|
|
<p>Instead, the System Interfaces volume of IEEE Std 1003.1-2001 provides two functions to make seek/read and seek/write
|
|
operations atomic. The file descriptor's current offset is unchanged, thus allowing multiple read and write operations to proceed
|
|
in parallel. This improves the I/O performance of threaded applications. The <a href="../functions/pread.html"><i>pread</i>()</a>
|
|
function is used to do an atomic read of data from a file into a buffer. Conversely, the <a href=
|
|
"../functions/pwrite.html"><i>pwrite</i>()</a> function does an atomic write of data from a buffer to a file.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_09_08"></a>Thread-Safety</h5>
|
|
|
|
<p>All functions required by IEEE Std 1003.1-2001 need to be thread-safe. Implementations have to provide internal
|
|
synchronization when necessary in order to achieve this goal. In certain cases-for example, most floating-point
|
|
implementations-context switch code may have to manage the writable shared state.</p>
|
|
|
|
<p>While a read from a pipe of {PIPE_MAX}*2 bytes may not generate a single atomic and thread-safe stream of bytes, it should
|
|
generate "several" (individually atomic) thread-safe streams of bytes. Similarly, while reading from a terminal device may not
|
|
generate a single atomic and thread-safe stream of bytes, it should generate some finite number of (individually atomic) and
|
|
thread-safe streams of bytes. That is, concurrent calls to read for a pipe, FIFO, or terminal device are not allowed to result in
|
|
corrupting the stream of bytes or other internal data. However, <a href="../functions/read.html"><i>read</i>()</a>, in these cases,
|
|
is not required to return a single contiguous and atomic stream of bytes.</p>
|
|
|
|
<p>It is not required that all functions provided by IEEE Std 1003.1-2001 be either async-cancel-safe or
|
|
async-signal-safe.</p>
|
|
|
|
<p>As it turns out, some functions are inherently not thread-safe; that is, their interface specifications preclude reentrancy. For
|
|
example, some functions (such as <a href="../functions/asctime.html"><i>asctime</i>()</a>) return a pointer to a result stored in
|
|
memory space allocated by the function on a per-process basis. Such a function is not thread-safe, because its result can be
|
|
overwritten by successive invocations. Other functions, while not inherently non-thread-safe, may be implemented in ways that lead
|
|
to them not being thread-safe. For example, some functions (such as <a href="../functions/rand.html"><i>rand</i>()</a>) store state
|
|
information (such as a seed value, which survives multiple function invocations) in memory space allocated by the function on a
|
|
per-process basis. The implementation of such a function is not thread-safe if the implementation fails to synchronize invocations
|
|
of the function and thus fails to protect the state information. The problem is that when the state information is not protected,
|
|
concurrent invocations can interfere with one another (for example, applications using <a href=
|
|
"../functions/rand.html"><i>rand</i>()</a> may see the same seed value).</p>
|
|
|
|
<p><i>Thread-Safety and Locking of Existing Functions</i></p>
|
|
|
|
<p>Originally, POSIX.1 was not designed to work in a multi-threaded environment, and some implementations of some existing
|
|
functions will not work properly when executed concurrently. To provide routines that will work correctly in an environment with
|
|
threads (``thread-safe"), two problems need to be solved:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Routines that maintain or return pointers to static areas internal to the routine (which may now be shared) need to be modified.
|
|
The routines <a href="../functions/ttyname.html"><i>ttyname</i>()</a> and <a href=
|
|
"../functions/localtime.html"><i>localtime</i>()</a> are examples.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Routines that access data space shared by more than one thread need to be modified. The <a href=
|
|
"../functions/malloc.html"><i>malloc</i>()</a> function and the <i>stdio</i> family routines are examples.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>There are a variety of constraints on these changes. The first is compatibility with the existing versions of these
|
|
functions-non-thread-safe functions will continue to be in use for some time, as the original interfaces are used by existing code.
|
|
Another is that the new thread-safe versions of these functions represent as small a change as possible over the familiar
|
|
interfaces provided by the existing non-thread-safe versions. The new interfaces should be independent of any particular threads
|
|
implementation. In particular, they should be thread-safe without depending on explicit thread-specific memory. Finally, there
|
|
should be minimal performance penalty due to the changes made to the functions.</p>
|
|
|
|
<p>It is intended that the list of functions from POSIX.1 that cannot be made thread-safe and for which corrected versions are
|
|
provided be complete.</p>
|
|
|
|
<p><i>Thread-Safety and Locking Solutions</i></p>
|
|
|
|
<p>Many of the POSIX.1 functions were thread-safe and did not change at all. However, some functions (for example, the math
|
|
functions typically found in <b>libm</b>) are not thread-safe because of writable shared global state. For instance, in
|
|
IEEE Std 754-1985 floating-point implementations, the computation modes and flags are global and shared.</p>
|
|
|
|
<p>Some functions are not thread-safe because a particular implementation is not reentrant, typically because of a non-essential
|
|
use of static storage. These require only a new implementation.</p>
|
|
|
|
<p>Thread-safe libraries are useful in a wide range of parallel (and asynchronous) programming environments, not just within
|
|
pthreads. In order to be used outside the context of pthreads, however, such libraries still have to use some synchronization
|
|
method. These could either be independent of the pthread synchronization operations, or they could be a subset of the pthread
|
|
interfaces. Either method results in thread-safe library implementations that can be used without the rest of pthreads.</p>
|
|
|
|
<p>Some functions, such as the <i>stdio</i> family interface and dynamic memory allocation functions such as <a href=
|
|
"../functions/malloc.html"><i>malloc</i>()</a>, are inter-dependent routines that share resources (for example, buffers) across
|
|
related calls. These require synchronization to work correctly, but they do not require any change to their external (user-visible)
|
|
interfaces.</p>
|
|
|
|
<p>In some cases, such as <a href="../functions/getc.html"><i>getc</i>()</a> and <a href=
|
|
"../functions/putc.html"><i>putc</i>()</a>, adding synchronization is likely to create an unacceptable performance impact. In this
|
|
case, slower thread-safe synchronized functions are to be provided, but the original, faster (but unsafe) functions (which may be
|
|
implemented as macros) are retained under new names. Some additional special-purpose synchronization facilities are necessary for
|
|
these macros to be usable in multi-threaded programs. This also requires changes in <a href=
|
|
"../basedefs/stdio.h.html"><i><stdio.h></i></a>.</p>
|
|
|
|
<p>The other common reason that functions are unsafe is that they return a pointer to static storage, making the functions
|
|
non-thread-safe. This has to be changed, and there are three natural choices:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Return a pointer to thread-specific storage</p>
|
|
|
|
<p>This could incur a severe performance penalty on those architectures with a costly implementation of the thread-specific data
|
|
interface.</p>
|
|
|
|
<p>A variation on this technique is to use <a href="../functions/malloc.html"><i>malloc</i>()</a> to allocate storage for the
|
|
function output and return a pointer to this storage. This technique may also have an undesirable performance impact, however, and
|
|
a simplistic implementation requires that the user program explicitly free the storage object when it is no longer needed. This
|
|
technique is used by some existing POSIX.1 functions. With careful implementation for infrequently used functions, there may be
|
|
little or no performance or storage penalty, and the maintenance of already-standardized interfaces is a significant benefit.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Return the actual value computed by the function</p>
|
|
|
|
<p>This technique can only be used with functions that return pointers to structures-routines that return character strings would
|
|
have to wrap their output in an enclosing structure in order to return the output on the stack. There is also a negative
|
|
performance impact inherent in this solution in that the output value has to be copied twice before it can be used by the calling
|
|
function: once from the called routine's local buffers to the top of the stack, then from the top of the stack to the assignment
|
|
target. Finally, many older compilers cannot support this technique due to a historical tendency to use internal static buffers to
|
|
deliver the results of structure-valued functions.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Have the caller pass the address of a buffer to contain the computed value</p>
|
|
|
|
<p>The only disadvantage of this approach is that extra arguments have to be provided by the calling program. It represents the
|
|
most efficient solution to the problem, however, and, unlike the <a href="../functions/malloc.html"><i>malloc</i>()</a> technique,
|
|
it is semantically clear.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>There are some routines (often groups of related routines) whose interfaces are inherently non-thread-safe because they
|
|
communicate across multiple function invocations by means of static memory locations. The solution is to redesign the calls so that
|
|
they are thread-safe, typically by passing the needed data as extra parameters. Unfortunately, this may require major changes to
|
|
the interface as well.</p>
|
|
|
|
<p>A floating-point implementation using IEEE Std 754-1985 is a case in point. A less problematic example is the
|
|
<i>rand48</i> family of pseudo-random number generators. The functions <a href="../functions/getgrgid.html"><i>getgrgid</i>()</a>,
|
|
<a href="../functions/getgrnam.html"><i>getgrnam</i>()</a>, <a href="../functions/getpwnam.html"><i>getpwnam</i>()</a>, and <a
|
|
href="../functions/getpwuid.html"><i>getpwuid</i>()</a> are another such case.</p>
|
|
|
|
<p>The problems with <i>errno</i> are discussed in <a href="#tag_03_02_03_01">Alternative Solutions for Per-Thread errno</a> .</p>
|
|
|
|
<p>Some functions can be thread-safe or not, depending on their arguments. These include the <a href=
|
|
"../functions/tmpnam.html"><i>tmpnam</i>()</a> and <a href="../functions/ctermid.html"><i>ctermid</i>()</a> functions. These
|
|
functions have pointers to character strings as arguments. If the pointers are not NULL, the functions store their results in the
|
|
character string; however, if the pointers are NULL, the functions store their results in an area that may be static and thus
|
|
subject to overwriting by successive calls. These should only be called by multi-thread applications when their arguments are
|
|
non-NULL.</p>
|
|
|
|
<p><i>Asynchronous Safety and Thread-Safety</i></p>
|
|
|
|
<p>A floating-point implementation has many modes that effect rounding and other aspects of computation. Functions in some math
|
|
library implementations may change the computation modes for the duration of a function call. If such a function call is
|
|
interrupted by a signal or cancelation, the floating-point state is not required to be protected.</p>
|
|
|
|
<p>There is a significant cost to make floating-point operations async-cancel-safe or async-signal-safe; accordingly, neither form
|
|
of async safety is required.</p>
|
|
|
|
<p><i>Functions Returning Pointers to Static Storage</i></p>
|
|
|
|
<p>For those functions that are not thread-safe because they return values in fixed size statically allocated structures, alternate
|
|
"_r" forms are provided that pass a pointer to an explicit result structure. Those that return pointers into library-allocated
|
|
buffers have forms provided with explicit buffer and length parameters.</p>
|
|
|
|
<p>For functions that return pointers to library-allocated buffers, it makes sense to provide "_r" versions that allow the
|
|
application control over allocation of the storage in which results are returned. This allows the state used by these functions to
|
|
be managed on an application-specific basis, supporting per-thread, per-process, or other application-specific sharing
|
|
relationships.</p>
|
|
|
|
<p>Early proposals had provided "_r" versions for functions that returned pointers to variable-size buffers without providing a
|
|
means for determining the required buffer size. This would have made using such functions exceedingly clumsy, potentially requiring
|
|
iteratively calling them with increasingly larger guesses for the amount of storage required. Hence, <a href=
|
|
"../functions/sysconf.html"><i>sysconf</i>()</a> variables have been provided for such functions that return the maximum required
|
|
buffer size.</p>
|
|
|
|
<p>Thus, the rule that has been followed by IEEE Std 1003.1-2001 when adapting single-threaded non-thread-safe functions
|
|
is as follows: all functions returning pointers to library-allocated storage should have "_r" versions provided, allowing the
|
|
application control over the storage allocation. Those with variable-sized return values accept both a buffer address and a length
|
|
parameter. The <a href="../functions/sysconf.html"><i>sysconf</i>()</a> variables are provided to supply the appropriate buffer
|
|
sizes when required. Implementors are encouraged to apply the same rule when adapting their own existing functions to a pthreads
|
|
environment.</p>
|
|
|
|
<h5><a name="tag_03_02_09_09"></a>Thread IDs</h5>
|
|
|
|
<p>Separate applications should communicate through well-defined interfaces and should not depend on each other's implementation.
|
|
For example, if a programmer decides to rewrite the <a href="../utilities/sort.html"><i>sort</i></a> utility using multiple
|
|
threads, it should be easy to do this so that the interface to the <a href="../utilities/sort.html"><i>sort</i></a> utility does
|
|
not change. Consider that if the user causes SIGINT to be generated while the <a href="../utilities/sort.html"><i>sort</i></a>
|
|
utility is running, keeping the same interface means that the entire <a href="../utilities/sort.html"><i>sort</i></a> utility is
|
|
killed, not just one of its threads. As another example, consider a realtime application that manages a reactor. Such an
|
|
application may wish to allow other applications to control the priority at which it watches the control rods. One technique to
|
|
accomplish this is to write the ID of the thread watching the control rods into a file and allow other programs to change the
|
|
priority of that thread as they see fit. A simpler technique is to have the reactor process accept IPCs (Interprocess Communication
|
|
messages) from other processes, telling it at a semantic level what priority the program should assign to watching the control
|
|
rods. This allows the programmer greater flexibility in the implementation. For example, the programmer can change the
|
|
implementation from having one thread per rod to having one thread watching all of the rods without changing the interface. Having
|
|
threads live inside the process means that the implementation of a process is invisible to outside processes (excepting debuggers
|
|
and system management tools).</p>
|
|
|
|
<p>Threads do not provide a protection boundary. Every thread model allows threads to share memory with other threads and
|
|
encourages this sharing to be widespread. This means that one thread can wipe out memory that is needed for the correct functioning
|
|
of other threads that are sharing its memory. Consequently, providing each thread with its own user and/or group IDs would not
|
|
provide a protection boundary between threads sharing memory.</p>
|
|
|
|
<h5><a name="tag_03_02_09_10"></a>Thread Mutexes</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_09_11"></a>Thread Scheduling</h5>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Scheduling Implementation Models</p>
|
|
|
|
<p>The following scheduling implementation models are presented in terms of threads and "kernel entities". This is to simplify
|
|
exposition of the models, and it does not imply that an implementation actually has an identifiable "kernel entity".</p>
|
|
|
|
<p>A kernel entity is not defined beyond the fact that it has scheduling attributes that are used to resolve contention with other
|
|
kernel entities for execution resources. A kernel entity may be thought of as an envelope that holds a thread or a separate kernel
|
|
thread. It is not a conventional process, although it shares with the process the attribute that it has a single thread of control;
|
|
it does not necessarily imply an address space, open files, and so on. It is better thought of as a primitive facility upon which
|
|
conventional processes and threads may be constructed.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>System Thread Scheduling Model</p>
|
|
|
|
<p>This model consists of one thread per kernel entity. The kernel entity is solely responsible for scheduling thread execution on
|
|
one or more processors. This model schedules all threads against all other threads in the system using the scheduling attributes of
|
|
the thread.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Process Scheduling Model</p>
|
|
|
|
<p>A generalized process scheduling model consists of two levels of scheduling. A threads library creates a pool of kernel
|
|
entities, as required, and schedules threads to run on them using the scheduling attributes of the threads. Typically, the size of
|
|
the pool is a function of the simultaneously runnable threads, not the total number of threads. The kernel then schedules the
|
|
kernel entities onto processors according to their scheduling attributes, which are managed by the threads library. This set model
|
|
potentially allows a wide range of mappings between threads and kernel entities.</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
|
|
<li>
|
|
<p>System and Process Scheduling Model Performance</p>
|
|
|
|
<p>There are a number of important implications on the performance of applications using these scheduling models. The process
|
|
scheduling model potentially provides lower overhead for making scheduling decisions, since there is no need to access kernel-level
|
|
information or functions and the set of schedulable entities is smaller (only the threads within the process).</p>
|
|
|
|
<p>On the other hand, since the kernel is also making scheduling decisions regarding the system resources under its control (for
|
|
example, CPU(s), I/O devices, memory), decisions that do not take thread scheduling parameters into account can result in
|
|
unspecified delays for realtime application threads, causing them to miss maximum response time limits.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Rate Monotonic Scheduling</p>
|
|
|
|
<p>Rate monotonic scheduling was considered, but rejected for standardization in the context of pthreads. A sporadic server policy
|
|
is included.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Scheduling Options</p>
|
|
|
|
<p>In IEEE Std 1003.1-2001, the basic thread scheduling functions are defined under the Threads option, so that they are
|
|
required of all threads implementations. However, there are no specific scheduling policies required by this option to allow for
|
|
conforming thread implementations that are not targeted to realtime applications.</p>
|
|
|
|
<p>Specific standard scheduling policies are defined to be under the Thread Execution Scheduling option, and they are specifically
|
|
designed to support realtime applications by providing predictable resource-sharing sequences. The name of this option was chosen
|
|
to emphasize that this functionality is defined as appropriate for realtime applications that require simple priority-based
|
|
scheduling.</p>
|
|
|
|
<p>It is recognized that these policies are not necessarily satisfactory for some multi-processor implementations, and work is
|
|
ongoing to address a wider range of scheduling behaviors. The interfaces have been chosen to create abundant opportunity for future
|
|
scheduling policies to be implemented and standardized based on this interface. In order to standardize a new scheduling policy,
|
|
all that is required (from the standpoint of thread scheduling attributes) is to define a new policy name, new members of the
|
|
thread attributes object, and functions to set these members when the scheduling policy is equal to the new value.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_09_12"></a>Scheduling Contention Scope</h5>
|
|
|
|
<p>In order to accommodate the requirement for realtime response, each thread has a scheduling contention scope attribute. Threads
|
|
with a system scheduling contention scope have to be scheduled with respect to all other threads in the system. These threads are
|
|
usually bound to a single kernel entity that reflects their scheduling attributes and are directly scheduled by the kernel.</p>
|
|
|
|
<p>Threads with a process scheduling contention scope need be scheduled only with respect to the other threads in the process.
|
|
These threads may be scheduled within the process onto a pool of kernel entities. The implementation is also free to bind these
|
|
threads directly to kernel entities and let them be scheduled by the kernel. Process scheduling contention scope allows the
|
|
implementation the most flexibility and is the default if both contention scopes are supported and none is specified.</p>
|
|
|
|
<p>Thus, the choice by implementors to provide one or the other (or both) of these scheduling models is driven by the need of their
|
|
supported application domains for worst-case (that is, realtime) response, or average-case (non-realtime) response.</p>
|
|
|
|
<h5><a name="tag_03_02_09_13"></a>Scheduling Allocation Domain</h5>
|
|
|
|
<p>The SCHED_FIFO and SCHED_RR scheduling policies take on different characteristics on a multi-processor. Other scheduling
|
|
policies are also subject to changed behavior when executed on a multi-processor. The concept of scheduling allocation domain
|
|
determines the set of processors on which the threads of an application may run. By considering the application's processor
|
|
scheduling allocation domain for its threads, scheduling policies can be defined in terms of their behavior for varying processor
|
|
scheduling allocation domain values. It is conceivable that not all scheduling allocation domain sizes make sense for all
|
|
scheduling policies on all implementations. The concept of scheduling allocation domain, however, is a useful tool for the
|
|
description of multi-processor scheduling policies.</p>
|
|
|
|
<p>The "process control" approach to scheduling obtains significant performance advantages from dynamic scheduling allocation
|
|
domain sizes when it is applicable.</p>
|
|
|
|
<p>Non-Uniform Memory Access (NUMA) multi-processors may use a system scheduling structure that involves reassignment of threads
|
|
among scheduling allocation domains. In NUMA machines, a natural model of scheduling is to match scheduling allocation domains to
|
|
clusters of processors. Load balancing in such an environment requires changing the scheduling allocation domain to which a thread
|
|
is assigned.</p>
|
|
|
|
<h5><a name="tag_03_02_09_14"></a>Scheduling Documentation</h5>
|
|
|
|
<p>Implementation-provided scheduling policies need to be completely documented in order to be useful. This documentation includes
|
|
a description of the attributes required for the policy, the scheduling interaction of threads running under this policy and all
|
|
other supported policies, and the effects of all possible values for processor scheduling allocation domain. Note that for the
|
|
implementor wishing to be minimally-compliant, it is (minimally) acceptable to define the behavior as undefined.</p>
|
|
|
|
<h5><a name="tag_03_02_09_15"></a>Scheduling Contention Scope Attribute</h5>
|
|
|
|
<p>The scheduling contention scope defines how threads compete for resources. Within IEEE Std 1003.1-2001, scheduling
|
|
contention scope is used to describe only how threads are scheduled in relation to one another in the system. That is, either they
|
|
are scheduled against all other threads in the system (``system scope") or only against those threads in the process (``process
|
|
scope"). In fact, scheduling contention scope may apply to additional resources, including virtual timers and profiling, which are
|
|
not currently considered by IEEE Std 1003.1-2001.</p>
|
|
|
|
<h5><a name="tag_03_02_09_16"></a>Mixed Scopes</h5>
|
|
|
|
<p>If only one scheduling contention scope is supported, the scheduling decision is straightforward. To perform the processor
|
|
scheduling decision in a mixed scope environment, it is necessary to map the scheduling attributes of the thread with process-wide
|
|
contention scope to the same attribute space as the thread with system-wide contention scope.</p>
|
|
|
|
<p>Since a conforming implementation has to support one and may support both scopes, it is useful to discuss the effects of such
|
|
choices with respect to example applications. If an implementation supports both scopes, mixing scopes provides a means of better
|
|
managing system-level (that is, kernel-level) and library-level resources. In general, threads with system scope will require the
|
|
resources of a separate kernel entity in order to guarantee the scheduling semantics. On the other hand, threads with process scope
|
|
can share the resources of a kernel entity while maintaining the scheduling semantics.</p>
|
|
|
|
<p>The application is free to create threads with dedicated kernel resources, and other threads that multiplex kernel resources.
|
|
Consider the example of a window server. The server allocates two threads per widget: one thread manages the widget user interface
|
|
(including drawing), while the other thread takes any required application action. This allows the widget to be "active" while
|
|
the application is computing. A screen image may be built from thousands of widgets. If each of these threads had been created with
|
|
system scope, then most of the kernel-level resources might be wasted, since only a few widgets are active at any one time. In
|
|
addition, mixed scope is particularly useful in a window server where one thread with high priority and system scope handles the
|
|
mouse so that it tracks well. As another example, consider a database server. For each of the hundreds or thousands of clients
|
|
supported by a large server, an equivalent number of threads will have to be created. If each of these threads were system scope,
|
|
the consequences would be the same as for the window server example above. However, the server could be constructed so that actual
|
|
retrieval of data is done by several dedicated threads. Dedicated threads that do work for all clients frequently justify the added
|
|
expense of system scope. If it were not permissible to mix system and process threads in the same process, this type of solution
|
|
would not be possible.</p>
|
|
|
|
<h5><a name="tag_03_02_09_17"></a>Dynamic Thread Scheduling Parameters Access</h5>
|
|
|
|
<p>In many time-constrained applications, there is no need to change the scheduling attributes dynamically during thread or process
|
|
execution, since the general use of these attributes is to reflect directly the time constraints of the application. Since these
|
|
time constraints are generally imposed to meet higher-level system requirements, such as accuracy or availability, they frequently
|
|
should remain unchanged during application execution.</p>
|
|
|
|
<p>However, there are important situations in which the scheduling attributes should be changed. Generally, this will occur when
|
|
external environmental conditions exist in which the time constraints change. Consider, for example, a space vehicle major mode
|
|
change, such as the change from ascent to descent mode, or the change from the space environment to the atmospheric environment. In
|
|
such cases, the frequency with which many of the sensors or actuators need to be read or written will change, which will
|
|
necessitate a priority change. In other cases, even the existence of a time constraint might be temporary, necessitating not just a
|
|
priority change, but also a policy change for ongoing threads or processes. For this reason, it is critical that the interface
|
|
should provide functions to change the scheduling parameters dynamically, but, as with many of the other realtime functions, it is
|
|
important that applications use them properly to avoid the possibility of unnecessarily degrading performance.</p>
|
|
|
|
<p>In providing functions for dynamically changing the scheduling behavior of threads, there were two options: provide functions to
|
|
get and set the individual scheduling parameters of threads, or provide a single interface to get and set all the scheduling
|
|
parameters for a given thread simultaneously. Both approaches have merit. Access functions for individual parameters allow simpler
|
|
control of thread scheduling for simple thread scheduling parameters. However, a single function for setting all the parameters for
|
|
a given scheduling policy is required when first setting that scheduling policy. Since the single all-encompassing functions are
|
|
required, it was decided to leave the interface as minimal as possible. Note that simpler functions (such as
|
|
<i>pthread_setprio</i>() for threads running under the priority-based schedulers) can be easily defined in terms of the
|
|
all-encompassing functions.</p>
|
|
|
|
<p>If the <a href="../functions/pthread_setschedparam.html"><i>pthread_setschedparam</i>()</a> function executes successfully, it
|
|
will have set all of the scheduling parameter values indicated in <i>param</i>; otherwise, none of the scheduling parameters will
|
|
have been modified. This is necessary to ensure that the scheduling of this and all other threads continues to be consistent in the
|
|
presence of an erroneous scheduling parameter.</p>
|
|
|
|
<p>The [EPERM] error value is included in the list of possible <a href=
|
|
"../functions/pthread_setschedparam.html"><i>pthread_setschedparam</i>()</a> error returns as a reflection of the fact that the
|
|
ability to change scheduling parameters increases risks to the implementation and application performance if the scheduling
|
|
parameters are changed improperly. For this reason, and based on some existing practice, it was felt that some implementations
|
|
would probably choose to define specific permissions for changing either a thread's own or another thread's scheduling parameters.
|
|
IEEE Std 1003.1-2001 does not include portable methods for setting or retrieving permissions, so any such use of
|
|
permissions is completely unspecified.</p>
|
|
|
|
<h5><a name="tag_03_02_09_18"></a>Mutex Initialization Scheduling Attributes</h5>
|
|
|
|
<p>In a priority-driven environment, a direct use of traditional primitives like mutexes and condition variables can lead to
|
|
unbounded priority inversion, where a higher priority thread can be blocked by a lower priority thread, or set of threads, for an
|
|
unbounded duration of time. As a result, it becomes impossible to guarantee thread deadlines. Priority inversion can be bounded and
|
|
minimized by the use of priority inheritance protocols. This allows thread deadlines to be guaranteed even in the presence of
|
|
synchronization requirements.</p>
|
|
|
|
<p>Two useful but simple members of the family of priority inheritance protocols are the basic priority inheritance protocol and
|
|
the priority ceiling protocol emulation. Under the Basic Priority Inheritance protocol (governed by the Thread Priority Inheritance
|
|
option), a thread that is blocking higher priority threads executes at the priority of the highest priority thread that it blocks.
|
|
This simple mechanism allows priority inversion to be bounded by the duration of critical sections and makes timing analysis
|
|
possible.</p>
|
|
|
|
<p>Under the Priority Ceiling Protocol Emulation protocol (governed by the Thread Priority Protection option), each mutex has a
|
|
priority ceiling, usually defined as the priority of the highest priority thread that can lock the mutex. When a thread is
|
|
executing inside critical sections, its priority is unconditionally increased to the highest of the priority ceilings of all the
|
|
mutexes owned by the thread. This protocol has two very desirable properties in uni-processor systems. First, a thread can be
|
|
blocked by a lower priority thread for at most the duration of one single critical section. Furthermore, when the protocol is
|
|
correctly used in a single processor, and if threads do not become blocked while owning mutexes, mutual deadlocks are
|
|
prevented.</p>
|
|
|
|
<p>The priority ceiling emulation can be extended to multiple processor environments, in which case the values of the priority
|
|
ceilings will be assigned depending on the kind of mutex that is being used: local to only one processor, or global, shared by
|
|
several processors. Local priority ceilings will be assigned the usual way, equal to the priority of the highest priority thread
|
|
that may lock that mutex. Global priority ceilings will usually be assigned a priority level higher than all the priorities
|
|
assigned to any of the threads that reside in the involved processors to avoid the effect called remote blocking.</p>
|
|
|
|
<h5><a name="tag_03_02_09_19"></a>Change the Priority Ceiling of a Mutex</h5>
|
|
|
|
<p>In order for the priority protect protocol to exhibit its desired properties of bounding priority inversion and avoidance of
|
|
deadlock, it is critical that the ceiling priority of a mutex be the same as the priority of the highest thread that can ever hold
|
|
it, or higher. Thus, if the priorities of the threads using such mutexes never change dynamically, there is no need ever to change
|
|
the priority ceiling of a mutex.</p>
|
|
|
|
<p>However, if a major system mode change results in an altered response time requirement for one or more application threads,
|
|
their priority has to change to reflect it. It will occasionally be the case that the priority ceilings of mutexes held also need
|
|
to change. While changing priority ceilings should generally be avoided, it is important that IEEE Std 1003.1-2001
|
|
provide these interfaces for those cases in which it is necessary.</p>
|
|
|
|
<h5><a name="tag_03_02_09_20"></a>Thread Cancelation</h5>
|
|
|
|
<p>Many existing threads packages have facilities for canceling an operation or canceling a thread. These facilities are used for
|
|
implementing user requests (such as the CANCEL button in a window-based application), for implementing OR parallelism (for example,
|
|
telling the other threads to stop working once one thread has found a forced mate in a parallel chess program), or for implementing
|
|
the ABORT mechanism in Ada.</p>
|
|
|
|
<p>POSIX programs traditionally have used the signal mechanism combined with either <a href=
|
|
"../functions/longjmp.html"><i>longjmp</i>()</a> or polling to cancel operations. Many POSIX programmers have trouble using these
|
|
facilities to solve their problems efficiently in a single-threaded process. With the introduction of threads, these solutions
|
|
become even more difficult to use.</p>
|
|
|
|
<p>The main issues with implementing a cancelation facility are specifying the operation to be canceled, cleanly releasing any
|
|
resources allocated to that operation, controlling when the target notices that it has been canceled, and defining the interaction
|
|
between asynchronous signals and cancelation.</p>
|
|
|
|
<h5><a name="tag_03_02_09_21"></a>Specifying the Operation to Cancel</h5>
|
|
|
|
<p>Consider a thread that calls through five distinct levels of program abstraction and then, inside the lowest-level abstraction,
|
|
calls a function that suspends the thread. (An abstraction boundary is a layer at which the client of the abstraction sees only the
|
|
service being provided and can remain ignorant of the implementation. Abstractions are often layered, each level of abstraction
|
|
being a client of the lower-level abstraction and implementing a higher-level abstraction.) Depending on the semantics of each
|
|
abstraction, one could imagine wanting to cancel only the call that causes suspension, only the bottom two levels, or the operation
|
|
being done by the entire thread. Canceling operations at a finer grain than the entire thread is difficult because threads are
|
|
active and they may be run in parallel on a multi-processor. By the time one thread can make a request to cancel an operation, the
|
|
thread performing the operation may have completed that operation and gone on to start another operation whose cancelation is not
|
|
desired. Thread IDs are not reused until the thread has exited, and either it was created with the <i>Attr detachstate</i>
|
|
attribute set to PTHREAD_CREATE_DETACHED or the <a href="../functions/pthread_join.html"><i>pthread_join</i>()</a> or <a href=
|
|
"../functions/pthread_detach.html"><i>pthread_detach</i>()</a> function has been called for that thread. Consequently, a thread
|
|
cancelation will never be misdirected when the thread terminates. For these reasons, the canceling of operations is done at the
|
|
granularity of the thread. Threads are designed to be inexpensive enough so that a separate thread may be created to perform each
|
|
separately cancelable operation; for example, each possibly long running user request.</p>
|
|
|
|
<p>For cancelation to be used in existing code, cancelation scopes and handlers will have to be established for code that needs to
|
|
release resources upon cancelation, so that it follows the programming discipline described in the text.</p>
|
|
|
|
<h5><a name="tag_03_02_09_22"></a>A Special Signal Versus a Special Interface</h5>
|
|
|
|
<p>Two different mechanisms were considered for providing the cancelation interfaces. The first was to provide an interface to
|
|
direct signals at a thread and then to define a special signal that had the required semantics. The other alternative was to use a
|
|
special interface that delivered the correct semantics to the target thread.</p>
|
|
|
|
<p>The solution using signals produced a number of problems. It required the implementation to provide cancelation in terms of
|
|
signals whereas a perfectly valid (and possibly more efficient) implementation could have both layered on a low-level set of
|
|
primitives. There were so many exceptions to the special signal (it cannot be used with <a href=
|
|
"../functions/kill.html"><i>kill</i>()</a>, no POSIX.1 interfaces can be used with it) that it was clearly not a valid signal. Its
|
|
semantics on delivery were also completely different from any existing POSIX.1 signal. As such, a special interface that did not
|
|
mandate the implementation and did not confuse the semantics of signals and cancelation was felt to be the better solution.</p>
|
|
|
|
<h5><a name="tag_03_02_09_23"></a>Races Between Cancelation and Resuming Execution</h5>
|
|
|
|
<p>Due to the nature of cancelation, there is generally no synchronization between the thread requesting the cancelation of a
|
|
blocked thread and events that may cause that thread to resume execution. For this reason, and because excess serialization hurts
|
|
performance, when both an event that a thread is waiting for has occurred and a cancelation request has been made and cancelation
|
|
is enabled, IEEE Std 1003.1-2001 explicitly allows the implementation to choose between returning from the blocking call
|
|
or acting on the cancelation request.</p>
|
|
|
|
<h5><a name="tag_03_02_09_24"></a>Interaction of Cancelation with Asynchronous Signals</h5>
|
|
|
|
<p>A typical use of cancelation is to acquire a lock on some resource and to establish a cancelation cleanup handler for releasing
|
|
the resource when and if the thread is canceled.</p>
|
|
|
|
<p>A correct and complete implementation of cancelation in the presence of asynchronous signals requires considerable care. An
|
|
implementation has to push a cancelation cleanup handler on the cancelation cleanup stack while maintaining the integrity of the
|
|
stack data structure. If an asynchronously-generated signal is posted to the thread during a stack operation, the signal handler
|
|
cannot manipulate the cancelation cleanup stack. As a consequence, asynchronous signal handlers may not cancel threads or otherwise
|
|
manipulate the cancelation state of a thread. Threads may, of course, be canceled by another thread that used a <a href=
|
|
"../functions/sigwait.html"><i>sigwait</i>()</a> function to wait synchronously for an asynchronous signal.</p>
|
|
|
|
<p>In order for cancelation to function correctly, it is required that asynchronous signal handlers not change the cancelation
|
|
state. This requires that some elements of existing practice, such as using <a href=
|
|
"../functions/longjmp.html"><i>longjmp</i>()</a> to exit from an asynchronous signal handler implicitly, be prohibited in cases
|
|
where the integrity of the cancelation state of the interrupt thread cannot be ensured.</p>
|
|
|
|
<h5><a name="tag_03_02_09_25"></a>Thread Cancelation Overview</h5>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Cancelability States</p>
|
|
|
|
<p>The three possible cancelability states (disabled, deferred, and asynchronous) are encoded into two separate bits ((disable,
|
|
enable) and (deferred, asynchronous)) to allow them to be changed and restored independently. For instance, short code sequences
|
|
that will not block sometimes disable cancelability on entry and restore the previous state upon exit. Likewise, long or unbounded
|
|
code sequences containing no convenient explicit cancelation points will sometimes set the cancelability type to asynchronous on
|
|
entry and restore the previous value upon exit.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Cancelation Points</p>
|
|
|
|
<p>Cancelation points are points inside of certain functions where a thread has to act on any pending cancelation request when
|
|
cancelability is enabled, if the function would block. As with checking for signals, operations need only check for pending
|
|
cancelation requests when the operation is about to block indefinitely.</p>
|
|
|
|
<p>The idea was considered of allowing implementations to define whether blocking calls such as <a href=
|
|
"../functions/read.html"><i>read</i>()</a> should be cancelation points. It was decided that it would adversely affect the design
|
|
of conforming applications if blocking calls were not cancelation points because threads could be left blocked in an uncancelable
|
|
state.</p>
|
|
|
|
<p>There are several important blocking routines that are specifically not made cancelation points:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p><a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a></p>
|
|
|
|
<p>If <a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a> were a cancelation point, every routine that
|
|
called it would also become a cancelation point (that is, any routine that touched shared state would automatically become a
|
|
cancelation point). For example, <a href="../functions/malloc.html"><i>malloc</i>()</a>, <a href=
|
|
"../functions/free.html"><i>free</i>()</a>, and <a href="../functions/rand.html"><i>rand</i>()</a> would become cancelation points
|
|
under this scheme. Having too many cancelation points makes programming very difficult, leading to either much disabling and
|
|
restoring of cancelability or much difficulty in trying to arrange for reliable cleanup at every possible place.</p>
|
|
|
|
<p>Since <a href="../functions/pthread_mutex_lock.html"><i>pthread_mutex_lock</i>()</a> is not a cancelation point, threads could
|
|
result in being blocked uninterruptibly for long periods of time if mutexes were used as a general synchronization mechanism. As
|
|
this is normally not acceptable, mutexes should only be used to protect resources that are held for small fixed lengths of time
|
|
where not being able to be canceled will not be a problem. Resources that need to be held exclusively for long periods of time
|
|
should be protected with condition variables.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/pthread_barrier_wait.html"><i>pthread_barrier_wait</i>()</a></p>
|
|
|
|
<p>Canceling a barrier wait will render a barrier unusable. Similar to a barrier timeout (which the standard developers rejected),
|
|
there is no way to guarantee the consistency of a barrier's internal data structures if a barrier wait is canceled.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><a href="../functions/pthread_spin_lock.html"><i>pthread_spin_lock</i>()</a></p>
|
|
|
|
<p>As with mutexes, spin locks should only be used to protect resources that are held for small fixed lengths of time where not
|
|
being cancelable will not be a problem.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Every library routine should specify whether or not it includes any cancelation points. Typically, only those routines that may
|
|
block or compute indefinitely need to include cancelation points.</p>
|
|
|
|
<p>Correctly coded routines only reach cancelation points after having set up a cancelation cleanup handler to restore invariants
|
|
if the thread is canceled at that point. Being cancelable only at specified cancelation points allows programmers to keep track of
|
|
actions needed in a cancelation cleanup handler more easily. A thread should only be made asynchronously cancelable when it is not
|
|
in the process of acquiring or releasing resources or otherwise in a state from which it would be difficult or impossible to
|
|
recover.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Thread Cancelation Cleanup Handlers</p>
|
|
|
|
<p>The cancelation cleanup handlers provide a portable mechanism, easy to implement, for releasing resources and restoring
|
|
invariants. They are easier to use than signal handlers because they provide a stack of cancelation cleanup handlers rather than a
|
|
single handler, and because they have an argument that can be used to pass context information to the handler.</p>
|
|
|
|
<p>The alternative to providing these simple cancelation cleanup handlers (whose only use is for cleaning up when a thread is
|
|
canceled) is to define a general exception package that could be used for handling and cleaning up after hardware traps and
|
|
software-detected errors. This was too far removed from the charter of providing threads to handle asynchrony. However, it is an
|
|
explicit goal of IEEE Std 1003.1-2001 to be compatible with existing exception facilities and languages having
|
|
exceptions.</p>
|
|
|
|
<p>The interaction of this facility and other procedure-based or language-level exception facilities is unspecified in this version
|
|
of IEEE Std 1003.1-2001. However, it is intended that it be possible for an implementation to define the relationship
|
|
between these cancelation cleanup handlers and Ada, C++, or other language-level exception handling facilities.</p>
|
|
|
|
<p>It was suggested that the cancelation cleanup handlers should also be called when the process exits or calls the <i>exec</i>
|
|
function. This was rejected partly due to the performance problem caused by having to call the cancelation cleanup handlers of
|
|
every thread before the operation could continue. The other reason was that the only state expected to be cleaned up by the
|
|
cancelation cleanup handlers would be the intraprocess state. Any handlers that are to clean up the interprocess state would be
|
|
registered with <a href="../functions/atexit.html"><i>atexit</i>()</a>. There is the orthogonal problem that the <i>exec</i>
|
|
functions do not honor the <a href="../functions/atexit.html"><i>atexit</i>()</a> handlers, but resolving this is beyond the scope
|
|
of IEEE Std 1003.1-2001.<br>
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Async-Cancel Safety</p>
|
|
|
|
<p>A function is said to be async-cancel-safe if it is written in such a way that entering the function with asynchronous
|
|
cancelability enabled will not cause any invariants to be violated, even if a cancelation request is delivered at any arbitrary
|
|
instruction. Functions that are async-cancel-safe are often written in such a way that they need to acquire no resources for their
|
|
operation and the visible variables that they may write are strictly limited.</p>
|
|
|
|
<p>Any routine that gets a resource as a side effect cannot be made async-cancel-safe (for example, <a href=
|
|
"../functions/malloc.html"><i>malloc</i>()</a>). If such a routine were called with asynchronous cancelability enabled, it might
|
|
acquire the resource successfully, but as it was returning to the client, it could act on a cancelation request. In such a case,
|
|
the application would have no way of knowing whether the resource was acquired or not.</p>
|
|
|
|
<p>Indeed, because many interesting routines cannot be made async-cancel-safe, most library routines in general are not
|
|
async-cancel-safe. Every library routine should specify whether or not it is async-cancel safe so that programmers know which
|
|
routines can be called from code that is asynchronously cancelable.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_09_26"></a>Thread Read-Write Locks</h5>
|
|
|
|
<h5><a name="tag_03_02_09_27"></a>Background</h5>
|
|
|
|
<p>Read-write locks are often used to allow parallel access to data on multi-processors, to avoid context switches on
|
|
uni-processors when multiple threads access the same data, and to protect data structures that are frequently accessed (that is,
|
|
read) but rarely updated (that is, written). The in-core representation of a file system directory is a good example of such a data
|
|
structure. One would like to achieve as much concurrency as possible when searching directories, but limit concurrent access when
|
|
adding or deleting files.</p>
|
|
|
|
<p>Although read-write locks can be implemented with mutexes and condition variables, such implementations are significantly less
|
|
efficient than is possible. Therefore, this synchronization primitive is included in IEEE Std 1003.1-2001 for the purpose
|
|
of allowing more efficient implementations in multi-processor systems.</p>
|
|
|
|
<h5><a name="tag_03_02_09_28"></a>Queuing of Waiting Threads</h5>
|
|
|
|
<p>The <a href="../functions/pthread_rwlock_unlock.html"><i>pthread_rwlock_unlock</i>()</a> function description states that one
|
|
writer or one or more readers must acquire the lock if it is no longer held by any thread as a result of the call. However, the
|
|
function does not specify which thread(s) acquire the lock, unless the Thread Execution Scheduling option is supported.</p>
|
|
|
|
<p>The standard developers considered the issue of scheduling with respect to the queuing of threads blocked on a read-write lock.
|
|
The question turned out to be whether IEEE Std 1003.1-2001 should require priority scheduling of read-write locks for
|
|
threads whose execution scheduling policy is priority-based (for example, SCHED_FIFO or SCHED_RR). There are tradeoffs between
|
|
priority scheduling, the amount of concurrency achievable among readers, and the prevention of writer and/or reader starvation.</p>
|
|
|
|
<p>For example, suppose one or more readers hold a read-write lock and the following threads request the lock in the listed
|
|
order:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>pthread_rwlock_wrlock() - Low priority thread writer_a
|
|
pthread_rwlock_rdlock() - High priority thread reader_a
|
|
pthread_rwlock_rdlock() - High priority thread reader_b
|
|
pthread_rwlock_rdlock() - High priority thread reader_c
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>When the lock becomes available, should <i>writer_a</i> block the high priority readers? Or, suppose a read-write lock becomes
|
|
available and the following are queued:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>pthread_rwlock_rdlock() - Low priority thread reader_a
|
|
pthread_rwlock_rdlock() - Low priority thread reader_b
|
|
pthread_rwlock_rdlock() - Low priority thread reader_c
|
|
pthread_rwlock_wrlock() - Medium priority thread writer_a
|
|
pthread_rwlock_rdlock() - High priority thread reader_d
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>If priority scheduling is applied then <i>reader_d</i> would acquire the lock and <i>writer_a</i> would block the remaining
|
|
readers. But should the remaining readers also acquire the lock to increase concurrency? The solution adopted takes into account
|
|
that when the Thread Execution Scheduling option is supported, high priority threads may in fact starve low priority threads (the
|
|
application developer is responsible in this case for designing the system in such a way that this starvation is avoided).
|
|
Therefore, IEEE Std 1003.1-2001 specifies that high priority readers take precedence over lower priority writers.
|
|
However, to prevent writer starvation from threads of the same or lower priority, writers take precedence over readers of the same
|
|
or lower priority.</p>
|
|
|
|
<p>Priority inheritance mechanisms are non-trivial in the context of read-write locks. When a high priority writer is forced to
|
|
wait for multiple readers, for example, it is not clear which subset of the readers should inherit the writer's priority.
|
|
Furthermore, the internal data structures that record the inheritance must be accessible to all readers, and this implies some sort
|
|
of serialization that could negate any gain in parallelism achieved through the use of multiple readers in the first place.
|
|
Finally, existing practice does not support the use of priority inheritance for read-write locks. Therefore, no specification of
|
|
priority inheritance or priority ceiling is attempted. If reliable priority-scheduled synchronization is absolutely required, it
|
|
can always be obtained through the use of mutexes.</p>
|
|
|
|
<h5><a name="tag_03_02_09_29"></a>Comparison to fcntl() Locks</h5>
|
|
|
|
<p>The read-write locks and the <a href="../functions/fcntl.html"><i>fcntl</i>()</a> locks in IEEE Std 1003.1-2001 share
|
|
a common goal: increasing concurrency among readers, thus increasing throughput and decreasing delay.</p>
|
|
|
|
<p>However, the read-write locks have two features not present in the <a href="../functions/fcntl.html"><i>fcntl</i>()</a> locks.
|
|
First, under priority scheduling, read-write locks are granted in priority order. Second, also under priority scheduling, writer
|
|
starvation is prevented by giving writers preference over readers of equal or lower priority.</p>
|
|
|
|
<p>Also, read-write locks can be used in systems lacking a file system, such as those conforming to the minimal realtime system
|
|
profile of IEEE Std 1003.13-1998.</p>
|
|
|
|
<h5><a name="tag_03_02_09_30"></a>History of Resolution Issues</h5>
|
|
|
|
<p>Based upon some balloting objections, early drafts specified the behavior of threads waiting on a read-write lock during the
|
|
execution of a signal handler, as if the thread had not called the lock operation. However, this specified behavior would require
|
|
implementations to establish internal signal handlers even though this situation would be rare, or never happen for many programs.
|
|
This would introduce an unacceptable performance hit in comparison to the little additional functionality gained. Therefore, the
|
|
behavior of read-write locks and signals was reverted back to its previous mutex-like specification.</p>
|
|
|
|
<h5><a name="tag_03_02_09_31"></a>Thread Interactions with Regular File Operations</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h4><a name="tag_03_02_10"></a>Sockets</h4>
|
|
|
|
<p>The base document for the sockets interfaces in IEEE Std 1003.1-2001 is the XNS, Issue 5.2 specification. This was
|
|
primarily chosen as it aligns with IPv6. Additional material has been added from IEEE Std 1003.1g-2000, notably socket
|
|
concepts, raw sockets, the <a href="../functions/pselect.html"><i>pselect</i>()</a> function, the <a href=
|
|
"../functions/sockatmark.html"><i>sockatmark</i>()</a> function, and the <a href=
|
|
"../basedefs/sys/select.h.html"><i><sys/select.h></i></a> header.</p>
|
|
|
|
<h5><a name="tag_03_02_10_01"></a>Address Families</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_02"></a>Addressing</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_03"></a>Protocols</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_04"></a>Routing</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_05"></a>Interfaces</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_06"></a>Socket Types</h5>
|
|
|
|
<p>The type <b>socklen_t</b> was invented to cover the range of implementations seen in the field. The intent of <b>socklen_t</b>
|
|
is to be the type for all lengths that are naturally bounded in size; that is, that they are the length of a buffer which cannot
|
|
sensibly become of massive size: network addresses, host names, string representations of these, ancillary data, control messages,
|
|
and socket options are examples. Truly boundless sizes are represented by <b>size_t</b> as in <a href=
|
|
"../functions/read.html"><i>read</i>()</a>, <a href="../functions/write.html"><i>write</i>()</a>, and so on.</p>
|
|
|
|
<p>All <b>socklen_t</b> types were originally (in BSD UNIX) of type <b>int</b>. During the development of
|
|
IEEE Std 1003.1-2001, it was decided to change all buffer lengths to <b>size_t</b>, which appears at face value to make
|
|
sense. When dual mode 32/64-bit systems came along, this choice unnecessarily complicated system interfaces because <b>size_t</b>
|
|
(with <b>long</b>) was a different size under ILP32 and LP64 models. Reverting to <b>int</b> would have happened except that some
|
|
implementations had already shipped 64-bit-only interfaces. The compromise was a type which could be defined to be any size by the
|
|
implementation: <b>socklen_t</b>.</p>
|
|
|
|
<h5><a name="tag_03_02_10_07"></a>Socket I/O Mode</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_08"></a>Socket Owner</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_09"></a>Socket Queue Limits</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_10"></a>Pending Error</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_11"></a>Socket Receive Queue</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_12"></a>Socket Out-of-Band Data State</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_13"></a>Connection Indication Queue</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_14"></a>Signals</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_15"></a>Asynchronous Errors</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_16"></a>Use of Options</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_17"></a>Use of Sockets for Local UNIX Connections</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_18"></a>Use of Sockets over Internet Protocols</h5>
|
|
|
|
<p>A raw socket allows privileged users direct access to a protocol; for example, raw access to the IP and ICMP protocols is
|
|
possible through raw sockets. Raw sockets are intended for knowledgeable applications that wish to take advantage of some protocol
|
|
feature not directly accessible through the other sockets interfaces.</p>
|
|
|
|
<h5><a name="tag_03_02_10_19"></a>Use of Sockets over Internet Protocols Based on IPv4</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_03_02_10_20"></a>Use of Sockets over Internet Protocols Based on IPv6</h5>
|
|
|
|
<p>The Open Group Base Resolution bwg2001-012 is applied, clarifying that IPv6 implementations are required to support use of
|
|
AF_INET6 sockets over IPv4.</p>
|
|
|
|
<h4><a name="tag_03_02_11"></a>Tracing</h4>
|
|
|
|
<p>The organization of the tracing rationale differs from the traditional rationale in that this tracing rationale text is written
|
|
against the trace interface as a whole, rather than against the individual components of the trace interface or the normative
|
|
section in which those components are defined. Therefore the sections below do not parallel the sections of normative text in
|
|
IEEE Std 1003.1-2001.</p>
|
|
|
|
<h5><a name="tag_03_02_11_01"></a>Objectives</h5>
|
|
|
|
<p>The intended uses of tracing are application-system debugging during system development, as a "flight recorder" for
|
|
maintenance of fielded systems, and as a performance measurement tool. In all of these intended uses, the vendor-supplied computer
|
|
system and its software are, for this discussion, assumed error-free; the intent being to debug the user-written and/or third-party
|
|
application code, and their interactions. Clearly, problems with the vendor-supplied system and its software will be uncovered from
|
|
time to time, but this is a byproduct of the primary activity, debugging user code.</p>
|
|
|
|
<p>Another need for defining a trace interface in POSIX stems from the objective to provide an efficient portable way to perform
|
|
benchmarks. Existing practice shows that such interfaces are commonly used in a variety of systems but with little commonality. As
|
|
part of the benchmarking needs, two aspects within the trace interface must be considered.</p>
|
|
|
|
<p>The first, and perhaps more important one, is the qualitative aspect.</p>
|
|
|
|
<p>The second is the quantitative aspect.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Qualitative Aspect</p>
|
|
|
|
<p>To better understand this aspect, let us consider an example. Suppose that you want to organize a number of actions to be
|
|
performed during the day. Some of these actions are known at the beginning of the day. Some others, which may be more or less
|
|
important, will be triggered by reading your mail. During the day you will make some phone calls and synchronously receive some
|
|
more information. Finally you will receive asynchronous phone calls that also will trigger actions. If you, or somebody else,
|
|
examines your day at work, you, or he, can discover that you have not efficiently organized your work. For instance, relative to
|
|
the phone calls you made, would it be preferable to make some of these early in the morning? Or to delay some others until the end
|
|
of the day? Relative to the phone calls you have received, you might find that somebody you called in the morning has called you 10
|
|
times while you were performing some important work. To examine, afterwards, your day at work, you record in sequence all the trace
|
|
events relative to your work. This should give you a chance of organizing your next day at work.</p>
|
|
|
|
<p>This is the qualitative aspect of the trace interface. The user of a system needs to keep a trace of particular points the
|
|
application passes through, so that he can eventually make some changes in the application and/or system configuration, to give the
|
|
application a chance of running more efficiently.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Quantitative Aspect</p>
|
|
|
|
<p>This aspect concerns primarily realtime applications, where missed deadlines can be undesirable. Although there are, in
|
|
IEEE Std 1003.1-2001, some interfaces useful for such applications (timeouts, execution time monitoring, and so on),
|
|
there are no APIs to aid in the tuning of a realtime application's behavior ( <b>timespec</b> in timeouts, length of message
|
|
queues, duration of driver interrupt service routine, and so on). The tuning of an application needs a means of recording
|
|
timestamped important trace events during execution in order to analyze offline, and eventually, to tune some realtime features
|
|
(redesign the system with less functionalities, readjust timeouts, redesign driver interrupts, and so on).</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_11_02"></a>Detailed Objectives</h5>
|
|
|
|
<p>Objectives were defined to build the trace interface and are kept for historical interest. Although some objectives are not
|
|
fully respected in this trace interface, the concept of the POSIX trace interface assumes the following points:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>It must be possible to trace both system and user trace events concurrently.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to trace per-process trace events and also to trace system trace events which are unrelated to any
|
|
particular process. A per-process trace event is either user-initiated or system-initiated.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to control tracing on a per-process basis from either inside or outside the process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to control tracing on a per-thread basis from inside the enclosing process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace points must be controllable by trace event type ID from inside and outside of the process. Multiple trace points can have
|
|
the same trace event type ID, and will be controlled jointly.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Recording of trace events is dependent on both trace event type ID and the process/thread. Both must be enabled in order to
|
|
record trace events. System trace events may or may not be handled differently.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The API must not mandate the ability to control tracing for more than one process at the same time.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>There is no objective for trace control on anything bigger than a process; for example, group or session.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace propagation and control:</p>
|
|
|
|
<ol type="a">
|
|
<li>
|
|
<p>Trace propagation across <a href="../functions/fork.html"><i>fork</i>()</a> is optional; the default is to not trace a child
|
|
process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace control must span <a href="../functions/pthread_create.html"><i>pthread_create</i>()</a> operations; that is, if a process
|
|
is being traced, any thread will be traced as well if this thread allows tracing. The default is to allow tracing.</p>
|
|
</li>
|
|
</ol>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace control must not span <i>exec</i> or <a href="../functions/posix_spawn.html"><i>posix_spawn</i>()</a> operations.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A triggering API is not required. The triggering API is the ability to command or stop tracing based on the occurrence of a
|
|
specific trace event other than a POSIX_TRACE_START trace event or a POSIX_TRACE_STOP trace event.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace log entries must have timestamps of implementation-defined resolution. Implementations are exhorted to support at least
|
|
microsecond resolution. When a trace log entry is retrieved, it must have timestamp, PC address, PID, and TID of the entity that
|
|
generated the trace event.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Independently developed code should be able to use trace facilities without coordination and without conflict.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Even if the trace points in the trace calls are not unique, the trace log entries (after any processing) must be uniquely
|
|
identified as to trace point.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>There must be a standard API to read the trace stream.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The format of the trace stream and the trace log is opaque and unspecified.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to read a completed trace, if recorded on some suitable non-volatile storage, even subsequent to a power
|
|
cycle or subsequent cold boot of the system.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Support of analysis of a trace log while it is being formed is implementation-defined.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The API must allow the application to write trace stream identification information into the trace stream and to be able to
|
|
retrieve it, without it being overwritten by trace entries, even if the trace stream is full.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to specify the destination of trace data produced by trace events.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to have different trace streams, and for the tracing enabled by one trace stream to be completely
|
|
independent of the tracing of another trace stream.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to trace events from threads in different CPUs.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The API must support one or more trace streams per-system, and one or more trace streams per-process, up to an
|
|
implementation-defined set of per-system and per-process maximums.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>It must be possible to determine the order in which the trace events happened, without necessarily depending on the clock, up to
|
|
an implementation-defined time resolution.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>For performance reasons, the trace event point call(s) must be implementable as a macro (see the ISO POSIX-1:1996 standard,
|
|
1.3.4, Statement 2).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>IEEE Std 1003.1-2001 must not define the trace points which a conforming system must implement, except for trace
|
|
points used in the control of tracing.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The APIs must be thread-safe, and trace points should be lock-free (that is, not require a lock to gain exclusive access to some
|
|
resource).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The user-provided information associated with a trace event is variable-sized, up to some maximum size.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Bounds on record and trace stream sizes:</p>
|
|
|
|
<ol type="a">
|
|
<li>
|
|
<p>The API must permit the application to declare the upper bounds on the length of an application data record. The system must
|
|
return the limit it used. The limit used may be smaller than requested.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The API must permit the application to declare the upper bounds on the size of trace streams. The system must return the limit
|
|
it used. The limit used may be different, either larger or smaller, than requested.</p>
|
|
</li>
|
|
</ol>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The API must be able to pass any fundamental data type, and a structured data type composed only of fundamental types. The API
|
|
must be able to pass data by reference, given only as an address and a length. Fundamental types are the POSIX.1 types (see the <a
|
|
href="../basedefs/sys/types.h.html"><i><sys/types.h></i></a> header) plus those defined in the ISO C standard.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The API must apply the POSIX notions of ownership and permission to recorded trace data, corresponding to the sources of that
|
|
data.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<h5><a name="tag_03_02_11_03"></a>Comments on Objectives</h5>
|
|
|
|
<basefont size="2">
|
|
|
|
<dl>
|
|
<dt><b>Note:</b></dt>
|
|
|
|
<dd>In the following comments, numbers in square brackets refer to the above objectives.</dd>
|
|
</dl>
|
|
|
|
<basefont size="3">
|
|
|
|
<p>It is necessary to be able to obtain a trace stream for a complete activity. Thus there is a requirement to be able to trace
|
|
both application and system trace events. A per-process trace event is either user-initiated, like the <a href=
|
|
"../functions/write.html"><i>write</i>()</a> function, or system-initiated, like a timer expiration. There is also a need to be
|
|
able to trace an entire process' activity even when it has threads in multiple CPUs. To avoid excess trace activity, it is
|
|
necessary to be able to control tracing on a trace event type basis.<br>
|
|
[Objectives 1,2,5,22]</p>
|
|
|
|
<p>There is a need to be able to control tracing on a per-process basis, both from inside and outside the process; that is, a
|
|
process can start a trace activity on itself or any other process. There is also the perceived need to allow the definition of a
|
|
maximum number of trace streams per system.<br>
|
|
[Objectives 3,23]</p>
|
|
|
|
<p>From within a process, it is necessary to be able to control tracing on a per-thread basis. This provides an additional
|
|
filtering capability to keep the amount of traced data to a minimum. It also allows for less ambiguity as to the origin of trace
|
|
events. It is recognized that thread-level control is only valid from within the process itself. It is also desirable to know the
|
|
maximum number of trace streams per process that can be started. The API should not require thread synchronization or mandate
|
|
priority inversions that would cause the thread to block. However, the API must be thread-safe.<br>
|
|
[Objectives 4,23,24,27]</p>
|
|
|
|
<p>There was no perceived objective to control tracing on anything larger than a process; for example, a group or session. Also,
|
|
the ability to start or stop a trace activity on multiple processes atomically may be very difficult or cumbersome in some
|
|
implementations.<br>
|
|
[Objectives 6,8]</p>
|
|
|
|
<p>It is also necessary to be able to control tracing by trace event type identifier, sometimes called a trace hook ID. However,
|
|
there is no mandated set of system trace events, since such trace points are implementation-defined. The API must not require from
|
|
the operating system facilities that are not standard.<br>
|
|
[Objectives 6,26]</p>
|
|
|
|
<p>Trace control must span <a href="../functions/fork.html"><i>fork</i>()</a> and <a href=
|
|
"../functions/pthread_create.html"><i>pthread_create</i>()</a>. If not, there will be no way to ensure that an application's
|
|
activity is entirely traced. The newly forked child would not be able to turn on its tracing until after it obtained control after
|
|
the fork, and trace control externally would be even more problematic.<br>
|
|
[Objective 9]</p>
|
|
|
|
<p>Since <i>exec</i> and <a href="../functions/posix_spawn.html"><i>posix_spawn</i>()</a> represent a complete change in the
|
|
execution of a task (a new program), trace control need not persist over an <i>exec</i> or <a href=
|
|
"../functions/posix_spawn.html"><i>posix_spawn</i>()</a>.<br>
|
|
[Objective 10]</p>
|
|
|
|
<p>Where trace activities are started on multiple processes, these trace activities should not interfere with each other.<br>
|
|
[Objective 21]</p>
|
|
|
|
<p>There is no need for a triggering objective, primarily for performance reasons; see also <a href="#tag_03_02_11_32">Rationale on
|
|
Triggering</a> , rationale on triggering.<br>
|
|
[Objective 11]</p>
|
|
|
|
<p>It must be possible to determine the origin of each traced event. The process and thread identifiers for each trace event are
|
|
needed. Also there was a perceived need for a user-specifiable origin, but it was felt that this would create too much
|
|
overhead.<br>
|
|
[Objectives 12,14]</p>
|
|
|
|
<p>An allowance must be made for trace points to come embedded in software components from several different sources and vendors
|
|
without requiring coordination.<br>
|
|
[Objective 13]</p>
|
|
|
|
<p>There is a requirement to be able to uniquely identify trace points that may have the same trace stream identifier. This is only
|
|
necessary when a trace report is produced.<br>
|
|
[Objectives 12,14]</p>
|
|
|
|
<p>Tracing is a very performance-sensitive activity, and will therefore likely be implemented at a low level within the system.
|
|
Hence the interface must not mandate any particular buffering or storage method. Therefore, a standard API is needed to read a
|
|
trace stream. Also the interface must not mandate the format of the trace data, and the interface must not assume a trace storage
|
|
method. Due to the possibility of a monolithic kernel and the possible presence of multiple processes capable of running trace
|
|
activities, the two kinds of trace events may be stored in two separate streams for performance reasons. A mandatory dump
|
|
mechanism, common in some existing practice, has been avoided to allow the implementation of this set of functions on small
|
|
realtime profiles for which the concept of a file system is not defined. The trace API calls should be implemented as macros.<br>
|
|
[Objectives 15,16,25,30]</p>
|
|
|
|
<p>Since a trace facility is a valuable service tool, the output (or log) of a completed trace stream that is written to permanent
|
|
storage must be readable on other systems of the type that produced the trace log. Note that there is no objective to be able to
|
|
interpret a trace log that was not successfully completed.<br>
|
|
[Objectives 17,18,19]</p>
|
|
|
|
<p>For trace streams written to permanent storage, a way to specify the destination of the trace stream is needed.<br>
|
|
[Objective 20]</p>
|
|
|
|
<p>There is a requirement to be able to depend on the ordering of trace events up to some implementation-defined time interval. For
|
|
example, there is a need to know the time period during which, if trace events are closer together, their ordering is unspecified.
|
|
Events that occur within an interval smaller than this resolution may or may not be read back in the correct order.<br>
|
|
[Objective 24]</p>
|
|
|
|
<p>The application should be able to know how much data can be traced. When trace event types can be filtered, the application
|
|
should be able to specify the approximate maximum amount of data that will be traced in a trace event so resources can be more
|
|
efficiently allocated.<br>
|
|
[Objectives 28,29]</p>
|
|
|
|
<p>Users should not be able to trace data to which they would not normally have access. System trace events corresponding to a
|
|
process/thread should be associated with the ownership of that process/thread.<br>
|
|
[Objective 31]<br>
|
|
</p>
|
|
|
|
<h5><a name="tag_03_02_11_04"></a>Trace Model</h5>
|
|
|
|
<h5><a name="tag_03_02_11_05"></a>Introduction</h5>
|
|
|
|
<p>The model is based on two base entities: the "Trace Stream" and the "Trace Log", and a recorded unit called the "Trace
|
|
Event". The possibility of using Trace Streams and Trace Logs separately gives two use dimensions and solves both the performance
|
|
issue and the full-information system issue. In the case of a trace stream without log, specific information, although reduced in
|
|
quantity, is required to be registered, in a possibly small realtime system, with as little overhead as possible. The Trace Log
|
|
option has been added for small realtime systems. In the case of a trace stream with log, considerable complex application-specific
|
|
information needs to be collected.</p>
|
|
|
|
<h5><a name="tag_03_02_11_06"></a>Trace Model Description</h5>
|
|
|
|
<p>The trace model can be examined for three different subfunctions: Application Instrumentation, Trace Operation Control, and
|
|
Trace Analysis.</p>
|
|
|
|
<dl compact>
|
|
<dt></dt>
|
|
|
|
<dd><img src=".././Figures/b-2.gif"></dd>
|
|
</dl>
|
|
|
|
<center><b><a name="tagfcjh_2"></a> Figure: Trace System Overview: for Offline Analysis</b></center>
|
|
|
|
<p>Each of these subfunctions requires specific characteristics of the trace mechanism API.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Application Instrumentation</p>
|
|
|
|
<p>When instrumenting an application, the programmer is not concerned about the future use of the trace events in the trace stream
|
|
or the trace log, the full policy of the trace stream, or the eventual pre-filtering of trace events. But he is concerned about the
|
|
correct determination of the specific trace event type identifier, regardless of how many independent libraries are used in the
|
|
same user application; see <a href="#tagfcjh_2">Trace System Overview: for Offline Analysis</a> and <a href="#tagfcjh_3">Trace
|
|
System Overview: for Online Analysis</a> .</p>
|
|
|
|
<p>This trace API provides the necessary operations to accomplish this subfunction. This is done by providing functions to
|
|
associate a programmer-defined name with an implementation-defined trace event type identifier (see the <a href=
|
|
"../functions/posix_trace_eventid_open.html"><i>posix_trace_eventid_open</i>()</a> function), and to send this trace event into a
|
|
potential trace stream (see the <a href="../functions/posix_trace_event.html"><i>posix_trace_event</i>()</a> function).<br>
|
|
</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace Operation Control</p>
|
|
|
|
<p>When controlling the recording of trace events in a trace stream, the programmer is concerned with the correct initialization of
|
|
the trace mechanism (that is, the sizing of the trace stream), the correct retention of trace events in a permanent storage, the
|
|
correct dynamic recording of trace events, and so on.</p>
|
|
|
|
<p>This trace API provides the necessary material to permit this efficiently. This is done by providing functions to initialize a
|
|
new trace stream, and optionally a trace log:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Trace Stream Attributes Object Initialization (see <a href=
|
|
"../functions/posix_trace_attr_init.html"><i>posix_trace_attr_init</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Functions to Retrieve or Set Information About a Trace Stream (see <a href=
|
|
"../functions/posix_trace_attr_getgenversion.html"><i>posix_trace_attr_getgenversion</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Functions to Retrieve or Set the Behavior of a Trace Stream (see <a href=
|
|
"../functions/posix_trace_attr_getinherited.html"><i>posix_trace_attr_getinherited</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Functions to Retrieve or Set Trace Stream Size Attributes (see <a href=
|
|
"../functions/posix_trace_attr_getmaxusereventsize.html"><i>posix_trace_attr_getmaxusereventsize</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace Stream Initialization, Flush, and Shutdown from a Process (see <a href=
|
|
"../functions/posix_trace_create.html"><i>posix_trace_create</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Clear Trace Stream and Trace Log (see <a href="../functions/posix_trace_clear.html"><i>posix_trace_clear</i>()</a>)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>To select the trace event types that are to be traced:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Manipulate Trace Event Type Identifier (see <a href=
|
|
"../functions/posix_trace_trid_eventid_open.html"><i>posix_trace_trid_eventid_open</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Iterate over a Mapping of Trace Event Type (see <a href=
|
|
"../functions/posix_trace_eventtypelist_getnext_id.html"><i>posix_trace_eventtypelist_getnext_id</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Manipulate Trace Event Type Sets (see <a href=
|
|
"../functions/posix_trace_eventset_empty.html"><i>posix_trace_eventset_empty</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Set Filter of an Initialized Trace Stream (see <a href=
|
|
"../functions/posix_trace_set_filter.html"><i>posix_trace_set_filter</i>()</a>)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>To control the execution of an active trace stream:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Trace Start and Stop (see <a href="../functions/posix_trace_start.html"><i>posix_trace_start</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Functions to Retrieve the Trace Attributes or Trace Statuses (see <a href=
|
|
"../functions/posix_trace_get_attr.html"><i>posix_trace_get_attr</i>()</a>)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<img src=".././Figures/b-3.gif">
|
|
|
|
<center><b><a name="tagfcjh_3"></a> Figure: Trace System Overview: for Online Analysis</b></center>
|
|
|
|
<br>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Trace Analysis</p>
|
|
|
|
<p>Once correctly recorded, on permanent storage or not, an ultimate activity consists of the analysis of the recorded information.
|
|
If the recorded data is on permanent storage, a specific open operation is required to associate a trace stream to a trace log.</p>
|
|
|
|
<p>The first intent of the group was to request the presence of a system identification structure in the trace stream attribute.
|
|
This was, for the application, to allow some portable way to process the recorded information. However, there is no requirement
|
|
that the <b>utsname</b> structure, on which this system identification was based, be portable from one machine to another, so the
|
|
contents of the attribute cannot be interpreted correctly by an application conforming to IEEE Std 1003.1-2001.</p>
|
|
|
|
<p>This modification has been incorporated and requests that some unspecified information be recorded in the trace log in order to
|
|
fail opening it if the analysis process and the controller process were running in different types of machine, but does not request
|
|
that this information be accessible to the application. This modification has implied a modification in the <a href=
|
|
"../functions/posix_trace_open.html"><i>posix_trace_open</i>()</a> function error code returns.</p>
|
|
|
|
<p>This trace API provides functions to:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Extract trace stream identification attributes (see <a href=
|
|
"../functions/posix_trace_attr_getgenversion.html"><i>posix_trace_attr_getgenversion</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Extract trace stream behavior attributes (see <a href=
|
|
"../functions/posix_trace_attr_getinherited.html"><i>posix_trace_attr_getinherited</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Extract trace event, stream, and log size attributes (see <a href=
|
|
"../functions/posix_trace_attr_getmaxusereventsize.html"><i>posix_trace_attr_getmaxusereventsize</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Look up trace event type names (see <a href=
|
|
"../functions/posix_trace_eventid_get_name.html"><i>posix_trace_eventid_get_name</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Iterate over trace event type identifiers (see <a href=
|
|
"../functions/posix_trace_eventtypelist_getnext_id.html"><i>posix_trace_eventtypelist_getnext_id</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Open, rewind, and close a trace log (see <a href="../functions/posix_trace_open.html"><i>posix_trace_open</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Read trace stream attributes and status (see <a href=
|
|
"../functions/posix_trace_get_attr.html"><i>posix_trace_get_attr</i>()</a>)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Read trace events (see <a href="../functions/posix_trace_getnext_event.html"><i>posix_trace_getnext_event</i>()</a>)</p>
|
|
</li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Due to the following two reasons:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>The requirement that the trace system must not add unacceptable overhead to the traced process and so that the trace event point
|
|
execution must be fast</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The traced application does not care about tracing errors</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>the trace system cannot return any internal error to the application. Internal error conditions can range from unrecoverable
|
|
errors that will force the active trace stream to abort, to small errors that can affect the quality of tracing without aborting
|
|
the trace stream. The group decided to define a system trace event to report to the analysis process such internal errors. It is
|
|
not the intention of IEEE Std 1003.1-2001 to require an implementation to report an internal error that corrupts or
|
|
terminates tracing operation. The implementor is free to decide which internal documented errors, if any, the trace system is able
|
|
to report.<br>
|
|
</p>
|
|
|
|
<h5><a name="tag_03_02_11_07"></a>States of a Trace Stream</h5>
|
|
|
|
<dl compact>
|
|
<dt></dt>
|
|
|
|
<dd><img src=".././Figures/b-4.gif"></dd>
|
|
</dl>
|
|
|
|
<center><b><a name="tagfcjh_4"></a> Figure: Trace System Overview: States of a Trace Stream</b></center>
|
|
|
|
<p><a href="#tagfcjh_4">Trace System Overview: States of a Trace Stream</a> shows the different states an active trace stream
|
|
passes through. After the <a href="../functions/posix_trace_create.html"><i>posix_trace_create</i>()</a> function call, a trace
|
|
stream becomes CREATED and a trace stream is associated for the future collection of trace events. The status of the trace stream
|
|
is POSIX_TRACE_SUSPENDED. The state becomes STARTED after a call to the <a href=
|
|
"../functions/posix_trace_start.html"><i>posix_trace_start</i>()</a> function, and the status becomes POSIX_TRACE_RUNNING. In this
|
|
state, all trace events that are not filtered out will be stored into the trace stream. After a call to <a href=
|
|
"../functions/posix_trace_stop.html"><i>posix_trace_stop</i>()</a>, the trace stream becomes STOPPED (and the status
|
|
POSIX_TRACE_SUSPENDED). In this state, no new trace events will be recorded in the trace stream, but previously recorded trace
|
|
events may continue to be read.</p>
|
|
|
|
<p>After a call to <a href="../functions/posix_trace_shutdown.html"><i>posix_trace_shutdown</i>()</a>, the trace stream is in the
|
|
state COMPLETED. The trace stream no longer exists but, if the Trace Log option is supported, all the information contained in it
|
|
has been logged. If a log object has not been associated with the trace stream at the creation, it is the responsibility of the
|
|
trace controller process to not shut the trace stream down while trace events remain to be read in the stream.</p>
|
|
|
|
<h5><a name="tag_03_02_11_08"></a>Tracing All Processes</h5>
|
|
|
|
<p>Some implementations have a tracing subsystem with the ability to trace all processes. This is useful to debug some types of
|
|
device drivers such as those for ATM or X25 adapters. These types of adapters are used by several independent processes, that are
|
|
not issued from the same process.</p>
|
|
|
|
<p>The POSIX trace interface does not define any constant or option to create a trace stream tracing all processes. POSIX.1 does
|
|
not prevent this type of implementation and an implementor is free to add this capability. Nevertheless, the trace interface allows
|
|
tracing of all the system trace events and all the processes issued from the same process.</p>
|
|
|
|
<p>If such a tracing system capability has to be implemented, when a trace stream is created, it is recommended that a constant
|
|
named POSIX_TRACE_ALLPROC be used instead of the process identifier in the argument of the <a href=
|
|
"../functions/posix_trace_create.html"><i>posix_trace_create</i>()</a> or <a href=
|
|
"../functions/posix_trace_create_withlog.html"><i>posix_trace_create_withlog</i>()</a> function. A possible value for
|
|
POSIX_TRACE_ALLPROC may be -1 instead of a real process identifier.</p>
|
|
|
|
<p>The implementor has to be aware that there is some impact on the tracing behavior as defined in the POSIX trace interface. For
|
|
example:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>If the default value for the inheritance attribute is set to POSIX_TRACE_CLOSE_FOR_CHILD, the implementation has to stop tracing
|
|
for the child process.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The trace controller which is creating this type of trace stream must have the appropriate privilege to trace all the
|
|
processes.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_11_09"></a>Trace Storage</h5>
|
|
|
|
<p>The model is based on two types of trace events: system trace events and user-defined trace events. The internal representation
|
|
of trace events is implementation-defined, and so the implementor is free to choose the more suitable, practical, and efficient way
|
|
to design the internal management of trace events. For the timestamping operation, the model does not impose the CLOCK_REALTIME or
|
|
any other clock. The buffering allocation and operation follow the same principle. The implementor is free to use one or more
|
|
buffers to record trace events; the interface assumes only a logical trace stream of sequentially recorded trace events. Regarding
|
|
flushing of trace events, the interface allows the definition of a trace log object which typically can be a file. But the group
|
|
was also aware of defining functions to permit the use of this interface in small realtime systems, which may not have general file
|
|
system capabilities. For instance, the three functions <a href=
|
|
"../functions/posix_trace_getnext_event.html"><i>posix_trace_getnext_event</i>()</a> (blocking), <a href=
|
|
"../functions/posix_trace_timedgetnext_event.html"><i>posix_trace_timedgetnext_event</i>()</a> (blocking with timeout), and <a
|
|
href="../functions/posix_trace_trygetnext_event.html"><i>posix_trace_trygetnext_event</i>()</a> (non-blocking) are proposed to read
|
|
the recorded trace events.</p>
|
|
|
|
<p>The policy to be used when the trace stream becomes full also relies on common practice:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>For an active trace stream, the POSIX_TRACE_LOOP trace stream policy permits automatic overrun (overwrite of oldest trace
|
|
events) while waiting for some user-defined condition to cause tracing to stop. By contrast, the POSIX_TRACE_UNTIL_FULL trace
|
|
stream policy requires the system to stop tracing when the trace stream is full. However, if the trace stream that is full is at
|
|
least partially emptied by a call to the <a href="../functions/posix_trace_flush.html"><i>posix_trace_flush</i>()</a> function or
|
|
by calls to the <a href="../functions/posix_trace_getnext_event.html"><i>posix_trace_getnext_event</i>()</a> function, the trace
|
|
system will automatically resume tracing.</p>
|
|
|
|
<p>If the Trace Log option is supported, the operation of the POSIX_TRACE_FLUSH policy is an extension of the
|
|
POSIX_TRACE_UNTIL_FULL policy. The automatic free operation (by flushing to the associated trace log) is added.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If a log is associated with the trace stream and this log is a regular file, these policies also apply for the log. One more
|
|
policy, POSIX_TRACE_APPEND, is defined to allow indefinite extension of the log. Since the log destination can be any device or
|
|
pseudo-device, the implementation may not be able to manipulate the destination as required by IEEE Std 1003.1-2001. For
|
|
this reason, the behavior of the log full policy may be unspecified depending on the trace log type.</p>
|
|
|
|
<p>The current trace interface does not define a service to preallocate space for a trace log file, because this space can be
|
|
preallocated by means of a call to the <a href="../functions/posix_fallocate.html"><i>posix_fallocate</i>()</a> function. This
|
|
function could be called after the file has been opened, but before the trace stream is created. The <a href=
|
|
"../functions/posix_fallocate.html"><i>posix_fallocate</i>()</a> function ensures that any required storage for regular file data
|
|
is allocated on the file system storage media. If <a href="../functions/posix_fallocate.html"><i>posix_fallocate</i>()</a> returns
|
|
successfully, subsequent writes to the specified file data will not fail due to the lack of free space on the file system storage
|
|
media. Besides trace events, a trace stream also includes trace attributes and the mapping from trace event names to trace event
|
|
type identifiers. The implementor is free to choose how to store the trace attributes and the trace event type map, but must ensure
|
|
that this information is not lost when a trace stream overrun occurs.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_03_02_11_10"></a>Trace Programming Examples</h5>
|
|
|
|
<p>Several programming examples are presented to show the code of the different possible subfunctions using a trace subsystem. All
|
|
these programs need to include the <a href="../basedefs/trace.h.html"><i><trace.h></i></a> header. In the examples shown,
|
|
error checking is omitted for more simplicity.</p>
|
|
|
|
<h5><a name="tag_03_02_11_11"></a>Trace Operation Control</h5>
|
|
|
|
<p>These examples show the creation of a trace stream for another process; one which is already trace instrumented. All the default
|
|
trace stream attributes are used to simplify programming in the first example. The second example shows more possibilities.</p>
|
|
|
|
<h5><a name="tag_03_02_11_12"></a>First Example</h5>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
trace_attr_t attr;
|
|
pid_t pid = traced_process_pid;
|
|
int fd;
|
|
trace_id_t trid;
|
|
<br>
|
|
- - - - - -
|
|
/* Initialize trace stream attributes */
|
|
posix_trace_attr_init(&attr);
|
|
/* Open a trace log */
|
|
fd=open("/tmp/mytracelog",...);
|
|
/*
|
|
* Create a new trace associated with a log
|
|
* and with default attributes
|
|
*/
|
|
<br>
|
|
posix_trace_create_withlog(pid, &attr, fd, &trid);
|
|
<br>
|
|
/* Trace attribute structure can now be destroyed */
|
|
posix_trace_attr_destroy(&attr);
|
|
/* Start of trace event recording */
|
|
posix_trace_start(trid);
|
|
- - - - - -
|
|
- - - - - -
|
|
/* Duration of tracing */
|
|
- - - - - -
|
|
- - - - - -
|
|
/* Stop and shutdown of trace activity */
|
|
posix_trace_shutdown(trid);
|
|
- - - - - -
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_13"></a>Second Example</h5>
|
|
|
|
<p>Between the initialization of the trace stream attributes and the creation of the trace stream, these trace stream attributes
|
|
may be modified; see <a href="#tag_03_02_11_19">Trace Stream Attribute Manipulation</a> for a specific programming example. Between
|
|
the creation and the start of the trace stream, the event filter may be set; after the trace stream is started, the event filter
|
|
may be changed. The setting of an event set and the change of a filter is shown in <a href="#tag_03_02_11_20">Create a Trace Event
|
|
Type Set and Change the Trace Event Type Filter</a> .</p>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
trace_attr_t attr;
|
|
pid_t pid = traced_process_pid;
|
|
int fd;
|
|
trace_id_t trid;
|
|
- - - - - -
|
|
/* Initialize trace stream attributes */
|
|
posix_trace_attr_init(&attr);
|
|
/* Attr default may be changed at this place; see example */
|
|
- - - - - -
|
|
/* Create and open a trace log with R/W user access */
|
|
fd=open("/tmp/mytracelog",O_WRONLY|O_CREAT,S_IRUSR|S_IWUSR);
|
|
/* Create a new trace associated with a log */
|
|
posix_trace_create_withlog(pid, &attr, fd, &trid);
|
|
/*
|
|
* If the Trace Filter option is supported
|
|
* trace event type filter default may be changed at this place;
|
|
* see example about changing the trace event type filter
|
|
*/
|
|
posix_trace_start(trid);
|
|
- - - - - -
|
|
<br>
|
|
/*
|
|
* If you have an uninteresting part of the application
|
|
* you can stop temporarily.
|
|
*
|
|
* posix_trace_stop(trid);
|
|
* - - - - - -
|
|
* - - - - - -
|
|
* posix_trace_start(trid);
|
|
*/
|
|
- - - - - -
|
|
/*
|
|
* If the Trace Filter option is supported
|
|
* the current trace event type filter can be changed
|
|
* at any time (see example about how to set
|
|
* a trace event type filter)
|
|
*/
|
|
- - - - - -
|
|
<br>
|
|
/* Stop the recording of trace events */
|
|
posix_trace_stop(trid);
|
|
/* Shutdown the trace stream */
|
|
posix_trace_shutdown(trid);
|
|
/*
|
|
* Destroy trace stream attributes; attr structure may have
|
|
* been used during tracing to fetch the attributes
|
|
*/
|
|
posix_trace_attr_destroy(&attr);
|
|
- - - - - -
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_14"></a>Application Instrumentation</h5>
|
|
|
|
<p>This example shows an instrumented application. The code is included in a block of instructions, perhaps a function from a
|
|
library. Possibly in an initialization part of the instrumented application, two user trace events names are mapped to two trace
|
|
event type identifiers (function <a href="../functions/posix_trace_eventid_open.html"><i>posix_trace_eventid_open</i>()</a>). Then
|
|
two trace points are programmed.</p>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
trace_event_id_t eventid1, eventid2;
|
|
- - - - - -
|
|
/* Initialization of two trace event type ids */
|
|
posix_trace_eventid_open("my_first_event",&eventid1);
|
|
posix_trace_eventid_open("my_second_event",&eventid2);
|
|
- - - - - -
|
|
- - - - - -
|
|
- - - - - -
|
|
/* Trace point */
|
|
posix_trace_event(eventid1,NULL,0);
|
|
- - - - - -
|
|
/* Trace point */
|
|
posix_trace_event(eventid2,NULL,0);
|
|
- - - - - -
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_15"></a>Trace Analyzer</h5>
|
|
|
|
<p>This example shows the manipulation of a trace log resulting from the dumping of a completed trace stream. All the default
|
|
attributes are used to simplify programming, and data associated with a trace event is not shown in the first example. The second
|
|
example shows more possibilities.</p>
|
|
|
|
<h5><a name="tag_03_02_11_16"></a>First Example</h5>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
int fd;
|
|
trace_id_t trid;
|
|
posix_trace_event_info trace_event;
|
|
char trace_event_name[TRACE_EVENT_NAME_MAX];
|
|
int return_value;
|
|
size_t returndatasize;
|
|
int lost_event_number;
|
|
<br>
|
|
- - - - - -
|
|
<br>
|
|
/* Open an existing trace log */
|
|
fd=open("/tmp/tracelog", O_RDONLY);
|
|
/* Open a trace stream on the open log */
|
|
posix_trace_open(fd, &trid);
|
|
/* Read a trace event */
|
|
posix_trace_getnext_event(trid, &trace_event,
|
|
NULL, 0, &returndatasize,&return_value);
|
|
<br>
|
|
/* Read and print all trace event names out in a loop */
|
|
while (return_value == NULL)
|
|
{
|
|
/*
|
|
* Get the name of the trace event associated
|
|
* with trid trace ID
|
|
*/
|
|
posix_trace_eventid_get_name(trid, trace_event.event_id,
|
|
trace_event_name);
|
|
/* Print the trace event name out */
|
|
printf("%s\n",trace_event_name);
|
|
/* Read a trace event */
|
|
posix_trace_getnext_event(trid, &trace_event,
|
|
NULL, 0, &returndatasize,&return_value);
|
|
}
|
|
<br>
|
|
/* Close the trace stream */
|
|
posix_trace_close(trid);
|
|
/* Close the trace log */
|
|
close(fd);
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_17"></a>Second Example</h5>
|
|
|
|
<p>The complete example includes the two other examples in <a href="#tag_03_02_11_21">Retrieve Information from a Trace Log</a> and
|
|
in <a href="#tag_03_02_11_22">Retrieve the List of Trace Event Types Used in a Trace Log</a> . For example, the <i>maxdatasize</i>
|
|
variable is set in <a href="#tag_03_02_11_22">Retrieve the List of Trace Event Types Used in a Trace Log</a> .</p>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
int fd;
|
|
trace_id_t trid;
|
|
posix_trace_event_info trace_event;
|
|
char trace_event_name[TRACE_EVENT_NAME_MAX];
|
|
char * data;
|
|
size_t maxdatasize=1024, returndatasize;
|
|
int return_value;
|
|
- - - - - -
|
|
<br>
|
|
/* Open an existing trace log */
|
|
fd=open("/tmp/tracelog", O_RDONLY);
|
|
/* Open a trace stream on the open log */
|
|
posix_trace_open( fd, &trid);
|
|
/*
|
|
* Retrieve information about the trace stream which
|
|
* was dumped in this trace log (see example)
|
|
*/
|
|
- - - - - -
|
|
<br>
|
|
/* Allocate a buffer for trace event data */
|
|
data=(char *)malloc(maxdatasize);
|
|
/*
|
|
* Retrieve the list of trace events used in this
|
|
* trace log (see example)
|
|
*/
|
|
- - - - - -
|
|
<br>
|
|
/* Read and print all trace event names and data out in a loop */
|
|
while (1)
|
|
{
|
|
posix_trace_getnext_event(trid, &trace_event,
|
|
data, maxdatasize, &returndatasize,&return_value);
|
|
if (return_value != NULL) break;
|
|
/*
|
|
* Get the name of the trace event type associated
|
|
* with trid trace ID
|
|
*/
|
|
posix_trace_eventid_get_name(trid, trace_event.event_id,
|
|
trace_event_name);
|
|
{
|
|
int i;
|
|
<br>
|
|
/* Print the trace event name out */
|
|
printf("%s: ", trace_event_name);
|
|
/* Print the trace event data out */
|
|
for (i=0; i<returndatasize, i++) printf("%02.2X",
|
|
(unsigned char)data[i]);
|
|
printf("\n");
|
|
}
|
|
}
|
|
<br>
|
|
/* Close the trace stream */
|
|
posix_trace_close(trid);
|
|
/* The buffer data is deallocated */
|
|
free(data);
|
|
/* Now the file can be closed */
|
|
close(fd);
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_18"></a>Several Programming Manipulations</h5>
|
|
|
|
<p>The following examples show some typical sets of operations needed in some contexts.</p>
|
|
|
|
<h5><a name="tag_03_02_11_19"></a>Trace Stream Attribute Manipulation</h5>
|
|
|
|
<p>This example shows the manipulation of a trace stream attribute object in order to change the default value provided by a
|
|
previous <a href="../functions/posix_trace_attr_init.html"><i>posix_trace_attr_init</i>()</a> call.</p>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
trace_attr_t attr;
|
|
size_t logsize=100000;
|
|
- - - - - -
|
|
/* Initialize trace stream attributes */
|
|
posix_trace_attr_init(&attr);
|
|
/* Set the trace name in the attributes structure */
|
|
posix_trace_attr_setname(&attr, "my_trace");
|
|
/* Set the trace full policy */
|
|
posix_trace_attr_setstreamfullpolicy(&attr, POSIX_TRACE_LOOP);
|
|
/* Set the trace log size */
|
|
posix_trace_attr_setlogsize(&attr, logsize);
|
|
- - - - - -
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_20"></a>Create a Trace Event Type Set and Change the Trace Event Type Filter</h5>
|
|
|
|
<p>This example is valid only if the Trace Event Filter option is supported. This example shows the manipulation of a trace event
|
|
type set in order to change the trace event type filter for an existing active trace stream, which may be just-created, running, or
|
|
suspended. Some sets of trace event types are well-known, such as the set of trace event types not associated with a process, some
|
|
trace event types are just-built trace event types for this trace stream; one trace event type is the predefined trace event error
|
|
type which is deleted from the trace event type set.</p>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
trace_id_t trid = existing_trace;
|
|
trace_event_set_t set;
|
|
trace_event_id_t trace_event1, trace_event2;
|
|
- - - - - -
|
|
/* Initialize to an empty set of trace event types */
|
|
/* (not strictly required because posix_trace_event_set_fill() */
|
|
/* will ignore the prior contents of the event set.) */
|
|
posix_trace_eventset_emptyset(&set);
|
|
/*
|
|
* Fill the set with all system trace events
|
|
* not associated with a process
|
|
*/
|
|
posix_trace_eventset_fill(&set, POSIX_TRACE_WOPID_EVENTS);
|
|
<br>
|
|
/*
|
|
* Get the trace event type identifier of the known trace event name
|
|
* my_first_event for the trid trace stream
|
|
*/
|
|
posix_trace_trid_eventid_open(trid, "my_first_event", &trace_event1);
|
|
/* Add the set with this trace event type identifier */
|
|
posix_trace_eventset_add_event(trace_event1, &set);
|
|
/*
|
|
* Get the trace event type identifier of the known trace event name
|
|
* my_second_event for the trid trace stream
|
|
*/
|
|
<br>
|
|
posix_trace_trid_eventid_open(trid, "my_second_event", &trace_event2);
|
|
/* Add the set with this trace event type identifier */
|
|
posix_trace_eventset_add_event(trace_event2, &set);
|
|
- - - - - -
|
|
/* Delete the system trace event POSIX_TRACE_ERROR from the set */
|
|
posix_trace_eventset_del_event(POSIX_TRACE_ERROR, &set);
|
|
- - - - - -
|
|
<br>
|
|
/* Modify the trace stream filter making it equal to the new set */
|
|
posix_trace_set_filter(trid, &set, POSIX_TRACE_SET_EVENTSET);
|
|
- - - - - -
|
|
/*
|
|
* Now trace_event1, trace_event2, and all system trace event types
|
|
* not associated with a process, except for the POSIX_TRACE_ERROR
|
|
* system trace event type, are filtered out of (not recorded in) the
|
|
* existing trace stream.
|
|
*/
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_21"></a>Retrieve Information from a Trace Log</h5>
|
|
|
|
<p>This example shows how to extract information from a trace log, the dump of a trace stream. This code:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Asks if the trace stream has lost trace events</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Extracts the information about the version of the trace subsystem which generated this trace log</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Retrieves the maximum size of trace event data; this may be used to dynamically allocate an array for extracting trace event
|
|
data from the trace log without overflow</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
struct posix_trace_status_info statusinfo;
|
|
trace_attr_t attr;
|
|
trace_id_t trid = existing_trace;
|
|
size_t maxdatasize;
|
|
char genversion[TRACE_NAME_MAX];
|
|
- - - - - -
|
|
/* Get the trace stream status */
|
|
posix_trace_get_status(trid, &statusinfo);
|
|
/* Detect an overrun condition */
|
|
if (statusinfo.posix_stream_overrun_status == POSIX_TRACE_OVERRUN)
|
|
printf("trace events have been lost\n");
|
|
<br>
|
|
/* Get attributes from the trid trace stream */
|
|
posix_trace_get_attr(trid, &attr);
|
|
/* Get the trace generation version from the attributes */
|
|
posix_trace_attr_getgenversion(&attr, genversion);
|
|
/* Print the trace generation version out */
|
|
printf("Information about Trace Generator:%s\n",genversion);
|
|
<br>
|
|
/* Get the trace event max data size from the attributes */
|
|
posix_trace_attr_getmaxdatasize(&attr, &maxdatasize);
|
|
/* Print the trace event max data size out */
|
|
printf("Maximum size of associated data:%d\n",maxdatasize);
|
|
/* Destroy the trace stream attributes */
|
|
posix_trace_attr_destroy(&attr);
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_03_02_11_22"></a>Retrieve the List of Trace Event Types Used in a Trace Log</h5>
|
|
|
|
<p>This example shows the retrieval of a trace stream's trace event type list. This operation may be very useful if you are
|
|
interested only in tracking the type of trace events in a trace log.</p>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
{
|
|
trace_id_t trid = existing_trace;
|
|
trace_event_id_t event_id;
|
|
char event_name[TRACE_EVENT_NAME_MAX];
|
|
int return_value;
|
|
- - - - - -
|
|
<br>
|
|
/*
|
|
* In a loop print all existing trace event names out
|
|
* for the trid trace stream
|
|
*/
|
|
while (1)
|
|
{
|
|
posix_trace_eventtypelist_getnext_id(trid, &event_id
|
|
&return_value);
|
|
if (return_value != NULL) break;
|
|
/*
|
|
* Get the name of the trace event associated
|
|
* with trid trace ID
|
|
*/
|
|
posix_trace_eventid_get_name(trid, event_id, event_name);
|
|
/* Print the name out */
|
|
printf("%s\n", event_name);
|
|
}
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<br>
|
|
<h5><a name="tag_03_02_11_23"></a>Rationale on Trace for Debugging</h5>
|
|
|
|
<dl compact>
|
|
<dt></dt>
|
|
|
|
<dd><img src=".././Figures/b-5.gif"></dd>
|
|
</dl>
|
|
|
|
<center><b><a name="tagfcjh_5"></a> Figure: Trace Another Process</b></center>
|
|
|
|
<p>Among the different possibilities offered by the trace interface defined in IEEE Std 1003.1-2001, the debugging of an
|
|
application is the most interesting one. Typical operations in the controlling debugger process are to filter trace event types, to
|
|
get trace events from the trace stream, to stop the trace stream when the debugged process is executing uninteresting code, to
|
|
start the trace stream when some interesting point is reached, and so on. The interface defined in IEEE Std 1003.1-2001
|
|
should define all the necessary base functions to allow this dynamic debug handling.</p>
|
|
|
|
<p><a href="#tagfcjh_5">Trace Another Process</a> shows an example in which the trace stream is created after the call to the <a
|
|
href="../functions/fork.html"><i>fork</i>()</a> function. If the user does not want to lose trace events, some synchronization
|
|
mechanism (represented in the figure) may be needed before calling the <i>exec</i> function, to give the parent a chance to create
|
|
the trace stream before the child begins the execution of its trace points.</p>
|
|
|
|
<h5><a name="tag_03_02_11_24"></a>Rationale on Trace Event Type Name Space</h5>
|
|
|
|
<p>At first, the working group was in favor of the representation of a trace event type by an integer ( <i>event_name</i>). It
|
|
seems that existing practice shows the weakness of such a representation. The collision of trace event types is the main problem
|
|
that cannot be simply resolved using this sort of representation. Suppose, for example, that a third party designs an instrumented
|
|
library. The user does not have the source of this library and wants to trace his application which uses in some part the
|
|
third-party library. There is no means for him to know what are the trace event types used in the instrumented library so he has
|
|
some chance of duplicating some of them and thus to obtain a contaminated tracing of his application.</p>
|
|
|
|
<dl compact>
|
|
<dt></dt>
|
|
|
|
<dd><img src=".././Figures/b-6.gif"></dd>
|
|
</dl>
|
|
|
|
<center><b><a name="tagfcjh_6"></a> Figure: Trace Name Space Overview: With Third-Party Library</b></center>
|
|
|
|
<p>There are requirements to allow program images containing pieces from various vendors to be traced without also requiring those
|
|
of any other vendors to coordinate their uses of the trace facility, and especially the naming of their various trace event types
|
|
and trace point IDs. The chosen solution is to provide a very large name space, large enough so that the individual vendors can
|
|
give their trace types and tracepoint IDs sufficiently long and descriptive names making the occurrence of collisions quite
|
|
unlikely. The probability of collision is thus made sufficiently low so that the problem may, as a practical matter, be ignored. By
|
|
requirement, the consequence of collisions will be a slight ambiguity in the trace streams; tracing will continue in spite of
|
|
collisions and ambiguities. "The show must go on". The <i>posix_prog_address</i> member of the <b>posix_trace_event_info</b>
|
|
structure is used to allow trace streams to be unambiguously interpreted, despite the fact that trace event types and trace event
|
|
names need not be unique.</p>
|
|
|
|
<p>The <a href="../functions/posix_trace_eventid_open.html"><i>posix_trace_eventid_open</i>()</a> function is required to allow the
|
|
instrumented third-party library to get a valid trace event type identifier for its trace event names. This operation is, somehow,
|
|
an allocation, and the group was aware of proposing some deallocation mechanism which the instrumented application could use to
|
|
recover the resources used by a trace event type identifier. This would have given the instrumented application the benefit of
|
|
being capable of reusing a possible minimum set of trace event type identifiers, but also the inconvenience to have, possibly in
|
|
the same trace stream, one trace event type identifier identifying two different trace event types. After some discussions the
|
|
group decided to not define such a function which would make this API thicker for little benefit, the user having always the
|
|
possibility of adding identification information in the <i>data</i> member of the trace event structure.</p>
|
|
|
|
<p>The set of the trace event type identifiers the controlling process wants to filter out is initialized in the trace mechanism
|
|
using the function <a href="../functions/posix_trace_set_filter.html"><i>posix_trace_set_filter</i>()</a>, setting the arguments
|
|
according to the definitions explained in <a href="../functions/posix_trace_set_filter.html"><i>posix_trace_set_filter</i>()</a>.
|
|
This operation can be done statically (when the trace is in the STOPPED state) or dynamically (when the trace is in the STARTED
|
|
state). The preparation of the filter is normally done using the function defined in <a href=
|
|
"../functions/posix_trace_eventtypelist_getnext_id.html"><i>posix_trace_eventtypelist_getnext_id</i>()</a> and eventually the
|
|
function <a href="../functions/posix_trace_eventtypelist_rewind.html"><i>posix_trace_eventtypelist_rewind</i>()</a> in order to
|
|
know (before the recording) the list of the potential set of trace event types that can be recorded. In the case of an active trace
|
|
stream, this list may not be exhaustive. Actually, the target process may not have yet called the function <a href=
|
|
"../functions/posix_trace_eventid_open.html"><i>posix_trace_eventid_open</i>()</a>. But it is a common practice, for a controlling
|
|
process, to prepare the filtering of a future trace stream before its start. Therefore the user must have a way to get the trace
|
|
event type identifier corresponding to a well-known trace event name before its future association by the pre-cited function. This
|
|
is done by calling the <a href="../functions/posix_trace_trid_eventid_open.html"><i>posix_trace_trid_eventid_open</i>()</a>
|
|
function, given the trace stream identifier and the trace name, and described hereafter. Because this trace event type identifier
|
|
is associated with a trace stream identifier, where a unique process has initialized two or more traces, the implementation is
|
|
expected to return the same trace event type identifier for successive calls to <a href=
|
|
"../functions/posix_trace_trid_eventid_open.html"><i>posix_trace_trid_eventid_open</i>()</a> with different trace stream
|
|
identifiers. The <a href="../functions/posix_trace_eventid_get_name.html"><i>posix_trace_eventid_get_name</i>()</a> function is
|
|
used by the controller process to identify, by the name, the trace event type returned by a call to the <a href=
|
|
"../functions/posix_trace_eventtypelist_getnext_id.html"><i>posix_trace_eventtypelist_getnext_id</i>()</a> function.</p>
|
|
|
|
<p>Afterwards, the set of trace event types is constructed using the functions defined in <a href=
|
|
"../functions/posix_trace_eventset_empty.html"><i>posix_trace_eventset_empty</i>()</a>, <a href=
|
|
"../functions/posix_trace_eventset_fill.html"><i>posix_trace_eventset_fill</i>()</a>, <a href=
|
|
"../functions/posix_trace_eventset_add.html"><i>posix_trace_eventset_add</i>()</a>, and <a href=
|
|
"../functions/posix_trace_eventset_del.html"><i>posix_trace_eventset_del</i>()</a>.</p>
|
|
|
|
<p>A set of functions is provided devoted to the manipulation of the trace event type identifier and names for an active trace
|
|
stream. All these functions require the trace stream identifier argument as the first parameter. The opacity of the trace event
|
|
type identifier implies that the user cannot associate directly its well-known trace event name with the system-associated trace
|
|
event type identifier.</p>
|
|
|
|
<p>The <a href="../functions/posix_trace_trid_eventid_open.html"><i>posix_trace_trid_eventid_open</i>()</a> function allows the
|
|
application to get the system trace event type identifier back from the system, given its well-known trace event name. This
|
|
function is useful only when a controlling process needs to specify specific events to be filtered.</p>
|
|
|
|
<p>The <a href="../functions/posix_trace_eventid_get_name.html"><i>posix_trace_eventid_get_name</i>()</a> function allows the
|
|
application to obtain a trace event name given its trace event type identifier. One possible use of this function is to identify
|
|
the type of a trace event retrieved from the trace stream, and print it. The easiest way to implement this requirement, is to use a
|
|
single trace event type map for all the processes whose maps are required to be identical. A more difficult way is to attempt to
|
|
keep multiple maps identical at every call to <a href=
|
|
"../functions/posix_trace_eventid_open.html"><i>posix_trace_eventid_open</i>()</a> and <a href=
|
|
"../functions/posix_trace_trid_eventid_open.html"><i>posix_trace_trid_eventid_open</i>()</a>.</p>
|
|
|
|
<h5><a name="tag_03_02_11_25"></a>Rationale on Trace Events Type Filtering</h5>
|
|
|
|
<p>The most basic rationale for runtime and pre-registration filtering (selection/rejection) of trace event types is to prevent
|
|
choking of the trace collection facility, and/or overloading of the computer system. Any worthwhile trace facility can bring even
|
|
the largest computer to its knees. Otherwise, everything would be recorded and filtered after the fact; it would be much simpler,
|
|
but impractical.</p>
|
|
|
|
<p>To achieve debugging, measurement, or whatever the purpose of tracing, the filtering of trace event types is an important part
|
|
of trace analysis. Due to the fact that the trace events are put into a trace stream and probably logged afterwards into a file,
|
|
different levels of filtering-that is, rejection of trace event types-are possible.</p>
|
|
|
|
<h5><a name="tag_03_02_11_26"></a>Filtering of Trace Event Types Before Tracing</h5>
|
|
|
|
<p>This function, represented by the <a href="../functions/posix_trace_set_filter.html"><i>posix_trace_set_filter</i>()</a>
|
|
function in IEEE Std 1003.1-2001 (see <a href=
|
|
"../functions/posix_trace_set_filter.html"><i>posix_trace_set_filter</i>()</a>), selects, before or during tracing, the set of
|
|
trace event types to be filtered out. It should be possible also (as OSF suggested in their ETAP trace specifications) to select
|
|
the kernel trace event types to be traced in a system-wide fashion. These two functionalities are called the pre-filtering of trace
|
|
event types.</p>
|
|
|
|
<p>The restriction on the actual type used for the <b>trace_event_set_t</b> type is intended to guarantee that these objects can
|
|
always be assigned, have their address taken, and be passed by value as parameters. It is not intended that this type be a
|
|
structure including pointers to other data structures, as that could impact the portability of applications performing such
|
|
operations. A reasonable implementation could be a structure containing an array of integer types.</p>
|
|
|
|
<h5><a name="tag_03_02_11_27"></a>Filtering of Trace Event Types at Runtime</h5>
|
|
|
|
<p>It is possible to build this functionality using the <a href=
|
|
"../functions/posix_trace_set_filter.html"><i>posix_trace_set_filter</i>()</a> function. A privileged process or a privileged
|
|
thread can get trace events from the trace stream of another process or thread, and thus specify the type of trace events to record
|
|
into a file, using implementation-defined methods and interfaces. This functionality, called inline filtering of trace event types,
|
|
is used for runtime analysis of trace streams.</p>
|
|
|
|
<h5><a name="tag_03_02_11_28"></a>Post-Mortem Filtering of Trace Event Types</h5>
|
|
|
|
<p>The word "post-mortem" is used here to indicate that some unanticipated situation occurs during execution that does not permit
|
|
a pre or inline filtering of trace events and that it is necessary to record all trace event types to have a chance to discover the
|
|
problem afterwards. When the program stops, all the trace events recorded previously can be analyzed in order to find the solution.
|
|
This functionality could be named the post-filtering of trace event types.</p>
|
|
|
|
<h5><a name="tag_03_02_11_29"></a>Discussions about Trace Event Type-Filtering</h5>
|
|
|
|
<p>After long discussions with the parties involved in the process of defining the trace interface, it seems that the sensitivity
|
|
to the filtering problem is different, but everybody agrees that the level of the overhead introduced during the tracing operation
|
|
depends on the filtering method elected. If the time that it takes the trace event to be recorded can be neglected, the overhead
|
|
introduced by the filtering process can be classified as follows:</p>
|
|
|
|
<dl compact>
|
|
<dt>Pre-filtering</dt>
|
|
|
|
<dd>System and process/thread-level overhead</dd>
|
|
|
|
<dt>Inline-filtering</dt>
|
|
|
|
<dd>Process/thread-level overhead</dd>
|
|
|
|
<dt>Post-filtering</dt>
|
|
|
|
<dd>No overhead; done offline</dd>
|
|
</dl>
|
|
|
|
<p>The pre-filtering could be named "critical realtime" filtering in the sense that the filtering of trace event type is
|
|
manageable at the user level so the user can lower to a minimum the filtering overhead at some user selected level of priority for
|
|
the inline filtering, or delay the filtering to after execution for the post-filtering. The counterpart of this solution is that
|
|
the size of the trace stream must be sufficient to record all the trace events. The advantage of the pre-filtering is that the
|
|
utilization of the trace stream is optimized.</p>
|
|
|
|
<p>Only pre-filtering is defined by IEEE Std 1003.1-2001. However, great care must be taken in specifying pre-filtering,
|
|
so that it does not impose unacceptable overhead. Moreover, it is necessary to isolate all the functionality relative to the
|
|
pre-filtering.</p>
|
|
|
|
<p>The result of this rationale is to define a new option, the Trace Event Filter option, not necessarily implemented in small
|
|
realtime systems, where system overhead is minimized to the extent possible.</p>
|
|
|
|
<h5><a name="tag_03_02_11_30"></a>Tracing, pthread API</h5>
|
|
|
|
<p>The objective to be able to control tracing for individual threads may be in conflict with the efficiency expected in threads
|
|
with a <i>contentionscope</i> attribute of PTHREAD_SCOPE_PROCESS. For these threads, context switches from one thread that has
|
|
tracing enabled to another thread that has tracing disabled may require a kernel call to inform the kernel whether it has to trace
|
|
system events executed by that thread or not. For this reason, it was proposed that the ability to enable or disable tracing for
|
|
PTHREAD_SCOPE_PROCESS threads be made optional, through the introduction of a Trace Scope Process option. A trace implementation
|
|
which did not implement the Trace Scope Process option would not honor the tracing-state attribute of a thread with
|
|
PTHREAD_SCOPE_PROCESS; it would, however, honor the tracing-state attribute of a thread with PTHREAD_SCOPE_SYSTEM. This proposal
|
|
was rejected as:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Removing desired functionality (per-thread trace control)</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Introducing counter-intuitive behavior for the tracing-state attribute</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Mixing logically orthogonal ideas (thread scheduling and thread tracing)<br>
|
|
[Objective 4]</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>Finally, to solve this complex issue, this API does not provide <i>pthread_gettracingstate</i>(),
|
|
<i>pthread_settracingstate</i>(), <i>pthread_attr_gettracingstate</i>(), and <i>pthread_attr_settracingstate</i>() interfaces.
|
|
These interfaces force the thread implementation to add to the weight of the thread and cause a revision of the threads libraries,
|
|
just to support tracing. Worse yet, <a href="../functions/posix_trace_event.html"><i>posix_trace_event</i>()</a> must always test
|
|
this per-thread variable even in the common case where it is not used at all. Per-thread tracing is easy to implement using
|
|
existing interfaces where necessary; see the following example.</p>
|
|
|
|
<h5><a name="tag_03_02_11_31"></a>Example</h5>
|
|
|
|
<pre>
|
|
<tt>/* Caution. Error checks omitted */
|
|
static pthread_key_t my_key;
|
|
static trace_event_id_t my_event_id;
|
|
static pthread_once_t my_once = PTHREAD_ONCE_INIT;
|
|
<br>
|
|
void my_init(void)
|
|
{
|
|
(void) pthread_key_create(&my_key, NULL);
|
|
(void) posix_trace_eventid_open("my", &my_event_id);
|
|
}
|
|
<br>
|
|
int get_trace_flag(void)
|
|
{
|
|
pthread_once(&my_once, my_init);
|
|
return (pthread_getspecific(my_key) != NULL);
|
|
}
|
|
<br>
|
|
void set_trace_flag(int f)
|
|
{
|
|
pthread_once(&my_once, my_init);
|
|
pthread_setspecific(my_key, f? &my_event_id: NULL);
|
|
}
|
|
<br>
|
|
fn()
|
|
{
|
|
if (get_trace_flag())
|
|
posix_trace_event(my_event_id, ...)
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The above example does not implement third-party state setting.</p>
|
|
|
|
<p>Lastly, per-thread tracing works poorly for threads with PTHREAD_SCOPE_PROCESS contention scope. These "library" threads have
|
|
minimal interaction with the kernel and would have to explicitly set the attributes whenever they are context switched to a new
|
|
kernel thread in order to trace system events. Such state was explicitly avoided in POSIX threads to keep PTHREAD_SCOPE_PROCESS
|
|
threads lightweight.</p>
|
|
|
|
<p>The reason that keeping PTHREAD_SCOPE_PROCESS threads lightweight is important is that such threads can be used not just for
|
|
simple multi-processors but also for co-routine style programming (such as discrete event simulation) without inventing a new
|
|
threads paradigm. Adding extra runtime cost to thread context switches will make using POSIX threads less attractive in these
|
|
situations.</p>
|
|
|
|
<h5><a name="tag_03_02_11_32"></a>Rationale on Triggering</h5>
|
|
|
|
<p>The ability to start or stop tracing based on the occurrence of specific trace event types has been proposed as a parallel to
|
|
similar functionality appearing in logic analyzers. Such triggering, in order to be very useful, should be based not only on the
|
|
trace event type, but on trace event-specific data, including tests of user-specified fields for matching or threshold values.</p>
|
|
|
|
<p>Such a facility is unnecessary where the buffering of the stream is not a constraint, since such checks can be performed offline
|
|
during post-mortem analysis.</p>
|
|
|
|
<p>For example, a large system could incorporate a daemon utility to collect the trace records from memory buffers and spool them
|
|
to secondary storage for later analysis. In the instances where resources are truly limited, such as embedded applications, the
|
|
application incorporation of application code to test the circumstances of a trace event and call the trace point only if needed is
|
|
usually straightforward.</p>
|
|
|
|
<p>For performance reasons, the <a href="../functions/posix_trace_event.html"><i>posix_trace_event</i>()</a> function should be
|
|
implemented using a macro, so if the trace is inactive, the trace event point calls are latent code and must cost no more than a
|
|
scalar test.</p>
|
|
|
|
<p>The API proposed in IEEE Std 1003.1-2001 does not include any triggering functionality.</p>
|
|
|
|
<h5><a name="tag_03_02_11_33"></a>Rationale on Timestamp Clock</h5>
|
|
|
|
<p>It has been suggested that the tracing mechanism should include the possibility of specifying the clock to be used in
|
|
timestamping the trace events. When application trace events must be correlated to remote trace events, such a facility could
|
|
provide a global time reference not available from a local clock. Further, the application may be driven by timers based on a clock
|
|
different from that used for the timestamp, and the correlation of the trace to those untraced timer activities could be an
|
|
important part of the analysis of the application.</p>
|
|
|
|
<p>However, the tracing mechanism needs to be fast and just the provision of such an option can materially affect its performance.
|
|
Leaving aside the performance costs of reading some clocks, this notion is also ill-defined when kernel trace events are to be
|
|
traced by two applications making use of different tracing clocks. This can even happen within a single application where different
|
|
parts of the application are served by different clocks. Another complication can occur when a clock is maintained strictly at the
|
|
user level and is unavailable at the kernel level.</p>
|
|
|
|
<p>It is felt that the benefits of a selectable trace clock do not match its costs. Applications that wish to correlate clocks
|
|
other than the default tracing clock can include trace events with sample values of those other clocks, allowing correlation of
|
|
timestamps from the various independent clocks. In any case, such a technique would be required when applications are sensitive to
|
|
multiple clocks.</p>
|
|
|
|
<h5><a name="tag_03_02_11_34"></a>Rationale on Different Overrun Conditions</h5>
|
|
|
|
<p>The analysis of the dynamic behavior of the trace mechanism shows that different overrun conditions may occur. The API must
|
|
provide a means to manage such conditions in a portable way.</p>
|
|
|
|
<h5><a name="tag_03_02_11_35"></a>Overrun in Trace Streams Initialized with POSIX_TRACE_LOOP Policy</h5>
|
|
|
|
<p>In this case, the user of the trace mechanism is interested in using the trace stream with POSIX_TRACE_LOOP policy to record
|
|
trace events continuously, but ideally without losing any trace events. The online analyzer process must get the trace events at a
|
|
mean speed equivalent to the recording speed. Should the trace stream become full, a trace stream overrun occurs. This condition is
|
|
detected by getting the status of the active trace stream (function <a href=
|
|
"../functions/posix_trace_get_status.html"><i>posix_trace_get_status</i>()</a>) and looking at the member
|
|
<i>posix_stream_overrun_status</i> of the read <b>posix_stream_status</b> structure. In addition, two predefined trace event types
|
|
are defined:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>The beginning of a trace overflow, to locate the beginning of an overflow when reading a trace stream</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The end of a trace overflow, to locate the end of an overflow, when reading a trace stream</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>As a timestamp is associated with these predefined trace events, it is possible to know the duration of the overflow.</p>
|
|
|
|
<h5><a name="tag_03_02_11_36"></a>Overrun in Dumping Trace Streams into Trace Logs</h5>
|
|
|
|
<p>The user lets the trace mechanism dump the trace stream initialized with POSIX_TRACE_FLUSH policy automatically into a trace
|
|
log. If the dump operation is slower than the recording of trace events, the trace stream can overrun. This condition is detected
|
|
by getting the status of the active trace stream (function <a href=
|
|
"../functions/posix_trace_get_status.html"><i>posix_trace_get_status</i>()</a>) and looking at the member
|
|
<i>posix_log_overrun_status</i> of the read <b>posix_stream_status</b> structure. This overrun indicates that the trace mechanism
|
|
is not able to operate in this mode at this speed. It is the responsibility of the user to modify one of the trace parameters (the
|
|
stream size or the trace event type filter, for instance) to avoid such overrun conditions, if overruns are to be prevented. The
|
|
same already predefined trace event types (see <a href="#tag_03_02_11_35">Overrun in Trace Streams Initialized with
|
|
POSIX_TRACE_LOOP Policy</a> ) are used to detect and to know the duration of an overflow.</p>
|
|
|
|
<h5><a name="tag_03_02_11_37"></a>Reading an Active Trace Stream</h5>
|
|
|
|
<p>Although this trace API allows one to read an active trace stream with log while it is tracing, this feature can lead to false
|
|
overflow origin interpretation: the trace log or the reader of the trace stream. Reading from an active trace stream with log is
|
|
thus non-portable, and has been left unspecified.</p>
|
|
|
|
<h4><a name="tag_03_02_12"></a>Data Types</h4>
|
|
|
|
<p>The requirement that additional types defined in this section end in "_t" was prompted by the problem of name space pollution.
|
|
It is difficult to define a type (where that type is not one defined by IEEE Std 1003.1-2001) in one header file and use
|
|
it in another without adding symbols to the name space of the program. To allow implementors to provide their own types, all
|
|
conforming applications are required to avoid symbols ending in "_t", which permits the implementor to provide additional types.
|
|
Because a major use of types is in the definition of structure members, which can (and in many cases must) be added to the
|
|
structures defined in IEEE Std 1003.1-2001, the need for additional types is compelling.</p>
|
|
|
|
<p>The types, such as <b>ushort</b> and <b>ulong</b>, which are in common usage, are not defined in IEEE Std 1003.1-2001
|
|
(although <b>ushort_t</b> would be permitted as an extension). They can be added to <a href=
|
|
"../basedefs/sys/types.h.html"><i><sys/types.h></i></a> using a feature test macro (see <a href="#tag_03_02_02_01">POSIX.1
|
|
Symbols</a> ). A suggested symbol for these is _SYSIII. Similarly, the types like <b>u_short</b> would probably be best controlled
|
|
by _BSD.</p>
|
|
|
|
<p>Some of these symbols may appear in other headers; see <a href="#tag_03_02_02_04">The Name Space</a> .</p>
|
|
|
|
<dl compact>
|
|
<dt><b>dev_t</b></dt>
|
|
|
|
<dd>This type may be made large enough to accommodate host-locality considerations of networked systems.
|
|
|
|
<p>This type must be arithmetic. Earlier proposals allowed this to be non-arithmetic (such as a structure) and provided a
|
|
<i>samefile</i>() function for comparison.</p>
|
|
</dd>
|
|
|
|
<dt><b>gid_t</b></dt>
|
|
|
|
<dd>Some implementations had separated <b>gid_t</b> from <b>uid_t</b> before POSIX.1 was completed. It would be difficult for them
|
|
to coalesce them when it was unnecessary. Additionally, it is quite possible that user IDs might be different from group IDs
|
|
because the user ID might wish to span a heterogeneous network, where the group ID might not.
|
|
|
|
<p>For current implementations, the cost of having a separate <b>gid_t</b> will be only lexical.</p>
|
|
</dd>
|
|
|
|
<dt><b>mode_t</b></dt>
|
|
|
|
<dd>This type was chosen so that implementations could choose the appropriate integer type, and for compatibility with the
|
|
ISO C standard. 4.3 BSD uses <b>unsigned short</b> and the SVID uses <b>ushort</b>, which is the same. Historically, only the
|
|
low-order sixteen bits are significant.</dd>
|
|
|
|
<dt><b>nlink_t</b></dt>
|
|
|
|
<dd>This type was introduced in place of <b>short</b> for <i>st_nlink</i> (see the <a href=
|
|
"../basedefs/sys/stat.h.html"><i><sys/stat.h></i></a> header) in response to an objection that <b>short</b> was too
|
|
small.</dd>
|
|
|
|
<dt><b>off_t</b></dt>
|
|
|
|
<dd>This type is used only in <a href="../functions/lseek.html"><i>lseek</i>()</a>, <a href=
|
|
"../functions/fcntl.html"><i>fcntl</i>()</a>, and <a href="../basedefs/sys/stat.h.html"><i><sys/stat.h></i></a>. Many
|
|
implementations would have difficulties if it were defined as anything other than <b>long</b>. Requiring an integer type limits the
|
|
capabilities of <a href="../functions/lseek.html"><i>lseek</i>()</a> to four gigabytes. The ISO C standard supplies routines
|
|
that use larger types; see <a href="../functions/fgetpos.html"><i>fgetpos</i>()</a> and <a href=
|
|
"../functions/fsetpos.html"><i>fsetpos</i>()</a>. XSI-conformant systems provide the <a href=
|
|
"../functions/fseeko.html"><i>fseeko</i>()</a> and <a href="../functions/ftello.html"><i>ftello</i>()</a> functions that use larger
|
|
types.</dd>
|
|
|
|
<dt><b>pid_t</b></dt>
|
|
|
|
<dd>The inclusion of this symbol was controversial because it is tied to the issue of the representation of a process ID as a
|
|
number. From the point of view of a conforming application, process IDs should be "magic cookies"<a href=
|
|
"#tag_foot_1"><sup><small>1</small></sup></a> that are produced by calls such as <a href=
|
|
"../functions/fork.html"><i>fork</i>()</a>, used by calls such as <a href="../functions/waitpid.html"><i>waitpid</i>()</a> or <a
|
|
href="../functions/kill.html"><i>kill</i>()</a>, and not otherwise analyzed (except that the sign is used as a flag for certain
|
|
operations).
|
|
|
|
<p>The concept of a {PID_MAX} value interacted with this in early proposals. Treating process IDs as an opaque type both removes
|
|
the requirement for {PID_MAX} and allows systems to be more flexible in providing process IDs that span a large range of values, or
|
|
a small one.</p>
|
|
|
|
<p>Since the values in <b>uid_t</b>, <b>gid_t</b>, and <b>pid_t</b> will be numbers generally, and potentially both large in
|
|
magnitude and sparse, applications that are based on arrays of objects of this type are unlikely to be fully portable in any case.
|
|
Solutions that treat them as magic cookies will be portable.</p>
|
|
|
|
<p>{CHILD_MAX} precludes the possibility of a "toy implementation", where there would only be one process.</p>
|
|
</dd>
|
|
|
|
<dt><b>ssize_t</b></dt>
|
|
|
|
<dd>This is intended to be a signed analog of <b>size_t</b>. The wording is such that an implementation may either choose to use a
|
|
longer type or simply to use the signed version of the type that underlies <b>size_t</b>. All functions that return <b>ssize_t</b>
|
|
( <a href="../functions/read.html"><i>read</i>()</a> and <a href="../functions/write.html"><i>write</i>()</a>) describe as
|
|
"implementation-defined" the result of an input exceeding {SSIZE_MAX}. It is recognized that some implementations might have
|
|
<b>int</b>s that are smaller than <b>size_t</b>. A conforming application would be constrained not to perform I/O in pieces larger
|
|
than {SSIZE_MAX}, but a conforming application using extensions would be able to use the full range if the implementation provided
|
|
an extended range, while still having a single type-compatible interface.
|
|
|
|
<p>The symbols <b>size_t</b> and <b>ssize_t</b> are also required in <a href=
|
|
"../basedefs/unistd.h.html"><i><unistd.h></i></a> to minimize the changes needed for calls to <a href=
|
|
"../functions/read.html"><i>read</i>()</a> and <a href="../functions/write.html"><i>write</i>()</a>. Implementors are reminded that
|
|
it must be possible to include both <a href="../basedefs/sys/types.h.html"><i><sys/types.h></i></a> and <a href=
|
|
"../basedefs/unistd.h.html"><i><unistd.h></i></a> in the same program (in either order) without error.</p>
|
|
</dd>
|
|
|
|
<dt><b>uid_t</b></dt>
|
|
|
|
<dd>Before the addition of this type, the data types used to represent these values varied throughout early proposals. The <a href=
|
|
"../basedefs/sys/stat.h.html"><i><sys/stat.h></i></a> header defined these values as type <b>short</b>, the
|
|
<i><passwd.h></i> file (now <a href="../basedefs/pwd.h.html"><i><pwd.h></i></a> and <a href=
|
|
"../basedefs/grp.h.html"><i><grp.h></i></a>) used an <b>int</b>, and <a href="../functions/getuid.html"><i>getuid</i>()</a>
|
|
returned an <b>int</b>. In response to a strong objection to the inconsistent definitions, all the types were switched to
|
|
<b>uid_t</b>.
|
|
|
|
<p>In practice, those historical implementations that use varying types of this sort can typedef <b>uid_t</b> to <b>short</b> with
|
|
no serious consequences.</p>
|
|
|
|
<p>The problem associated with this change concerns object compatibility after structure size changes. Since most implementations
|
|
will define <b>uid_t</b> as a short, the only substantive change will be a reduction in the size of the <b>passwd</b> structure.
|
|
Consequently, implementations with an overriding concern for object compatibility can pad the structure back to its current size.
|
|
For that reason, this problem was not considered critical enough to warrant the addition of a separate type to POSIX.1.</p>
|
|
|
|
<p>The types <b>uid_t</b> and <b>gid_t</b> are magic cookies. There is no {UID_MAX} defined by POSIX.1, and no structure imposed on
|
|
<b>uid_t</b> and <b>gid_t</b> other than that they be positive arithmetic types. (In fact, they could be <b>unsigned char</b>.)
|
|
There is no maximum or minimum specified for the number of distinct user or group IDs.</p>
|
|
</dd>
|
|
</dl>
|
|
|
|
<hr>
|
|
<h4><a name="tag_03_02_13"></a>Footnotes</h4>
|
|
|
|
<dl compact>
|
|
<dt><a name="tag_foot_1">1.</a></dt>
|
|
|
|
<dd>An historical term meaning: "An opaque object, or token, of determinate size, whose significance is known only to the entity
|
|
which created it. An entity receiving such a token from the generating entity may only make such use of the `cookie' as is defined
|
|
and permitted by the supplying entity."</dd>
|
|
</dl>
|
|
|
|
|
|
<hr size="2" noshade>
|
|
<center><font size="2"><!--footer start-->
|
|
UNIX ® is a registered Trademark of The Open Group.<br>
|
|
POSIX ® is a registered Trademark of The IEEE.<br>
|
|
[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href=
|
|
"../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>
|
|
]</font></center>
|
|
|
|
<!--footer end-->
|
|
<hr size="2" noshade>
|
|
</body>
|
|
</html>
|
|
|