902 lines
38 KiB
HTML
902 lines
38 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 -->
|
|
<!-- Copyright (c) 2001 The Open Group, All Rights Reserved -->
|
|
<title>Rationale</title>
|
|
</head>
|
|
<body>
|
|
|
|
<basefont size="3">
|
|
|
|
<center><font size="2">The Open Group Base Specifications Issue 6<br>
|
|
IEEE Std 1003.1-2001<br>
|
|
Copyright © 2001 The IEEE and The Open Group</font></center>
|
|
|
|
<hr size="2" noshade>
|
|
<h3><a name="tag_01_07"></a>Locale</h3>
|
|
|
|
<h4><a name="tag_01_07_01"></a>General</h4>
|
|
|
|
<p>The description of locales is based on work performed in the UniForum Technical Committee, Subcommittee on Internationalization.
|
|
Wherever appropriate, keywords are taken from the ISO C standard or the X/Open Portability Guide.</p>
|
|
|
|
<p>The value used to specify a locale with environment variables is the name specified as the <i>name</i> operand to the <a href=
|
|
"../utilities/localedef.html"><i>localedef</i></a> utility when the locale was created. This provides a verifiable method to create
|
|
and invoke a locale.</p>
|
|
|
|
<p>The "object" definitions need not be portable, as long as "source" definitions are. Strictly speaking, source definitions
|
|
are portable only between implementations using the same character set(s). Such source definitions, if they use symbolic names
|
|
only, easily can be ported between systems using different codesets, as long as the characters in the portable character set (see
|
|
the Base Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_01">Section 6.1,
|
|
Portable Character Set</a>) have common values between the codesets; this is frequently the case in historical implementations. Of
|
|
source, this requires that the symbolic names used for characters outside the portable character set be identical between character
|
|
sets. The definition of symbolic names for characters is outside the scope of IEEE Std 1003.1-2001, but is certainly
|
|
within the scope of other standards organizations.</p>
|
|
|
|
<p>Applications can select the desired locale by invoking the <a href="../functions/setlocale.html"><i>setlocale</i>()</a> function
|
|
(or equivalent) with the appropriate value. If the function is invoked with an empty string, the value of the corresponding
|
|
environment variable is used. If the environment variable is not set or is set to the empty string, the implementation sets the
|
|
appropriate environment as defined in the Base Definitions volume of IEEE Std 1003.1-2001, <a href=
|
|
"../basedefs/xbd_chap08.html">Chapter 8, Environment Variables</a>.</p>
|
|
|
|
<h4><a name="tag_01_07_02"></a>POSIX Locale</h4>
|
|
|
|
<p>The POSIX locale is equal to the C locale. To avoid being classified as a C-language function, the name has been changed to the
|
|
POSIX locale; the environment variable value can be either <tt>"POSIX"</tt> or, for historical reasons, <tt>"C"</tt> .</p>
|
|
|
|
<p>The POSIX definitions mirror the historical UNIX system behavior.</p>
|
|
|
|
<p>The use of symbolic names for characters in the tables does not imply that the POSIX locale must be described using symbolic
|
|
character names, but merely that it may be advantageous to do so.</p>
|
|
|
|
<h4><a name="tag_01_07_03"></a>Locale Definition</h4>
|
|
|
|
<p>The decision to separate the file format from the <a href="../utilities/localedef.html"><i>localedef</i></a> utility description
|
|
was only partially editorial. Implementations may provide other interfaces than <a href=
|
|
"../utilities/localedef.html"><i>localedef</i></a>. Requirements on "the utility", mostly concerning error messages, are
|
|
described in this way because they are meant to affect the other interfaces implementations may provide as well as <a href=
|
|
"../utilities/localedef.html"><i>localedef</i></a>.</p>
|
|
|
|
<p>The text about POSIX2_LOCALEDEF does not mean that internationalization is optional; only that the functionality of the <a href=
|
|
"../utilities/localedef.html"><i>localedef</i></a> utility is. REs, for instance, must still be able to recognize, for example,
|
|
character class expressions such as <tt>"[[:alpha:]]"</tt> . A possible analogy is with an applications development environment;
|
|
while all conforming implementations must be capable of executing applications, not all need to have the development environment
|
|
installed. The assumption is that the capability to modify the behavior of utilities (and applications) via locale settings must be
|
|
supported. If the <a href="../utilities/localedef.html"><i>localedef</i></a> utility is not present, then the only choice is to
|
|
select an existing (presumably implementation-documented) locale. An implementation could, for example, choose to support only the
|
|
POSIX locale, which would in effect limit the amount of changes from historical implementations quite drastically. The <a href=
|
|
"../utilities/localedef.html"><i>localedef</i></a> utility is still required, but would always terminate with an exit code
|
|
indicating that no locale could be created. Supported locales must be documented using the syntax defined in this chapter. (This
|
|
ensures that users can accurately determine what capabilities are provided. If the implementation decides to provide additional
|
|
capabilities to the ones in this chapter, that is already provided for.)</p>
|
|
|
|
<p>If the option is present (that is, locales can be created), then the <a href="../utilities/localedef.html"><i>localedef</i></a>
|
|
utility must be capable of creating locales based on the syntax and rules defined in this chapter. This does not mean that the
|
|
implementation cannot also provide alternate means for creating locales.</p>
|
|
|
|
<p>The octal, decimal, and hexadecimal notations are the same employed by the charmap facility (see the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_04">Section 6.4, Character Set Description File</a>).
|
|
To avoid confusion between an octal constant and a back-reference, the octal, hexadecimal, and decimal constants must contain at
|
|
least two digits. As single-digit constants are relatively rare, this should not impose any significant hardship. Provision is made
|
|
for more digits to account for systems in which the byte size is larger than 8 bits. For example, a Unicode (see the
|
|
ISO/IEC 10646-1:2000 standard) system that has defined 16-bit bytes may require six octal, four hexadecimal, and five decimal
|
|
digits. As with the charmap file, multi-byte characters are described in the locale definition file using "big-endian" notation
|
|
for reasons of portability. There is no requirement that the internal representation in the computer memory be in this same
|
|
order.</p>
|
|
|
|
<p>One of the guidelines used for the development of this volume of IEEE Std 1003.1-2001 is that characters outside the
|
|
invariant part of the ISO/IEC 646:1991 standard should not be used in portable specifications. The backslash character is not
|
|
in the invariant part; the number sign is, but with multiple representations: as a number sign, and as a pound sign. As far as
|
|
general usage of these symbols, they are covered by the "grandfather clause", but for newly defined interfaces, the WG15 POSIX
|
|
working group has requested that POSIX provide alternate representations. Consequently, while the default escape character remains
|
|
the backslash and the default comment character is the number sign, implementations are required to recognize alternative
|
|
representations, identified in the applicable source file via the <b><escape_char></b> and <b><comment_char></b>
|
|
keywords.</p>
|
|
|
|
<h5><a name="tag_01_07_03_01"></a>LC_CTYPE</h5>
|
|
|
|
<p>The <i>LC_CTYPE</i> category is primarily used to define the encoding-independent aspects of a character set, such as character
|
|
classification. In addition, certain encoding-dependent characteristics are also defined for an application via the <i>LC_CTYPE</i>
|
|
category. IEEE Std 1003.1-2001 does not mandate that the encoding used in the locale is the same as the one used by the
|
|
application because an implementation may decide that it is advantageous to define locales in a system-wide encoding rather than
|
|
having multiple, logically identical locales in different encodings, and to convert from the application encoding to the
|
|
system-wide encoding on usage. Other implementations could require encoding-dependent locales.</p>
|
|
|
|
<p>In either case, the <i>LC_CTYPE</i> attributes that are directly dependent on the encoding, such as <b><mb_cur_max></b>
|
|
and the display width of characters, are not user-specifiable in a locale source and are consequently not defined as keywords.</p>
|
|
|
|
<p>Implementations may define additional keywords or extend the <i>LC_CTYPE</i> mechanism to allow application-defined
|
|
keywords.</p>
|
|
|
|
<p>The text "The ellipsis specification shall only be valid within a single encoded character set" is present because it is
|
|
possible to have a locale supported by multiple character encodings, as explained in the rationale for the Base Definitions volume
|
|
of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_01">Section 6.1, Portable Character Set</a>. An
|
|
example given there is of a possible Japanese-based locale supported by a mixture of the character sets JIS X 0201 Roman,
|
|
JIS X 0208, and JIS X 0201 Katakana. Attempting to express a range of characters across these sets is not
|
|
logical and the implementation is free to reject such attempts.</p>
|
|
|
|
<p>As the <i>LC_CTYPE</i> character classes are based on the ISO C standard character class definition, the category does not
|
|
support multi-character elements. For instance, the German character <sharp-s> is traditionally classified as a lowercase
|
|
letter. There is no corresponding uppercase letter; in proper capitalization of German text, the <sharp-s> will be replaced
|
|
by <tt>"SS"</tt> ; that is, by two characters. This kind of conversion is outside the scope of the <b>toupper</b> and
|
|
<b>tolower</b> keywords.</p>
|
|
|
|
<p>Where IEEE Std 1003.1-2001 specifies that only certain characters can be specified, as for the keywords <b>digit</b>
|
|
and <b>xdigit</b>, the specified characters must be from the portable character set, as shown. As an example, only the Arabic
|
|
digits 0 through 9 are acceptable as digits.</p>
|
|
|
|
<p>The character classes <b>digit</b>, <b>xdigit</b>, <b>lower</b>, <b>upper</b>, and <b>space</b> have a set of automatically
|
|
included characters. These only need to be specified if the character values (that is, encoding) differs from the implementation
|
|
default values. It is not possible to define a locale without these automatically included characters unless some implementation
|
|
extension is used to prevent their inclusion. Such a definition would not be a proper superset of the C locale, and thus, it might
|
|
not be possible for the standard utilities to be implemented as programs conforming to the ISO C standard.</p>
|
|
|
|
<p>The definition of character class <b>digit</b> requires that only ten characters-the ones defining digits-can be specified;
|
|
alternate digits (for example, Hindi or Kanji) cannot be specified here. However, the encoding may vary if an implementation
|
|
supports more than one encoding.</p>
|
|
|
|
<p>The definition of character class <b>xdigit</b> requires that the characters included in character class <b>digit</b> are
|
|
included here also and allows for different symbols for the hexadecimal digits 10 through 15.</p>
|
|
|
|
<p>The inclusion of the <b>charclass</b> keyword satisfies the following requirement from the ISO POSIX-2:1993 standard, Annex
|
|
H.1:</p>
|
|
|
|
<dl compact>
|
|
<dt>(3)</dt>
|
|
|
|
<dd><i>The</i> LC_CTYPE (2.5.2.1) locale definition should be enhanced to allow user-specified additional character classes,
|
|
similar in concept to the ISO C standard Multibyte Support Extension (MSE) <a href=
|
|
"../functions/iswctype.html"><i>iswctype</i>()</a> function.</dd>
|
|
</dl>
|
|
|
|
<p>This keyword was previously included in The Open Group specifications and is now mandated in the Shell and Utilities volume of
|
|
IEEE Std 1003.1-2001.</p>
|
|
|
|
<p>The symbolic constant {CHARCLASS_NAME_MAX} was also adopted from The Open Group specifications. Applications portability is
|
|
enhanced by the use of symbolic constants.</p>
|
|
|
|
<h5><a name="tag_01_07_03_02"></a>LC_COLLATE</h5>
|
|
|
|
<p>The rules governing collation depend to some extent on the use. At least five different levels of increasingly complex collation
|
|
rules can be distinguished:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p><i>Byte/machine code order</i>: This is the historical collation order in the UNIX system and many proprietary operating
|
|
systems. Collation is here performed character by character, without any regard to context. The primary virtue is that it usually
|
|
is quite fast and also completely deterministic; it works well when the native machine collation sequence matches the user
|
|
expectations.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>Character order</i>: On this level, collation is also performed character by character, without regard to context. The order
|
|
between characters is, however, not determined by the code values, but on the expectations by the user of the "correct" order
|
|
between characters. In addition, such a (simple) collation order can specify that certain characters collate equally (for example,
|
|
uppercase and lowercase letters).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>String ordering</i>: On this level, entire strings are compared based on relatively straightforward rules. Several "passes''
|
|
may be required to determine the order between two strings. Characters may be ignored in some passes, but not in others; the
|
|
strings may be compared in different directions; and simple string substitutions may be performed before strings are compared. This
|
|
level is best described as "dictionary" ordering; it is based on the spelling, not the pronunciation, or meaning, of the
|
|
words.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>Text search ordering</i>: This is a further refinement of the previous level, best described as "telephone book ordering'';
|
|
some common homonyms (words spelled differently but with the same pronunciation) are collated together; numbers are collated as if
|
|
they were spelled out, and so on.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><i>Semantic-level ordering</i>: Words and strings are collated based on their meaning; entire words (such as "the") are
|
|
eliminated; the ordering is not deterministic. This usually requires special software and is highly dependent on the intended
|
|
use.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>While the historical collation order formally is at level 1, for the English language it corresponds roughly to elements at
|
|
level 2. The user expects to see the output from the <a href="../utilities/ls.html"><i>ls</i></a> utility sorted very much as it
|
|
would be in a dictionary. While telephone book ordering would be an optimal goal for standard collation, this was ruled out as the
|
|
order would be language-dependent. Furthermore, a requirement was that the order must be determined solely from the text string and
|
|
the collation rules; no external information (for example, "pronunciation dictionaries") could be required.</p>
|
|
|
|
<p>As a result, the goal for the collation support is at level 3. This also matches the requirements for the Canadian collation
|
|
order, as well as other, known collation requirements for alphabetic scripts. It specifically rules out collation based on
|
|
pronunciation rules or based on semantic analysis of the text.</p>
|
|
|
|
<p>The syntax for the <i>LC_COLLATE</i> category source meets the requirements for level 3 and has been verified to produce the
|
|
correct result with examples based on French, Canadian, and Danish collation order. Because it supports multi-character collating
|
|
elements, it is also capable of supporting collation in codesets where a character is expressed using non-spacing characters
|
|
followed by the base character (such as the ISO/IEC 6937:1994 standard).</p>
|
|
|
|
<p>The directives that can be specified in an operand to the <b>order_start</b> keyword are based on the requirements specified in
|
|
several proposed standards and in customary use. The following is a rephrasing of rules defined for "lexical ordering in English
|
|
and French" by the Canadian Standards Association (the text in square brackets is rephrased):</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Once special characters [punctuation] have been removed from original strings, the ordering is determined by scanning forwards
|
|
(left to right) [disregarding case and diacriticals].</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>In case of equivalence, special characters are once again removed from original strings and the ordering is determined by
|
|
scanning backwards (starting from the rightmost character of the string and back), character by character [disregarding case but
|
|
considering diacriticals].</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>In case of repeated equivalence, special characters are removed again from original strings and the ordering is determined by
|
|
scanning forwards, character by character [considering both case and diacriticals].</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If there is still an ordering equivalence after the first three rules have been applied, then only special characters and the
|
|
position they occupy in the string are considered to determine ordering. The string that has a special character in the lowest
|
|
position comes first. If two strings have a special character in the same position, the character [with the lowest collation value]
|
|
comes first. In case of equality, the other special characters are considered until there is a difference or until all special
|
|
characters have been exhausted.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>It is estimated that this part of IEEE Std 1003.1-2001 covers the requirements for all European languages, and no
|
|
particular problems are anticipated with Slavic or Middle East character sets.</p>
|
|
|
|
<p>The Far East (particularly Japanese/Chinese) collations are often based on contextual information and pronunciation rules (the
|
|
same ideogram can have different meanings and different pronunciations). Such collation, in general, falls outside the desired goal
|
|
of IEEE Std 1003.1-2001. There are, however, several other collation rules (stroke/radical or "most common
|
|
pronunciation") that can be supported with the mechanism described here.</p>
|
|
|
|
<p>The character order is defined by the order in which characters and elements are specified between the <b>order_start</b> and
|
|
<b>order_end</b> keywords. Weights assigned to the characters and elements define the collation sequence; in the absence of
|
|
weights, the character order is also the collation sequence.</p>
|
|
|
|
<p>The <b>position</b> keyword provides the capability to consider, in a compare, the relative position of characters not subject
|
|
to <b>IGNORE</b>. As an example, consider the two strings <tt>"o-ring"</tt> and <tt>"or-ing"</tt> . Assuming the hyphen is subject
|
|
to <b>IGNORE</b> on the first pass, the two strings compare equal, and the position of the hyphen is immaterial. On second pass,
|
|
all characters except the hyphen are subject to <b>IGNORE</b>, and in the normal case the two strings would again compare equal. By
|
|
taking position into account, the first collates before the second.</p>
|
|
|
|
<h5><a name="tag_01_07_03_03"></a>LC_MONETARY</h5>
|
|
|
|
<p>The currency symbol does not appear in <i>LC_MONETARY</i> because it is not defined in the C locale of the ISO C
|
|
standard.</p>
|
|
|
|
<p>The ISO C standard limits the size of decimal points and thousands delimiters to single-byte values. In locales based on
|
|
multi-byte coded character sets, this cannot be enforced; IEEE Std 1003.1-2001 does not prohibit such characters, but
|
|
makes the behavior unspecified (in the text "In contexts where other standards ...").</p>
|
|
|
|
<p>The grouping specification is based on, but not identical to, the ISO C standard. The -1 indicates that no further grouping
|
|
is performed; the equivalent of {CHAR_MAX} in the ISO C standard.</p>
|
|
|
|
<p>The text "the value is not available in the locale" is taken from the ISO C standard and is used instead of the
|
|
"unspecified" text in early proposals. There is no implication that omitting these keywords or assigning them values of
|
|
<tt>""</tt> or -1 produces unspecified results; such omissions or assignments eliminate the effects described for the keyword or
|
|
produce zero-length strings, as appropriate.</p>
|
|
|
|
<p>The locale definition is an extension of the ISO C standard <a href="../functions/localeconv.html"><i>localeconv</i>()</a>
|
|
specification. In particular, rules on how <b>currency_symbol</b> is treated are extended to also cover <b>int_curr_symbol</b>, and
|
|
<b>p_set_by_space</b> and <b>n_sep_by_space</b> have been augmented with the value 2, which places a <space> between the sign
|
|
and the symbol (if they are adjacent; otherwise, it should be treated as a 0). The following table shows the result of various
|
|
combinations:</p>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="left">
|
|
<p class="tent"> </p>
|
|
</th>
|
|
<th align="left">
|
|
<p class="tent"> </p>
|
|
</th>
|
|
<th colspan="3" align="center">
|
|
<p class="tent"><b>p_sep_by_space</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<th align="left">
|
|
<p class="tent"> </p>
|
|
</th>
|
|
<th align="left">
|
|
<p class="tent"> </p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>2</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>1</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>0</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>p_cs_precedes</b> = 1</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 0</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">($1.25)</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">($ 1.25)</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">($1.25)</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 1</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+ $1.25</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+$ 1.25</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+$1.25</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 2</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">$1.25 +</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">$ 1.25+</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">$1.25+</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 3</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+ $1.25</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+$ 1.25</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+$1.25</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 4</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">$ +1.25</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">$+ 1.25</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">$+1.25</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>p_cs_precedes</b> = 0</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 0</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">(1.25 $)</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">(1.25 $)</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">(1.25$)</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 1</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+1.25 $</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+1.25 $</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">+1.25$</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 2</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25$ +</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25 $+</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25$+</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 3</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25+ $</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25 +$</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25+$</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>p_sign_posn</b> = 4</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25$ +</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25 $+</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1.25$+</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<p>The following is an example of the interpretation of the <b>mon_grouping</b> keyword. Assuming that the value to be formatted is
|
|
123456789 and the <b>mon_thousands_sep</b> is <tt>'"</tt> , then the following table shows the result. The third column shows the
|
|
equivalent string in the ISO C standard that would be used by the <a href=
|
|
"../functions/localeconv.html"><i>localeconv</i>()</a> function to accommodate this grouping.</p>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>mon_grouping</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Formatted Value</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>ISO C String</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">3;-1</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">123456'789</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">"\3\177"</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">3</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">123'456'789</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">"\3"</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">3;2;-1</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1234'56'789</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">"\3\2\177"</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">3;2</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">12'34'56'789</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">"\3\2"</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">-1</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">123456789</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">"\177"</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<p>In these examples, the octal value of {CHAR_MAX} is 177.</p>
|
|
|
|
<h5><a name="tag_01_07_03_04"></a>LC_NUMERIC</h5>
|
|
|
|
<p>See the rationale for <i>LC_MONETARY</i> for a description of the behavior of grouping.</p>
|
|
|
|
<h5><a name="tag_01_07_03_05"></a>LC_TIME</h5>
|
|
|
|
<p>Although certain of the conversion specifications in the POSIX locale (such as the name of the month) are shown with initial
|
|
capital letters, this need not be the case in other locales. Programs using these conversion specifications may need to adjust the
|
|
capitalization if the output is going to be used at the beginning of a sentence.</p>
|
|
|
|
<p>The <i>LC_TIME</i> descriptions of <b>abday</b>, <b>day</b>, <b>mon</b>, and <b>abmon</b> imply a Gregorian style calendar
|
|
(7-day weeks, 12-month years, leap years, and so on). Formatting time strings for other types of calendars is outside the scope of
|
|
IEEE Std 1003.1-2001.</p>
|
|
|
|
<p>While the ISO 8601:2000 standard numbers the weekdays starting with Monday, historical practice is to use the Sunday as the
|
|
first day. Rather than change the order and introduce potential confusion, the days must be specified beginning with Sunday;
|
|
previous references to "first day" have been removed. Note also that the Shell and Utilities volume of
|
|
IEEE Std 1003.1-2001 <a href="../utilities/date.html"><i>date</i></a> utility supports numbering compliant with the
|
|
ISO 8601:2000 standard.</p>
|
|
|
|
<p>As specified under <a href="../utilities/date.html"><i>date</i></a> in the Shell and Utilities volume of
|
|
IEEE Std 1003.1-2001 and <a href="../functions/strftime.html"><i>strftime</i>()</a> in the System Interfaces volume of
|
|
IEEE Std 1003.1-2001, the conversion specifications corresponding to the optional keywords consist of a modifier followed
|
|
by a traditional conversion specification (for instance, <tt>%Ex</tt> ). If the optional keywords are not supported by the
|
|
implementation or are unspecified for the current locale, these modified conversion specifications are treated as the traditional
|
|
conversion specifications. For example, assume the following keywords:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>alt_digits "0th";"1st";"2nd";"3rd";"4th";"5th";\
|
|
"6th";"7th";"8th";"9th";"10th"
|
|
<br>
|
|
d_fmt "The %Od day of %B in %Y"
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>On July 4th 1776, the <tt>%x</tt> conversion specifications would result in <tt>"The 4th day of July in 1776"</tt> , while on
|
|
July 14th 1789 it would result in <tt>"The 14 day of July in 1789"</tt> . It can be noted that the above example is for
|
|
illustrative purposes only; the <tt>%O</tt> modifier is primarily intended to provide for Kanji or Hindi digits in <a href=
|
|
"../utilities/date.html"><i>date</i></a> formats.</p>
|
|
|
|
<p>The following is an example for Japan that supports the current plus last three Emperors and reverts to Western style numbering
|
|
for years prior to the Meiji era. The example also allows for the custom of using a special name for the first year of an era
|
|
instead of using 1. (The examples substitute romaji where kanji should be used.)</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>era_d_fmt "%EY%mgatsu%dnichi (%a)"
|
|
<br>
|
|
era "+:2:1990/01/01:+*:Heisei:%EC%Eynen";\
|
|
"+:1:1989/01/08:1989/12/31:Heisei:%ECgannen";\
|
|
"+:2:1927/01/01:1989/01/07:Shouwa:%EC%Eynen";\
|
|
"+:1:1926/12/25:1926/12/31:Shouwa:%ECgannen";\
|
|
"+:2:1913/01/01:1926/12/24:Taishou:%EC%Eynen";\
|
|
"+:1:1912/07/30:1912/12/31:Taishou:%ECgannen";\
|
|
"+:2:1869/01/01:1912/07/29:Meiji:%EC%Eynen";\
|
|
"+:1:1868/09/08:1868/12/31:Meiji:%ECgannen";\
|
|
"-:1868:1868/09/07:-*::%Ey"
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>Assuming that the current date is September 21, 1991, a request to <a href="../utilities/date.html"><i>date</i></a> or <a href=
|
|
"../functions/strftime.html"><i>strftime</i>()</a> would yield the following results:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>%Ec - Heisei3nen9gatsu21nichi (Sat) 14:39:26
|
|
%EC - Heisei
|
|
%Ex - Heisei3nen9gatsu21nichi (Sat)
|
|
%Ey - 3
|
|
%EY - Heisei3nen
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>Example era definitions for the Republic of China:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>era "+:2:1913/01/01:+*:ChungHwaMingGuo:%EC%EyNen";\
|
|
"+:1:1912/1/1:1912/12/31:ChungHwaMingGuo:%ECYuenNen";\
|
|
"+:1:1911/12/31:-*:MingChien:%EC%EyNen"
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>Example definitions for the Christian Era:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>era "+:1:0001/01/01:+*:AD:%EC %Ey";\
|
|
"+:1:-0001/12/31:-*:BC:%Ey %EC"
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<h5><a name="tag_01_07_03_06"></a>LC_MESSAGES</h5>
|
|
|
|
<p>The <b>yesstr</b> and <b>nostr</b> locale keywords and the YESSTR and NOSTR <i>langinfo</i> items were formerly used to match
|
|
user affirmative and negative responses. In IEEE Std 1003.1-2001, the <b>yesexpr</b>, <b>noexpr</b>, YESEXPR, and NOEXPR
|
|
extended regular expressions have replaced them. Applications should use the general locale-based messaging facilities to issue
|
|
prompting messages which include sample desired responses.</p>
|
|
|
|
<h4><a name="tag_01_07_04"></a>Locale Definition Grammar</h4>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_01_07_04_01"></a>Locale Lexical Conventions</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h5><a name="tag_01_07_04_02"></a>Locale Grammar</h5>
|
|
|
|
<p>There is no additional rationale provided for this section.</p>
|
|
|
|
<h4><a name="tag_01_07_05"></a>Locale Definition Example</h4>
|
|
|
|
<p>The following is an example of a locale definition file that could be used as input to the <a href=
|
|
"../utilities/localedef.html"><i>localedef</i></a> utility. It assumes that the utility is executed with the <b>-f</b> option,
|
|
naming a charmap file with (at least) the following content:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<tt>CHARMAP
|
|
<space> \x20
|
|
<dollar> \x24
|
|
<A> \101
|
|
<a> \141
|
|
<A-acute> \346
|
|
<a-acute> \365
|
|
<A-grave> \300
|
|
<a-grave> \366
|
|
<b> \142
|
|
<C> \103
|
|
<c> \143
|
|
<c-cedilla> \347
|
|
<d> \x64
|
|
<H> \110
|
|
<h> \150
|
|
<eszet> \xb7
|
|
<s> \x73
|
|
<z> \x7a
|
|
END CHARMAP
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>It should not be taken as complete or to represent any actual locale, but only to illustrate the syntax.</p>
|
|
|
|
<pre>
|
|
<tt>#
|
|
LC_CTYPE
|
|
lower <a>;<b>;<c>;<c-cedilla>;<d>;...;<z>
|
|
upper A;B;C;Ç;...;Z
|
|
space \x20;\x09;\x0a;\x0b;\x0c;\x0d
|
|
blank \040;\011
|
|
toupper (<a>,<A>);(b,B);(c,C);(ç,Ç);(d,D);(z,Z)
|
|
END LC_CTYPE
|
|
#
|
|
LC_COLLATE
|
|
#
|
|
# The following example of collation is based on
|
|
# Canadian standard Z243.4.1-1998, "Canadian Alphanumeric
|
|
# Ordering Standard for Character Sets of CSA Z234.4 Standard".
|
|
# (Other parts of this example locale definition file do not
|
|
# purport to relate to Canada, or to any other real culture.)
|
|
# The proposed standard defines a 4-weight collation, such that
|
|
# in the first pass, characters are compared without regard to
|
|
# case or accents; in the second pass, backwards-compare without
|
|
# regard to case; in the third pass, forwards-compare without
|
|
# regard to diacriticals. In the 3 first passes, non-alphabetic
|
|
# characters are ignored; in the fourth pass, only special
|
|
# characters are considered, such that "The string that has a
|
|
# special character in the lowest position comes first. If two
|
|
# strings have a special character in the same position, the
|
|
# collation value of the special character determines ordering.
|
|
#
|
|
# Only a subset of the character set is used here; mostly to
|
|
# illustrate the set-up.
|
|
#
|
|
collating-symbol <NULL>
|
|
collating-symbol <LOW_VALUE>
|
|
collating-symbol <LOWER-CASE>
|
|
collating-symbol <SUBSCRIPT-LOWER>
|
|
collating-symbol <SUPERSCRIPT-LOWER>
|
|
collating-symbol <UPPER-CASE>
|
|
collating-symbol <NO-ACCENT>
|
|
collating-symbol <PECULIAR>
|
|
collating-symbol <LIGATURE>
|
|
collating-symbol <ACUTE>
|
|
collating-symbol <GRAVE>
|
|
# Further collating-symbols follow.
|
|
#
|
|
# Properly, the standard does not include any multi-character
|
|
# collating elements; the one below is added for completeness.
|
|
#
|
|
collating_element <ch> from "<c><h>"
|
|
collating_element <CH> from "<C><H>"
|
|
collating_element <Ch> from "<C><h>"
|
|
#
|
|
order_start forward;backward;forward;forward,position
|
|
#
|
|
# Collating symbols are specified first in the sequence to allocate
|
|
# basic collation values to them, lower than that of any character.
|
|
<NULL>
|
|
<LOW_VALUE>
|
|
<LOWER-CASE>
|
|
<SUBSCRIPT-LOWER>
|
|
<SUPERSCRIPT-LOWER>
|
|
<UPPER-CASE>
|
|
<NO-ACCENT>
|
|
<PECULIAR>
|
|
<LIGATURE>
|
|
<ACUTE>
|
|
<GRAVE>
|
|
<RING-ABOVE>
|
|
<DIAERESIS>
|
|
<TILDE>
|
|
# Further collating symbols are given a basic collating value here.
|
|
#
|
|
# Here follow special characters.
|
|
<space> IGNORE;IGNORE;IGNORE;<space>
|
|
# Other special characters follow here.
|
|
#
|
|
# Here follow the regular characters.
|
|
<a> <a>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
|
|
<A> <a>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
|
|
<a-acute> <a>;<ACUTE>;<LOWER-CASE>;IGNORE
|
|
<A-acute> <a>;<ACUTE>;<UPPER-CASE>;IGNORE
|
|
<a-grave> <a>;<GRAVE>;<LOWER-CASE>;IGNORE
|
|
<A-grave> <a>;<GRAVE>;<UPPER-CASE>;IGNORE
|
|
<ae> "<a><e>";"<LIGATURE><LIGATURE>";\
|
|
"<LOWER-CASE><LOWER-CASE>";IGNORE
|
|
<AE> "<a><e>";"<LIGATURE><LIGATURE>";\
|
|
"<UPPER-CASE><UPPER-CASE>";IGNORE
|
|
<b> <b>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
|
|
<B> <b>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
|
|
<c> <c>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
|
|
<C> <c>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
|
|
<ch> <ch>;<NO-ACCENT>;<LOWER-CASE>;IGNORE
|
|
<Ch> <ch>;<NO-ACCENT>;<PECULIAR>;IGNORE
|
|
<CH> <ch>;<NO-ACCENT>;<UPPER-CASE>;IGNORE
|
|
#
|
|
# As an example, the strings "Bach" and "bach" could be encoded (for
|
|
# compare purposes) as:
|
|
# "Bach" <b>;<a>;<ch>;<LOW_VALUE>;<NO_ACCENT>;<NO_ACCENT>;\
|
|
# <NO_ACCENT>;<LOW_VALUE>;<UPPER-CASE>;<LOWER-CASE>;\
|
|
# <LOWER-CASE>;<NULL>
|
|
# "bach" <b>;<a>;<ch>;<LOW_VALUE>;<NO_ACCENT>;<NO_ACCENT>;\
|
|
# <NO_ACCENT>;<LOW_VALUE>;<LOWER-CASE>;<LOWER-CASE>;\
|
|
# <LOWER-CASE>;<NULL>
|
|
#
|
|
# The two strings are equal in pass 1 and 2, but differ in pass 3.
|
|
#
|
|
# Further characters follow.
|
|
#
|
|
UNDEFINED IGNORE;IGNORE;IGNORE;IGNORE
|
|
#
|
|
order_end
|
|
#
|
|
END LC_COLLATE
|
|
#
|
|
LC_MONETARY
|
|
int_curr_symbol "USD "
|
|
currency_symbol "$"
|
|
mon_decimal_point "."
|
|
mon_grouping 3;0
|
|
positive_sign ""
|
|
negative_sign "-"
|
|
p_cs_precedes 1
|
|
n_sign_posn 0
|
|
END LC_MONETARY
|
|
#
|
|
LC_NUMERIC
|
|
copy "US_en.ASCII"
|
|
END LC_NUMERIC
|
|
#
|
|
LC_TIME
|
|
abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat"
|
|
#
|
|
day "Sunday";"Monday";"Tuesday";"Wednesday";\
|
|
"Thursday";"Friday";"Saturday"
|
|
#
|
|
abmon "Jan";"Feb";"Mar";"Apr";"May";"Jun";\
|
|
"Jul";"Aug";"Sep";"Oct";"Nov";"Dec"
|
|
#
|
|
mon "January";"February";"March";"April";\
|
|
"May";"June";"July";"August";"September";\
|
|
"October";"November";"December"
|
|
#
|
|
d_t_fmt "%a %b %d %T %Z %Y\n"
|
|
END LC_TIME
|
|
#
|
|
LC_MESSAGES
|
|
yesexpr "^([yY][[:alpha:]]*)|(OK)"
|
|
#
|
|
noexpr "^[nN][[:alpha:]]*"
|
|
END LC_MESSAGES
|
|
</tt>
|
|
</pre>
|
|
|
|
|
|
<hr size="2" noshade>
|
|
<center><font size="2"><!--footer start-->
|
|
UNIX ® is a registered Trademark of The Open Group.<br>
|
|
POSIX ® is a registered Trademark of The IEEE.<br>
|
|
[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href=
|
|
"../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>
|
|
]</font></center>
|
|
|
|
<!--footer end-->
|
|
<hr size="2" noshade>
|
|
</body>
|
|
</html>
|
|
|