oldlinux-files/Ref-docs/POSIX/susv3/xrat/xbd_chap07.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 -->
<!-- Copyright (c) 2001 The Open Group, All Rights Reserved -->
<title>Rationale</title>
</head>
<body>

<basefont size="3">

<center><font size="2">The Open Group Base Specifications Issue 6<br>
IEEE Std 1003.1-2001<br>
Copyright &copy; 2001 The IEEE and The Open Group</font></center>

<hr size="2" noshade>
<h3><a name="tag_01_07"></a>Locale</h3>

<h4><a name="tag_01_07_01"></a>General</h4>

<p>The description of locales is based on work performed in the UniForum Technical Committee, Subcommittee on Internationalization.
Wherever appropriate, keywords are taken from the ISO&nbsp;C standard or the X/Open Portability Guide.</p>

<p>The value used to specify a locale with environment variables is the name specified as the <i>name</i> operand to the <a href=
"../utilities/localedef.html"><i>localedef</i></a> utility when the locale was created. This provides a verifiable method to create
and invoke a locale.</p>

<p>The &quot;object&quot; definitions need not be portable, as long as &quot;source&quot; definitions are. Strictly speaking, source definitions
are portable only between implementations using the same character set(s). Such source definitions, if they use symbolic names
only, easily can be ported between systems using different codesets, as long as the characters in the portable character set (see
the Base Definitions volume of IEEE&nbsp;Std&nbsp;1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_01">Section 6.1,
Portable Character Set</a>) have common values between the codesets; this is frequently the case in historical implementations. Of
source, this requires that the symbolic names used for characters outside the portable character set be identical between character
sets. The definition of symbolic names for characters is outside the scope of IEEE&nbsp;Std&nbsp;1003.1-2001, but is certainly
within the scope of other standards organizations.</p>

<p>Applications can select the desired locale by invoking the <a href="../functions/setlocale.html"><i>setlocale</i>()</a> function
(or equivalent) with the appropriate value. If the function is invoked with an empty string, the value of the corresponding
environment variable is used. If the environment variable is not set or is set to the empty string, the implementation sets the
appropriate environment as defined in the Base Definitions volume of IEEE&nbsp;Std&nbsp;1003.1-2001, <a href=
"../basedefs/xbd_chap08.html">Chapter 8, Environment Variables</a>.</p>

<h4><a name="tag_01_07_02"></a>POSIX Locale</h4>

<p>The POSIX locale is equal to the C locale. To avoid being classified as a C-language function, the name has been changed to the
POSIX locale; the environment variable value can be either <tt>"POSIX"</tt> or, for historical reasons, <tt>"C"</tt> .</p>

<p>The POSIX definitions mirror the historical UNIX system behavior.</p>

<p>The use of symbolic names for characters in the tables does not imply that the POSIX locale must be described using symbolic
character names, but merely that it may be advantageous to do so.</p>

<h4><a name="tag_01_07_03"></a>Locale Definition</h4>

<p>The decision to separate the file format from the <a href="../utilities/localedef.html"><i>localedef</i></a> utility description
was only partially editorial. Implementations may provide other interfaces than <a href=
"../utilities/localedef.html"><i>localedef</i></a>. Requirements on &quot;the utility&quot;, mostly concerning error messages, are
described in this way because they are meant to affect the other interfaces implementations may provide as well as <a href=
"../utilities/localedef.html"><i>localedef</i></a>.</p>

<p>The text about POSIX2_LOCALEDEF does not mean that internationalization is optional; only that the functionality of the <a href=
"../utilities/localedef.html"><i>localedef</i></a> utility is. REs, for instance, must still be able to recognize, for example,
character class expressions such as <tt>"[[:alpha:]]"</tt> . A possible analogy is with an applications development environment;
while all conforming implementations must be capable of executing applications, not all need to have the development environment
installed. The assumption is that the capability to modify the behavior of utilities (and applications) via locale settings must be
supported. If the <a href="../utilities/localedef.html"><i>localedef</i></a> utility is not present, then the only choice is to
select an existing (presumably implementation-documented) locale. An implementation could, for example, choose to support only the
POSIX locale, which would in effect limit the amount of changes from historical implementations quite drastically. The <a href=
"../utilities/localedef.html"><i>localedef</i></a> utility is still required, but would always terminate with an exit code
indicating that no locale could be created. Supported locales must be documented using the syntax defined in this chapter. (This
ensures that users can accurately determine what capabilities are provided. If the implementation decides to provide additional
capabilities to the ones in this chapter, that is already provided for.)</p>

<p>If the option is present (that is, locales can be created), then the <a href="../utilities/localedef.html"><i>localedef</i></a>
utility must be capable of creating locales based on the syntax and rules defined in this chapter. This does not mean that the
implementation cannot also provide alternate means for creating locales.</p>

<p>The octal, decimal, and hexadecimal notations are the same employed by the charmap facility (see the Base Definitions volume of
IEEE&nbsp;Std&nbsp;1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_04">Section 6.4, Character Set Description File</a>).
To avoid confusion between an octal constant and a back-reference, the octal, hexadecimal, and decimal constants must contain at
least two digits. As single-digit constants are relatively rare, this should not impose any significant hardship. Provision is made
for more digits to account for systems in which the byte size is larger than 8 bits. For example, a Unicode (see the
ISO/IEC&nbsp;10646-1:2000 standard) system that has defined 16-bit bytes may require six octal, four hexadecimal, and five decimal
digits. As with the charmap file, multi-byte characters are described in the locale definition file using &quot;big-endian&quot; notation
for reasons of portability. There is no requirement that the internal representation in the computer memory be in this same
order.</p>

<p>One of the guidelines used for the development of this volume of IEEE&nbsp;Std&nbsp;1003.1-2001 is that characters outside the
invariant part of the ISO/IEC&nbsp;646:1991 standard should not be used in portable specifications. The backslash character is not
in the invariant part; the number sign is, but with multiple representations: as a number sign, and as a pound sign. As far as
general usage of these symbols, they are covered by the &quot;grandfather clause&quot;, but for newly defined interfaces, the WG15 POSIX
working group has requested that POSIX provide alternate representations. Consequently, while the default escape character remains
the backslash and the default comment character is the number sign, implementations are required to recognize alternative
representations, identified in the applicable source file via the <b>&lt;escape_char&gt;</b> and <b>&lt;comment_char&gt;</b>
keywords.</p>

<h5><a name="tag_01_07_03_01"></a>LC_CTYPE</h5>

<p>The <i>LC_CTYPE</i> category is primarily used to define the encoding-independent aspects of a character set, such as character
classification. In addition, certain encoding-dependent characteristics are also defined for an application via the <i>LC_CTYPE</i>
category. IEEE&nbsp;Std&nbsp;1003.1-2001 does not mandate that the encoding used in the locale is the same as the one used by the
application because an implementation may decide that it is advantageous to define locales in a system-wide encoding rather than
having multiple, logically identical locales in different encodings, and to convert from the application encoding to the
system-wide encoding on usage. Other implementations could require encoding-dependent locales.</p>

<p>In either case, the <i>LC_CTYPE</i> attributes that are directly dependent on the encoding, such as <b>&lt;mb_cur_max&gt;</b>
and the display width of characters, are not user-specifiable in a locale source and are consequently not defined as keywords.</p>

<p>Implementations may define additional keywords or extend the <i>LC_CTYPE</i> mechanism to allow application-defined
keywords.</p>

<p>The text &quot;The ellipsis specification shall only be valid within a single encoded character set&quot; is present because it is
possible to have a locale supported by multiple character encodings, as explained in the rationale for the Base Definitions volume
of IEEE&nbsp;Std&nbsp;1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_01">Section 6.1, Portable Character Set</a>. An
example given there is of a possible Japanese-based locale supported by a mixture of the character sets JIS&nbsp;X&nbsp;0201 Roman,
JIS&nbsp;X&nbsp;0208, and JIS&nbsp;X&nbsp;0201 Katakana. Attempting to express a range of characters across these sets is not
logical and the implementation is free to reject such attempts.</p>

<p>As the <i>LC_CTYPE</i> character classes are based on the ISO&nbsp;C standard character class definition, the category does not
support multi-character elements. For instance, the German character &lt;sharp-s&gt; is traditionally classified as a lowercase
letter. There is no corresponding uppercase letter; in proper capitalization of German text, the &lt;sharp-s&gt; will be replaced
by <tt>"SS"</tt> ; that is, by two characters. This kind of conversion is outside the scope of the <b>toupper</b> and
<b>tolower</b> keywords.</p>

<p>Where IEEE&nbsp;Std&nbsp;1003.1-2001 specifies that only certain characters can be specified, as for the keywords <b>digit</b>
and <b>xdigit</b>, the specified characters must be from the portable character set, as shown. As an example, only the Arabic
digits 0 through 9 are acceptable as digits.</p>

<p>The character classes <b>digit</b>, <b>xdigit</b>, <b>lower</b>, <b>upper</b>, and <b>space</b> have a set of automatically
included characters. These only need to be specified if the character values (that is, encoding) differs from the implementation
default values. It is not possible to define a locale without these automatically included characters unless some implementation
extension is used to prevent their inclusion. Such a definition would not be a proper superset of the C locale, and thus, it might
not be possible for the standard utilities to be implemented as programs conforming to the ISO&nbsp;C standard.</p>

<p>The definition of character class <b>digit</b> requires that only ten characters-the ones defining digits-can be specified;
alternate digits (for example, Hindi or Kanji) cannot be specified here. However, the encoding may vary if an implementation
supports more than one encoding.</p>

<p>The definition of character class <b>xdigit</b> requires that the characters included in character class <b>digit</b> are
included here also and allows for different symbols for the hexadecimal digits 10 through 15.</p>

<p>The inclusion of the <b>charclass</b> keyword satisfies the following requirement from the ISO&nbsp;POSIX-2:1993 standard, Annex
H.1:</p>

<dl compact>
<dt>(3)</dt>

<dd><i>The</i> LC_CTYPE (2.5.2.1) locale definition should be enhanced to allow user-specified additional character classes,
similar in concept to the ISO&nbsp;C standard Multibyte Support Extension (MSE) <a href=
"../functions/iswctype.html"><i>iswctype</i>()</a> function.</dd>
</dl>

<p>This keyword was previously included in The Open Group specifications and is now mandated in the Shell and Utilities volume of
IEEE&nbsp;Std&nbsp;1003.1-2001.</p>

<p>The symbolic constant {CHARCLASS_NAME_MAX} was also adopted from The Open Group specifications. Applications portability is
enhanced by the use of symbolic constants.</p>

<h5><a name="tag_01_07_03_02"></a>LC_COLLATE</h5>

<p>The rules governing collation depend to some extent on the use. At least five different levels of increasingly complex collation
rules can be distinguished:</p>

<ol>
<li>
<p><i>Byte/machine code order</i>: This is the historical collation order in the UNIX system and many proprietary operating
systems. Collation is here performed character by character, without any regard to context. The primary virtue is that it usually
is quite fast and also completely deterministic; it works well when the native machine collation sequence matches the user
expectations.</p>
</li>

<li>
<p><i>Character order</i>: On this level, collation is also performed character by character, without regard to context. The order
between characters is, however, not determined by the code values, but on the expectations by the user of the &quot;correct&quot; order
between characters. In addition, such a (simple) collation order can specify that certain characters collate equally (for example,
uppercase and lowercase letters).</p>
</li>

<li>
<p><i>String ordering</i>: On this level, entire strings are compared based on relatively straightforward rules. Several &quot;passes''
may be required to determine the order between two strings. Characters may be ignored in some passes, but not in others; the
strings may be compared in different directions; and simple string substitutions may be performed before strings are compared. This
level is best described as &quot;dictionary&quot; ordering; it is based on the spelling, not the pronunciation, or meaning, of the
words.</p>
</li>

<li>
<p><i>Text search ordering</i>: This is a further refinement of the previous level, best described as &quot;telephone book ordering'';
some common homonyms (words spelled differently but with the same pronunciation) are collated together; numbers are collated as if
they were spelled out, and so on.</p>
</li>

<li>
<p><i>Semantic-level ordering</i>: Words and strings are collated based on their meaning; entire words (such as &quot;the&quot;) are
eliminated; the ordering is not deterministic. This usually requires special software and is highly dependent on the intended
use.</p>
</li>
</ol>

<p>While the historical collation order formally is at level 1, for the English language it corresponds roughly to elements at
level 2. The user expects to see the output from the <a href="../utilities/ls.html"><i>ls</i></a> utility sorted very much as it
would be in a dictionary. While telephone book ordering would be an optimal goal for standard collation, this was ruled out as the
order would be language-dependent. Furthermore, a requirement was that the order must be determined solely from the text string and
the collation rules; no external information (for example, &quot;pronunciation dictionaries&quot;) could be required.</p>

<p>As a result, the goal for the collation support is at level 3. This also matches the requirements for the Canadian collation
order, as well as other, known collation requirements for alphabetic scripts. It specifically rules out collation based on
pronunciation rules or based on semantic analysis of the text.</p>

<p>The syntax for the <i>LC_COLLATE</i> category source meets the requirements for level 3 and has been verified to produce the
correct result with examples based on French, Canadian, and Danish collation order. Because it supports multi-character collating
elements, it is also capable of supporting collation in codesets where a character is expressed using non-spacing characters
followed by the base character (such as the ISO/IEC&nbsp;6937:1994 standard).</p>

<p>The directives that can be specified in an operand to the <b>order_start</b> keyword are based on the requirements specified in
several proposed standards and in customary use. The following is a rephrasing of rules defined for &quot;lexical ordering in English
and French&quot; by the Canadian Standards Association (the text in square brackets is rephrased):</p>

<ul>
<li>
<p>Once special characters [punctuation] have been removed from original strings, the ordering is determined by scanning forwards
(left to right) [disregarding case and diacriticals].</p>
</li>

<li>
<p>In case of equivalence, special characters are once again removed from original strings and the ordering is determined by
scanning backwards (starting from the rightmost character of the string and back), character by character [disregarding case but
considering diacriticals].</p>
</li>

<li>
<p>In case of repeated equivalence, special characters are removed again from original strings and the ordering is determined by
scanning forwards, character by character [considering both case and diacriticals].</p>
</li>

<li>
<p>If there is still an ordering equivalence after the first three rules have been applied, then only special characters and the
position they occupy in the string are considered to determine ordering. The string that has a special character in the lowest
position comes first. If two strings have a special character in the same position, the character [with the lowest collation value]
comes first. In case of equality, the other special characters are considered until there is a difference or until all special
characters have been exhausted.</p>
</li>
</ul>

<p>It is estimated that this part of IEEE&nbsp;Std&nbsp;1003.1-2001 covers the requirements for all European languages, and no
particular problems are anticipated with Slavic or Middle East character sets.</p>

<p>The Far East (particularly Japanese/Chinese) collations are often based on contextual information and pronunciation rules (the
same ideogram can have different meanings and different pronunciations). Such collation, in general, falls outside the desired goal
of IEEE&nbsp;Std&nbsp;1003.1-2001. There are, however, several other collation rules (stroke/radical or &quot;most common
pronunciation&quot;) that can be supported with the mechanism described here.</p>

<p>The character order is defined by the order in which characters and elements are specified between the <b>order_start</b> and
<b>order_end</b> keywords. Weights assigned to the characters and elements define the collation sequence; in the absence of
weights, the character order is also the collation sequence.</p>

<p>The <b>position</b> keyword provides the capability to consider, in a compare, the relative position of characters not subject
to <b>IGNORE</b>. As an example, consider the two strings <tt>"o-ring"</tt> and <tt>"or-ing"</tt> . Assuming the hyphen is subject
to <b>IGNORE</b> on the first pass, the two strings compare equal, and the position of the hyphen is immaterial. On second pass,
all characters except the hyphen are subject to <b>IGNORE</b>, and in the normal case the two strings would again compare equal. By
taking position into account, the first collates before the second.</p>

<h5><a name="tag_01_07_03_03"></a>LC_MONETARY</h5>

<p>The currency symbol does not appear in <i>LC_MONETARY</i> because it is not defined in the C locale of the ISO&nbsp;C
standard.</p>

<p>The ISO&nbsp;C standard limits the size of decimal points and thousands delimiters to single-byte values. In locales based on
multi-byte coded character sets, this cannot be enforced; IEEE&nbsp;Std&nbsp;1003.1-2001 does not prohibit such characters, but
makes the behavior unspecified (in the text &quot;In contexts where other standards ...&quot;).</p>

<p>The grouping specification is based on, but not identical to, the ISO&nbsp;C standard. The -1 indicates that no further grouping
is performed; the equivalent of {CHAR_MAX} in the ISO&nbsp;C standard.</p>

<p>The text &quot;the value is not available in the locale&quot; is taken from the ISO&nbsp;C standard and is used instead of the
&quot;unspecified&quot; text in early proposals. There is no implication that omitting these keywords or assigning them values of
<tt>""</tt> or -1 produces unspecified results; such omissions or assignments eliminate the effects described for the keyword or
produce zero-length strings, as appropriate.</p>

<p>The locale definition is an extension of the ISO&nbsp;C standard <a href="../functions/localeconv.html"><i>localeconv</i>()</a>
specification. In particular, rules on how <b>currency_symbol</b> is treated are extended to also cover <b>int_curr_symbol</b>, and
<b>p_set_by_space</b> and <b>n_sep_by_space</b> have been augmented with the value 2, which places a &lt;space&gt; between the sign
and the symbol (if they are adjacent; otherwise, it should be treated as a 0). The following table shows the result of various
combinations:</p>

<center>
<table border="1" cellpadding="3" align="center">
<tr valign="top">
<th align="left">
<p class="tent">&nbsp;</p>
</th>
<th align="left">
<p class="tent">&nbsp;</p>
</th>
<th colspan="3" align="center">
<p class="tent"><b>p_sep_by_space</b></p>
</th>
</tr>

<tr valign="top">
<th align="left">
<p class="tent">&nbsp;</p>
</th>
<th align="left">
<p class="tent">&nbsp;</p>
</th>
<th align="center">
<p class="tent"><b>2</b></p>
</th>
<th align="center">
<p class="tent"><b>1</b></p>
</th>
<th align="center">
<p class="tent"><b>0</b></p>
</th>
</tr>

<tr valign="top">
<td align="left">
<p class="tent"><b>p_cs_precedes</b> = 1</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 0</p>
</td>
<td align="left">
<p class="tent">($1.25)</p>
</td>
<td align="left">
<p class="tent">($ 1.25)</p>
</td>
<td align="left">
<p class="tent">($1.25)</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 1</p>
</td>
<td align="left">
<p class="tent">+ $1.25</p>
</td>
<td align="left">
<p class="tent">+$ 1.25</p>
</td>
<td align="left">
<p class="tent">+$1.25</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 2</p>
</td>
<td align="left">
<p class="tent">$1.25 +</p>
</td>
<td align="left">
<p class="tent">$ 1.25+</p>
</td>
<td align="left">
<p class="tent">$1.25+</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 3</p>
</td>
<td align="left">
<p class="tent">+ $1.25</p>
</td>
<td align="left">
<p class="tent">+$ 1.25</p>
</td>
<td align="left">
<p class="tent">+$1.25</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 4</p>
</td>
<td align="left">
<p class="tent">$ +1.25</p>
</td>
<td align="left">
<p class="tent">$+ 1.25</p>
</td>
<td align="left">
<p class="tent">$+1.25</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent"><b>p_cs_precedes</b> = 0</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 0</p>
</td>
<td align="left">
<p class="tent">(1.25 $)</p>
</td>
<td align="left">
<p class="tent">(1.25 $)</p>
</td>
<td align="left">
<p class="tent">(1.25$)</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 1</p>
</td>
<td align="left">
<p class="tent">+1.25 $</p>
</td>
<td align="left">
<p class="tent">+1.25 $</p>
</td>
<td align="left">
<p class="tent">+1.25$</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 2</p>
</td>
<td align="left">
<p class="tent">1.25$ +</p>
</td>
<td align="left">
<p class="tent">1.25 $+</p>
</td>
<td align="left">
<p class="tent">1.25$+</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 3</p>
</td>
<td align="left">
<p class="tent">1.25+ $</p>
</td>
<td align="left">
<p class="tent">1.25 +$</p>
</td>
<td align="left">
<p class="tent">1.25+$</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">&nbsp;</p>
</td>
<td align="left">
<p class="tent"><b>p_sign_posn</b> = 4</p>
</td>
<td align="left">
<p class="tent">1.25$ +</p>
</td>
<td align="left">
<p class="tent">1.25 $+</p>
</td>
<td align="left">
<p class="tent">1.25$+</p>
</td>
</tr>
</table>
</center>

<p>The following is an example of the interpretation of the <b>mon_grouping</b> keyword. Assuming that the value to be formatted is
123456789 and the <b>mon_thousands_sep</b> is <tt>'&quot;</tt> , then the following table shows the result. The third column shows the
equivalent string in the ISO&nbsp;C standard that would be used by the <a href=
"../functions/localeconv.html"><i>localeconv</i>()</a> function to accommodate this grouping.</p>

<center>
<table border="1" cellpadding="3" align="center">
<tr valign="top">
<th align="center">
<p class="tent"><b>mon_grouping</b></p>
</th>
<th align="center">
<p class="tent"><b>Formatted Value</b></p>
</th>
<th align="center">
<p class="tent"><b>ISO C String</b></p>
</th>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">3;-1</p>
</td>
<td align="left">
<p class="tent">123456'789</p>
</td>
<td align="left">
<p class="tent">"\3\177"</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">3</p>
</td>
<td align="left">
<p class="tent">123'456'789</p>
</td>
<td align="left">
<p class="tent">"\3"</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">3;2;-1</p>
</td>
<td align="left">
<p class="tent">1234'56'789</p>
</td>
<td align="left">
<p class="tent">"\3\2\177"</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">3;2</p>
</td>
<td align="left">
<p class="tent">12'34'56'789</p>
</td>
<td align="left">
<p class="tent">"\3\2"</p>
</td>
</tr>

<tr valign="top">
<td align="left">
<p class="tent">-1</p>
</td>
<td align="left">
<p class="tent">123456789</p>
</td>
<td align="left">
<p class="tent">"\177"</p>
</td>
</tr>
</table>
</center>

<p>In these examples, the octal value of {CHAR_MAX} is 177.</p>

<h5><a name="tag_01_07_03_04"></a>LC_NUMERIC</h5>

<p>See the rationale for <i>LC_MONETARY</i> for a description of the behavior of grouping.</p>

<h5><a name="tag_01_07_03_05"></a>LC_TIME</h5>

<p>Although certain of the conversion specifications in the POSIX locale (such as the name of the month) are shown with initial
capital letters, this need not be the case in other locales. Programs using these conversion specifications may need to adjust the
capitalization if the output is going to be used at the beginning of a sentence.</p>

<p>The <i>LC_TIME</i> descriptions of <b>abday</b>, <b>day</b>, <b>mon</b>, and <b>abmon</b> imply a Gregorian style calendar
(7-day weeks, 12-month years, leap years, and so on). Formatting time strings for other types of calendars is outside the scope of
IEEE&nbsp;Std&nbsp;1003.1-2001.</p>

<p>While the ISO&nbsp;8601:2000 standard numbers the weekdays starting with Monday, historical practice is to use the Sunday as the
first day. Rather than change the order and introduce potential confusion, the days must be specified beginning with Sunday;
previous references to &quot;first day&quot; have been removed. Note also that the Shell and Utilities volume of
IEEE&nbsp;Std&nbsp;1003.1-2001 <a href="../utilities/date.html"><i>date</i></a> utility supports numbering compliant with the
ISO&nbsp;8601:2000 standard.</p>

<p>As specified under <a href="../utilities/date.html"><i>date</i></a> in the Shell and Utilities volume of
IEEE&nbsp;Std&nbsp;1003.1-2001 and <a href="../functions/strftime.html"><i>strftime</i>()</a> in the System Interfaces volume of
IEEE&nbsp;Std&nbsp;1003.1-2001, the conversion specifications corresponding to the optional keywords consist of a modifier followed
by a traditional conversion specification (for instance, <tt>%Ex</tt> ). If the optional keywords are not supported by the
implementation or are unspecified for the current locale, these modified conversion specifications are treated as the traditional
conversion specifications. For example, assume the following keywords:</p>

<blockquote>
<pre>
<tt>alt_digits   "0th";"1st";"2nd";"3rd";"4th";"5th";\
             "6th";"7th";"8th";"9th";"10th"
<br>
d_fmt        "The %Od day of %B in %Y"
</tt>
</pre>
</blockquote>

<p>On July 4th 1776, the <tt>%x</tt> conversion specifications would result in <tt>"The 4th day of July in 1776"</tt> , while on
July 14th 1789 it would result in <tt>"The 14 day of July in 1789"</tt> . It can be noted that the above example is for
illustrative purposes only; the <tt>%O</tt> modifier is primarily intended to provide for Kanji or Hindi digits in <a href=
"../utilities/date.html"><i>date</i></a> formats.</p>

<p>The following is an example for Japan that supports the current plus last three Emperors and reverts to Western style numbering
for years prior to the Meiji era. The example also allows for the custom of using a special name for the first year of an era
instead of using 1. (The examples substitute romaji where kanji should be used.)</p>

<blockquote>
<pre>
<tt>era_d_fmt "%EY%mgatsu%dnichi (%a)"
<br>
era    "+:2:1990/01/01:+*:Heisei:%EC%Eynen";\
       "+:1:1989/01/08:1989/12/31:Heisei:%ECgannen";\
       "+:2:1927/01/01:1989/01/07:Shouwa:%EC%Eynen";\
       "+:1:1926/12/25:1926/12/31:Shouwa:%ECgannen";\
       "+:2:1913/01/01:1926/12/24:Taishou:%EC%Eynen";\
       "+:1:1912/07/30:1912/12/31:Taishou:%ECgannen";\
       "+:2:1869/01/01:1912/07/29:Meiji:%EC%Eynen";\
       "+:1:1868/09/08:1868/12/31:Meiji:%ECgannen";\
       "-:1868:1868/09/07:-*::%Ey"
</tt>
</pre>
</blockquote>

<p>Assuming that the current date is September 21, 1991, a request to <a href="../utilities/date.html"><i>date</i></a> or <a href=
"../functions/strftime.html"><i>strftime</i>()</a> would yield the following results:</p>

<blockquote>
<pre>
<tt>%Ec - Heisei3nen9gatsu21nichi (Sat) 14:39:26
%EC - Heisei
%Ex - Heisei3nen9gatsu21nichi (Sat)
%Ey - 3
%EY - Heisei3nen
</tt>
</pre>
</blockquote>

<p>Example era definitions for the Republic of China:</p>

<blockquote>
<pre>
<tt>era    "+:2:1913/01/01:+*:ChungHwaMingGuo:%EC%EyNen";\
       "+:1:1912/1/1:1912/12/31:ChungHwaMingGuo:%ECYuenNen";\
       "+:1:1911/12/31:-*:MingChien:%EC%EyNen"
</tt>
</pre>
</blockquote>

<p>Example definitions for the Christian Era:</p>

<blockquote>
<pre>
<tt>era    "+:1:0001/01/01:+*:AD:%EC %Ey";\
       "+:1:-0001/12/31:-*:BC:%Ey %EC"
</tt>
</pre>
</blockquote>

<h5><a name="tag_01_07_03_06"></a>LC_MESSAGES</h5>

<p>The <b>yesstr</b> and <b>nostr</b> locale keywords and the YESSTR and NOSTR <i>langinfo</i> items were formerly used to match
user affirmative and negative responses. In IEEE&nbsp;Std&nbsp;1003.1-2001, the <b>yesexpr</b>, <b>noexpr</b>, YESEXPR, and NOEXPR
extended regular expressions have replaced them. Applications should use the general locale-based messaging facilities to issue
prompting messages which include sample desired responses.</p>

<h4><a name="tag_01_07_04"></a>Locale Definition Grammar</h4>

<p>There is no additional rationale provided for this section.</p>

<h5><a name="tag_01_07_04_01"></a>Locale Lexical Conventions</h5>

<p>There is no additional rationale provided for this section.</p>

<h5><a name="tag_01_07_04_02"></a>Locale Grammar</h5>

<p>There is no additional rationale provided for this section.</p>

<h4><a name="tag_01_07_05"></a>Locale Definition Example</h4>

<p>The following is an example of a locale definition file that could be used as input to the <a href=
"../utilities/localedef.html"><i>localedef</i></a> utility. It assumes that the utility is executed with the <b>-f</b> option,
naming a charmap file with (at least) the following content:</p>

<blockquote>
<pre>
<tt>CHARMAP
&lt;space&gt;      \x20
&lt;dollar&gt;     \x24
&lt;A&gt;          \101
&lt;a&gt;          \141
&lt;A-acute&gt;    \346
&lt;a-acute&gt;    \365
&lt;A-grave&gt;    \300
&lt;a-grave&gt;    \366
&lt;b&gt;          \142
&lt;C&gt;          \103
&lt;c&gt;          \143
&lt;c-cedilla&gt;  \347
&lt;d&gt;          \x64
&lt;H&gt;          \110
&lt;h&gt;          \150
&lt;eszet&gt;      \xb7
&lt;s&gt;          \x73
&lt;z&gt;          \x7a
END CHARMAP
</tt>
</pre>
</blockquote>

<p>It should not be taken as complete or to represent any actual locale, but only to illustrate the syntax.</p>

<pre>
<tt>#
LC_CTYPE
lower   &lt;a&gt;;&lt;b&gt;;&lt;c&gt;;&lt;c-cedilla&gt;;&lt;d&gt;;...;&lt;z&gt;
upper   A;B;C;&Ccedil;;...;Z
space   \x20;\x09;\x0a;\x0b;\x0c;\x0d
blank   \040;\011
toupper (&lt;a&gt;,&lt;A&gt;);(b,B);(c,C);(&ccedil;,&Ccedil;);(d,D);(z,Z)
END LC_CTYPE
#
LC_COLLATE
#
# The following example of collation is based on
# Canadian standard Z243.4.1-1998, "Canadian Alphanumeric
# Ordering Standard for Character Sets of CSA Z234.4 Standard".
# (Other parts of this example locale definition file do not
# purport to relate to Canada, or to any other real culture.)
# The proposed standard defines a 4-weight collation, such that
# in the first pass, characters are compared without regard to
# case or accents; in the second pass, backwards-compare without
# regard to case; in the third pass, forwards-compare without
# regard to diacriticals. In the 3 first passes, non-alphabetic
# characters are ignored; in the fourth pass, only special
# characters are considered, such that "The string that has a
# special character in the lowest position comes first. If two
# strings have a special character in the same position, the
# collation value of the special character determines ordering.
#
# Only a subset of the character set is used here; mostly to
# illustrate the set-up.
#
collating-symbol &lt;NULL&gt;
collating-symbol &lt;LOW_VALUE&gt;
collating-symbol &lt;LOWER-CASE&gt;
collating-symbol &lt;SUBSCRIPT-LOWER&gt;
collating-symbol &lt;SUPERSCRIPT-LOWER&gt;
collating-symbol &lt;UPPER-CASE&gt;
collating-symbol &lt;NO-ACCENT&gt;
collating-symbol &lt;PECULIAR&gt;
collating-symbol &lt;LIGATURE&gt;
collating-symbol &lt;ACUTE&gt;
collating-symbol &lt;GRAVE&gt;
# Further collating-symbols follow.
#
# Properly, the standard does not include any multi-character
# collating elements; the one below is added for completeness.
#
collating_element &lt;ch&gt; from "&lt;c&gt;&lt;h&gt;"
collating_element &lt;CH&gt; from "&lt;C&gt;&lt;H&gt;"
collating_element &lt;Ch&gt; from "&lt;C&gt;&lt;h&gt;"
#
order_start forward;backward;forward;forward,position
#
# Collating symbols are specified first in the sequence to allocate
# basic collation values to them, lower than that of any character.
&lt;NULL&gt;
&lt;LOW_VALUE&gt;
&lt;LOWER-CASE&gt;
&lt;SUBSCRIPT-LOWER&gt;
&lt;SUPERSCRIPT-LOWER&gt;
&lt;UPPER-CASE&gt;
&lt;NO-ACCENT&gt;
&lt;PECULIAR&gt;
&lt;LIGATURE&gt;
&lt;ACUTE&gt;
&lt;GRAVE&gt;
&lt;RING-ABOVE&gt;
&lt;DIAERESIS&gt;
&lt;TILDE&gt;
# Further collating symbols are given a basic collating value here.
#
# Here follow special characters.
&lt;space&gt;        IGNORE;IGNORE;IGNORE;&lt;space&gt;
# Other special characters follow here.
#
# Here follow the regular characters.
&lt;a&gt;        &lt;a&gt;;&lt;NO-ACCENT&gt;;&lt;LOWER-CASE&gt;;IGNORE
&lt;A&gt;        &lt;a&gt;;&lt;NO-ACCENT&gt;;&lt;UPPER-CASE&gt;;IGNORE
&lt;a-acute&gt;  &lt;a&gt;;&lt;ACUTE&gt;;&lt;LOWER-CASE&gt;;IGNORE
&lt;A-acute&gt;  &lt;a&gt;;&lt;ACUTE&gt;;&lt;UPPER-CASE&gt;;IGNORE
&lt;a-grave&gt;  &lt;a&gt;;&lt;GRAVE&gt;;&lt;LOWER-CASE&gt;;IGNORE
&lt;A-grave&gt;  &lt;a&gt;;&lt;GRAVE&gt;;&lt;UPPER-CASE&gt;;IGNORE
&lt;ae&gt;      "&lt;a&gt;&lt;e&gt;";"&lt;LIGATURE&gt;&lt;LIGATURE&gt;";\
          "&lt;LOWER-CASE&gt;&lt;LOWER-CASE&gt;";IGNORE
&lt;AE&gt;      "&lt;a&gt;&lt;e&gt;";"&lt;LIGATURE&gt;&lt;LIGATURE&gt;";\
          "&lt;UPPER-CASE&gt;&lt;UPPER-CASE&gt;";IGNORE
&lt;b&gt;        &lt;b&gt;;&lt;NO-ACCENT&gt;;&lt;LOWER-CASE&gt;;IGNORE
&lt;B&gt;        &lt;b&gt;;&lt;NO-ACCENT&gt;;&lt;UPPER-CASE&gt;;IGNORE
&lt;c&gt;        &lt;c&gt;;&lt;NO-ACCENT&gt;;&lt;LOWER-CASE&gt;;IGNORE
&lt;C&gt;        &lt;c&gt;;&lt;NO-ACCENT&gt;;&lt;UPPER-CASE&gt;;IGNORE
&lt;ch&gt;       &lt;ch&gt;;&lt;NO-ACCENT&gt;;&lt;LOWER-CASE&gt;;IGNORE
&lt;Ch&gt;       &lt;ch&gt;;&lt;NO-ACCENT&gt;;&lt;PECULIAR&gt;;IGNORE
&lt;CH&gt;       &lt;ch&gt;;&lt;NO-ACCENT&gt;;&lt;UPPER-CASE&gt;;IGNORE
#
# As an example, the strings "Bach" and "bach" could be encoded (for
# compare purposes) as:
# "Bach"  &lt;b&gt;;&lt;a&gt;;&lt;ch&gt;;&lt;LOW_VALUE&gt;;&lt;NO_ACCENT&gt;;&lt;NO_ACCENT&gt;;\
#         &lt;NO_ACCENT&gt;;&lt;LOW_VALUE&gt;;&lt;UPPER-CASE&gt;;&lt;LOWER-CASE&gt;;\
#         &lt;LOWER-CASE&gt;;&lt;NULL&gt;
# "bach"  &lt;b&gt;;&lt;a&gt;;&lt;ch&gt;;&lt;LOW_VALUE&gt;;&lt;NO_ACCENT&gt;;&lt;NO_ACCENT&gt;;\
#         &lt;NO_ACCENT&gt;;&lt;LOW_VALUE&gt;;&lt;LOWER-CASE&gt;;&lt;LOWER-CASE&gt;;\
#         &lt;LOWER-CASE&gt;;&lt;NULL&gt;
#
# The two strings are equal in pass 1 and 2, but differ in pass 3.
#
# Further characters follow.
#
UNDEFINED    IGNORE;IGNORE;IGNORE;IGNORE
#
order_end
#
END LC_COLLATE
#
LC_MONETARY
int_curr_symbol    "USD "
currency_symbol    "$"
mon_decimal_point  "."
mon_grouping       3;0
positive_sign      ""
negative_sign      "-"
p_cs_precedes      1
n_sign_posn        0
END LC_MONETARY
#
LC_NUMERIC
copy "US_en.ASCII"
END LC_NUMERIC
#
LC_TIME
abday   "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat"
#
day     "Sunday";"Monday";"Tuesday";"Wednesday";\
        "Thursday";"Friday";"Saturday"
#
abmon   "Jan";"Feb";"Mar";"Apr";"May";"Jun";\
         "Jul";"Aug";"Sep";"Oct";"Nov";"Dec"
#
mon     "January";"February";"March";"April";\
        "May";"June";"July";"August";"September";\
        "October";"November";"December"
#
d_t_fmt "%a %b %d %T %Z %Y\n"
END LC_TIME
#
LC_MESSAGES
yesexpr "^([yY][[:alpha:]]*)|(OK)"
#
noexpr  "^[nN][[:alpha:]]*"
END LC_MESSAGES
</tt>
</pre>


<hr size="2" noshade>
<center><font size="2"><!--footer start-->
UNIX &reg; is a registered Trademark of The Open Group.<br>
POSIX &reg; is a registered Trademark of The IEEE.<br>
[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href=
"../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>
]</font></center>

<!--footer end-->
<hr size="2" noshade>
</body>
</html>