3178 lines
121 KiB
HTML
3178 lines
121 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 -->
|
|
<!-- Copyright (c) 2001 The Open Group, All Rights Reserved -->
|
|
<title>awk</title>
|
|
</head>
|
|
<body bgcolor="white">
|
|
<script type="text/javascript" language="JavaScript" src="../jscript/codes.js">
|
|
</script>
|
|
|
|
<basefont size="3"> <a name="awk"></a> <a name="tag_04_06"></a><!-- awk -->
|
|
<!--header start-->
|
|
<center><font size="2">The Open Group Base Specifications Issue 6<br>
|
|
IEEE Std 1003.1-2001<br>
|
|
Copyright © 2001 The IEEE and The Open Group, All Rights reserved.</font></center>
|
|
|
|
<!--header end-->
|
|
<hr size="2" noshade>
|
|
<h4><a name="tag_04_06_01"></a>NAME</h4>
|
|
|
|
<blockquote>awk - pattern scanning and processing language</blockquote>
|
|
|
|
<h4><a name="tag_04_06_02"></a>SYNOPSIS</h4>
|
|
|
|
<blockquote class="synopsis">
|
|
<p><code><tt>awk</tt> <b>[</b><tt>-F</tt> <i>ERE</i><b>][</b><tt>-v</tt> <i>assignment</i><b>]</b> <tt>...</tt> <i>program</i>
|
|
<b>[</b><i>argument</i> <tt>...</tt><b>]</b><tt><br>
|
|
<br>
|
|
awk</tt> <b>[</b><tt>-F</tt> <i>ERE</i><b>]</b> <tt>-f</tt> <i>progfile</i> <tt>...</tt> <b>[</b><tt>-v</tt>
|
|
<i>assignment</i><b>]</b> <tt>...</tt><b>[</b><i>argument</i> <tt>...</tt><b>]</b><tt><br>
|
|
</tt></code></p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_03"></a>DESCRIPTION</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>awk</i> utility shall execute programs written in the <i>awk</i> programming language, which is specialized for textual
|
|
data manipulation. An <i>awk</i> program is a sequence of patterns and corresponding actions. When input is read that matches a
|
|
pattern, the action associated with that pattern is carried out.</p>
|
|
|
|
<p>Input shall be interpreted as a sequence of records. By default, a record is a line, less its terminating <newline>, but
|
|
this can be changed by using the <b>RS</b> built-in variable. Each record of input shall be matched in turn against each pattern in
|
|
the program. For each pattern matched, the associated action shall be executed.</p>
|
|
|
|
<p>The <i>awk</i> utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non-
|
|
<blank>s. This default white-space field delimiter can be changed by using the <b>FS</b> built-in variable or <b>-F</b>
|
|
<i>ERE</i>. The <i>awk</i> utility shall denote the first field in a record $1, the second $2, and so on. The symbol $0 shall refer
|
|
to the entire record; setting any other field causes the re-evaluation of $0. Assigning to $0 shall reset the values of all other
|
|
fields and the <b>NF</b> built-in variable.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_04"></a>OPTIONS</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>awk</i> utility shall conform to the Base Definitions volume of IEEE Std 1003.1-2001, <a href=
|
|
"../basedefs/xbd_chap12.html#tag_12_02">Section 12.2, Utility Syntax Guidelines</a>.</p>
|
|
|
|
<p>The following options shall be supported:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>-F </b> <i>ERE</i></dt>
|
|
|
|
<dd>Define the input field separator to be the extended regular expression <i>ERE</i>, before any input is read; see <a href=
|
|
"#tag_04_06_13_04">Regular Expressions</a> .</dd>
|
|
|
|
<dt><b>-f </b> <i>progfile</i></dt>
|
|
|
|
<dd>Specify the pathname of the file <i>progfile</i> containing an <i>awk</i> program. If multiple instances of this option are
|
|
specified, the concatenation of the files specified as <i>progfile</i> in the order specified shall be the <i>awk</i> program. The
|
|
<i>awk</i> program can alternatively be specified in the command line as a single argument.</dd>
|
|
|
|
<dt><b>-v </b> <i>assignment</i></dt>
|
|
|
|
<dd>
|
|
The application shall ensure that the <i>assignment</i> argument is in the same form as an <i>assignment</i> operand. The specified
|
|
variable assignment shall occur prior to executing the <i>awk</i> program, including the actions associated with <b>BEGIN</b>
|
|
patterns (if any). Multiple occurrences of this option can be specified.</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_05"></a>OPERANDS</h4>
|
|
|
|
<blockquote>
|
|
<p>The following operands shall be supported:</p>
|
|
|
|
<dl compact>
|
|
<dt><i>program</i></dt>
|
|
|
|
<dd>If no <b>-f</b> option is specified, the first operand to <i>awk</i> shall be the text of the <i>awk</i> program. The
|
|
application shall supply the <i>program</i> operand as a single argument to <i>awk</i>. If the text does not end in a
|
|
<newline>, <i>awk</i> shall interpret the text as if it did.</dd>
|
|
|
|
<dt><i>argument</i></dt>
|
|
|
|
<dd>Either of the following two types of <i>argument</i> can be intermixed:
|
|
|
|
<dl compact>
|
|
<dt><i>file</i></dt>
|
|
|
|
<dd>A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no
|
|
<i>file</i> operands are specified, or if a <i>file</i> operand is <tt>'-'</tt> , the standard input shall be used.</dd>
|
|
|
|
<dt><i>assignment</i></dt>
|
|
|
|
<dd>An operand that begins with an underscore or alphabetic character from the portable character set (see the table in the Base
|
|
Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_01">Section 6.1, Portable
|
|
Character Set</a>), followed by a sequence of underscores, digits, and alphabetics from the portable character set, followed by the
|
|
<tt>'='</tt> character, shall specify a variable assignment rather than a pathname. The characters before the <tt>'='</tt>
|
|
represent the name of an <i>awk</i> variable; if that name is an <i>awk</i> reserved word (see <a href=
|
|
"#tag_04_06_13_16">Grammar</a> ) the behavior is undefined. The characters following the equal sign shall be interpreted as if they
|
|
appeared in the <i>awk</i> program preceded and followed by a double-quote ( <tt>' )'</tt> character, as a <b>STRING</b> token (see
|
|
<a href="#tag_04_06_13_16">Grammar</a> ), except that if the last character is an unescaped backslash, it shall be interpreted as a
|
|
literal backslash rather than as the first character of the sequence <tt>"\""</tt> . The variable shall be assigned the value of
|
|
that <b>STRING</b> token and, if appropriate, shall be considered a <i>numeric string</i> (see <a href=
|
|
"#tag_04_06_13_02">Expressions in awk</a> ), the variable shall also be assigned its numeric value. Each such variable assignment
|
|
shall occur just prior to the processing of the following <i>file</i>, if any. Thus, an assignment before the first <i>file</i>
|
|
argument shall be executed after the <b>BEGIN</b> actions (if any), while an assignment after the last <i>file</i> argument shall
|
|
occur before the <b>END</b> actions (if any). If there are no <i>file</i> arguments, assignments shall be executed before
|
|
processing the standard input.</dd>
|
|
</dl>
|
|
</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_06"></a>STDIN</h4>
|
|
|
|
<blockquote>
|
|
<p>The standard input shall be used only if no <i>file</i> operands are specified, or if a <i>file</i> operand is <tt>'-'</tt> ;
|
|
see the INPUT FILES section. If the <i>awk</i> program contains no actions and no patterns, but is otherwise a valid <i>awk</i>
|
|
program, standard input and any <i>file</i> operands shall not be read and <i>awk</i> shall exit with a return status of zero.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_07"></a>INPUT FILES</h4>
|
|
|
|
<blockquote>
|
|
<p>Input files to the <i>awk</i> program from any of the following sources shall be text files:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Any <i>file</i> operands or their equivalents, achieved by modifying the <i>awk</i> variables <b>ARGV</b> and <b>ARGC</b></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Standard input in the absence of any <i>file</i> operands</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Arguments to the <b>getline</b> function</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>Whether the variable <b>RS</b> is set to a value other than a <newline> or not, for these files, implementations shall
|
|
support records terminated with the specified separator up to {LINE_MAX} bytes and may support longer records.</p>
|
|
|
|
<p>If <b>-f</b> <i>progfile</i> is specified, the application shall ensure that the files named by each of the <i>progfile</i>
|
|
option-arguments are text files and their concatenation, in the same order as they appear in the arguments, is an <i>awk</i>
|
|
program.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_08"></a>ENVIRONMENT VARIABLES</h4>
|
|
|
|
<blockquote>
|
|
<p>The following environment variables shall affect the execution of <i>awk</i>:</p>
|
|
|
|
<dl compact>
|
|
<dt><i>LANG</i></dt>
|
|
|
|
<dd>Provide a default value for the internationalization variables that are unset or null. (See the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap08.html#tag_08_02">Section 8.2, Internationalization Variables</a> for
|
|
the precedence of internationalization variables used to determine the values of locale categories.)</dd>
|
|
|
|
<dt><i>LC_ALL</i></dt>
|
|
|
|
<dd>If set to a non-empty string value, override the values of all the other internationalization variables.</dd>
|
|
|
|
<dt><i>LC_COLLATE</i></dt>
|
|
|
|
<dd>
|
|
Determine the locale for the behavior of ranges, equivalence classes, and multi-character collating elements within regular
|
|
expressions and in comparisons of string values.</dd>
|
|
|
|
<dt><i>LC_CTYPE</i></dt>
|
|
|
|
<dd>Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as
|
|
opposed to multi-byte characters in arguments and input files), the behavior of character classes within regular expressions, the
|
|
identification of characters as letters, and the mapping of uppercase and lowercase characters for the <b>toupper</b> and
|
|
<b>tolower</b> functions.</dd>
|
|
|
|
<dt><i>LC_MESSAGES</i></dt>
|
|
|
|
<dd>Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard
|
|
error.</dd>
|
|
|
|
<dt><i>LC_NUMERIC</i></dt>
|
|
|
|
<dd>
|
|
Determine the radix character used when interpreting numeric input, performing conversions between numeric and string values, and
|
|
formatting numeric output. Regardless of locale, the period character (the decimal-point character of the POSIX locale) is the
|
|
decimal-point character recognized in processing <i>awk</i> programs (including assignments in command line arguments).</dd>
|
|
|
|
<dt><i>NLSPATH</i></dt>
|
|
|
|
<dd><sup>[<a href="javascript:open_code('XSI')">XSI</a>]</sup> <img src="../images/opt-start.gif" alt="[Option Start]" border="0">
|
|
Determine the location of message catalogs for the processing of <i>LC_MESSAGES .</i> <img src="../images/opt-end.gif" alt=
|
|
"[Option End]" border="0"></dd>
|
|
|
|
<dt><i>PATH</i></dt>
|
|
|
|
<dd>Determine the search path when looking for commands executed by <i>system</i>(<i>expr</i>), or input and output pipes; see the
|
|
Base Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap08.html">Chapter 8, Environment
|
|
Variables</a>.</dd>
|
|
</dl>
|
|
|
|
<p>In addition, all environment variables shall be visible via the <i>awk</i> variable <b>ENVIRON</b>.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_09"></a>ASYNCHRONOUS EVENTS</h4>
|
|
|
|
<blockquote>
|
|
<p>Default.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_10"></a>STDOUT</h4>
|
|
|
|
<blockquote>
|
|
<p>The nature of the output files depends on the <i>awk</i> program.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_11"></a>STDERR</h4>
|
|
|
|
<blockquote>
|
|
<p>The standard error shall be used only for diagnostic messages.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_12"></a>OUTPUT FILES</h4>
|
|
|
|
<blockquote>
|
|
<p>The nature of the output files depends on the <i>awk</i> program.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_13"></a>EXTENDED DESCRIPTION</h4>
|
|
|
|
<blockquote>
|
|
<h5><a name="tag_04_06_13_01"></a>Overall Program Structure</h5>
|
|
|
|
<p>An <i>awk</i> program is composed of pairs of the form:</p>
|
|
|
|
<pre>
|
|
<i>pattern</i> <tt>{</tt> <i>action</i> <tt>}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Either the pattern or the action (including the enclosing brace characters) can be omitted.</p>
|
|
|
|
<p>A missing pattern shall match any record of input, and a missing action shall be equivalent to:</p>
|
|
|
|
<pre>
|
|
<tt>{ print }
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Execution of the <i>awk</i> program shall start by first executing the actions associated with all <b>BEGIN</b> patterns in the
|
|
order they occur in the program. Then each <i>file</i> operand (or standard input if no files were specified) shall be processed in
|
|
turn by reading data from the file until a record separator is seen ( <newline> by default). Before the first reference to a
|
|
field in the record is evaluated, the record shall be split into fields, according to the rules in <a href=
|
|
"#tag_04_06_13_04">Regular Expressions</a> , using the value of <b>FS</b> that was current at the time the record was read. Each
|
|
pattern in the program then shall be evaluated in the order of occurrence, and the action associated with each pattern that matches
|
|
the current record executed. The action for a matching pattern shall be executed before evaluating subsequent patterns. Finally,
|
|
the actions associated with all <b>END</b> patterns shall be executed in the order they occur in the program.</p>
|
|
|
|
<h5><a name="tag_04_06_13_02"></a>Expressions in awk</h5>
|
|
|
|
<p>Expressions describe computations used in <i>patterns</i> and <i>actions</i>. In the following table, valid expression
|
|
operations are given in groups from highest precedence first to lowest precedence last, with equal-precedence operators grouped
|
|
between horizontal lines. In expression evaluation, where the grammar is formally ambiguous, higher precedence operators shall be
|
|
evaluated before lower precedence operators. In this table <i>expr</i>, <i>expr1</i>, <i>expr2</i>, and <i>expr3</i> represent any
|
|
expression, while lvalue represents any entity that can be assigned to (that is, on the left side of an assignment operator). The
|
|
precise syntax of expressions is given in <a href="#tag_04_06_13_16">Grammar</a> .</p>
|
|
|
|
<center><a name="tagtcjh_10"></a><b>Table: Expressions in Decreasing Precedence in <i>awk</i></b></center>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Syntax</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Name</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Type of Result</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Associativity</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">( <i>expr</i> )</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Grouping</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Type of <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">$<i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Field reference</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">String</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">++ lvalue</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Pre-increment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">-- lvalue</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Pre-decrement</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue ++</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Post-increment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue --</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Post-decrement</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> ^ <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Exponentiation</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">! <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Logical not</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">+ <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Unary plus</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">- <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Unary minus</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">N/A</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> * <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Multiplication</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> / <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Division</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> % <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Modulus</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> + <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Addition</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> - <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Subtraction</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">String concatenation</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">String</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> < <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Less than</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> <= <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Less than or equal to</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> != <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Not equal to</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> == <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Equal to</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> > <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Greater than</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> >= <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Greater than or equal to</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> ˜ <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">ERE match</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> !˜ <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">ERE non-match</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">None</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> in array</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Array membership</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">( <i>index</i> ) in <i>array</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Multi-dimension array</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">membership</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> && <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Logical AND</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr</i> || <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Logical OR</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Left</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><i>expr1</i> ? <i>expr2</i> : <i>expr3</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Conditional expression</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Type of selected</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><i>expr2</i> or <i>expr3</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue ^= <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Exponentiation assignment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue %= <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Modulus assignment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue *= <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Multiplication assignment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue /= <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Division assignment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue += <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Addition assignment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue -= <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Subtraction assignment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Numeric</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">lvalue = <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Assignment</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Type of <i>expr</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Right</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<p>Each expression shall have either a string value, a numeric value, or both. Except as stated for specific contexts, the value of
|
|
an expression shall be implicitly converted to the type needed for the context in which it is used. A string value shall be
|
|
converted to a numeric value by the equivalent of the following calls to functions defined by the ISO C standard:</p>
|
|
|
|
<pre>
|
|
<tt>setlocale(LC_NUMERIC, "");
|
|
</tt><i>numeric_value</i> <tt>= atof(</tt><i>string_value</i><tt>);
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>A numeric value that is exactly equal to the value of an integer (see <a href="xcu_chap01.html#tag_01_07_02"><i>Concepts Derived
|
|
from the ISO C Standard</i></a> ) shall be converted to a string by the equivalent of a call to the <b>sprintf</b> function (see <a
|
|
href="#tag_04_06_13_13">String Functions</a> ) with the string <tt>"%d"</tt> as the <i>fmt</i> argument and the numeric value being
|
|
converted as the first and only <i>expr</i> argument. Any other numeric value shall be converted to a string by the equivalent of a
|
|
call to the <b>sprintf</b> function with the value of the variable <b>CONVFMT</b> as the <i>fmt</i> argument and the numeric value
|
|
being converted as the first and only <i>expr</i> argument. The result of the conversion is unspecified if the value of
|
|
<b>CONVFMT</b> is not a floating-point format specification. This volume of IEEE Std 1003.1-2001 specifies no explicit
|
|
conversions between numbers and strings. An application can force an expression to be treated as a number by adding zero to it, or
|
|
can force it to be treated as a string by concatenating the null string ( <tt>""</tt> ) to it.</p>
|
|
|
|
<p>A string value shall be considered a <i>numeric string</i> if it comes from one of the following:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Field variables</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Input from the <i>getline</i>() function</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><b>FILENAME</b></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><b>ARGV</b> array elements</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p><b>ENVIRON</b> array elements</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Array elements created by the <i>split</i>() function</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A command line variable assignment</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Variable assignment from another numeric string variable</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>and after all the following conversions have been applied, the resulting string would lexically be recognized as a <b>NUMBER</b>
|
|
token as described by the lexical conventions in <a href="#tag_04_06_13_16">Grammar</a> :</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>All leading and trailing <blank>s are discarded.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If the first non- <blank> is <tt>'+'</tt> or <tt>'-'</tt> , it is discarded.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Changing each occurrence of the decimal point character from the current locale to a period.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>If a <tt>'-'</tt> character is ignored in the preceding description, the numeric value of the <i>numeric string</i> shall be the
|
|
negation of the numeric value of the recognized <b>NUMBER</b> token. Otherwise, the numeric value of the <i>numeric string</i>
|
|
shall be the numeric value of the recognized <b>NUMBER</b> token. Whether or not a string is a <i>numeric string</i> shall be
|
|
relevant only in contexts where that term is used in this section.</p>
|
|
|
|
<p>When an expression is used in a Boolean context, if it has a numeric value, a value of zero shall be treated as false and any
|
|
other value shall be treated as true. Otherwise, a string value of the null string shall be treated as false and any other value
|
|
shall be treated as true. A Boolean context shall be one of the following:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>The first subexpression of a conditional expression</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>An expression operated on by logical NOT, logical AND, or logical OR</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The second expression of a <b>for</b> statement</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The expression of an <b>if</b> statement</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The expression of the <b>while</b> clause in either a <b>while</b> or <b>do</b>... <b>while</b> statement</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>An expression used as a pattern (as in Overall Program Structure)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>All arithmetic shall follow the semantics of floating-point arithmetic as specified by the ISO C standard (see <a href=
|
|
"xcu_chap01.html#tag_01_07_02"><i>Concepts Derived from the ISO C Standard</i></a> ).</p>
|
|
|
|
<p>The value of the expression:</p>
|
|
|
|
<pre>
|
|
<i>expr1</i> <tt>^</tt> <i>expr2</i>
|
|
</pre>
|
|
|
|
<p>shall be equivalent to the value returned by the ISO C standard function call:</p>
|
|
|
|
<pre>
|
|
<tt>pow(</tt><i>expr1</i><tt>,</tt> <i>expr2</i><tt>)
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The expression:</p>
|
|
|
|
<pre>
|
|
<tt>lvalue ^=</tt> <i>expr</i>
|
|
</pre>
|
|
|
|
<p>shall be equivalent to the ISO C standard expression:</p>
|
|
|
|
<pre>
|
|
<tt>lvalue = pow(lvalue,</tt> <i>expr</i><tt>)
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>except that lvalue shall be evaluated only once. The value of the expression:</p>
|
|
|
|
<pre>
|
|
<i>expr1</i> <tt>%</tt> <i>expr2</i>
|
|
</pre>
|
|
|
|
<p>shall be equivalent to the value returned by the ISO C standard function call:</p>
|
|
|
|
<pre>
|
|
<tt>fmod(</tt><i>expr1</i><tt>,</tt> <i>expr2</i><tt>)
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The expression:</p>
|
|
|
|
<pre>
|
|
<tt>lvalue %=</tt> <i>expr</i>
|
|
</pre>
|
|
|
|
<p>shall be equivalent to the ISO C standard expression:</p>
|
|
|
|
<pre>
|
|
<tt>lvalue = fmod(lvalue,</tt> <i>expr</i><tt>)
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>except that lvalue shall be evaluated only once.</p>
|
|
|
|
<p>Variables and fields shall be set by the assignment statement:</p>
|
|
|
|
<pre>
|
|
<tt>lvalue =</tt> <i>expression</i>
|
|
</pre>
|
|
|
|
<p>and the type of <i>expression</i> shall determine the resulting variable type. The assignment includes the arithmetic
|
|
assignments ( <tt>"+="</tt> , <tt>"-="</tt> , <tt>"*="</tt> , <tt>"/="</tt> , <tt>"%="</tt> , <tt>"^="</tt> , <tt>"++"</tt> ,
|
|
<tt>"--"</tt> ) all of which shall produce a numeric result. The left-hand side of an assignment and the target of increment and
|
|
decrement operators can be one of a variable, an array with index, or a field selector.</p>
|
|
|
|
<p>The <i>awk</i> language supplies arrays that are used for storing numbers or strings. Arrays need not be declared. They shall
|
|
initially be empty, and their sizes shall change dynamically. The subscripts, or element identifiers, are strings, providing a type
|
|
of associative array capability. An array name followed by a subscript within square brackets can be used as an lvalue and thus as
|
|
an expression, as described in the grammar; see <a href="#tag_04_06_13_16">Grammar</a> . Unsubscripted array names can be used in
|
|
only the following contexts:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>A parameter in a function definition or function call</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <b>NAME</b> token following any use of the keyword <b>in</b> as specified in the grammar (see <a href=
|
|
"#tag_04_06_13_16">Grammar</a> ); if the name used in this context is not an array name, the behavior is undefined</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>A valid array <i>index</i> shall consist of one or more comma-separated expressions, similar to the way in which
|
|
multi-dimensional arrays are indexed in some programming languages. Because <i>awk</i> arrays are really one-dimensional, such a
|
|
comma-separated list shall be converted to a single string by concatenating the string values of the separate expressions, each
|
|
separated from the other by the value of the <b>SUBSEP</b> variable. Thus, the following two index operations shall be
|
|
equivalent:</p>
|
|
|
|
<pre>
|
|
<i>var</i><b>[</b><i>expr1</i><tt>,</tt> <i>expr2</i><tt>, ...</tt> <i>exprn</i><b>]
|
|
<br>
|
|
</b><i>var</i><b>[</b><i>expr1</i> <tt>SUBSEP</tt> <i>expr2</i> <tt>SUBSEP ... SUBSEP</tt> <i>exprn</i><b>]</b>
|
|
</pre>
|
|
|
|
<p>The application shall ensure that a multi-dimensioned <i>index</i> used with the <b>in</b> operator is parenthesized. The
|
|
<b>in</b> operator, which tests for the existence of a particular array element, shall not cause that element to exist. Any other
|
|
reference to a nonexistent array element shall automatically create it.</p>
|
|
|
|
<p>Comparisons (with the <tt>'<'</tt> , <tt>"<="</tt> , <tt>"!="</tt> , <tt>"=="</tt> , <tt>'>'</tt> , and
|
|
<tt>">="</tt> operators) shall be made numerically if both operands are numeric, if one is numeric and the other has a string
|
|
value that is a numeric string, or if one is numeric and the other has the uninitialized value. Otherwise, operands shall be
|
|
converted to strings as required and a string comparison shall be made using the locale-specific collation sequence. The value of
|
|
the comparison expression shall be 1 if the relation is true, or 0 if the relation is false.</p>
|
|
|
|
<h5><a name="tag_04_06_13_03"></a>Variables and Special Variables</h5>
|
|
|
|
<p>Variables can be used in an <i>awk</i> program by referencing them. With the exception of function parameters (see <a href=
|
|
"#tag_04_06_13_15">User-Defined Functions</a> ), they are not explicitly declared. Function parameter names shall be local to the
|
|
function; all other variable names shall be global. The same name shall not be used as both a function parameter name and as the
|
|
name of a function or a special <i>awk</i> variable. The same name shall not be used both as a variable name with global scope and
|
|
as the name of a function. The same name shall not be used within the same scope both as a scalar variable and as an array.
|
|
Uninitialized variables, including scalar variables, array elements, and field variables, shall have an uninitialized value. An
|
|
uninitialized value shall have both a numeric value of zero and a string value of the empty string. Evaluation of variables with an
|
|
uninitialized value, to either string or numeric, shall be determined by the context in which they are used.</p>
|
|
|
|
<p>Field variables shall be designated by a <tt>'$'</tt> followed by a number or numerical expression. The effect of the field
|
|
number <i>expression</i> evaluating to anything other than a non-negative integer is unspecified; uninitialized variables or string
|
|
values need not be converted to numeric values in this context. New field variables can be created by assigning a value to them.
|
|
References to nonexistent fields (that is, fields after $<b>NF</b>), shall evaluate to the uninitialized value. Such references
|
|
shall not create new fields. However, assigning to a nonexistent field (for example, $(<b>NF</b>+2)=5) shall increase the value of
|
|
<b>NF</b>; create any intervening fields with the uninitialized value; and cause the value of $0 to be recomputed, with the fields
|
|
being separated by the value of <b>OFS</b>. Each field variable shall have a string value or an uninitialized value when created.
|
|
Field variables shall have the uninitialized value when created from $0 using <b>FS</b> and the variable does not contain any
|
|
characters. If appropriate, the field variable shall be considered a numeric string (see <a href="#tag_04_06_13_02">Expressions in
|
|
awk</a> ).</p>
|
|
|
|
<p>Implementations shall support the following other special variables that are set by <i>awk</i>:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>ARGC</b></dt>
|
|
|
|
<dd>The number of elements in the <b>ARGV</b> array.</dd>
|
|
|
|
<dt><b>ARGV</b></dt>
|
|
|
|
<dd>An array of command line arguments, excluding options and the <i>program</i> argument, numbered from zero to <b>ARGC</b>-1.
|
|
|
|
<p>The arguments in <b>ARGV</b> can be modified or added to; <b>ARGC</b> can be altered. As each input file ends, <i>awk</i> shall
|
|
treat the next non-null element of <b>ARGV</b>, up to the current value of <b>ARGC</b>-1, inclusive, as the name of the next input
|
|
file. Thus, setting an element of <b>ARGV</b> to null means that it shall not be treated as an input file. The name <tt>'-'</tt>
|
|
indicates the standard input. If an argument matches the format of an <i>assignment</i> operand, this argument shall be treated as
|
|
an <i>assignment</i> rather than a <i>file</i> argument.</p>
|
|
</dd>
|
|
|
|
<dt><b>CONVFMT</b></dt>
|
|
|
|
<dd>The <b>printf</b> format for converting numbers to strings (except for output statements, where <b>OFMT</b> is used);
|
|
<tt>"%.6g"</tt> by default.</dd>
|
|
|
|
<dt><b>ENVIRON</b></dt>
|
|
|
|
<dd>An array representing the value of the environment, as described in the <i>exec</i> functions defined in the System Interfaces
|
|
volume of IEEE Std 1003.1-2001. The indices of the array shall be strings consisting of the names of the environment
|
|
variables, and the value of each array element shall be a string consisting of the value of that variable. If appropriate, the
|
|
environment variable shall be considered a <i>numeric string</i> (see <a href="#tag_04_06_13_02">Expressions in awk</a> ); the
|
|
array element shall also have its numeric value.
|
|
|
|
<p>In all cases where the behavior of <i>awk</i> is affected by environment variables (including the environment of any commands
|
|
that <i>awk</i> executes via the <b>system</b> function or via pipeline redirections with the <b>print</b> statement, the
|
|
<b>printf</b> statement, or the <b>getline</b> function), the environment used shall be the environment at the time <i>awk</i>
|
|
began executing; it is implementation-defined whether any modification of <b>ENVIRON</b> affects this environment.</p>
|
|
</dd>
|
|
|
|
<dt><b>FILENAME</b></dt>
|
|
|
|
<dd>A pathname of the current input file. Inside a <b>BEGIN</b> action the value is undefined. Inside an <b>END</b> action the
|
|
value shall be the name of the last input file processed.</dd>
|
|
|
|
<dt><b>FNR</b></dt>
|
|
|
|
<dd>The ordinal number of the current record in the current file. Inside a <b>BEGIN</b> action the value shall be zero. Inside an
|
|
<b>END</b> action the value shall be the number of the last record processed in the last file processed.</dd>
|
|
|
|
<dt><b>FS</b></dt>
|
|
|
|
<dd>Input field separator regular expression; a <space> by default.</dd>
|
|
|
|
<dt><b>NF</b></dt>
|
|
|
|
<dd>The number of fields in the current record. Inside a <b>BEGIN</b> action, the use of <b>NF</b> is undefined unless a
|
|
<b>getline</b> function without a <i>var</i> argument is executed previously. Inside an <b>END</b> action, <b>NF</b> shall retain
|
|
the value it had for the last record read, unless a subsequent, redirected, <b>getline</b> function without a <i>var</i> argument
|
|
is performed prior to entering the <b>END</b> action.</dd>
|
|
|
|
<dt><b>NR</b></dt>
|
|
|
|
<dd>The ordinal number of the current record from the start of input. Inside a <b>BEGIN</b> action the value shall be zero. Inside
|
|
an <b>END</b> action the value shall be the number of the last record processed.</dd>
|
|
|
|
<dt><b>OFMT</b></dt>
|
|
|
|
<dd>The <b>printf</b> format for converting numbers to strings in output statements (see <a href="#tag_04_06_13_10">Output
|
|
Statements</a> ); <tt>"%.6g"</tt> by default. The result of the conversion is unspecified if the value of <b>OFMT</b> is not a
|
|
floating-point format specification.</dd>
|
|
|
|
<dt><b>OFS</b></dt>
|
|
|
|
<dd>The <b>print</b> statement output field separation; <space> by default.</dd>
|
|
|
|
<dt><b>ORS</b></dt>
|
|
|
|
<dd>The <b>print</b> statement output record separator; a <newline> by default.</dd>
|
|
|
|
<dt><b>RLENGTH</b></dt>
|
|
|
|
<dd>The length of the string matched by the <b>match</b> function.</dd>
|
|
|
|
<dt><b>RS</b></dt>
|
|
|
|
<dd>The first character of the string value of <b>RS</b> shall be the input record separator; a <newline> by default. If
|
|
<b>RS</b> contains more than one character, the results are unspecified. If <b>RS</b> is null, then records are separated by
|
|
sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty
|
|
records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of
|
|
<b>FS</b> is.</dd>
|
|
|
|
<dt><b>RSTART</b></dt>
|
|
|
|
<dd>The starting position of the string matched by the <b>match</b> function, numbering from 1. This shall always be equivalent to
|
|
the return value of the <b>match</b> function.</dd>
|
|
|
|
<dt><b>SUBSEP</b></dt>
|
|
|
|
<dd>The subscript separator string for multi-dimensional arrays; the default value is implementation-defined.</dd>
|
|
</dl>
|
|
|
|
<h5><a name="tag_04_06_13_04"></a>Regular Expressions</h5>
|
|
|
|
<p>The <i>awk</i> utility shall make use of the extended regular expression notation (see the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap09.html#tag_09_04">Section 9.4, Extended Regular Expressions</a>)
|
|
except that it shall allow the use of C-language conventions for escaping special characters within the EREs, as specified in the
|
|
table in the Base Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File
|
|
Format Notation</a> ( <tt>'\\'</tt> , <tt>'\a'</tt> , <tt>'\b'</tt> , <tt>'\f'</tt> , <tt>'\n'</tt> , <tt>'\r'</tt> , <tt>'\t'</tt>
|
|
, <tt>'\v'</tt> ) and the following table; these escape sequences shall be recognized both inside and outside bracket expressions.
|
|
Note that records need not be separated by <newline>s and string constants can contain <newline>s, so even the
|
|
<tt>"\n"</tt> sequence is valid in <i>awk</i> EREs. Using a slash character within an ERE requires the escaping shown in the
|
|
following table.<br>
|
|
</p>
|
|
|
|
<center><b>Table: Escape Sequences in <i>awk</i></b></center>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Escape</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b> </b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b> </b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Sequence</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Description</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Meaning</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">\"</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Backslash quotation-mark</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Quotation-mark character</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">\/</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Backslash slash</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Slash character</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">\ddd</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). If
|
|
all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">The character whose encoding is represented by the one, two, or three-digit octal integer. Multi-byte characters
|
|
require multiple, concatenated escape sequences of this type, including the leading <tt>'\'</tt> for each byte.</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">\c</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">A backslash character followed by any character not described in this table or in the table in the Base Definitions
|
|
volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a> ( <tt>'\\'</tt>
|
|
, <tt>'\a'</tt> , <tt>'\b'</tt> , <tt>'\f'</tt> , <tt>'\n'</tt> , <tt>'\r'</tt> , <tt>'\t'</tt> , <tt>'\v'</tt> ).</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Undefined</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<p>A regular expression can be matched against a specific field or string by using one of the two regular expression matching
|
|
operators, <tt>'˜'</tt> and <tt>"!˜"</tt> . These operators shall interpret their right-hand operand as a regular
|
|
expression and their left-hand operand as a string. If the regular expression matches the string, the <tt>'˜'</tt> expression
|
|
shall evaluate to a value of 1, and the <tt>"!˜"</tt> expression shall evaluate to a value of 0. (The regular expression
|
|
matching operation is as defined by the term matched in the Base Definitions volume of IEEE Std 1003.1-2001, <a href=
|
|
"../basedefs/xbd_chap09.html#tag_09_01">Section 9.1, Regular Expression Definitions</a>, where a match occurs on any part of the
|
|
string unless the regular expression is limited with the circumflex or dollar sign special characters.) If the regular expression
|
|
does not match the string, the <tt>'˜'</tt> expression shall evaluate to a value of 0, and the <tt>"!˜"</tt> expression
|
|
shall evaluate to a value of 1. If the right-hand operand is any expression other than the lexical token <b>ERE</b>, the string
|
|
value of the expression shall be interpreted as an extended regular expression, including the escape conventions described above.
|
|
Note that these same escape conventions shall also be applied in determining the value of a string literal (the lexical token
|
|
<b>STRING</b>), and thus shall be applied a second time when a string literal is used in this context.</p>
|
|
|
|
<p>When an <b>ERE</b> token appears as an expression in any context other than as the right-hand of the <tt>'˜'</tt> or
|
|
<tt>"!˜"</tt> operator or as one of the built-in function arguments described below, the value of the resulting expression
|
|
shall be the equivalent of:</p>
|
|
|
|
<pre>
|
|
<tt>$0 ˜ /</tt><i>ere</i><tt>/
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The <i>ere</i> argument to the <b>gsub</b>, <b>match</b>, <b>sub</b> functions, and the <i>fs</i> argument to the <b>split</b>
|
|
function (see <a href="#tag_04_06_13_13">String Functions</a> ) shall be interpreted as extended regular expressions. These can be
|
|
either <b>ERE</b> tokens or arbitrary expressions, and shall be interpreted in the same manner as the right-hand side of the
|
|
<tt>'˜'</tt> or <tt>"!˜"</tt> operator.</p>
|
|
|
|
<p>An extended regular expression can be used to separate fields by using the <b>-F</b> <i>ERE</i> option or by assigning a string
|
|
containing the expression to the built-in variable <b>FS</b>. The default value of the <b>FS</b> variable shall be a single
|
|
<space>. The following describes <b>FS</b> behavior:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>If <b>FS</b> is a null string, the behavior is unspecified.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If <b>FS</b> is a single character:</p>
|
|
|
|
<ol type="a">
|
|
<li>
|
|
<p>If <b>FS</b> is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more
|
|
<blank>s.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Otherwise, if <b>FS</b> is any other character <i>c</i>, fields shall be delimited by each single occurrence of <i>c</i>.</p>
|
|
</li>
|
|
</ol>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Otherwise, the string value of <b>FS</b> shall be considered to be an extended regular expression. Each occurrence of a sequence
|
|
matching the extended regular expression shall delimit fields.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>Except for the <tt>'˜'</tt> and <tt>"!˜"</tt> operators, and in the <b>gsub</b>, <b>match</b>, <b>split</b>, and
|
|
<b>sub</b> built-in functions, ERE matching shall be based on input records; that is, record separator characters (the first
|
|
character of the value of the variable <b>RS</b>, <newline> by default) cannot be embedded in the expression, and no
|
|
expression shall match the record separator character. If the record separator is not <newline>, <newline>s embedded in
|
|
the expression can be matched. For the <tt>'˜'</tt> and <tt>"!˜"</tt> operators, and in those four built-in functions,
|
|
ERE matching shall be based on text strings; that is, any character (including <newline> and the record separator) can be
|
|
embedded in the pattern, and an appropriate pattern shall match any character. However, in all <i>awk</i> ERE matching, the use of
|
|
one or more NUL characters in the pattern, input record, or text string produces undefined results.</p>
|
|
|
|
<h5><a name="tag_04_06_13_05"></a>Patterns</h5>
|
|
|
|
<p>A <i>pattern</i> is any valid <i>expression</i>, a range specified by two expressions separated by a comma, or one of the two
|
|
special patterns <b>BEGIN</b> or <b>END</b>.</p>
|
|
|
|
<h5><a name="tag_04_06_13_06"></a>Special Patterns</h5>
|
|
|
|
<p>The <i>awk</i> utility shall recognize two special patterns, <b>BEGIN</b> and <b>END</b>. Each <b>BEGIN</b> pattern shall be
|
|
matched once and its associated action executed before the first record of input is read (except possibly by use of the
|
|
<b>getline</b> function-see <a href="#tag_04_06_13_14">Input/Output and General Functions</a> - in a prior <b>BEGIN</b> action) and
|
|
before command line assignment is done. Each <b>END</b> pattern shall be matched once and its associated action executed after the
|
|
last record of input has been read. These two patterns shall have associated actions.</p>
|
|
|
|
<p><b>BEGIN</b> and <b>END</b> shall not combine with other patterns. Multiple <b>BEGIN</b> and <b>END</b> patterns shall be
|
|
allowed. The actions associated with the <b>BEGIN</b> patterns shall be executed in the order specified in the program, as are the
|
|
<b>END</b> actions. An <b>END</b> pattern can precede a <b>BEGIN</b> pattern in a program.</p>
|
|
|
|
<p>If an <i>awk</i> program consists of only actions with the pattern <b>BEGIN</b>, and the <b>BEGIN</b> action contains no
|
|
<b>getline</b> function, <i>awk</i> shall exit without reading its input when the last statement in the last <b>BEGIN</b> action is
|
|
executed. If an <i>awk</i> program consists of only actions with the pattern <b>END</b> or only actions with the patterns
|
|
<b>BEGIN</b> and <b>END</b>, the input shall be read before the statements in the <b>END</b> actions are executed.</p>
|
|
|
|
<h5><a name="tag_04_06_13_07"></a>Expression Patterns</h5>
|
|
|
|
<p>An expression pattern shall be evaluated as if it were an expression in a Boolean context. If the result is true, the pattern
|
|
shall be considered to match, and the associated action (if any) shall be executed. If the result is false, the action shall not be
|
|
executed.</p>
|
|
|
|
<h5><a name="tag_04_06_13_08"></a>Pattern Ranges</h5>
|
|
|
|
<p>A pattern range consists of two expressions separated by a comma; in this case, the action shall be performed for all records
|
|
between a match of the first expression and the following match of the second expression, inclusive. At this point, the pattern
|
|
range can be repeated starting at input records subsequent to the end of the matched range.</p>
|
|
|
|
<h5><a name="tag_04_06_13_09"></a>Actions</h5>
|
|
|
|
<p>An action is a sequence of statements as shown in the grammar in <a href="#tag_04_06_13_16">Grammar</a> . Any single statement
|
|
can be replaced by a statement list enclosed in braces. The application shall ensure that statements in a statement list are
|
|
separated by <newline>s or semicolons. Statements in a statement list shall be executed sequentially in the order that they
|
|
appear.</p>
|
|
|
|
<p>The <i>expression</i> acting as the conditional in an <b>if</b> statement shall be evaluated and if it is non-zero or non-null,
|
|
the following statement shall be executed; otherwise, if <b>else</b> is present, the statement following the <b>else</b> shall be
|
|
executed.</p>
|
|
|
|
<p>The <b>if</b>, <b>while</b>, <b>do</b>... <b>while</b>, <b>for</b>, <b>break</b>, and <b>continue</b> statements are based on
|
|
the ISO C standard (see <a href="xcu_chap01.html#tag_01_07_02"><i>Concepts Derived from the ISO C Standard</i></a> ), except
|
|
that the Boolean expressions shall be treated as described in <a href="#tag_04_06_13_02">Expressions in awk</a> , and except in the
|
|
case of:</p>
|
|
|
|
<pre>
|
|
<tt>for (</tt><i>variable</i> <tt>in</tt> <i>array</i><tt>)
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>which shall iterate, assigning each <i>index</i> of <i>array</i> to <i>variable</i> in an unspecified order. The results of
|
|
adding new elements to <i>array</i> within such a <b>for</b> loop are undefined. If a <b>break</b> or <b>continue</b> statement
|
|
occurs outside of a loop, the behavior is undefined.</p>
|
|
|
|
<p>The <b>delete</b> statement shall remove an individual array element. Thus, the following code deletes an entire array:</p>
|
|
|
|
<pre>
|
|
<tt>for (index in array)
|
|
delete array[index]
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The <b>next</b> statement shall cause all further processing of the current input record to be abandoned. The behavior is
|
|
undefined if a <b>next</b> statement appears or is invoked in a <b>BEGIN</b> or <b>END</b> action.</p>
|
|
|
|
<p>The <b>exit</b> statement shall invoke all <b>END</b> actions in the order in which they occur in the program source and then
|
|
terminate the program without reading further input. An <b>exit</b> statement inside an <b>END</b> action shall terminate the
|
|
program without further execution of <b>END</b> actions. If an expression is specified in an <b>exit</b> statement, its numeric
|
|
value shall be the exit status of <i>awk</i>, unless subsequent errors are encountered or a subsequent <b>exit</b> statement with
|
|
an expression is executed.</p>
|
|
|
|
<h5><a name="tag_04_06_13_10"></a>Output Statements</h5>
|
|
|
|
<p>Both <b>print</b> and <b>printf</b> statements shall write to standard output by default. The output shall be written to the
|
|
location specified by <i>output_redirection</i> if one is supplied, as follows:</p>
|
|
|
|
<pre>
|
|
<tt>></tt> <i>expression</i><tt>>></tt> <i>expression</i><tt>|</tt> <i>expression</i>
|
|
</pre>
|
|
|
|
<p>In all cases, the <i>expression</i> shall be evaluated to produce a string that is used as a pathname into which to write (for
|
|
<tt>'>'</tt> or <tt>">>"</tt> ) or as a command to be executed (for <tt>'|'</tt> ). Using the first two forms, if the file
|
|
of that name is not currently open, it shall be opened, creating it if necessary and using the first form, truncating the file. The
|
|
output then shall be appended to the file. As long as the file remains open, subsequent calls in which <i>expression</i> evaluates
|
|
to the same string value shall simply append output to the file. The file remains open until the <b>close</b> function (see <a
|
|
href="#tag_04_06_13_14">Input/Output and General Functions</a> ) is called with an expression that evaluates to the same string
|
|
value.</p>
|
|
|
|
<p>The third form shall write output onto a stream piped to the input of a command. The stream shall be created if no stream is
|
|
currently open with the value of <i>expression</i> as its command name. The stream created shall be equivalent to one created by a
|
|
call to the <a href="../functions/popen.html"><i>popen</i>()</a> function defined in the System Interfaces volume of
|
|
IEEE Std 1003.1-2001 with the value of <i>expression</i> as the <i>command</i> argument and a value of <i>w</i> as the
|
|
<i>mode</i> argument. As long as the stream remains open, subsequent calls in which <i>expression</i> evaluates to the same string
|
|
value shall write output to the existing stream. The stream shall remain open until the <b>close</b> function (see <a href=
|
|
"#tag_04_06_13_14">Input/Output and General Functions</a> ) is called with an expression that evaluates to the same string value.
|
|
At that time, the stream shall be closed as if by a call to the <a href="../functions/pclose.html"><i>pclose</i>()</a> function
|
|
defined in the System Interfaces volume of IEEE Std 1003.1-2001.</p>
|
|
|
|
<p>As described in detail by the grammar in <a href="#tag_04_06_13_16">Grammar</a> , these output statements shall take a
|
|
comma-separated list of <i>expression</i>s referred to in the grammar by the non-terminal symbols <b>expr_list</b>,
|
|
<b>print_expr_list</b>, or <b>print_expr_list_opt</b>. This list is referred to here as the <i>expression list</i>, and each member
|
|
is referred to as an <i>expression argument</i>.</p>
|
|
|
|
<p>The <b>print</b> statement shall write the value of each expression argument onto the indicated output stream separated by the
|
|
current output field separator (see variable <b>OFS</b> above), and terminated by the output record separator (see variable
|
|
<b>ORS</b> above). All expression arguments shall be taken as strings, being converted if necessary; this conversion shall be as
|
|
described in <a href="#tag_04_06_13_02">Expressions in awk</a> , with the exception that the <b>printf</b> format in <b>OFMT</b>
|
|
shall be used instead of the value in <b>CONVFMT</b>. An empty expression list shall stand for the whole input record ($0).</p>
|
|
|
|
<p>The <b>printf</b> statement shall produce output based on a notation similar to the File Format Notation used to describe file
|
|
formats in this volume of IEEE Std 1003.1-2001 (see the Base Definitions volume of IEEE Std 1003.1-2001, <a
|
|
href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a>). Output shall be produced as specified with the first
|
|
<i>expression</i> argument as the string <i>format</i> and subsequent <i>expression</i> arguments as the strings <i>arg1</i> to
|
|
<i>argn</i>, inclusive, with the following exceptions:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>The <i>format</i> shall be an actual character string rather than a graphical representation. Therefore, it cannot contain empty
|
|
character positions. The <space> in the <i>format</i> string, in any context other than a <i>flag</i> of a conversion
|
|
specification, shall be treated as an ordinary character that is copied to the output.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If the character set contains a <tt>'<img src="../images/delta.gif" border="0">'</tt> character and that character appears in
|
|
the <i>format</i> string, it shall be treated as an ordinary character that is copied to the output.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <i>escape sequences</i> beginning with a backslash character shall be treated as sequences of ordinary characters that are
|
|
copied to the output. Note that these same sequences shall be interpreted lexically by <i>awk</i> when they appear in literal
|
|
strings, but they shall not be treated specially by the <b>printf</b> statement.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A <i>field width</i> or <i>precision</i> can be specified as the <tt>'*'</tt> character instead of a digit string. In this case
|
|
the next argument from the expression list shall be fetched and its numeric value taken as the field width or precision.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The implementation shall not precede or follow output from the <tt>d</tt> or <tt>u</tt> conversion specifier characters with
|
|
<blank>s not specified by the <i>format</i> string.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The implementation shall not precede output from the <tt>o</tt> conversion specifier character with leading zeros not specified
|
|
by the <i>format</i> string.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>For the <tt>c</tt> conversion specifier character: if the argument has a numeric value, the character whose encoding is that
|
|
value shall be output. If the value is zero or is not the encoding of any character in the character set, the behavior is
|
|
undefined. If the argument does not have a numeric value, the first character of the string value shall be output; if the string
|
|
does not contain any characters, the behavior is undefined.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>For each conversion specification that consumes an argument, the next expression argument shall be evaluated. With the exception
|
|
of the <tt>c</tt> conversion specifier character, the value shall be converted (according to the rules specified in <a href=
|
|
"#tag_04_06_13_02">Expressions in awk</a> ) to the appropriate type for the conversion specification.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If there are insufficient expression arguments to satisfy all the conversion specifications in the <i>format</i> string, the
|
|
behavior is undefined.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If any character sequence in the <i>format</i> string begins with a <tt>'%'</tt> character, but does not form a valid conversion
|
|
specification, the behavior is unspecified.</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>Both <b>print</b> and <b>printf</b> can output at least {LINE_MAX} bytes.</p>
|
|
|
|
<h5><a name="tag_04_06_13_11"></a>Functions</h5>
|
|
|
|
<p>The <i>awk</i> language has a variety of built-in functions: arithmetic, string, input/output, and general.</p>
|
|
|
|
<h5><a name="tag_04_06_13_12"></a>Arithmetic Functions</h5>
|
|
|
|
<p>The arithmetic functions, except for <b>int</b>, shall be based on the ISO C standard (see <a href=
|
|
"xcu_chap01.html#tag_01_07_02"><i>Concepts Derived from the ISO C Standard</i></a> ). The behavior is undefined in cases where the
|
|
ISO C standard specifies that an error be returned or that the behavior is undefined. Although the grammar (see <a href=
|
|
"#tag_04_06_13_16">Grammar</a> ) permits built-in functions to appear with no arguments or parentheses, unless the argument or
|
|
parentheses are indicated as optional in the following list (by displaying them within the <tt>"[]"</tt> brackets), such use is
|
|
undefined.</p>
|
|
|
|
<dl compact>
|
|
<dt><b>atan2</b>(<i>y</i>,<i>x</i>)</dt>
|
|
|
|
<dd>Return arctangent of <i>y</i>/<i>x</i> in radians in the range [-<img src="../images/pi.gif" border="0">,<img src=
|
|
"../images/pi.gif" border="0">].</dd>
|
|
|
|
<dt><b>cos</b>(<i>x</i>)</dt>
|
|
|
|
<dd>Return cosine of <i>x</i>, where <i>x</i> is in radians.</dd>
|
|
|
|
<dt><b>sin</b>(<i>x</i>)</dt>
|
|
|
|
<dd>Return sine of <i>x</i>, where <i>x</i> is in radians.</dd>
|
|
|
|
<dt><b>exp</b>(<i>x</i>)</dt>
|
|
|
|
<dd>Return the exponential function of <i>x</i>.</dd>
|
|
|
|
<dt><b>log</b>(<i>x</i>)</dt>
|
|
|
|
<dd>Return the natural logarithm of <i>x</i>.</dd>
|
|
|
|
<dt><b>sqrt</b>(<i>x</i>)</dt>
|
|
|
|
<dd>Return the square root of <i>x</i>.</dd>
|
|
|
|
<dt><b>int</b>(<i>x</i>)</dt>
|
|
|
|
<dd>Return the argument truncated to an integer. Truncation shall be toward 0 when <i>x</i>>0.</dd>
|
|
|
|
<dt><b>rand</b>()</dt>
|
|
|
|
<dd>Return a random number <i>n</i>, such that 0<=<i>n</i><1.</dd>
|
|
|
|
<dt><b>srand</b>(<b>[</b><i>expr</i><b>]</b>)</dt>
|
|
|
|
<dd>Set the seed value for <i>rand</i> to <i>expr</i> or use the time of day if <i>expr</i> is omitted. The previous seed value
|
|
shall be returned.</dd>
|
|
</dl>
|
|
|
|
<h5><a name="tag_04_06_13_13"></a>String Functions</h5>
|
|
|
|
<p>The string functions in the following list shall be supported. Although the grammar (see <a href="#tag_04_06_13_16">Grammar</a>
|
|
) permits built-in functions to appear with no arguments or parentheses, unless the argument or parentheses are indicated as
|
|
optional in the following list (by displaying them within the <tt>"[]"</tt> brackets), such use is undefined.</p>
|
|
|
|
<dl compact>
|
|
<dt><b>gsub</b>(<i>ere</i>, <i>repl</i><b>[</b>, <i>in</i><b>]</b>)</dt>
|
|
|
|
<dd>
|
|
Behave like <b>sub</b> (see below), except that it shall replace all occurrences of the regular expression (like the <a href=
|
|
"../utilities/ed.html"><i>ed</i></a> utility global substitute) in $0 or in the <i>in</i> argument, when specified.</dd>
|
|
|
|
<dt><b>index</b>(<i>s</i>, <i>t</i>)</dt>
|
|
|
|
<dd>Return the position, in characters, numbering from 1, in string <i>s</i> where string <i>t</i> first occurs, or zero if it does
|
|
not occur at all.</dd>
|
|
|
|
<dt><b>length[</b>(<b>[</b><i>s</i><b>]</b>)<b>]</b></dt>
|
|
|
|
<dd>Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument.</dd>
|
|
|
|
<dt><b>match</b>(<i>s</i>, <i>ere</i>)</dt>
|
|
|
|
<dd>Return the position, in characters, numbering from 1, in string <i>s</i> where the extended regular expression <i>ere</i>
|
|
occurs, or zero if it does not occur at all. RSTART shall be set to the starting position (which is the same as the returned
|
|
value), zero if no match is found; RLENGTH shall be set to the length of the matched string, -1 if no match is found.</dd>
|
|
|
|
<dt><b>split</b>(<i>s</i>, <i>a</i><b>[</b>, <i>fs </i> <b>]</b>)</dt>
|
|
|
|
<dd>
|
|
Split the string <i>s</i> into array elements <i>a</i>[1], <i>a</i>[2], ..., <i>a</i>[<i>n</i>], and return <i>n</i>. All elements
|
|
of the array shall be deleted before the split is performed. The separation shall be done with the ERE <i>fs</i> or with the field
|
|
separator <b>FS</b> if <i>fs</i> is not given. Each array element shall have a string value when created and, if appropriate, the
|
|
array element shall be considered a numeric string (see <a href="#tag_04_06_13_02">Expressions in awk</a> ). The effect of a null
|
|
string as the value of <i>fs</i> is unspecified.</dd>
|
|
|
|
<dt><b>sprintf</b>(<i>fmt</i>, <i>expr</i>, <i>expr</i>, ...)</dt>
|
|
|
|
<dd>
|
|
Format the expressions according to the <b>printf</b> format given by <i>fmt</i> and return the resulting string.</dd>
|
|
|
|
<dt><b>sub(</b><i>ere</i>, <i>repl</i><b>[</b>, <i>in </i> <b>]</b>)</dt>
|
|
|
|
<dd>
|
|
Substitute the string <i>repl</i> in place of the first instance of the extended regular expression <i>ERE</i> in string <i>in</i>
|
|
and return the number of substitutions. An ampersand ( <tt>'&'</tt> ) appearing in the string <i>repl</i> shall be replaced by
|
|
the string from <i>in</i> that matches the ERE. An ampersand preceded with a backslash ( <tt>'\'</tt> ) shall be interpreted as the
|
|
literal ampersand character. An occurrence of two consecutive backslashes shall be interpreted as just a single literal backslash
|
|
character. Any other occurrence of a backslash (for example, preceding any other character) shall be treated as a literal backslash
|
|
character. Note that if <i>repl</i> is a string literal (the lexical token <b>STRING</b>; see <a href=
|
|
"#tag_04_06_13_16">Grammar</a> ), the handling of the ampersand character occurs after any lexical processing, including any
|
|
lexical backslash escape sequence processing. If <i>in</i> is specified and it is not an lvalue (see <a href=
|
|
"#tag_04_06_13_02">Expressions in awk</a> ), the behavior is undefined. If <i>in</i> is omitted, <i>awk</i> shall use the current
|
|
record ($0) in its place.</dd>
|
|
|
|
<dt><b>substr</b>(<i>s</i>, <i>m</i><b>[</b>, <i>n </i> <b>]</b>)</dt>
|
|
|
|
<dd>
|
|
Return the at most <i>n</i>-character substring of <i>s</i> that begins at position <i>m</i>, numbering from 1. If <i>n</i> is
|
|
omitted, or if <i>n</i> specifies more characters than are left in the string, the length of the substring shall be limited by the
|
|
length of the string <i>s</i>.</dd>
|
|
|
|
<dt><b>tolower</b>(<i>s</i>)</dt>
|
|
|
|
<dd>Return a string based on the string <i>s</i>. Each character in <i>s</i> that is an uppercase letter specified to have a
|
|
<b>tolower</b> mapping by the <i>LC_CTYPE</i> category of the current locale shall be replaced in the returned string by the
|
|
lowercase letter specified by the mapping. Other characters in <i>s</i> shall be unchanged in the returned string.</dd>
|
|
|
|
<dt><b>toupper</b>(<i>s</i>)</dt>
|
|
|
|
<dd>Return a string based on the string <i>s</i>. Each character in <i>s</i> that is a lowercase letter specified to have a
|
|
<b>toupper</b> mapping by the <i>LC_CTYPE</i> category of the current locale is replaced in the returned string by the uppercase
|
|
letter specified by the mapping. Other characters in <i>s</i> are unchanged in the returned string.</dd>
|
|
</dl>
|
|
|
|
<p>All of the preceding functions that take <i>ERE</i> as a parameter expect a pattern or a string valued expression that is a
|
|
regular expression as defined in <a href="#tag_04_06_13_04">Regular Expressions</a> .</p>
|
|
|
|
<h5><a name="tag_04_06_13_14"></a>Input/Output and General Functions</h5>
|
|
|
|
<p>The input/output and general functions are:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>close</b>(<i>expression</i>)</dt>
|
|
|
|
<dd>
|
|
Close the file or pipe opened by a <b>print</b> or <b>printf</b> statement or a call to <b>getline</b> with the same string-valued
|
|
<i>expression</i>. The limit on the number of open <i>expression</i> arguments is implementation-defined. If the close was
|
|
successful, the function shall return zero; otherwise, it shall return non-zero.</dd>
|
|
|
|
<dt><i>expression | </i> <b>getline [</b><i>var</i><b>]</b></dt>
|
|
|
|
<dd>
|
|
Read a record of input from a stream piped from the output of a command. The stream shall be created if no stream is currently open
|
|
with the value of <i>expression</i> as its command name. The stream created shall be equivalent to one created by a call to the <a
|
|
href="../functions/popen.html"><i>popen</i>()</a> function with the value of <i>expression</i> as the <i>command</i> argument and a
|
|
value of <i>r</i> as the <i>mode</i> argument. As long as the stream remains open, subsequent calls in which <i>expression</i>
|
|
evaluates to the same string value shall read subsequent records from the stream. The stream shall remain open until the
|
|
<b>close</b> function is called with an expression that evaluates to the same string value. At that time, the stream shall be
|
|
closed as if by a call to the <a href="../functions/pclose.html"><i>pclose</i>()</a> function. If <i>var</i> is omitted, $0 and
|
|
<b>NF</b> shall be set; otherwise, <i>var</i> shall be set and, if appropriate, it shall be considered a numeric string (see <a
|
|
href="#tag_04_06_13_02">Expressions in awk</a> ).
|
|
|
|
<p>The <b>getline</b> operator can form ambiguous constructs when there are unparenthesized operators (including concatenate) to
|
|
the left of the <tt>'|'</tt> (to the beginning of the expression containing <b>getline</b>). In the context of the <tt>'$'</tt>
|
|
operator, <tt>'|'</tt> shall behave as if it had a lower precedence than <tt>'$'</tt> . The result of evaluating other operators is
|
|
unspecified, and conforming applications shall parenthesize properly all such usages.</p>
|
|
</dd>
|
|
|
|
<dt><b>getline</b></dt>
|
|
|
|
<dd>Set $0 to the next input record from the current input file. This form of <b>getline</b> shall set the <b>NF</b>, <b>NR</b>,
|
|
and <b>FNR</b> variables.</dd>
|
|
|
|
<dt><b>getline </b> <i>var</i></dt>
|
|
|
|
<dd>Set variable <i>var</i> to the next input record from the current input file and, if appropriate, <i>var</i> shall be
|
|
considered a numeric string (see <a href="#tag_04_06_13_02">Expressions in awk</a> ). This form of <b>getline</b> shall set the
|
|
<b>FNR</b> and <b>NR</b> variables.</dd>
|
|
|
|
<dt><b>getline [</b><i>var</i><b>] </b> < <i>expression</i></dt>
|
|
|
|
<dd>
|
|
Read the next record of input from a named file. The <i>expression</i> shall be evaluated to produce a string that is used as a
|
|
pathname. If the file of that name is not currently open, it shall be opened. As long as the stream remains open, subsequent calls
|
|
in which <i>expression</i> evaluates to the same string value shall read subsequent records from the file. The file shall remain
|
|
open until the <b>close</b> function is called with an expression that evaluates to the same string value. If <i>var</i> is
|
|
omitted, $0 and <b>NF</b> shall be set; otherwise, <i>var</i> shall be set and, if appropriate, it shall be considered a numeric
|
|
string (see <a href="#tag_04_06_13_02">Expressions in awk</a> ).
|
|
|
|
<p>The <b>getline</b> operator can form ambiguous constructs when there are unparenthesized binary operators (including
|
|
concatenate) to the right of the <tt>'<'</tt> (up to the end of the expression containing the <b>getline</b>). The result of
|
|
evaluating such a construct is unspecified, and conforming applications shall parenthesize properly all such usages.</p>
|
|
</dd>
|
|
|
|
<dt><b>system</b>(<i>expression</i>)</dt>
|
|
|
|
<dd>
|
|
Execute the command given by <i>expression</i> in a manner equivalent to the <a href="../functions/system.html"><i>system</i>()</a>
|
|
function defined in the System Interfaces volume of IEEE Std 1003.1-2001 and return the exit status of the command.</dd>
|
|
</dl>
|
|
|
|
<p>All forms of <b>getline</b> shall return 1 for successful input, zero for end-of-file, and -1 for an error.</p>
|
|
|
|
<p>Where strings are used as the name of a file or pipeline, the application shall ensure that the strings are textually identical.
|
|
The terminology "same string value" implies that "equivalent strings", even those that differ only by <space>s, represent
|
|
different files.</p>
|
|
|
|
<h5><a name="tag_04_06_13_15"></a>User-Defined Functions</h5>
|
|
|
|
<p>The <i>awk</i> language also provides user-defined functions. Such functions can be defined as:</p>
|
|
|
|
<pre>
|
|
<tt>function</tt> <i>name</i><tt>(</tt><b>[</b><i>parameter</i><tt>, ...</tt><b>]</b><tt>) {</tt> <i>statements</i> <tt>}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>A function can be referred to anywhere in an <i>awk</i> program; in particular, its use can precede its definition. The scope of
|
|
a function is global.</p>
|
|
|
|
<p>Function parameters, if present, can be either scalars or arrays; the behavior is undefined if an array name is passed as a
|
|
parameter that the function uses as a scalar, or if a scalar expression is passed as a parameter that the function uses as an
|
|
array. Function parameters shall be passed by value if scalar and by reference if array name.</p>
|
|
|
|
<p>The number of parameters in the function definition need not match the number of parameters in the function call. Excess formal
|
|
parameters can be used as local variables. If fewer arguments are supplied in a function call than are in the function definition,
|
|
the extra parameters that are used in the function body as scalars shall evaluate to the uninitialized value until they are
|
|
otherwise initialized, and the extra parameters that are used in the function body as arrays shall be treated as uninitialized
|
|
arrays where each element evaluates to the uninitialized value until otherwise initialized.</p>
|
|
|
|
<p>When invoking a function, no white space can be placed between the function name and the opening parenthesis. Function calls can
|
|
be nested and recursive calls can be made upon functions. Upon return from any nested or recursive function call, the values of all
|
|
of the calling function's parameters shall be unchanged, except for array parameters passed by reference. The <b>return</b>
|
|
statement can be used to return a value. If a <b>return</b> statement appears outside of a function definition, the behavior is
|
|
undefined.</p>
|
|
|
|
<p>In the function definition, <newline>s shall be optional before the opening brace and after the closing brace. Function
|
|
definitions can appear anywhere in the program where a <i>pattern-action</i> pair is allowed.</p>
|
|
|
|
<h5><a name="tag_04_06_13_16"></a>Grammar</h5>
|
|
|
|
<p>The grammar in this section and the lexical conventions in the following section shall together describe the syntax for
|
|
<i>awk</i> programs. The general conventions for this style of grammar are described in <a href=
|
|
"xcu_chap01.html#tag_01_10"><i>Grammar Conventions</i></a> . A valid program can be represented as the non-terminal symbol
|
|
<i>program</i> in the grammar. This formal syntax shall take precedence over the preceding text syntax description.</p>
|
|
|
|
<pre>
|
|
<tt>%token NAME NUMBER STRING ERE
|
|
%token FUNC_NAME /* Name followed by '(' without white space. */
|
|
<br>
|
|
/* Keywords */
|
|
%token Begin End
|
|
/* 'BEGIN' 'END' */
|
|
<br>
|
|
%token Break Continue Delete Do Else
|
|
/* 'break' 'continue' 'delete' 'do' 'else' */
|
|
<br>
|
|
%token Exit For Function If In
|
|
/* 'exit' 'for' 'function' 'if' 'in' */
|
|
<br>
|
|
%token Next Print Printf Return While
|
|
/* 'next' 'print' 'printf' 'return' 'while' */
|
|
<br>
|
|
/* Reserved function names */
|
|
%token BUILTIN_FUNC_NAME
|
|
/* One token for the following:
|
|
* atan2 cos sin exp log sqrt int rand srand
|
|
* gsub index length match split sprintf sub
|
|
* substr tolower toupper close system
|
|
*/
|
|
%token GETLINE
|
|
/* Syntactically different from other built-ins. */
|
|
<br>
|
|
/* Two-character tokens. */
|
|
%token ADD_ASSIGN SUB_ASSIGN MUL_ASSIGN DIV_ASSIGN MOD_ASSIGN POW_ASSIGN
|
|
/* '+=' '-=' '*=' '/=' '%=' '^=' */
|
|
<br>
|
|
%token OR AND NO_MATCH EQ LE GE NE INCR DECR APPEND
|
|
/* '||' '&&' '!˜' '==' '<=' '>=' '!=' '++' '--' '>>' */
|
|
<br>
|
|
/* One-character tokens. */
|
|
%token '{' '}' '(' ')' '[' ']' ',' ';' NEWLINE
|
|
%token '+' '-' '*' '%' '^' '!' '>' '<' '|' '?' ':' '˜' '$' '='
|
|
<br>
|
|
%start program
|
|
%%
|
|
<br>
|
|
program : item_list
|
|
| actionless_item_list
|
|
;
|
|
<br>
|
|
item_list : newline_opt
|
|
| actionless_item_list item terminator
|
|
| item_list item terminator
|
|
| item_list action terminator
|
|
;
|
|
<br>
|
|
actionless_item_list : item_list pattern terminator
|
|
| actionless_item_list pattern terminator
|
|
;
|
|
<br>
|
|
item : pattern action
|
|
| Function NAME '(' param_list_opt ')'
|
|
newline_opt action
|
|
| Function FUNC_NAME '(' param_list_opt ')'
|
|
newline_opt action
|
|
;
|
|
<br>
|
|
param_list_opt : /* empty */
|
|
| param_list
|
|
;
|
|
<br>
|
|
param_list : NAME
|
|
| param_list ',' NAME
|
|
;
|
|
<br>
|
|
pattern : Begin
|
|
| End
|
|
| expr
|
|
| expr ',' newline_opt expr
|
|
;
|
|
<br>
|
|
action : '{' newline_opt '}'
|
|
| '{' newline_opt terminated_statement_list '}'
|
|
| '{' newline_opt unterminated_statement_list '}'
|
|
;
|
|
<br>
|
|
terminator : terminator ';'
|
|
| terminator NEWLINE
|
|
| ';'
|
|
| NEWLINE
|
|
;
|
|
<br>
|
|
terminated_statement_list : terminated_statement
|
|
| terminated_statement_list terminated_statement
|
|
;
|
|
<br>
|
|
unterminated_statement_list : unterminated_statement
|
|
| terminated_statement_list unterminated_statement
|
|
;
|
|
<br>
|
|
terminated_statement : action newline_opt
|
|
| If '(' expr ')' newline_opt terminated_statement
|
|
| If '(' expr ')' newline_opt terminated_statement
|
|
Else newline_opt terminated_statement
|
|
| While '(' expr ')' newline_opt terminated_statement
|
|
| For '(' simple_statement_opt ';'
|
|
expr_opt ';' simple_statement_opt ')' newline_opt
|
|
terminated_statement
|
|
| For '(' NAME In NAME ')' newline_opt
|
|
terminated_statement
|
|
| ';' newline_opt
|
|
| terminatable_statement NEWLINE newline_opt
|
|
| terminatable_statement ';' newline_opt
|
|
;
|
|
<br>
|
|
unterminated_statement : terminatable_statement
|
|
| If '(' expr ')' newline_opt unterminated_statement
|
|
| If '(' expr ')' newline_opt terminated_statement
|
|
Else newline_opt unterminated_statement
|
|
| While '(' expr ')' newline_opt unterminated_statement
|
|
| For '(' simple_statement_opt ';'
|
|
expr_opt ';' simple_statement_opt ')' newline_opt
|
|
unterminated_statement
|
|
| For '(' NAME In NAME ')' newline_opt
|
|
unterminated_statement
|
|
;
|
|
<br>
|
|
terminatable_statement : simple_statement
|
|
| Break
|
|
| Continue
|
|
| Next
|
|
| Exit expr_opt
|
|
| Return expr_opt
|
|
| Do newline_opt terminated_statement While '(' expr ')'
|
|
;
|
|
<br>
|
|
simple_statement_opt : /* empty */
|
|
| simple_statement
|
|
;
|
|
<br>
|
|
simple_statement : Delete NAME '[' expr_list ']'
|
|
| expr
|
|
| print_statement
|
|
;
|
|
<br>
|
|
print_statement : simple_print_statement
|
|
| simple_print_statement output_redirection
|
|
;
|
|
<br>
|
|
simple_print_statement : Print print_expr_list_opt
|
|
| Print '(' multiple_expr_list ')'
|
|
| Printf print_expr_list
|
|
| Printf '(' multiple_expr_list ')'
|
|
;
|
|
<br>
|
|
output_redirection : '>' expr
|
|
| APPEND expr
|
|
| '|' expr
|
|
;
|
|
<br>
|
|
expr_list_opt : /* empty */
|
|
| expr_list
|
|
;
|
|
<br>
|
|
expr_list : expr
|
|
| multiple_expr_list
|
|
;
|
|
<br>
|
|
multiple_expr_list : expr ',' newline_opt expr
|
|
| multiple_expr_list ',' newline_opt expr
|
|
;
|
|
<br>
|
|
expr_opt : /* empty */
|
|
| expr
|
|
;
|
|
<br>
|
|
expr : unary_expr
|
|
| non_unary_expr
|
|
;
|
|
<br>
|
|
unary_expr : '+' expr
|
|
| '-' expr
|
|
| unary_expr '^' expr
|
|
| unary_expr '*' expr
|
|
| unary_expr '/' expr
|
|
| unary_expr '%' expr
|
|
| unary_expr '+' expr
|
|
| unary_expr '-' expr
|
|
| unary_expr non_unary_expr
|
|
| unary_expr '<' expr
|
|
| unary_expr LE expr
|
|
| unary_expr NE expr
|
|
| unary_expr EQ expr
|
|
| unary_expr '>' expr
|
|
| unary_expr GE expr
|
|
| unary_expr '˜' expr
|
|
| unary_expr NO_MATCH expr
|
|
| unary_expr In NAME
|
|
| unary_expr AND newline_opt expr
|
|
| unary_expr OR newline_opt expr
|
|
| unary_expr '?' expr ':' expr
|
|
| unary_input_function
|
|
;
|
|
<br>
|
|
non_unary_expr : '(' expr ')'
|
|
| '!' expr
|
|
| non_unary_expr '^' expr
|
|
| non_unary_expr '*' expr
|
|
| non_unary_expr '/' expr
|
|
| non_unary_expr '%' expr
|
|
| non_unary_expr '+' expr
|
|
| non_unary_expr '-' expr
|
|
| non_unary_expr non_unary_expr
|
|
| non_unary_expr '<' expr
|
|
| non_unary_expr LE expr
|
|
| non_unary_expr NE expr
|
|
| non_unary_expr EQ expr
|
|
| non_unary_expr '>' expr
|
|
| non_unary_expr GE expr
|
|
| non_unary_expr '˜' expr
|
|
| non_unary_expr NO_MATCH expr
|
|
| non_unary_expr In NAME
|
|
| '(' multiple_expr_list ')' In NAME
|
|
| non_unary_expr AND newline_opt expr
|
|
| non_unary_expr OR newline_opt expr
|
|
| non_unary_expr '?' expr ':' expr
|
|
| NUMBER
|
|
| STRING
|
|
| lvalue
|
|
| ERE
|
|
| lvalue INCR
|
|
| lvalue DECR
|
|
| INCR lvalue
|
|
| DECR lvalue
|
|
| lvalue POW_ASSIGN expr
|
|
| lvalue MOD_ASSIGN expr
|
|
| lvalue MUL_ASSIGN expr
|
|
| lvalue DIV_ASSIGN expr
|
|
| lvalue ADD_ASSIGN expr
|
|
| lvalue SUB_ASSIGN expr
|
|
| lvalue '=' expr
|
|
| FUNC_NAME '(' expr_list_opt ')'
|
|
/* no white space allowed before '(' */
|
|
| BUILTIN_FUNC_NAME '(' expr_list_opt ')'
|
|
| BUILTIN_FUNC_NAME
|
|
| non_unary_input_function
|
|
;
|
|
<br>
|
|
print_expr_list_opt : /* empty */
|
|
| print_expr_list
|
|
;
|
|
<br>
|
|
print_expr_list : print_expr
|
|
| print_expr_list ',' newline_opt print_expr
|
|
;
|
|
<br>
|
|
print_expr : unary_print_expr
|
|
| non_unary_print_expr
|
|
;
|
|
<br>
|
|
unary_print_expr : '+' print_expr
|
|
| '-' print_expr
|
|
| unary_print_expr '^' print_expr
|
|
| unary_print_expr '*' print_expr
|
|
| unary_print_expr '/' print_expr
|
|
| unary_print_expr '%' print_expr
|
|
| unary_print_expr '+' print_expr
|
|
| unary_print_expr '-' print_expr
|
|
| unary_print_expr non_unary_print_expr
|
|
| unary_print_expr '˜' print_expr
|
|
| unary_print_expr NO_MATCH print_expr
|
|
| unary_print_expr In NAME
|
|
| unary_print_expr AND newline_opt print_expr
|
|
| unary_print_expr OR newline_opt print_expr
|
|
| unary_print_expr '?' print_expr ':' print_expr
|
|
;
|
|
<br>
|
|
non_unary_print_expr : '(' expr ')'
|
|
| '!' print_expr
|
|
| non_unary_print_expr '^' print_expr
|
|
| non_unary_print_expr '*' print_expr
|
|
| non_unary_print_expr '/' print_expr
|
|
| non_unary_print_expr '%' print_expr
|
|
| non_unary_print_expr '+' print_expr
|
|
| non_unary_print_expr '-' print_expr
|
|
| non_unary_print_expr non_unary_print_expr
|
|
| non_unary_print_expr '˜' print_expr
|
|
| non_unary_print_expr NO_MATCH print_expr
|
|
| non_unary_print_expr In NAME
|
|
| '(' multiple_expr_list ')' In NAME
|
|
| non_unary_print_expr AND newline_opt print_expr
|
|
| non_unary_print_expr OR newline_opt print_expr
|
|
| non_unary_print_expr '?' print_expr ':' print_expr
|
|
| NUMBER
|
|
| STRING
|
|
| lvalue
|
|
| ERE
|
|
| lvalue INCR
|
|
| lvalue DECR
|
|
| INCR lvalue
|
|
| DECR lvalue
|
|
| lvalue POW_ASSIGN print_expr
|
|
| lvalue MOD_ASSIGN print_expr
|
|
| lvalue MUL_ASSIGN print_expr
|
|
| lvalue DIV_ASSIGN print_expr
|
|
| lvalue ADD_ASSIGN print_expr
|
|
| lvalue SUB_ASSIGN print_expr
|
|
| lvalue '=' print_expr
|
|
| FUNC_NAME '(' expr_list_opt ')'
|
|
/* no white space allowed before '(' */
|
|
| BUILTIN_FUNC_NAME '(' expr_list_opt ')'
|
|
| BUILTIN_FUNC_NAME
|
|
;
|
|
<br>
|
|
lvalue : NAME
|
|
| NAME '[' expr_list ']'
|
|
| '$' expr
|
|
;
|
|
<br>
|
|
non_unary_input_function : simple_get
|
|
| simple_get '<' expr
|
|
| non_unary_expr '|' simple_get
|
|
;
|
|
<br>
|
|
unary_input_function : unary_expr '|' simple_get
|
|
;
|
|
<br>
|
|
simple_get : GETLINE
|
|
| GETLINE lvalue
|
|
;
|
|
<br>
|
|
newline_opt : /* empty */
|
|
| newline_opt NEWLINE
|
|
;
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>This grammar has several ambiguities that shall be resolved as follows:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Operator precedence and associativity shall be as described in <a href="#tagtcjh_10">Expressions in Decreasing Precedence in <i>awk</i></a> .</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>In case of ambiguity, an <b>else</b> shall be associated with the most immediately preceding <b>if</b> that would satisfy the
|
|
grammar.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>In some contexts, a slash ( <tt>'/'</tt> ) that is used to surround an ERE could also be the division operator. This shall be
|
|
resolved in such a way that wherever the division operator could appear, a slash is assumed to be the division operator. (There is
|
|
no unary division operator.)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>One convention that might not be obvious from the formal grammar is where <newline>s are acceptable. There are several
|
|
obvious placements such as terminating a statement, and a backslash can be used to escape <newline>s between any lexical
|
|
tokens. In addition, <newline>s without backslashes can follow a comma, an open brace, logical AND operator (
|
|
<tt>"&&"</tt> ), logical OR operator ( <tt>"||"</tt> ), the <b>do</b> keyword, the <b>else</b> keyword, and the closing
|
|
parenthesis of an <b>if</b>, <b>for</b>, or <b>while</b> statement. For example:</p>
|
|
|
|
<pre>
|
|
<tt>{ print $1,
|
|
$2 }
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_04_06_13_17"></a>Lexical Conventions</h5>
|
|
|
|
<p>The lexical conventions for <i>awk</i> programs, with respect to the preceding grammar, shall be as follows:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Except as noted, <i>awk</i> shall recognize the longest possible token or delimiter beginning at a given point.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A comment shall consist of any characters beginning with the number sign character and terminated by, but excluding the next
|
|
occurrence of, a <newline>. Comments shall have no effect, except to delimit lexical tokens.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <newline> shall be recognized as the token <b>NEWLINE</b>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A backslash character immediately followed by a <newline> shall have no effect.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The token <b>STRING</b> shall represent a string constant. A string constant shall begin with the character <tt>' .'</tt> Within
|
|
a string constant, a backslash character shall be considered to begin an escape sequence as specified in the table in the Base
|
|
Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a> (
|
|
<tt>'\\'</tt> , <tt>'\a'</tt> , <tt>'\b'</tt> , <tt>'\f'</tt> , <tt>'\n'</tt> , <tt>'\r'</tt> , <tt>'\t'</tt> , <tt>'\v'</tt> ). In
|
|
addition, the escape sequences in <a href="#tagtcjh_10">Expressions in Decreasing Precedence in <i>awk</i></a> shall be recognized. A <newline> shall not
|
|
occur within a string constant. A string constant shall be terminated by the first unescaped occurrence of the character
|
|
<tt>''</tt> after the one that begins the string constant. The value of the string shall be the sequence of all unescaped
|
|
characters and values of escape sequences between, but not including, the two delimiting <tt>''</tt> characters.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The token <b>ERE</b> represents an extended regular expression constant. An ERE constant shall begin with the slash character.
|
|
Within an ERE constant, a backslash character shall be considered to begin an escape sequence as specified in the table in the Base
|
|
Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a>. In
|
|
addition, the escape sequences in <a href="#tagtcjh_10">Expressions in Decreasing Precedence in <i>awk</i></a> shall be recognized. The application shall
|
|
ensure that a <newline> does not occur within an ERE constant. An ERE constant shall be terminated by the first unescaped
|
|
occurrence of the slash character after the one that begins the ERE constant. The extended regular expression represented by the
|
|
ERE constant shall be the sequence of all unescaped characters and values of escape sequences between, but not including, the two
|
|
delimiting slash characters.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A <blank> shall have no effect, except to delimit lexical tokens or within <b>STRING</b> or <b>ERE</b> tokens.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The token <b>NUMBER</b> shall represent a numeric constant. Its form and numeric value shall be equivalent to either of the
|
|
tokens <b>floating-constant</b> or <b>integer-constant</b> as specified by the ISO C standard, with the following
|
|
exceptions:</p>
|
|
|
|
<ol type="a">
|
|
<li>
|
|
<p>An integer constant cannot begin with 0x or include the hexadecimal digits <tt>'a'</tt> , <tt>'b'</tt> , <tt>'c'</tt> ,
|
|
<tt>'d'</tt> , <tt>'e'</tt> , <tt>'f'</tt> , <tt>'A'</tt> , <tt>'B'</tt> , <tt>'C'</tt> , <tt>'D'</tt> , <tt>'E'</tt> , or
|
|
<tt>'F'</tt> .</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The value of an integer constant beginning with 0 shall be taken in decimal rather than octal.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>An integer constant cannot include a suffix ( <tt>'u'</tt> , <tt>'U'</tt> , <tt>'l'</tt> , or <tt>'L'</tt> ).</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A floating constant cannot include a suffix ( <tt>'f'</tt> , <tt>'F'</tt> , <tt>'l'</tt> , or <tt>'L'</tt> ).</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>If the value is too large or too small to be representable (see <a href="xcu_chap01.html#tag_01_07_02"><i>Concepts Derived from
|
|
the ISO C Standard</i></a> ), the behavior is undefined.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>A sequence of underscores, digits, and alphabetics from the portable character set (see the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap06.html#tag_06_01">Section 6.1, Portable Character Set</a>), beginning
|
|
with an underscore or alphabetic, shall be considered a word.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The following words are keywords that shall be recognized as individual tokens; the name of the token is the same as the
|
|
keyword:</p>
|
|
|
|
<blockquote>
|
|
<table cellpadding="3">
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
BEGIN<br>
|
|
break<br>
|
|
continue<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
delete<br>
|
|
do<br>
|
|
else<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
END<br>
|
|
exit<br>
|
|
for<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
function<br>
|
|
getline<br>
|
|
if<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
in<br>
|
|
next<br>
|
|
print<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
printf<br>
|
|
return<br>
|
|
while<br>
|
|
</b></p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</blockquote>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The following words are names of built-in functions and shall be recognized as the token <b>BUILTIN_FUNC_NAME</b>:</p>
|
|
|
|
<blockquote>
|
|
<table cellpadding="3">
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
atan2<br>
|
|
close<br>
|
|
cos<br>
|
|
exp<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
gsub<br>
|
|
index<br>
|
|
int<br>
|
|
length<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
log<br>
|
|
match<br>
|
|
rand<br>
|
|
sin<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
split<br>
|
|
sprintf<br>
|
|
sqrt<br>
|
|
srand<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
sub<br>
|
|
substr<br>
|
|
system<br>
|
|
tolower<br>
|
|
</b></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b><br>
|
|
toupper<br>
|
|
</b></p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</blockquote>
|
|
|
|
<p>The above-listed keywords and names of built-in functions are considered reserved words.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The token <b>NAME</b> shall consist of a word that is not a keyword or a name of a built-in function and is not followed
|
|
immediately (without any delimiters) by the <tt>'('</tt> character.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The token <b>FUNC_NAME</b> shall consist of a word that is not a keyword or a name of a built-in function, followed immediately
|
|
(without any delimiters) by the <tt>'('</tt> character. The <tt>'('</tt> character shall not be included as part of the token.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The following two-character sequences shall be recognized as the named tokens:</p>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Token Name</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Sequence</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Token Name</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Sequence</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>ADD_ASSIGN</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">+=</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>NO_MATCH</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">!˜</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>SUB_ASSIGN</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">-=</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>EQ</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">==</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>MUL_ASSIGN</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">*=</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>LE</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent"><=</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>DIV_ASSIGN</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">/=</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>GE</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">>=</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>MOD_ASSIGN</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">%=</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>NE</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">!=</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>POW_ASSIGN</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">^=</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>INCR</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">++</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>OR</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">||</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>DECR</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">--</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent"><b>AND</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">&&</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"><b>APPEND</b></p>
|
|
</td>
|
|
<td align="center">
|
|
<p class="tent">>></p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The following single characters shall be recognized as tokens whose names are the character:</p>
|
|
|
|
<pre>
|
|
<tt><newline> { } ( ) [ ] , ; + - * % ^ ! > < | ? : ˜ $ =
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>There is a lexical ambiguity between the token <b>ERE</b> and the tokens <tt>'/'</tt> and <b>DIV_ASSIGN</b>. When an input
|
|
sequence begins with a slash character in any syntactic context where the token <tt>'/'</tt> or <b>DIV_ASSIGN</b> could appear as
|
|
the next token in a valid program, the longer of those two tokens that can be recognized shall be recognized. In any other
|
|
syntactic context where the token <b>ERE</b> could appear as the next token in a valid program, the token <b>ERE</b> shall be
|
|
recognized.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_14"></a>EXIT STATUS</h4>
|
|
|
|
<blockquote>
|
|
<p>The following exit values shall be returned:</p>
|
|
|
|
<dl compact>
|
|
<dt> 0</dt>
|
|
|
|
<dd>All input files were processed successfully.</dd>
|
|
|
|
<dt>>0</dt>
|
|
|
|
<dd>An error occurred.</dd>
|
|
</dl>
|
|
|
|
<p>The exit status can be altered within the program by using an <b>exit</b> expression.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_15"></a>CONSEQUENCES OF ERRORS</h4>
|
|
|
|
<blockquote>
|
|
<p>If any <i>file</i> operand is specified and the named file cannot be accessed, <i>awk</i> shall write a diagnostic message to
|
|
standard error and terminate without any further action.</p>
|
|
|
|
<p>If the program specified by either the <i>program</i> operand or a <i>progfile</i> operand is not a valid <i>awk</i> program (as
|
|
specified in the EXTENDED DESCRIPTION section), the behavior is undefined.</p>
|
|
</blockquote>
|
|
|
|
<hr>
|
|
<div class="box"><em>The following sections are informative.</em></div>
|
|
|
|
<h4><a name="tag_04_06_16"></a>APPLICATION USAGE</h4>
|
|
|
|
<blockquote>
|
|
<p>The <b>index</b>, <b>length</b>, <b>match</b>, and <b>substr</b> functions should not be confused with similar functions in the
|
|
ISO C standard; the <i>awk</i> versions deal with characters, while the ISO C standard deals with bytes.</p>
|
|
|
|
<p>Because the concatenation operation is represented by adjacent expressions rather than an explicit operator, it is often
|
|
necessary to use parentheses to enforce the proper evaluation precedence.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_17"></a>EXAMPLES</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>awk</i> program specified in the command line is most easily specified within single-quotes (for example,
|
|
'<i>program</i>') for applications using <a href="../utilities/sh.html"><i>sh</i></a>, because <i>awk</i> programs commonly contain
|
|
characters that are special to the shell, including double-quotes. In the cases where an <i>awk</i> program contains single-quote
|
|
characters, it is usually easiest to specify most of the program as strings within single-quotes concatenated by the shell with
|
|
quoted single-quote characters. For example:</p>
|
|
|
|
<pre>
|
|
<tt>awk '/'\''/ { print "quote:", $0 }'
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>prints all lines from the standard input containing a single-quote character, prefixed with <i>quote</i>:.</p>
|
|
|
|
<p>The following are examples of simple <i>awk</i> programs:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Write to the standard output all input lines for which field 3 is greater than 5:</p>
|
|
|
|
<pre>
|
|
<tt>$3 > 5
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write every tenth line:</p>
|
|
|
|
<pre>
|
|
<tt>(NR % 10) == 0
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write any line with a substring matching the regular expression:</p>
|
|
|
|
<pre>
|
|
<tt>/(G|D)(2[0-9][[:alpha:]]*)/
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Print any line with a substring containing a <tt>'G'</tt> or <tt>'D'</tt> , followed by a sequence of digits and characters.
|
|
This example uses character classes <b>digit</b> and <b>alpha</b> to match language-independent digit and alphabetic characters
|
|
respectively:</p>
|
|
|
|
<pre>
|
|
<tt>/(G|D)([[:digit:][:alpha:]]*)/
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write any line in which the second field matches the regular expression and the fourth field does not:</p>
|
|
|
|
<pre>
|
|
<tt>$2 ˜ /xyz/ && $4 !˜ /xyz/
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write any line in which the second field contains a backslash:</p>
|
|
|
|
<pre>
|
|
<tt>$2 ˜ /\\/
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write any line in which the second field contains a backslash. Note that backslash escapes are interpreted twice; once in
|
|
lexical processing of the string and once in processing the regular expression:</p>
|
|
|
|
<pre>
|
|
<tt>$2 ˜ "\\\\"
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write the second to the last and the last field in each line. Separate the fields by a colon:</p>
|
|
|
|
<pre>
|
|
<tt>{OFS=":";print $(NF-1), $NF}
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write the line number and number of fields in each line. The three strings representing the line number, the colon, and the
|
|
number of fields are concatenated and that string is written to standard output:</p>
|
|
|
|
<pre>
|
|
<tt>{print NR ":" NF}
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write lines longer than 72 characters:</p>
|
|
|
|
<pre>
|
|
<tt>length($0) > 72
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write the first two fields in opposite order separated by <b>OFS</b>:</p>
|
|
|
|
<pre>
|
|
<tt>{ print $2, $1 }
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Same, with input fields separated by a comma or <space>s and <tab>s, or both:</p>
|
|
|
|
<pre>
|
|
<tt>BEGIN { FS = ",[ \t]*|[ \t]+" }
|
|
{ print $2, $1 }
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Add up the first column, print sum, and average:</p>
|
|
|
|
<pre>
|
|
<tt> {s += $1 }
|
|
END {print "sum is ", s, " average is", s/NR}
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write fields in reverse order, one per line (many lines out for each line in):</p>
|
|
|
|
<pre>
|
|
<tt>{ for (i = NF; i > 0; --i) print $i }
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write all lines between occurrences of the strings <b>start</b> and <b>stop</b>:</p>
|
|
|
|
<pre>
|
|
<tt>/start/, /stop/
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write all lines whose first field is different from the previous one:</p>
|
|
|
|
<pre>
|
|
<tt>$1 != prev { print; prev = $1 }
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Simulate <a href="../utilities/echo.html"><i>echo</i></a>:</p>
|
|
|
|
<pre>
|
|
<tt>BEGIN {
|
|
for (i = 1; i < ARGC; ++i)
|
|
printf("%s%s", ARGV[i], i==ARGC-1?"\n":" ")
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Write the path prefixes contained in the <i>PATH</i> environment variable, one per line:</p>
|
|
|
|
<pre>
|
|
<tt>BEGIN {
|
|
n = split (ENVIRON["PATH"], path, ":")
|
|
for (i = 1; i <= n; ++i)
|
|
print path[i]
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If there is a file named <b>input</b> containing page headers of the form:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
Page #
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>and a file named <b>program</b> that contains:</p>
|
|
|
|
<pre>
|
|
<tt>/Page/ { $2 = n++; }
|
|
{ print }
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>then the command line:</p>
|
|
|
|
<pre>
|
|
<tt>awk -f program n=5 input
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>prints the file <b>input</b>, filling in page numbers starting at 5.</p>
|
|
</li>
|
|
</ol>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_18"></a>RATIONALE</h4>
|
|
|
|
<blockquote>
|
|
<p>This description is based on the new <i>awk</i>, "nawk", (see the referenced <i>The AWK Programming Language</i>), which
|
|
introduced a number of new features to the historical <i>awk</i>:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>New keywords: <b>delete</b>, <b>do</b>, <b>function</b>, <b>return</b></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>New built-in functions: <b>atan2</b>, <b>close</b>, <b>cos</b>, <b>gsub</b>, <b>match</b>, <b>rand</b>, <b>sin</b>,
|
|
<b>srand</b>, <b>sub</b>, <b>system</b></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>New predefined variables: <b>FNR</b>, <b>ARGC</b>, <b>ARGV</b>, <b>RSTART</b>, <b>RLENGTH</b>, <b>SUBSEP</b></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>New expression operators: <b>?</b>, <b>:</b>, <b>,</b>, <b>^</b></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <b>FS</b> variable and the third argument to <b>split</b>, now treated as extended regular expressions.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The operator precedence, changed to more closely match the C language. Two examples of code that operate differently are:</p>
|
|
|
|
<pre>
|
|
<tt>while ( n /= 10 > 1) ...
|
|
if (!"wk" ˜ /bwk/) ...
|
|
</tt>
|
|
</pre>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>Several features have been added based on newer implementations of <i>awk</i>:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Multiple instances of <b>-f</b> <i>progfile</i> are permitted.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The new option <b>-v</b> <i>assignment.</i></p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The new predefined variable <b>ENVIRON</b>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>New built-in functions <b>toupper</b> and <b>tolower</b>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>More formatting capabilities are added to <b>printf</b> to match the ISO C standard.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>The overall <i>awk</i> syntax has always been based on the C language, with a few features from the shell command language and
|
|
other sources. Because of this, it is not completely compatible with any other language, which has caused confusion for some users.
|
|
It is not the intent of the standard developers to address such issues. A few relatively minor changes toward making the language
|
|
more compatible with the ISO C standard were made; most of these changes are based on similar changes in recent
|
|
implementations, as described above. There remain several C-language conventions that are not in <i>awk</i>. One of the notable
|
|
ones is the comma operator, which is commonly used to specify multiple expressions in the C language <b>for</b> statement. Also,
|
|
there are various places where <i>awk</i> is more restrictive than the C language regarding the type of expression that can be used
|
|
in a given context. These limitations are due to the different features that the <i>awk</i> language does provide.</p>
|
|
|
|
<p>Regular expressions in <i>awk</i> have been extended somewhat from historical implementations to make them a pure superset of
|
|
extended regular expressions, as defined by IEEE Std 1003.1-2001 (see the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap09.html#tag_09_04">Section 9.4, Extended Regular Expressions</a>). The
|
|
main extensions are internationalization features and interval expressions. Historical implementations of <i>awk</i> have long
|
|
supported backslash escape sequences as an extension to extended regular expressions, and this extension has been retained despite
|
|
inconsistency with other utilities. The number of escape sequences recognized in both extended regular expressions and strings has
|
|
varied (generally increasing with time) among implementations. The set specified by IEEE Std 1003.1-2001 includes most
|
|
sequences known to be supported by popular implementations and by the ISO C standard. One sequence that is not supported is
|
|
hexadecimal value escapes beginning with <tt>'\x'</tt> . This would allow values expressed in more than 9 bits to be used within
|
|
<i>awk</i> as in the ISO C standard. However, because this syntax has a non-deterministic length, it does not permit the
|
|
subsequent character to be a hexadecimal digit. This limitation can be dealt with in the C language by the use of lexical string
|
|
concatenation. In the <i>awk</i> language, concatenation could also be a solution for strings, but not for extended regular
|
|
expressions (either lexical ERE tokens or strings used dynamically as regular expressions). Because of this limitation, the feature
|
|
has not been added to IEEE Std 1003.1-2001.</p>
|
|
|
|
<p>When a string variable is used in a context where an extended regular expression normally appears (where the lexical token ERE
|
|
is used in the grammar) the string does not contain the literal slashes.</p>
|
|
|
|
<p>Some versions of <i>awk</i> allow the form:</p>
|
|
|
|
<pre>
|
|
<tt>func name(args, ... ) { statements }
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>This has been deprecated by the authors of the language, who asked that it not be specified.</p>
|
|
|
|
<p>Historical implementations of <i>awk</i> produce an error if a <b>next</b> statement is executed in a <b>BEGIN</b> action, and
|
|
cause <i>awk</i> to terminate if a <b>next</b> statement is executed in an <b>END</b> action. This behavior has not been
|
|
documented, and it was not believed that it was necessary to standardize it.</p>
|
|
|
|
<p>The specification of conversions between string and numeric values is much more detailed than in the documentation of historical
|
|
implementations or in the referenced <i>The AWK Programming Language</i>. Although most of the behavior is designed to be
|
|
intuitive, the details are necessary to ensure compatible behavior from different implementations. This is especially important in
|
|
relational expressions since the types of the operands determine whether a string or numeric comparison is performed. From the
|
|
perspective of an application writer, it is usually sufficient to expect intuitive behavior and to force conversions (by adding
|
|
zero or concatenating a null string) when the type of an expression does not obviously match what is needed. The intent has been to
|
|
specify historical practice in almost all cases. The one exception is that, in historical implementations, variables and constants
|
|
maintain both string and numeric values after their original value is converted by any use. This means that referencing a variable
|
|
or constant can have unexpected side effects. For example, with historical implementations the following program:</p>
|
|
|
|
<pre>
|
|
<tt>{
|
|
a = "+2"
|
|
b = 2
|
|
if (NR % 2)
|
|
c = a + b
|
|
if (a == b)
|
|
print "numeric comparison"
|
|
else
|
|
print "string comparison"
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>would perform a numeric comparison (and output numeric comparison) for each odd-numbered line, but perform a string comparison
|
|
(and output string comparison) for each even-numbered line. IEEE Std 1003.1-2001 ensures that comparisons will be numeric
|
|
if necessary. With historical implementations, the following program:</p>
|
|
|
|
<pre>
|
|
<tt>BEGIN {
|
|
OFMT = "%e"
|
|
print 3.14
|
|
OFMT = "%f"
|
|
print 3.14
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>would output <tt>"3.140000e+00"</tt> twice, because in the second <b>print</b> statement the constant <tt>"3.14"</tt> would have
|
|
a string value from the previous conversion. IEEE Std 1003.1-2001 requires that the output of the second <b>print</b>
|
|
statement be <tt>"3.140000"</tt> . The behavior of historical implementations was seen as too unintuitive and unpredictable.</p>
|
|
|
|
<p>It was pointed out that with the rules contained in early drafts, the following script would print nothing:</p>
|
|
|
|
<pre>
|
|
<tt>BEGIN {
|
|
y[1.5] = 1
|
|
OFMT = "%e"
|
|
print y[1.5]
|
|
}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Therefore, a new variable, <b>CONVFMT</b>, was introduced. The <b>OFMT</b> variable is now restricted to affecting output
|
|
conversions of numbers to strings and <b>CONVFMT</b> is used for internal conversions, such as comparisons or array indexing. The
|
|
default value is the same as that for <b>OFMT</b>, so unless a program changes <b>CONVFMT</b> (which no historical program would
|
|
do), it will receive the historical behavior associated with internal string conversions.</p>
|
|
|
|
<p>The POSIX <i>awk</i> lexical and syntactic conventions are specified more formally than in other sources. Again the intent has
|
|
been to specify historical practice. One convention that may not be obvious from the formal grammar as in other verbal descriptions
|
|
is where <newline>s are acceptable. There are several obvious placements such as terminating a statement, and a backslash can
|
|
be used to escape <newline>s between any lexical tokens. In addition, <newline>s without backslashes can follow a
|
|
comma, an open brace, a logical AND operator ( <tt>"&&"</tt> ), a logical OR operator ( <tt>"||"</tt> ), the <b>do</b>
|
|
keyword, the <b>else</b> keyword, and the closing parenthesis of an <b>if</b>, <b>for</b>, or <b>while</b> statement. For
|
|
example:</p>
|
|
|
|
<pre>
|
|
<tt>{ print $1,
|
|
$2 }
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The requirement that <i>awk</i> add a trailing <newline> to the program argument text is to simplify the grammar, making
|
|
it match a text file in form. There is no way for an application or test suite to determine whether a literal <newline> is
|
|
added or whether <i>awk</i> simply acts as if it did.</p>
|
|
|
|
<p>IEEE Std 1003.1-2001 requires several changes from historical implementations in order to support
|
|
internationalization. Probably the most subtle of these is the use of the decimal-point character, defined by the <i>LC_NUMERIC</i>
|
|
category of the locale, in representations of floating-point numbers. This locale-specific character is used in recognizing numeric
|
|
input, in converting between strings and numeric values, and in formatting output. However, regardless of locale, the period
|
|
character (the decimal-point character of the POSIX locale) is the decimal-point character recognized in processing <i>awk</i>
|
|
programs (including assignments in command line arguments). This is essentially the same convention as the one used in the
|
|
ISO C standard. The difference is that the C language includes the <a href=
|
|
"../functions/setlocale.html"><i>setlocale</i>()</a> function, which permits an application to modify its locale. Because of this
|
|
capability, a C application begins executing with its locale set to the C locale, and only executes in the environment-specified
|
|
locale after an explicit call to <a href="../functions/setlocale.html"><i>setlocale</i>()</a>. However, adding such an elaborate
|
|
new feature to the <i>awk</i> language was seen as inappropriate for IEEE Std 1003.1-2001. It is possible to execute an
|
|
<i>awk</i> program explicitly in any desired locale by setting the environment in the shell.</p>
|
|
|
|
<p>The undefined behavior resulting from NULs in extended regular expressions allows future extensions for the GNU <i>gawk</i>
|
|
program to process binary data.</p>
|
|
|
|
<p>The behavior in the case of invalid <i>awk</i> programs (including lexical, syntactic, and semantic errors) is undefined because
|
|
it was considered overly limiting on implementations to specify. In most cases such errors can be expected to produce a diagnostic
|
|
and a non-zero exit status. However, some implementations may choose to extend the language in ways that make use of certain
|
|
invalid constructs. Other invalid constructs might be deemed worthy of a warning, but otherwise cause some reasonable behavior.
|
|
Still other constructs may be very difficult to detect in some implementations. Also, different implementations might detect a
|
|
given error during an initial parsing of the program (before reading any input files) while others might detect it when executing
|
|
the program after reading some input. Implementors should be aware that diagnosing errors as early as possible and producing useful
|
|
diagnostics can ease debugging of applications, and thus make an implementation more usable.</p>
|
|
|
|
<p>The unspecified behavior from using multi-character <b>RS</b> values is to allow possible future extensions based on extended
|
|
regular expressions used for record separators. Historical implementations take the first character of the string and ignore the
|
|
others.</p>
|
|
|
|
<p>Unspecified behavior when <a href="../utilities/split.html"><i>split</i></a>( <i>string</i>, <i>array</i>, <null>) is used
|
|
is to allow a proposed future extension that would split up a string into an array of individual characters.</p>
|
|
|
|
<p>In the context of the <b>getline</b> function, equally good arguments for different precedences of the <b>|</b> and <b><</b>
|
|
operators can be made. Historical practice has been that:</p>
|
|
|
|
<pre>
|
|
<tt>getline < "a" "b"
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>is parsed as:</p>
|
|
|
|
<pre>
|
|
<tt>( getline < "a" ) "b"
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>although many would argue that the intent was that the file <b>ab</b> should be read. However:</p>
|
|
|
|
<pre>
|
|
<tt>getline < "x" + 1
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>parses as:</p>
|
|
|
|
<pre>
|
|
<tt>getline < ( "x" + 1 )
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Similar problems occur with the <b>|</b> version of <b>getline</b>, particularly in combination with <b>$</b>. For example:</p>
|
|
|
|
<pre>
|
|
<tt>$"echo hi" | getline
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>(This situation is particularly problematic when used in a <b>print</b> statement, where the <b>|getline</b> part might be a
|
|
redirection of the <b>print</b>.)</p>
|
|
|
|
<p>Since in most cases such constructs are not (or at least should not) be used (because they have a natural ambiguity for which
|
|
there is no conventional parsing), the meaning of these constructs has been made explicitly unspecified. (The effect is that a
|
|
conforming application that runs into the problem must parenthesize to resolve the ambiguity.) There appeared to be few if any
|
|
actual uses of such constructs.</p>
|
|
|
|
<p>Grammars can be written that would cause an error under these circumstances. Where backwards-compatibility is not a large
|
|
consideration, implementors may wish to use such grammars.</p>
|
|
|
|
<p>Some historical implementations have allowed some built-in functions to be called without an argument list, the result being a
|
|
default argument list chosen in some "reasonable" way. Use of <b>length</b> as a synonym for <b>length($0)</b> is the only one of
|
|
these forms that is thought to be widely known or widely used; this particular form is documented in various places (for example,
|
|
most historical <i>awk</i> reference pages, although not in the referenced <i>The AWK Programming Language</i>) as legitimate
|
|
practice. With this exception, default argument lists have always been undocumented and vaguely defined, and it is not at all clear
|
|
how (or if) they should be generalized to user-defined functions. They add no useful functionality and preclude possible future
|
|
extensions that might need to name functions without calling them. Not standardizing them seems the simplest course. The standard
|
|
developers considered that <b>length</b> merited special treatment, however, since it has been documented in the past and sees
|
|
possibly substantial use in historical programs. Accordingly, this usage has been made legitimate, but Issue 5 removed the
|
|
obsolescent marking for XSI-conforming implementations and many otherwise conforming applications depend on this feature.</p>
|
|
|
|
<p>In <b>sub</b> and <b>gsub</b>, if <i>repl</i> is a string literal (the lexical token <b>STRING</b>), then two consecutive
|
|
backslash characters should be used in the string to ensure a single backslash will precede the ampersand when the resultant string
|
|
is passed to the function. (For example, to specify one literal ampersand in the replacement string, use <b>gsub</b>( <b>ERE</b>,
|
|
<tt>"\\&"</tt> ).)</p>
|
|
|
|
<p>Historically the only special character in the <i>repl</i> argument of <b>sub</b> and <b>gsub</b> string functions was the
|
|
ampersand ( <tt>'&'</tt> ) character and preceding it with the backslash character was used to turn off its special
|
|
meaning.</p>
|
|
|
|
<p>The description in the ISO POSIX-2:1993 standard introduced behavior such that the backslash character was another special
|
|
character and it was unspecified whether there were any other special characters. This description introduced several portability
|
|
problems, some of which are described below, and so it has been replaced with the more historical description. Some of the problems
|
|
include:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Historically, to create the replacement string, a script could use <b>gsub</b>( <b>ERE</b>, <tt>"\\&"</tt> ), but with the
|
|
ISO POSIX-2:1993 standard wording, it was necessary to use <b>gsub</b>( <b>ERE</b>, <tt>"\\\\&"</tt> ). Backslash
|
|
characters are doubled here because all string literals are subject to lexical analysis, which would reduce each pair of backslash
|
|
characters to a single backslash before being passed to <b>gsub</b>.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Since it was unspecified what the special characters were, for portable scripts to guarantee that characters are printed
|
|
literally, each character had to be preceded with a backslash. (For example, a portable script had to use <b>gsub</b>( <b>ERE</b>,
|
|
<tt>"\\h\\i"</tt> ) to produce a replacement string of <tt>"hi"</tt> .)</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>The description for comparisons in the ISO POSIX-2:1993 standard did not properly describe historical practice because of
|
|
the way numeric strings are compared as numbers. The current rules cause the following code:</p>
|
|
|
|
<pre>
|
|
<tt>if (0 == "000")
|
|
print "strange, but true"
|
|
else
|
|
print "not true"
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>to do a numeric comparison, causing the <b>if</b> to succeed. It should be intuitively obvious that this is incorrect behavior,
|
|
and indeed, no historical implementation of <i>awk</i> actually behaves this way.</p>
|
|
|
|
<p>To fix this problem, the definition of <i>numeric string</i> was enhanced to include only those values obtained from specific
|
|
circumstances (mostly external sources) where it is not possible to determine unambiguously whether the value is intended to be a
|
|
string or a numeric.</p>
|
|
|
|
<p>Variables that are assigned to a numeric string shall also be treated as a numeric string. (For example, the notion of a numeric
|
|
string can be propagated across assignments.) In comparisons, all variables having the uninitialized value are to be treated as a
|
|
numeric operand evaluating to the numeric value zero.</p>
|
|
|
|
<p>Uninitialized variables include all types of variables including scalars, array elements, and fields. The definition of an
|
|
uninitialized value in <a href="#tag_04_06_13_03">Variables and Special Variables</a> is necessary to describe the value placed on
|
|
uninitialized variables and on fields that are valid (for example, <b><</b> <b>$NF</b>) but have no characters in them and to
|
|
describe how these variables are to be used in comparisons. A valid field, such as <b>$1</b>, that has no characters in it can be
|
|
obtained from an input line of <tt>"\t\t"</tt> when <b>FS=</b> <tt>'\t'</tt> . Historically, the comparison ( <b>$1<</b>10) was
|
|
done numerically after evaluating <b>$1</b> to the value zero.</p>
|
|
|
|
<p>The phrase "... also shall have the numeric value of the numeric string" was removed from several sections of the
|
|
ISO POSIX-2:1993 standard because is specifies an unnecessary implementation detail. It is not necessary for
|
|
IEEE Std 1003.1-2001 to specify that these objects be assigned two different values. It is only necessary to specify that
|
|
these objects may evaluate to two different values depending on context.</p>
|
|
|
|
<p>The description of numeric string processing is based on the behavior of the <a href="../functions/atof.html"><i>atof</i>()</a>
|
|
function in the ISO C standard. While it is not a requirement for an implementation to use this function, many historical
|
|
implementations of <i>awk</i> do. In the ISO C standard, floating-point constants use a period as a decimal point character
|
|
for the language itself, independent of the current locale, but the <a href="../functions/atof.html"><i>atof</i>()</a> function and
|
|
the associated <a href="../functions/strtod.html"><i>strtod</i>()</a> function use the decimal point character of the current
|
|
locale when converting strings to numeric values. Similarly in <i>awk</i>, floating-point constants in an <i>awk</i> script use a
|
|
period independent of the locale, but input strings use the decimal point character of the locale.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_19"></a>FUTURE DIRECTIONS</h4>
|
|
|
|
<blockquote>
|
|
<p>None.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_20"></a>SEE ALSO</h4>
|
|
|
|
<blockquote>
|
|
<p><a href="xcu_chap01.html#tag_01_10"><i>Grammar Conventions</i></a> , <a href="grep.html"><i>grep</i></a> , <a href=
|
|
"lex.html"><i>lex</i></a> , <a href="sed.html"><i>sed</i></a> , the System Interfaces volume of IEEE Std 1003.1-2001, <a
|
|
href="../functions/atof.html"><i>atof</i>()</a>, <i>exec</i>, <a href="../functions/popen.html"><i>popen</i>()</a>, <a href=
|
|
"../functions/setlocale.html"><i>setlocale</i>()</a>, <a href="../functions/strtod.html"><i>strtod</i>()</a></p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_21"></a>CHANGE HISTORY</h4>
|
|
|
|
<blockquote>
|
|
<p>First released in Issue 2.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_22"></a>Issue 5</h4>
|
|
|
|
<blockquote>
|
|
<p>The FUTURE DIRECTIONS section is added.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_06_23"></a>Issue 6</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>awk</i> utility is aligned with the IEEE P1003.2b draft standard.</p>
|
|
|
|
<p>The normative text is reworded to avoid use of the term "must" for application requirements.<br>
|
|
</p>
|
|
|
|
<p>IEEE PASC Interpretation 1003.2 #211 is applied, adding the sentence "An occurrence of two consecutive backslashes shall be
|
|
interpreted as just a single literal backslash character." into the description of the <b>sub</b> string function.</p>
|
|
</blockquote>
|
|
|
|
<div class="box"><em>End of informative text.</em></div>
|
|
|
|
<hr>
|
|
<hr size="2" noshade>
|
|
<center><font size="2"><!--footer start-->
|
|
UNIX ® is a registered Trademark of The Open Group.<br>
|
|
POSIX ® is a registered Trademark of The IEEE.<br>
|
|
[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href=
|
|
"../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>
|
|
]</font></center>
|
|
|
|
<!--footer end-->
|
|
<hr size="2" noshade>
|
|
</body>
|
|
</html>
|
|
|