1058 lines
44 KiB
HTML
1058 lines
44 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 -->
|
|
<!-- Copyright (c) 2001 The Open Group, All Rights Reserved -->
|
|
<title>lex</title>
|
|
</head>
|
|
<body bgcolor="white">
|
|
<script type="text/javascript" language="JavaScript" src="../jscript/codes.js">
|
|
</script>
|
|
|
|
<basefont size="3"> <a name="lex"></a> <a name="tag_04_73"></a><!-- lex -->
|
|
<!--header start-->
|
|
<center><font size="2">The Open Group Base Specifications Issue 6<br>
|
|
IEEE Std 1003.1-2001<br>
|
|
Copyright © 2001 The IEEE and The Open Group, All Rights reserved.</font></center>
|
|
|
|
<!--header end-->
|
|
<hr size="2" noshade>
|
|
<h4><a name="tag_04_73_01"></a>NAME</h4>
|
|
|
|
<blockquote>lex - generate programs for lexical tasks (<b>DEVELOPMENT</b>)</blockquote>
|
|
|
|
<h4><a name="tag_04_73_02"></a>SYNOPSIS</h4>
|
|
|
|
<blockquote class="synopsis">
|
|
<div class="box"><code><tt><sup>[<a href="javascript:open_code('CD')">CD</a>]</sup> <img src="../images/opt-start.gif" alt=
|
|
"[Option Start]" border="0"> lex</tt> <b>[</b><tt>-t</tt><b>][</b><tt>-n|-v</tt><b>][</b><i>file</i> <tt>...</tt><b>]</b><tt><img
|
|
src="../images/opt-end.gif" alt="[Option End]" border="0"></tt></code></div>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_03"></a>DESCRIPTION</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>lex</i> utility shall generate C programs to be used in lexical processing of character input, and that can be used as an
|
|
interface to <a href="../utilities/yacc.html"><i>yacc</i></a>. The C programs shall be generated from <i>lex</i> source code and
|
|
conform to the ISO C standard. Usually, the <i>lex</i> utility shall write the program it generates to the file
|
|
<b>lex.yy.c</b>; the state of this file is unspecified if <i>lex</i> exits with a non-zero exit status. See the EXTENDED
|
|
DESCRIPTION section for a complete description of the <i>lex</i> input language.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_04"></a>OPTIONS</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>lex</i> utility shall conform to the Base Definitions volume of IEEE Std 1003.1-2001, <a href=
|
|
"../basedefs/xbd_chap12.html#tag_12_02">Section 12.2, Utility Syntax Guidelines</a>.</p>
|
|
|
|
<p>The following options shall be supported:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>-n</b></dt>
|
|
|
|
<dd>Suppress the summary of statistics usually written with the <b>-v</b> option. If no table sizes are specified in the <i>lex</i>
|
|
source code and the <b>-v</b> option is not specified, then <b>-n</b> is implied.</dd>
|
|
|
|
<dt><b>-t</b></dt>
|
|
|
|
<dd>Write the resulting program to standard output instead of <b>lex.yy.c</b>.</dd>
|
|
|
|
<dt><b>-v</b></dt>
|
|
|
|
<dd>Write a summary of <i>lex</i> statistics to the standard output. (See the discussion of <i>lex</i> table sizes in <a href=
|
|
"#tag_04_73_13_01">Definitions in lex</a> .) If the <b>-t</b> option is specified and <b>-n</b> is not specified, this report shall
|
|
be written to standard error. If table sizes are specified in the <i>lex</i> source code, and if the <b>-n</b> option is not
|
|
specified, the <b>-v</b> option may be enabled.</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_05"></a>OPERANDS</h4>
|
|
|
|
<blockquote>
|
|
<p>The following operand shall be supported:</p>
|
|
|
|
<dl compact>
|
|
<dt><i>file</i></dt>
|
|
|
|
<dd>A pathname of an input file. If more than one such <i>file</i> is specified, all files shall be concatenated to produce a
|
|
single <i>lex</i> program. If no <i>file</i> operands are specified, or if a <i>file</i> operand is <tt>'-'</tt> , the standard
|
|
input shall be used.</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_06"></a>STDIN</h4>
|
|
|
|
<blockquote>
|
|
<p>The standard input shall be used if no <i>file</i> operands are specified, or if a <i>file</i> operand is <tt>'-'</tt> . See
|
|
INPUT FILES.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_07"></a>INPUT FILES</h4>
|
|
|
|
<blockquote>
|
|
<p>The input files shall be text files containing <i>lex</i> source code, as described in the EXTENDED DESCRIPTION section.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_08"></a>ENVIRONMENT VARIABLES</h4>
|
|
|
|
<blockquote>
|
|
<p>The following environment variables shall affect the execution of <i>lex</i>:</p>
|
|
|
|
<dl compact>
|
|
<dt><i>LANG</i></dt>
|
|
|
|
<dd>Provide a default value for the internationalization variables that are unset or null. (See the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap08.html#tag_08_02">Section 8.2, Internationalization Variables</a> for
|
|
the precedence of internationalization variables used to determine the values of locale categories.)</dd>
|
|
|
|
<dt><i>LC_ALL</i></dt>
|
|
|
|
<dd>If set to a non-empty string value, override the values of all the other internationalization variables.</dd>
|
|
|
|
<dt><i>LC_COLLATE</i></dt>
|
|
|
|
<dd><br>
|
|
Determine the locale for the behavior of ranges, equivalence classes, and multi-character collating elements within regular
|
|
expressions. If this variable is not set to the POSIX locale, the results are unspecified.</dd>
|
|
|
|
<dt><i>LC_CTYPE</i></dt>
|
|
|
|
<dd>Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as
|
|
opposed to multi-byte characters in arguments and input files), and the behavior of character classes within regular expressions.
|
|
If this variable is not set to the POSIX locale, the results are unspecified.</dd>
|
|
|
|
<dt><i>LC_MESSAGES</i></dt>
|
|
|
|
<dd>Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard
|
|
error.</dd>
|
|
|
|
<dt><i>NLSPATH</i></dt>
|
|
|
|
<dd><sup>[<a href="javascript:open_code('XSI')">XSI</a>]</sup> <img src="../images/opt-start.gif" alt="[Option Start]" border="0">
|
|
Determine the location of message catalogs for the processing of <i>LC_MESSAGES .</i> <img src="../images/opt-end.gif" alt=
|
|
"[Option End]" border="0"></dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_09"></a>ASYNCHRONOUS EVENTS</h4>
|
|
|
|
<blockquote>
|
|
<p>Default.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_10"></a>STDOUT</h4>
|
|
|
|
<blockquote>
|
|
<p>If the <b>-t</b> option is specified, the text file of C source code output of <i>lex</i> shall be written to standard
|
|
output.</p>
|
|
|
|
<p>If the <b>-t</b> option is not specified:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>Implementation-defined informational, error, and warning messages concerning the contents of <i>lex</i> source code input shall
|
|
be written to either the standard output or standard error.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If the <b>-v</b> option is specified and the <b>-n</b> option is not specified, <i>lex</i> statistics shall also be written to
|
|
either the standard output or standard error, in an implementation-defined format. These statistics may also be generated if table
|
|
sizes are specified with a <tt>'%'</tt> operator in the <i>Definitions</i> section, as long as the <b>-n</b> option is not
|
|
specified.</p>
|
|
</li>
|
|
</ul>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_11"></a>STDERR</h4>
|
|
|
|
<blockquote>
|
|
<p>If the <b>-t</b> option is specified, implementation-defined informational, error, and warning messages concerning the contents
|
|
of <i>lex</i> source code input shall be written to the standard error.</p>
|
|
|
|
<p>If the <b>-t</b> option is not specified:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>Implementation-defined informational, error, and warning messages concerning the contents of <i>lex</i> source code input shall
|
|
be written to either the standard output or standard error.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If the <b>-v</b> option is specified and the <b>-n</b> option is not specified, <i>lex</i> statistics shall also be written to
|
|
either the standard output or standard error, in an implementation-defined format. These statistics may also be generated if table
|
|
sizes are specified with a <tt>'%'</tt> operator in the <i>Definitions</i> section, as long as the <b>-n</b> option is not
|
|
specified.</p>
|
|
</li>
|
|
</ol>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_12"></a>OUTPUT FILES</h4>
|
|
|
|
<blockquote>
|
|
<p>A text file containing C source code shall be written to <b>lex.yy.c</b>, or to the standard output if the <b>-t</b> option is
|
|
present.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_13"></a>EXTENDED DESCRIPTION</h4>
|
|
|
|
<blockquote>
|
|
<p>Each input file shall contain <i>lex</i> source code, which is a table of regular expressions with corresponding actions in the
|
|
form of C program fragments.</p>
|
|
|
|
<p>When <b>lex.yy.c</b> is compiled and linked with the <i>lex</i> library (using the <b>-l l</b> operand with <a href=
|
|
"../utilities/c99.html"><i>c99</i></a>), the resulting program shall read character input from the standard input and shall
|
|
partition it into strings that match the given expressions.</p>
|
|
|
|
<p>When an expression is matched, these actions shall occur:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>The input string that was matched shall be left in <i>yytext</i> as a null-terminated string; <i>yytext</i> shall either be an
|
|
external character array or a pointer to a character string. As explained in <a href="#tag_04_73_13_01">Definitions in lex</a> ,
|
|
the type can be explicitly selected using the <b>%array</b> or <b>%pointer</b> declarations, but the default is
|
|
implementation-defined.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The external <b>int</b> <i>yyleng</i> shall be set to the length of the matching string.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The expression's corresponding program fragment, or action, shall be executed.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<p>During pattern matching, <i>lex</i> shall search the set of patterns for the single longest possible match. Among rules that
|
|
match the same number of characters, the rule given first shall be chosen.</p>
|
|
|
|
<p>The general format of <i>lex</i> source shall be:</p>
|
|
|
|
<blockquote>
|
|
<pre>
|
|
<i>Definitions</i>
|
|
<b>%%</b>
|
|
<i>Rules</i>
|
|
<b>%%</b>
|
|
<i>User</i>Subroutines
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<p>The first <tt>"%%"</tt> is required to mark the beginning of the rules (regular expressions and actions); the second
|
|
<tt>"%%"</tt> is required only if user subroutines follow.</p>
|
|
|
|
<p>Any line in the <i>Definitions</i> section beginning with a <blank> shall be assumed to be a C program fragment and shall
|
|
be copied to the external definition area of the <b>lex.yy.c</b> file. Similarly, anything in the <i>Definitions</i> section
|
|
included between delimiter lines containing only <tt>"%{"</tt> and <tt>"%}"</tt> shall also be copied unchanged to the external
|
|
definition area of the <b>lex.yy.c</b> file.</p>
|
|
|
|
<p>Any such input (beginning with a <blank> or within <tt>"%{"</tt> and <tt>"%}"</tt> delimiter lines) appearing at the
|
|
beginning of the <i>Rules</i> section before any rules are specified shall be written to <b>lex.yy.c</b> after the declarations of
|
|
variables for the <i>yylex</i>() function and before the first line of code in <i>yylex</i>(). Thus, user variables local to
|
|
<i>yylex</i>() can be declared here, as well as application code to execute upon entry to <i>yylex</i>().</p>
|
|
|
|
<p>The action taken by <i>lex</i> when encountering any input beginning with a <blank> or within <tt>"%{"</tt> and
|
|
<tt>"%}"</tt> delimiter lines appearing in the <i>Rules</i> section but coming after one or more rules is undefined. The presence
|
|
of such input may result in an erroneous definition of the <i>yylex</i>() function.</p>
|
|
|
|
<h5><a name="tag_04_73_13_01"></a>Definitions in lex</h5>
|
|
|
|
<p><i>Definitions</i> appear before the first <tt>"%%"</tt> delimiter. Any line in this section not contained between <tt>"%{"</tt>
|
|
and <tt>"%}"</tt> lines and not beginning with a <blank> shall be assumed to define a <i>lex</i> substitution string. The
|
|
format of these lines shall be:</p>
|
|
|
|
<pre>
|
|
<i>name substitute</i>
|
|
</pre>
|
|
|
|
<p>If a <i>name</i> does not meet the requirements for identifiers in the ISO C standard, the result is undefined. The string
|
|
<i>substitute</i> shall replace the string { <i>name</i>} when it is used in a rule. The <i>name</i> string shall be recognized in
|
|
this context only when the braces are provided and when it does not appear within a bracket expression or within double-quotes.</p>
|
|
|
|
<p>In the <i>Definitions</i> section, any line beginning with a <tt>'%'</tt> (percent sign) character and followed by an
|
|
alphanumeric word beginning with either <tt>'s'</tt> or <tt>'S'</tt> shall define a set of start conditions. Any line beginning
|
|
with a <tt>'%'</tt> followed by a word beginning with either <tt>'x'</tt> or <tt>'X'</tt> shall define a set of exclusive start
|
|
conditions. When the generated scanner is in a <tt>%s</tt> state, patterns with no state specified shall be also active; in a
|
|
<tt>%x</tt> state, such patterns shall not be active. The rest of the line, after the first word, shall be considered to be one or
|
|
more <blank>-separated names of start conditions. Start condition names shall be constructed in the same way as definition
|
|
names. Start conditions can be used to restrict the matching of regular expressions to one or more states as described in <a href=
|
|
"#tag_04_73_13_04">Regular Expressions in lex</a> .</p>
|
|
|
|
<p>Implementations shall accept either of the following two mutually-exclusive declarations in the <i>Definitions</i> section:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>%array</b></dt>
|
|
|
|
<dd>Declare the type of <i>yytext</i> to be a null-terminated character array.</dd>
|
|
|
|
<dt><b>%pointer</b></dt>
|
|
|
|
<dd>Declare the type of <i>yytext</i> to be a pointer to a null-terminated character string.</dd>
|
|
</dl>
|
|
|
|
<p>The default type of <i>yytext</i> is implementation-defined. If an application refers to <i>yytext</i> outside of the scanner
|
|
source file (that is, via an <b>extern</b>), the application shall include the appropriate <b>%array</b> or <b>%pointer</b>
|
|
declaration in the scanner source file.</p>
|
|
|
|
<p>Implementations shall accept declarations in the <i>Definitions</i> section for setting certain internal table sizes. The
|
|
declarations are shown in the following table.</p>
|
|
|
|
<center><b>Table: Table Size Declarations in <i>lex</i></b></center>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Declaration</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Description</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Minimum Value</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">%<b>p</b> <i>n</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of positions</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">2500</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">%<b>n</b> <i>n</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of states</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">500</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">%<b>a</b> <i>n</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of transitions</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">2000</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">%<b>e</b> <i>n</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of parse tree nodes</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1000</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">%<b>k</b> <i>n</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of packed character classes</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">1000</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">%<b>o</b> <i>n</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Size of the output array</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">3000</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<p>In the table, <i>n</i> represents a positive decimal integer, preceded by one or more <blank>s. The exact meaning of these
|
|
table size numbers is implementation-defined. The implementation shall document how these numbers affect the <i>lex</i> utility and
|
|
how they are related to any output that may be generated by the implementation should limitations be encountered during the
|
|
execution of <i>lex</i>. It shall be possible to determine from this output which of the table size values needs to be modified to
|
|
permit <i>lex</i> to successfully generate tables for the input language. The values in the column Minimum Value represent the
|
|
lowest values conforming implementations shall provide.</p>
|
|
|
|
<h5><a name="tag_04_73_13_02"></a>Rules in lex</h5>
|
|
|
|
<p>The rules in <i>lex</i> source files are a table in which the left column contains regular expressions and the right column
|
|
contains actions (C program fragments) to be executed when the expressions are recognized.</p>
|
|
|
|
<pre>
|
|
<i>ERE action
|
|
ERE action</i><tt>...
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The extended regular expression (ERE) portion of a row shall be separated from <i>action</i> by one or more <blank>s. A
|
|
regular expression containing <blank>s shall be recognized under one of the following conditions:</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<p>The entire expression appears within double-quotes.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>The <blank>s appear within double-quotes or square brackets.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>Each <blank> is preceded by a backslash character.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h5><a name="tag_04_73_13_03"></a>User Subroutines in lex</h5>
|
|
|
|
<p>Anything in the user subroutines section shall be copied to <b>lex.yy.c</b> following <i>yylex</i>().</p>
|
|
|
|
<h5><a name="tag_04_73_13_04"></a>Regular Expressions in lex</h5>
|
|
|
|
<p>The <i>lex</i> utility shall support the set of extended regular expressions (see the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap09.html#tag_09_04">Section 9.4, Extended Regular Expressions</a>),
|
|
with the following additions and exceptions to the syntax:</p>
|
|
|
|
<dl compact>
|
|
<dt><tt>"..."</tt></dt>
|
|
|
|
<dd>Any string enclosed in double-quotes shall represent the characters within the double-quotes as themselves, except that
|
|
backslash escapes (which appear in the following table) shall be recognized. Any backslash-escape sequence shall be terminated by
|
|
the closing quote. For example, <tt>"\01"</tt> <tt>"1"</tt> represents a single string: the octal value 1 followed by the character
|
|
<tt>'1'</tt> .</dd>
|
|
|
|
<dt><<i>state</i>><i>r</i>, <<i>state1,state2,</i>...><i>r</i></dt>
|
|
|
|
<dd><br>
|
|
The regular expression <i>r</i> shall be matched only when the program is in one of the start conditions indicated by <i>state</i>,
|
|
<i>state1</i>, and so on; see <a href="#tag_04_73_13_05">Actions in lex</a> . (As an exception to the typographical conventions of
|
|
the rest of this volume of IEEE Std 1003.1-2001, in this case <<i>state</i>> does not represent a metavariable, but
|
|
the literal angle-bracket characters surrounding a symbol.) The start condition shall be recognized as such only at the beginning
|
|
of a regular expression.</dd>
|
|
|
|
<dt><i>r</i>/<i>x</i></dt>
|
|
|
|
<dd>The regular expression <i>r</i> shall be matched only if it is followed by an occurrence of regular expression <i>x</i> (
|
|
<i>x</i> is the instance of trailing context, further defined below). The token returned in <i>yytext</i> shall only match
|
|
<i>r</i>. If the trailing portion of <i>r</i> matches the beginning of <i>x</i>, the result is unspecified. The <i>r</i> expression
|
|
cannot include further trailing context or the <tt>'$'</tt> (match-end-of-line) operator; <i>x</i> cannot include the <tt>'^'</tt>
|
|
(match-beginning-of-line) operator, nor trailing context, nor the <tt>'$'</tt> operator. That is, only one occurrence of trailing
|
|
context is allowed in a <i>lex</i> regular expression, and the <tt>'^'</tt> operator only can be used at the beginning of such an
|
|
expression.</dd>
|
|
|
|
<dt>{<i>name</i>}</dt>
|
|
|
|
<dd>When <i>name</i> is one of the substitution symbols from the <i>Definitions</i> section, the string, including the enclosing
|
|
braces, shall be replaced by the <i>substitute</i> value. The <i>substitute</i> value shall be treated in the extended regular
|
|
expression as if it were enclosed in parentheses. No substitution shall occur if { <i>name</i>} occurs within a bracket expression
|
|
or within double-quotes.</dd>
|
|
</dl>
|
|
|
|
<p>Within an ERE, a backslash character shall be considered to begin an escape sequence as specified in the table in the Base
|
|
Definitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a> (
|
|
<tt>'\\'</tt> , <tt>'\a'</tt> , <tt>'\b'</tt> , <tt>'\f'</tt> , <tt>'\n'</tt> , <tt>'\r'</tt> , <tt>'\t'</tt> , <tt>'\v'</tt> ). In
|
|
addition, the escape sequences in the following table shall be recognized.</p>
|
|
|
|
<p>A literal <newline> cannot occur within an ERE; the escape sequence <tt>'\n'</tt> can be used to represent a
|
|
<newline>. A <newline> shall not be matched by a period operator.<br>
|
|
</p>
|
|
|
|
<center><b>Table: Escape Sequences in <i>lex</i></b></center>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Escape</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b> </b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b> </b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Sequence</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Description</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Meaning</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">\<i>digits</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). If
|
|
all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">The character whose encoding is represented by the one, two, or three-digit octal integer. If the size of a byte on
|
|
the system is greater than nine bits, the valid escape sequence used to represent a byte is implementation-defined. Multi-byte
|
|
characters require multiple, concatenated escape sequences of this type, including the leading <tt>'\'</tt> for each byte.</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">\x<i>digits</i></p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">A backslash character followed by the longest sequence of hexadecimal-digit characters (01234567abcdefABCDEF). If
|
|
all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">The character whose encoding is represented by the hexadecimal integer.</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">\c</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">A backslash character followed by any character not described in this table or in the table in the Base Definitions
|
|
volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a> ( <tt>'\\'</tt>
|
|
, <tt>'\a'</tt> , <tt>'\b'</tt> , <tt>'\f'</tt> , <tt>'\n'</tt> , <tt>'\r'</tt> , <tt>'\t'</tt> , <tt>'\v'</tt> ).</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">The character <tt>'c'</tt> , unchanged.</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<basefont size="2">
|
|
|
|
<dl>
|
|
<dt><b>Note:</b></dt>
|
|
|
|
<dd>If a <tt>'\x'</tt> sequence needs to be immediately followed by a hexadecimal digit character, a sequence such as
|
|
<tt>"\x1"</tt> <tt>"1"</tt> can be used, which represents a character containing the value 1, followed by the character
|
|
<tt>'1'</tt> .</dd>
|
|
</dl>
|
|
|
|
<basefont size="3">
|
|
|
|
<p>The order of precedence given to extended regular expressions for <i>lex</i> differs from that specified in the Base Definitions
|
|
volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap09.html#tag_09_04">Section 9.4, Extended Regular
|
|
Expressions</a>. The order of precedence for <i>lex</i> shall be as shown in the following table, from high to low. <basefont size=
|
|
"2"></p>
|
|
|
|
<dl>
|
|
<dt><b>Note:</b></dt>
|
|
|
|
<dd>The escaped characters entry is not meant to imply that these are operators, but they are included in the table to show their
|
|
relationships to the true operators. The start condition, trailing context, and anchoring notations have been omitted from the
|
|
table because of the placement restrictions described in this section; they can only appear at the beginning or ending of an
|
|
ERE.</dd>
|
|
</dl>
|
|
|
|
<basefont size="3"><br>
|
|
<center><b>Table: ERE Precedence in <i>lex</i></b></center>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Extended Regular Expression</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Precedence</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">collation-related bracket symbols</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">[= =] [: :] [. .]</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">escaped characters</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">\<<i>special character</i>></p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">bracket expression</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">[ ]</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">quoting</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">"..."</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">grouping</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">( )</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">definition</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">{<i>name</i>}</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">single-character RE duplication</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">* + ?</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">concatenation</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent"> </p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">interval expression</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">{m,n}</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">alternation</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">|</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
|
|
<p>The ERE anchoring operators <tt>'^'</tt> and <tt>'$'</tt> do not appear in the table. With <i>lex</i> regular expressions, these
|
|
operators are restricted in their use: the <tt>'^'</tt> operator can only be used at the beginning of an entire regular expression,
|
|
and the <tt>'$'</tt> operator only at the end. The operators apply to the entire regular expression. Thus, for example, the pattern
|
|
<tt>"(^abc)|(def$)"</tt> is undefined; it can instead be written as two separate rules, one with the regular expression
|
|
<tt>"^abc"</tt> and one with <tt>"def$"</tt> , which share a common action via the special <tt>'|'</tt> action (see below). If the
|
|
pattern were written <tt>"^abc|def$"</tt> , it would match either <tt>"abc"</tt> or <tt>"def"</tt> on a line by itself.</p>
|
|
|
|
<p>Unlike the general ERE rules, embedded anchoring is not allowed by most historical <i>lex</i> implementations. An example of
|
|
embedded anchoring would be for patterns such as <tt>"(^| )foo( |$)"</tt> to match <tt>"foo"</tt> when it exists as a
|
|
complete word. This functionality can be obtained using existing <i>lex</i> features:</p>
|
|
|
|
<pre>
|
|
<tt>^foo/[ \n] |
|
|
" foo"/[ \n] /* Found foo as a separate word. */
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Note also that <tt>'$'</tt> is a form of trailing context (it is equivalent to <tt>"/\n"</tt> ) and as such cannot be used with
|
|
regular expressions containing another instance of the operator (see the preceding discussion of trailing context).</p>
|
|
|
|
<p>The additional regular expressions trailing-context operator <tt>'/'</tt> can be used as an ordinary character if presented
|
|
within double-quotes, <tt>"/"</tt> ; preceded by a backslash, <tt>"\/"</tt> ; or within a bracket expression, <tt>"[/]"</tt> . The
|
|
start-condition <tt>'<'</tt> and <tt>'>'</tt> operators shall be special only in a start condition at the beginning of a
|
|
regular expression; elsewhere in the regular expression they shall be treated as ordinary characters.</p>
|
|
|
|
<h5><a name="tag_04_73_13_05"></a>Actions in lex</h5>
|
|
|
|
<p>The action to be taken when an ERE is matched can be a C program fragment or the special actions described below; the program
|
|
fragment can contain one or more C statements, and can also include special actions. The empty C statement <tt>';'</tt> shall be a
|
|
valid action; any string in the <b>lex.yy.c</b> input that matches the pattern portion of such a rule is effectively ignored or
|
|
skipped. However, the absence of an action shall not be valid, and the action <i>lex</i> takes in such a condition is
|
|
undefined.</p>
|
|
|
|
<p>The specification for an action, including C statements and special actions, can extend across several lines if enclosed in
|
|
braces:</p>
|
|
|
|
<pre>
|
|
<i>ERE</i> <tt><</tt><i>one or more blanks</i><tt>> {</tt> <i>program statement
|
|
program statement</i> <tt>}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The default action when a string in the input to a <b>lex.yy.c</b> program is not matched by any expression shall be to copy the
|
|
string to the output. Because the default behavior of a program generated by <i>lex</i> is to read the input and copy it to the
|
|
output, a minimal <i>lex</i> source program that has just <tt>"%%"</tt> shall generate a C program that simply copies the input to
|
|
the output unchanged.</p>
|
|
|
|
<p>Four special actions shall be available:</p>
|
|
|
|
<pre>
|
|
<tt>| ECHO; REJECT; BEGIN
|
|
</tt>
|
|
</pre>
|
|
|
|
<dl compact>
|
|
<dt><tt>|</tt></dt>
|
|
|
|
<dd>The action <tt>'|'</tt> means that the action for the next rule is the action for this rule. Unlike the other three actions,
|
|
<tt>'|'</tt> cannot be enclosed in braces or be semicolon-terminated; the application shall ensure that it is specified alone, with
|
|
no other actions.</dd>
|
|
|
|
<dt><b>ECHO;</b></dt>
|
|
|
|
<dd>Write the contents of the string <i>yytext</i> on the output.</dd>
|
|
|
|
<dt><b>REJECT;</b></dt>
|
|
|
|
<dd>Usually only a single expression is matched by a given string in the input. <b>REJECT</b> means "continue to the next
|
|
expression that matches the current input", and shall cause whatever rule was the second choice after the current rule to be
|
|
executed for the same input. Thus, multiple rules can be matched and executed for one input string or overlapping input strings.
|
|
For example, given the regular expressions <tt>"xyz"</tt> and <tt>"xy"</tt> and the input <tt>"xyz"</tt> , usually only the regular
|
|
expression <tt>"xyz"</tt> would match. The next attempted match would start after <b>z.</b> If the last action in the
|
|
<tt>"xyz"</tt> rule is <b>REJECT</b>, both this rule and the <tt>"xy"</tt> rule would be executed. The <b>REJECT</b> action may be
|
|
implemented in such a fashion that flow of control does not continue after it, as if it were equivalent to a <b>goto</b> to another
|
|
part of <i>yylex</i>(). The use of <b>REJECT</b> may result in somewhat larger and slower scanners.</dd>
|
|
|
|
<dt><b>BEGIN</b></dt>
|
|
|
|
<dd>The action:
|
|
|
|
<pre>
|
|
<tt>BEGIN</tt> <i>newstate</i><tt>;
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>switches the state (start condition) to <i>newstate</i>. If the string <i>newstate</i> has not been declared previously as a
|
|
start condition in the <i>Definitions</i> section, the results are unspecified. The initial state is indicated by the digit
|
|
<tt>'0'</tt> or the token <b>INITIAL</b>.</p>
|
|
</dd>
|
|
</dl>
|
|
|
|
<p>The functions or macros described below are accessible to user code included in the <i>lex</i> input. It is unspecified whether
|
|
they appear in the C code output of <i>lex</i>, or are accessible only through the <b>-l l</b> operand to <a href=
|
|
"../utilities/c99.html"><i>c99</i></a> (the <i>lex</i> library).</p>
|
|
|
|
<dl compact>
|
|
<dt><b>int </b> <i>yylex</i>(<b>void</b>)</dt>
|
|
|
|
<dd><br>
|
|
Performs lexical analysis on the input; this is the primary function generated by the <i>lex</i> utility. The function shall return
|
|
zero when the end of input is reached; otherwise, it shall return non-zero values (tokens) determined by the actions that are
|
|
selected.</dd>
|
|
|
|
<dt><b>int </b> <i>yymore</i>(<b>void</b>)</dt>
|
|
|
|
<dd><br>
|
|
When called, indicates that when the next input string is recognized, it is to be appended to the current value of <i>yytext</i>
|
|
rather than replacing it; the value in <i>yyleng</i> shall be adjusted accordingly.</dd>
|
|
|
|
<dt><b>int </b> <i>yyless</i>(<b>int </b> <i>n</i>)</dt>
|
|
|
|
<dd><br>
|
|
Retains <i>n</i> initial characters in <i>yytext</i>, NUL-terminated, and treats the remaining characters as if they had not been
|
|
read; the value in <i>yyleng</i> shall be adjusted accordingly.</dd>
|
|
|
|
<dt><b>int </b> <i>input</i>(<b>void</b>)</dt>
|
|
|
|
<dd><br>
|
|
Returns the next character from the input, or zero on end-of-file. It shall obtain input from the stream pointer <i>yyin</i>,
|
|
although possibly via an intermediate buffer. Thus, once scanning has begun, the effect of altering the value of <i>yyin</i> is
|
|
undefined. The character read shall be removed from the input stream of the scanner without any processing by the scanner.</dd>
|
|
|
|
<dt><b>int </b> <i>unput</i>(<b>int </b> <i>c</i>)</dt>
|
|
|
|
<dd><br>
|
|
Returns the character <tt>'c'</tt> to the input; <i>yytext</i> and <i>yyleng</i> are undefined until the next expression is
|
|
matched. The result of using <i>unput</i>() for more characters than have been input is unspecified.</dd>
|
|
</dl>
|
|
|
|
<p>The following functions shall appear only in the <i>lex</i> library accessible through the <b>-l l</b> operand; they can
|
|
therefore be redefined by a conforming application:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>int </b> <i>yywrap</i>(<b>void</b>)</dt>
|
|
|
|
<dd><br>
|
|
Called by <i>yylex</i>() at end-of-file; the default <i>yywrap</i>() shall always return 1. If the application requires
|
|
<i>yylex</i>() to continue processing with another source of input, then the application can include a function <i>yywrap</i>(),
|
|
which associates another file with the external variable <b>FILE *</b> <i>yyin</i> and shall return a value of zero.</dd>
|
|
|
|
<dt><b>int </b> <i>main</i>(<b>int </b> <i>argc</i>, <b>char *</b><i>argv</i>[])</dt>
|
|
|
|
<dd><br>
|
|
Calls <i>yylex</i>() to perform lexical analysis, then exits. The user code can contain <i>main</i>() to perform
|
|
application-specific operations, calling <i>yylex</i>() as applicable.</dd>
|
|
</dl>
|
|
|
|
<p>Except for <i>input</i>(), <i>unput</i>(), and <i>main</i>(), all external and static names generated by <i>lex</i> shall begin
|
|
with the prefix <b>yy</b> or <b>YY</b>.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_14"></a>EXIT STATUS</h4>
|
|
|
|
<blockquote>
|
|
<p>The following exit values shall be returned:</p>
|
|
|
|
<dl compact>
|
|
<dt> 0</dt>
|
|
|
|
<dd>Successful completion.</dd>
|
|
|
|
<dt>>0</dt>
|
|
|
|
<dd>An error occurred.</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_15"></a>CONSEQUENCES OF ERRORS</h4>
|
|
|
|
<blockquote>
|
|
<p>Default.</p>
|
|
</blockquote>
|
|
|
|
<hr>
|
|
<div class="box"><em>The following sections are informative.</em></div>
|
|
|
|
<h4><a name="tag_04_73_16"></a>APPLICATION USAGE</h4>
|
|
|
|
<blockquote>
|
|
<p>Conforming applications are warned that in the <i>Rules</i> section, an ERE without an action is not acceptable, but need not be
|
|
detected as erroneous by <i>lex</i>. This may result in compilation or runtime errors.</p>
|
|
|
|
<p>The purpose of <i>input</i>() is to take characters off the input stream and discard them as far as the lexical analysis is
|
|
concerned. A common use is to discard the body of a comment once the beginning of a comment is recognized.</p>
|
|
|
|
<p>The <i>lex</i> utility is not fully internationalized in its treatment of regular expressions in the <i>lex</i> source code or
|
|
generated lexical analyzer. It would seem desirable to have the lexical analyzer interpret the regular expressions given in the
|
|
<i>lex</i> source according to the environment specified when the lexical analyzer is executed, but this is not possible with the
|
|
current <i>lex</i> technology. Furthermore, the very nature of the lexical analyzers produced by <i>lex</i> must be closely tied to
|
|
the lexical requirements of the input language being described, which is frequently locale-specific anyway. (For example, writing
|
|
an analyzer that is used for French text is not automatically useful for processing other languages.)</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_17"></a>EXAMPLES</h4>
|
|
|
|
<blockquote>
|
|
<p>The following is an example of a <i>lex</i> program that implements a rudimentary scanner for a Pascal-like syntax:</p>
|
|
|
|
<pre>
|
|
<tt>%{
|
|
/* Need this for the call to atof() below. */
|
|
#include <math.h>
|
|
/* Need this for printf(), fopen(), and stdin below. */
|
|
#include <stdio.h>
|
|
%}
|
|
<br>
|
|
DIGIT [0-9]
|
|
ID [a-z][a-z0-9]*
|
|
<br>
|
|
%%
|
|
<br>
|
|
{DIGIT}+ {
|
|
printf("An integer: %s (%d)\n", yytext,
|
|
atoi(yytext));
|
|
}
|
|
<br>
|
|
{DIGIT}+"."{DIGIT}* {
|
|
printf("A float: %s (%g)\n", yytext,
|
|
atof(yytext));
|
|
}
|
|
<br>
|
|
if|then|begin|end|procedure|function {
|
|
printf("A keyword: %s\n", yytext);
|
|
}
|
|
<br>
|
|
{ID} printf("An identifier: %s\n", yytext);
|
|
<br>
|
|
"+"|"-"|"*"|"/" printf("An operator: %s\n", yytext);
|
|
<br>
|
|
"{"[^}\n]*"}" /* Eat up one-line comments. */
|
|
<br>
|
|
[ \t\n]+ /* Eat up white space. */
|
|
<br>
|
|
. printf("Unrecognized character: %s\n", yytext);
|
|
<br>
|
|
%%
|
|
<br>
|
|
int main(int argc, char *argv[])
|
|
{
|
|
++argv, --argc; /* Skip over program name. */
|
|
if (argc > 0)
|
|
yyin = fopen(argv[0], "r");
|
|
else
|
|
yyin = stdin;
|
|
<br>
|
|
yylex();
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_18"></a>RATIONALE</h4>
|
|
|
|
<blockquote>
|
|
<p>Even though the <b>-c</b> option and references to the C language are retained in this description, <i>lex</i> may be
|
|
generalized to other languages, as was done at one time for EFL, the Extended FORTRAN Language. Since the <i>lex</i> input
|
|
specification is essentially language-independent, versions of this utility could be written to produce Ada, Modula-2, or Pascal
|
|
code, and there are known historical implementations that do so.</p>
|
|
|
|
<p>The current description of <i>lex</i> bypasses the issue of dealing with internationalized EREs in the <i>lex</i> source code or
|
|
generated lexical analyzer. If it follows the model used by <a href="../utilities/awk.html"><i>awk</i></a> (the source code is
|
|
assumed to be presented in the POSIX locale, but input and output are in the locale specified by the environment variables), then
|
|
the tables in the lexical analyzer produced by <i>lex</i> would interpret EREs specified in the <i>lex</i> source in terms of the
|
|
environment variables specified when <i>lex</i> was executed. The desired effect would be to have the lexical analyzer interpret
|
|
the EREs given in the <i>lex</i> source according to the environment specified when the lexical analyzer is executed, but this is
|
|
not possible with the current <i>lex</i> technology.</p>
|
|
|
|
<p>The description of octal and hexadecimal-digit escape sequences agrees with the ISO C standard use of escape sequences. See
|
|
the RATIONALE for <a href="ed.html"><i>ed</i></a> for a discussion of bytes larger than 9 bits being represented by octal values.
|
|
Hexadecimal values can represent larger bytes and multi-byte characters directly, using as many digits as required.</p>
|
|
|
|
<p>There is no detailed output format specification. The observed behavior of <i>lex</i> under four different historical
|
|
implementations was that none of these implementations consistently reported the line numbers for error and warning messages.
|
|
Furthermore, there was a desire that <i>lex</i> be allowed to output additional diagnostic messages. Leaving message formats
|
|
unspecified avoids these formatting questions and problems with internationalization.</p>
|
|
|
|
<p>Although the <tt>%x</tt> specifier for <i>exclusive</i> start conditions is not historical practice, it is believed to be a
|
|
minor change to historical implementations and greatly enhances the usability of <i>lex</i> programs since it permits an
|
|
application to obtain the expected functionality with fewer statements.</p>
|
|
|
|
<p>The <b>%array</b> and <b>%pointer</b> declarations were added as a compromise between historical systems. The System V-based
|
|
<i>lex</i> copies the matched text to a <i>yytext</i> array. The <i>flex</i> program, supported in BSD and GNU systems, uses a
|
|
pointer. In the latter case, significant performance improvements are available for some scanners. Most historical programs should
|
|
require no change in porting from one system to another because the string being referenced is null-terminated in both cases. (The
|
|
method used by <i>flex</i> in its case is to null-terminate the token in place by remembering the character that used to come right
|
|
after the token and replacing it before continuing on to the next scan.) Multi-file programs with external references to
|
|
<i>yytext</i> outside the scanner source file should continue to operate on their historical systems, but would require one of the
|
|
new declarations to be considered strictly portable.</p>
|
|
|
|
<p>The description of EREs avoids unnecessary duplication of ERE details because their meanings within a <i>lex</i> ERE are the
|
|
same as that for the ERE in this volume of IEEE Std 1003.1-2001.</p>
|
|
|
|
<p>The reason for the undefined condition associated with text beginning with a <blank> or within <tt>"%{"</tt> and
|
|
<tt>"%}"</tt> delimiter lines appearing in the <i>Rules</i> section is historical practice. Both the BSD and System V <i>lex</i>
|
|
copy the indented (or enclosed) input in the <i>Rules</i> section (except at the beginning) to unreachable areas of the
|
|
<i>yylex</i>() function (the code is written directly after a <a href="../utilities/break.html"><i>break</i></a>
|
|
statement). In some cases, the System V <i>lex</i> generates an error message or a syntax error, depending on the form of indented
|
|
input.</p>
|
|
|
|
<p>The intention in breaking the list of functions into those that may appear in <b>lex.yy.c</b> <i>versus</i> those that only
|
|
appear in <b>libl.a</b> is that only those functions in <b>libl.a</b> can be reliably redefined by a conforming application.</p>
|
|
|
|
<p>The descriptions of standard output and standard error are somewhat complicated because historical <i>lex</i> implementations
|
|
chose to issue diagnostic messages to standard output (unless <b>-t</b> was given). IEEE Std 1003.1-2001 allows this
|
|
behavior, but leaves an opening for the more expected behavior of using standard error for diagnostics. Also, the System V behavior
|
|
of writing the statistics when any table sizes are given is allowed, while BSD-derived systems can avoid it. The programmer can
|
|
always precisely obtain the desired results by using either the <b>-t</b> or <b>-n</b> options.</p>
|
|
|
|
<p>The OPERANDS section does not mention the use of <b>-</b> as a synonym for standard input; not all historical implementations
|
|
support such usage for any of the <i>file</i> operands.</p>
|
|
|
|
<p>A description of the <i>translation table</i> was deleted from early proposals because of its relatively low usage in historical
|
|
applications.</p>
|
|
|
|
<p>The change to the definition of the <i>input</i>() function that allows buffering of input presents the opportunity for major
|
|
performance gains in some applications.</p>
|
|
|
|
<p>The following examples clarify the differences between <i>lex</i> regular expressions and regular expressions appearing
|
|
elsewhere in this volume of IEEE Std 1003.1-2001. For regular expressions of the form <tt>"r/x"</tt> , the string
|
|
matching <i>r</i> is always returned; confusion may arise when the beginning of <i>x</i> matches the trailing portion of <i>r</i>.
|
|
For example, given the regular expression <tt>"a*b/cc"</tt> and the input <tt>"aaabcc"</tt> , <i>yytext</i> would contain the
|
|
string <tt>"aaab"</tt> on this match. But given the regular expression <tt>"x*/xy"</tt> and the input <tt>"xxxy"</tt> , the token
|
|
<b>xxx</b>, not <b>xx</b>, is returned by some implementations because <b>xxx</b> matches <tt>"x*"</tt> .</p>
|
|
|
|
<p>In the rule <tt>"ab*/bc"</tt> , the <tt>"b*"</tt> at the end of <i>r</i> extends <i>r</i>'s match into the beginning of the
|
|
trailing context, so the result is unspecified. If this rule were <tt>"ab/bc"</tt> , however, the rule matches the text
|
|
<tt>"ab"</tt> when it is followed by the text <tt>"bc"</tt> . In this latter case, the matching of <i>r</i> cannot extend into the
|
|
beginning of <i>x</i>, so the result is specified.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_19"></a>FUTURE DIRECTIONS</h4>
|
|
|
|
<blockquote>
|
|
<p>None.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_20"></a>SEE ALSO</h4>
|
|
|
|
<blockquote>
|
|
<p><a href="c99.html"><i>c99</i></a> , <a href="ed.html"><i>ed</i></a> , <a href="yacc.html"><i>yacc</i></a></p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_21"></a>CHANGE HISTORY</h4>
|
|
|
|
<blockquote>
|
|
<p>First released in Issue 2.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_73_22"></a>Issue 6</h4>
|
|
|
|
<blockquote>
|
|
<p>This utility is marked as part of the C-Language Development Utilities option.</p>
|
|
|
|
<p>The obsolescent <b>-c</b> option is withdrawn in this issue.</p>
|
|
|
|
<p>The normative text is reworded to avoid use of the term "must" for application requirements.</p>
|
|
</blockquote>
|
|
|
|
<div class="box"><em>End of informative text.</em></div>
|
|
|
|
<hr>
|
|
<hr size="2" noshade>
|
|
<center><font size="2"><!--footer start-->
|
|
UNIX ® is a registered Trademark of The Open Group.<br>
|
|
POSIX ® is a registered Trademark of The IEEE.<br>
|
|
[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href=
|
|
"../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>
|
|
]</font></center>
|
|
|
|
<!--footer end-->
|
|
<hr size="2" noshade>
|
|
</body>
|
|
</html>
|
|
|