1138 lines
56 KiB
HTML
1138 lines
56 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link type="text/css" rel="stylesheet" href="style.css"><!-- Generated by The Open Group's rhtm tool v1.2.1 -->
|
|
<!-- Copyright (c) 2001 The Open Group, All Rights Reserved -->
|
|
<title>yacc</title>
|
|
</head>
|
|
<body bgcolor="white">
|
|
<script type="text/javascript" language="JavaScript" src="../jscript/codes.js">
|
|
</script>
|
|
|
|
<basefont size="3"> <a name="yacc"></a> <a name="tag_04_174"></a><!-- yacc -->
|
|
<!--header start-->
|
|
<center><font size="2">The Open Group Base Specifications Issue 6<br>
|
|
IEEE Std 1003.1-2001<br>
|
|
Copyright © 2001 The IEEE and The Open Group, All Rights reserved.</font></center>
|
|
|
|
<!--header end-->
|
|
<hr size="2" noshade>
|
|
<h4><a name="tag_04_174_01"></a>NAME</h4>
|
|
|
|
<blockquote>yacc - yet another compiler compiler (<b>DEVELOPMENT</b>)</blockquote>
|
|
|
|
<h4><a name="tag_04_174_02"></a>SYNOPSIS</h4>
|
|
|
|
<blockquote class="synopsis">
|
|
<div class="box"><code><tt><sup>[<a href="javascript:open_code('CD')">CD</a>]</sup> <img src="../images/opt-start.gif" alt=
|
|
"[Option Start]" border="0"> yacc</tt> <b>[</b><tt>-dltv</tt><b>][</b><tt>-b</tt> <i>file_prefix</i><b>][</b><tt>-p</tt>
|
|
<i>sym_prefix</i><b>]</b> <i>grammar</i><tt><img src="../images/opt-end.gif" alt="[Option End]" border="0"></tt></code></div>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_03"></a>DESCRIPTION</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>yacc</i> utility shall read a description of a context-free grammar in <i>grammar</i> and write C source code, conforming
|
|
to the ISO C standard, to a code file, and optionally header information into a header file, in the current directory. The C
|
|
code shall define a function and related routines and macros for an automaton that executes a parsing algorithm meeting the
|
|
requirements in <a href="#tag_04_174_13_13">Algorithms</a> .</p>
|
|
|
|
<p>The form and meaning of the grammar are described in the EXTENDED DESCRIPTION section.</p>
|
|
|
|
<p>The C source code and header file shall be produced in a form suitable as input for the C compiler (see <a href=
|
|
"c99.html"><i>c99</i></a> ).</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_04"></a>OPTIONS</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>yacc</i> utility shall conform to the Base Definitions volume of IEEE Std 1003.1-2001, <a href=
|
|
"../basedefs/xbd_chap12.html#tag_12_02">Section 12.2, Utility Syntax Guidelines</a>.</p>
|
|
|
|
<p>The following options shall be supported:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>-b </b> <i>file_prefix</i></dt>
|
|
|
|
<dd>Use <i>file_prefix</i> instead of <b>y</b> as the prefix for all output filenames. The code file <b>y.tab.c</b>, the header
|
|
file <b>y.tab.h</b> (created when <b>-d</b> is specified), and the description file <b>y.output</b> (created when <b>-v</b> is
|
|
specified), shall be changed to <i>file_prefix</i> <b>.tab.c</b>, <i>file_prefix</i> <b>.tab.h</b>, and <i>file_prefix</i>
|
|
<b>.output</b>, respectively.</dd>
|
|
|
|
<dt><b>-d</b></dt>
|
|
|
|
<dd>Write the header file; by default only the code file is written. The <b>#define</b> statements associate the token codes
|
|
assigned by <i>yacc</i> with the user-declared token names. This allows source files other than <b>y.tab.c</b> to access the token
|
|
codes.</dd>
|
|
|
|
<dt><b>-l</b></dt>
|
|
|
|
<dd>Produce a code file that does not contain any <b>#line</b> constructs. If this option is not present, it is unspecified whether
|
|
the code file or header file contains <b>#line</b> directives. This should only be used after the grammar and the associated
|
|
actions are fully debugged.</dd>
|
|
|
|
<dt><b>-p </b> <i>sym_prefix</i></dt>
|
|
|
|
<dd><br>
|
|
Use <i>sym_prefix</i> instead of <b>yy</b> as the prefix for all external names produced by <i>yacc</i>. The names affected shall
|
|
include the functions <i>yyparse</i>(), <i>yylex</i>(), and <i>yyerror</i>(), and the variables <i>yylval</i>, <i>yychar</i>, and
|
|
<i>yydebug</i>. (In the remainder of this section, the six symbols cited are referenced using their default names only as a
|
|
notational convenience.) Local names may also be affected by the <b>-p</b> option; however, the <b>-p</b> option shall not affect
|
|
<b>#define</b> symbols generated by <i>yacc</i>.</dd>
|
|
|
|
<dt><b>-t</b></dt>
|
|
|
|
<dd>Modify conditional compilation directives to permit compilation of debugging code in the code file. Runtime debugging
|
|
statements shall always be contained in the code file, but by default conditional compilation directives prevent their
|
|
compilation.</dd>
|
|
|
|
<dt><b>-v</b></dt>
|
|
|
|
<dd>Write a file containing a description of the parser and a report of conflicts generated by ambiguities in the grammar.</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_05"></a>OPERANDS</h4>
|
|
|
|
<blockquote>
|
|
<p>The following operand is required:</p>
|
|
|
|
<dl compact>
|
|
<dt><i>grammar</i></dt>
|
|
|
|
<dd>A pathname of a file containing instructions, hereafter called <i>grammar</i>, for which a parser is to be created. The format
|
|
for the grammar is described in the EXTENDED DESCRIPTION section.</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_06"></a>STDIN</h4>
|
|
|
|
<blockquote>
|
|
<p>Not used.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_07"></a>INPUT FILES</h4>
|
|
|
|
<blockquote>
|
|
<p>The file <i>grammar</i> shall be a text file formatted as specified in the EXTENDED DESCRIPTION section.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_08"></a>ENVIRONMENT VARIABLES</h4>
|
|
|
|
<blockquote>
|
|
<p>The following environment variables shall affect the execution of <i>yacc</i>:</p>
|
|
|
|
<dl compact>
|
|
<dt><i>LANG</i></dt>
|
|
|
|
<dd>Provide a default value for the internationalization variables that are unset or null. (See the Base Definitions volume of
|
|
IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap08.html#tag_08_02">Section 8.2, Internationalization Variables</a> for
|
|
the precedence of internationalization variables used to determine the values of locale categories.)</dd>
|
|
|
|
<dt><i>LC_ALL</i></dt>
|
|
|
|
<dd>If set to a non-empty string value, override the values of all the other internationalization variables.</dd>
|
|
|
|
<dt><i>LC_CTYPE</i></dt>
|
|
|
|
<dd>Determine the locale for the interpretation of sequences of bytes of text data as characters (for example, single-byte as
|
|
opposed to multi-byte characters in arguments and input files).</dd>
|
|
|
|
<dt><i>LC_MESSAGES</i></dt>
|
|
|
|
<dd>Determine the locale that should be used to affect the format and contents of diagnostic messages written to standard
|
|
error.</dd>
|
|
|
|
<dt><i>NLSPATH</i></dt>
|
|
|
|
<dd><sup>[<a href="javascript:open_code('XSI')">XSI</a>]</sup> <img src="../images/opt-start.gif" alt="[Option Start]" border="0">
|
|
Determine the location of message catalogs for the processing of <i>LC_MESSAGES .</i> <img src="../images/opt-end.gif" alt=
|
|
"[Option End]" border="0"></dd>
|
|
</dl>
|
|
|
|
<p>The <i>LANG</i> and <i>LC_*</i> variables affect the execution of the <i>yacc</i> utility as stated. The <i>main</i>() function
|
|
defined in <a href="#tag_04_174_13_11">Yacc Library</a> shall call:</p>
|
|
|
|
<pre>
|
|
<tt>setlocale(LC_ALL, "")
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>and thus the program generated by <i>yacc</i> shall also be affected by the contents of these variables at runtime.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_09"></a>ASYNCHRONOUS EVENTS</h4>
|
|
|
|
<blockquote>
|
|
<p>Default.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_10"></a>STDOUT</h4>
|
|
|
|
<blockquote>
|
|
<p>Not used.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_11"></a>STDERR</h4>
|
|
|
|
<blockquote>
|
|
<p>If shift/reduce or reduce/reduce conflicts are detected in <i>grammar</i>, <i>yacc</i> shall write a report of those conflicts
|
|
to the standard error in an unspecified format.</p>
|
|
|
|
<p>Standard error shall also be used for diagnostic messages.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_12"></a>OUTPUT FILES</h4>
|
|
|
|
<blockquote>
|
|
<p>The code file, the header file, and the description file shall be text files. All are described in the following sections.</p>
|
|
|
|
<h5><a name="tag_04_174_12_01"></a>Code File</h5>
|
|
|
|
<p>This file shall contain the C source code for the <i>yyparse</i>() function. It shall contain code for the various semantic
|
|
actions with macro substitution performed on them as described in the EXTENDED DESCRIPTION section. It also shall contain a copy of
|
|
the <b>#define</b> statements in the header file. If a <b>%union</b> declaration is used, the declaration for YYSTYPE shall also be
|
|
included in this file.</p>
|
|
|
|
<h5><a name="tag_04_174_12_02"></a>Header File</h5>
|
|
|
|
<p>The header file shall contain <b>#define</b> statements that associate the token numbers with the token names. This allows
|
|
source files other than the code file to access the token codes. If a <b>%union</b> declaration is used, the declaration for
|
|
YYSTYPE and an <i>extern YYSTYPE yylval</i> declaration shall also be included in this file.</p>
|
|
|
|
<h5><a name="tag_04_174_12_03"></a>Description File</h5>
|
|
|
|
<p>The description file shall be a text file containing a description of the state machine corresponding to the parser, using an
|
|
unspecified format. Limits for internal tables (see <a href="#tag_04_174_13_14">Limits</a> ) shall also be reported, in an
|
|
implementation-defined manner. (Some implementations may use dynamic allocation techniques and have no specific limit values to
|
|
report.)</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_13"></a>EXTENDED DESCRIPTION</h4>
|
|
|
|
<blockquote>
|
|
<p>The <i>yacc</i> command accepts a language that is used to define a grammar for a target language to be parsed by the tables and
|
|
code generated by <i>yacc</i>. The language accepted by <i>yacc</i> as a grammar for the target language is described below using
|
|
the <i>yacc</i> input language itself.</p>
|
|
|
|
<p>The input <i>grammar</i> includes rules describing the input structure of the target language and code to be invoked when these
|
|
rules are recognized to provide the associated semantic action. The code to be executed shall appear as bodies of text that are
|
|
intended to be C-language code. The C-language inclusions are presumed to form a correct function when processed by <i>yacc</i>
|
|
into its output files. The code included in this way shall be executed during the recognition of the target language.</p>
|
|
|
|
<p>Given a grammar, the <i>yacc</i> utility generates the files described in the OUTPUT FILES section. The code file can be
|
|
compiled and linked using <a href="../utilities/c99.html"><i>c99</i></a>. If the declaration and programs sections of the grammar
|
|
file did not include definitions of <i>main</i>(), <i>yylex</i>(), and <i>yyerror</i>(), the compiled output requires linking with
|
|
externally supplied versions of those functions. Default versions of <i>main</i>() and <i>yyerror</i>() are supplied in the
|
|
<i>yacc</i> library and can be linked in by using the <b>-l y</b> operand to <a href="../utilities/c99.html"><i>c99</i></a>.
|
|
The <i>yacc</i> library interfaces need not support interfaces with other than the default <b>yy</b> symbol prefix. The application
|
|
provides the lexical analyzer function, <i>yylex</i>(); the <a href="../utilities/lex.html"><i>lex</i></a> utility is specifically
|
|
designed to generate such a routine.</p>
|
|
|
|
<h5><a name="tag_04_174_13_01"></a>Input Language</h5>
|
|
|
|
<p>The application shall ensure that every specification file consists of three sections in order: <i>declarations</i>, <i>grammar
|
|
rules</i>, and <i>programs</i>, separated by double percent signs ( <tt>"%%"</tt> ). The declarations and programs sections can be
|
|
empty. If the latter is empty, the preceding <tt>"%%"</tt> mark separating it from the rules section can be omitted.</p>
|
|
|
|
<p>The input is free form text following the structure of the grammar defined below.</p>
|
|
|
|
<h5><a name="tag_04_174_13_02"></a>Lexical Structure of the Grammar</h5>
|
|
|
|
<p>The <blank>s, <newline>s, and <form-feed>s shall be ignored, except that the application shall ensure that
|
|
they do not appear in names or multi-character reserved symbols. Comments shall be enclosed in <tt>"/* ... */"</tt> , and
|
|
can appear wherever a name is valid.</p>
|
|
|
|
<p>Names are of arbitrary length, made up of letters, periods ( <tt>'.'</tt> ), underscores ( <tt>'_'</tt> ), and non-initial
|
|
digits. Uppercase and lowercase letters are distinct. Conforming applications shall not use names beginning in <b>yy</b> or
|
|
<b>YY</b> since the <i>yacc</i> parser uses such names. Many of the names appear in the final output of <i>yacc</i>, and thus they
|
|
should be chosen to conform with any additional rules created by the C compiler to be used. In particular they appear in
|
|
<b>#define</b> statements.</p>
|
|
|
|
<p>A literal shall consist of a single character enclosed in single-quotes ( <tt>'"</tt> ). All of the escape sequences supported
|
|
for character constants by the ISO C standard shall be supported by <i>yacc</i>.</p>
|
|
|
|
<p>The relationship with the lexical analyzer is discussed in detail below.</p>
|
|
|
|
<p>The application shall ensure that the NUL character is not used in grammar rules or literals.</p>
|
|
|
|
<h5><a name="tag_04_174_13_03"></a>Declarations Section</h5>
|
|
|
|
<p>The declarations section is used to define the symbols used to define the target language and their relationship with each
|
|
other. In particular, much of the additional information required to resolve ambiguities in the context-free grammar for the target
|
|
language is provided here.</p>
|
|
|
|
<p>Usually <i>yacc</i> assigns the relationship between the symbolic names it generates and their underlying numeric value. The
|
|
declarations section makes it possible to control the assignment of these values.</p>
|
|
|
|
<p>It is also possible to keep semantic information associated with the tokens currently on the parse stack in a user-defined
|
|
C-language <b>union</b>, if the members of the union are associated with the various names in the grammar. The declarations section
|
|
provides for this as well.</p>
|
|
|
|
<p>The first group of declarators below all take a list of names as arguments. That list can optionally be preceded by the name of
|
|
a C union member (called a <i>tag</i> below) appearing within <tt>'<'</tt> and <tt>'>'</tt> . (As an exception to the
|
|
typographical conventions of the rest of this volume of IEEE Std 1003.1-2001, in this case <<i>tag</i>> does not
|
|
represent a metavariable, but the literal angle bracket characters surrounding a symbol.) The use of <i>tag</i> specifies that the
|
|
tokens named on this line shall be of the same C type as the union member referenced by <i>tag</i>. This is discussed in more
|
|
detail below.</p>
|
|
|
|
<p>For lists used to define tokens, the first appearance of a given token can be followed by a positive integer (as a string of
|
|
decimal digits). If this is done, the underlying value assigned to it for lexical purposes shall be taken to be that number.</p>
|
|
|
|
<p>The following declares <i>name</i> to be a token:</p>
|
|
|
|
<pre>
|
|
<tt>token</tt> <b>[</b><tt><</tt><i>tag</i><tt>></tt><b>]</b> <i>name</i> <b>[</b><i>number</i><b>][</b><i>name</i> <b>[</b><i>number</i><b>]]</b><tt>...
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>If <i>tag</i> is present, the C type for all tokens on this line shall be declared to be the type referenced by <i>tag</i>. If a
|
|
positive integer, <i>number</i>, follows a <i>name</i>, that value shall be assigned to the token.</p>
|
|
|
|
<p>The following declares <i>name</i> to be a token, and assigns precedence to it:</p>
|
|
|
|
<pre>
|
|
<tt>%left</tt> <b>[</b><tt><</tt><i>tag</i><tt>></tt><b>]</b> <i>name</i> <b>[</b><i>number</i><b>][</b><i>name</i> <b>[</b><i>number</i><b>]]</b><tt>...
|
|
%right</tt> <b>[</b><tt><</tt><i>tag</i><tt>></tt><b>]</b> <i>name</i> <b>[</b><i>number</i><b>][</b><i>name</i> <b>[</b><i>number</i><b>]]</b><tt>...
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>One or more lines, each beginning with one of these symbols, can appear in this section. All tokens on the same line have the
|
|
same precedence level and associativity; the lines are in order of increasing precedence or binding strength. <b>%left</b> denotes
|
|
that the operators on that line are left associative, and <b>%right</b> similarly denotes right associative operators. If
|
|
<i>tag</i> is present, it shall declare a C type for <i>name</i>s as described for <b>%token</b>.</p>
|
|
|
|
<p>The following declares <i>name</i> to be a token, and indicates that this cannot be used associatively:</p>
|
|
|
|
<pre>
|
|
<tt>%nonassoc</tt> <b>[</b><tt><</tt><i>tag</i><tt>></tt><b>]</b> <i>name</i> <b>[</b><i>number</i><b>][</b><i>name</i> <b>[</b><i>number</i><b>]]</b><tt>...
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>If the parser encounters associative use of this token it reports an error. If <i>tag</i> is present, it shall declare a C type
|
|
for <i>name</i>s as described for <b>%token</b>.</p>
|
|
|
|
<p>The following declares that union member <i>name</i>s are non-terminals, and thus it is required to have a <i>tag</i> field at
|
|
its beginning:</p>
|
|
|
|
<pre>
|
|
<tt>%type <</tt><i>tag</i><tt>></tt> <i>name</i><tt>...
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Because it deals with non-terminals only, assigning a token number or using a literal is also prohibited. If this construct is
|
|
present, <i>yacc</i> shall perform type checking; if this construct is not present, the parse stack shall hold only the <b>int</b>
|
|
type.</p>
|
|
|
|
<p>Every name used in <i>grammar</i> not defined by a <b>%token</b>, <b>%left</b>, <b>%right</b>, or <b>%nonassoc</b> declaration
|
|
is assumed to represent a non-terminal symbol. The <i>yacc</i> utility shall report an error for any non-terminal symbol that does
|
|
not appear on the left side of at least one grammar rule.</p>
|
|
|
|
<p>Once the type, precedence, or token number of a name is specified, it shall not be changed. If the first declaration of a token
|
|
does not assign a token number, <i>yacc</i> shall assign a token number. Once this assignment is made, the token number shall not
|
|
be changed by explicit assignment.</p>
|
|
|
|
<p>The following declarators do not follow the previous pattern.</p>
|
|
|
|
<p>The following declares the non-terminal <i>name</i> to be the <i>start symbol</i>, which represents the largest, most general
|
|
structure described by the grammar rules:</p>
|
|
|
|
<pre>
|
|
<tt>%start</tt> <i>name</i>
|
|
</pre>
|
|
|
|
<p>By default, it is the left-hand side of the first grammar rule; this default can be overridden with this declaration.</p>
|
|
|
|
<p>The following declares the <i>yacc</i> value stack to be a union of the various types of values desired:</p>
|
|
|
|
<pre>
|
|
<tt>%union {</tt> <i>body of union</i> <tt>(</tt><i>in C</i><tt>) }
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>By default, the values returned by actions (see below) and the lexical analyzer shall be of type <b>int</b>. The <i>yacc</i>
|
|
utility keeps track of types, and it shall insert corresponding union member names in order to perform strict type checking of the
|
|
resulting parser.</p>
|
|
|
|
<p>Alternatively, given that at least one <<i>tag</i>> construct is used, the union can be declared in a header file (which
|
|
shall be included in the declarations section by using a <b>#include</b> construct within <b>%{</b> and <b>%}</b>), and a
|
|
<b>typedef</b> used to define the symbol YYSTYPE to represent this union. The effect of <b>%union</b> is to provide the declaration
|
|
of YYSTYPE directly from the <i>yacc</i> input.</p>
|
|
|
|
<p>C-language declarations and definitions can appear in the declarations section, enclosed by the following marks:</p>
|
|
|
|
<pre>
|
|
<tt>%{ ... %}
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>These statements shall be copied into the code file, and have global scope within it so that they can be used in the rules and
|
|
program sections.</p>
|
|
|
|
<p>The application shall ensure that the declarations section is terminated by the token <b>%%</b>.</p>
|
|
|
|
<h5><a name="tag_04_174_13_04"></a>Grammar Rules in yacc</h5>
|
|
|
|
<p>The rules section defines the context-free grammar to be accepted by the function <i>yacc</i> generates, and associates with
|
|
those rules C-language actions and additional precedence information. The grammar is described below, and a formal definition
|
|
follows.</p>
|
|
|
|
<p>The rules section is comprised of one or more grammar rules. A grammar rule has the form:</p>
|
|
|
|
<pre>
|
|
<tt>A : BODY ;
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>The symbol <b>A</b> represents a non-terminal name, and <b>BODY</b> represents a sequence of zero or more <i>name</i>s,
|
|
<i>literal</i>s, and <i>semantic action</i>s that can then be followed by optional <i>precedence rule</i>s. Only the names and
|
|
literals participate in the formation of the grammar; the semantic actions and precedence rules are used in other ways. The colon
|
|
and the semicolon are <i>yacc</i> punctuation. If there are several successive grammar rules with the same left-hand side, the
|
|
vertical bar <tt>'|'</tt> can be used to avoid rewriting the left-hand side; in this case the semicolon appears only after the last
|
|
rule. The BODY part can be empty (or empty of names and literals) to indicate that the non-terminal symbol matches the empty
|
|
string.</p>
|
|
|
|
<p>The <i>yacc</i> utility assigns a unique number to each rule. Rules using the vertical bar notation are distinct rules. The
|
|
number assigned to the rule appears in the description file.</p>
|
|
|
|
<p>The elements comprising a BODY are:</p>
|
|
|
|
<dl compact>
|
|
<dt><i>name</i>, <i>literal</i></dt>
|
|
|
|
<dd>These form the rules of the grammar: <i>name</i> is either a <i>token</i> or a <i>non-terminal</i>; <i>literal</i> stands for
|
|
itself (less the lexically required quotation marks).</dd>
|
|
|
|
<dt><i>semantic action</i></dt>
|
|
|
|
<dd><br>
|
|
With each grammar rule, the user can associate actions to be performed each time the rule is recognized in the input process. (Note
|
|
that the word "action" can also refer to the actions of the parser-shift, reduce, and so on.)
|
|
|
|
<p>These actions can return values and can obtain the values returned by previous actions. These values are kept in objects of type
|
|
YYSTYPE (see <b>%union</b>). The result value of the action shall be kept on the parse stack with the left-hand side of the rule,
|
|
to be accessed by other reductions as part of their right-hand side. By using the <<i>tag</i>> information provided in the
|
|
declarations section, the code generated by <i>yacc</i> can be strictly type checked and contain arbitrary information. In
|
|
addition, the lexical analyzer can provide the same kinds of values for tokens, if desired.</p>
|
|
|
|
<p>An action is an arbitrary C statement and as such can do input or output, call subprograms, and alter external variables. An
|
|
action is one or more C statements enclosed in curly braces <tt>'{'</tt> and <tt>'}'</tt> .</p>
|
|
|
|
<p>Certain pseudo-variables can be used in the action. These are macros for access to data structures known internally to
|
|
<i>yacc</i>.</p>
|
|
|
|
<dl compact>
|
|
<dt>$$</dt>
|
|
|
|
<dd>The value of the action can be set by assigning it to $$. If type checking is enabled and the type of the value to be assigned
|
|
cannot be determined, a diagnostic message may be generated.</dd>
|
|
|
|
<dt>$<i>number</i></dt>
|
|
|
|
<dd>This refers to the value returned by the component specified by the token <i>number</i> in the right side of a rule, reading
|
|
from left to right; <i>number</i> can be zero or negative. If <i>number</i> is zero or negative, it refers to the data associated
|
|
with the name on the parser's stack preceding the leftmost symbol of the current rule. (That is, <tt>"$0"</tt> refers to the name
|
|
immediately preceding the leftmost name in the current rule to be found on the parser's stack and <tt>"$-1"</tt> refers to the
|
|
symbol to <i>its</i> left.) If <i>number</i> refers to an element past the current point in the rule, or beyond the bottom of the
|
|
stack, the result is undefined. If type checking is enabled and the type of the value to be assigned cannot be determined, a
|
|
diagnostic message may be generated.</dd>
|
|
|
|
<dt>$<<i>tag</i>><i>number</i></dt>
|
|
|
|
<dd><br>
|
|
These correspond exactly to the corresponding symbols without the <i>tag</i> inclusion, but allow for strict type checking (and
|
|
preclude unwanted type conversions). The effect is that the macro is expanded to use <i>tag</i> to select an element from the
|
|
YYSTYPE union (using <i>dataname.tag</i>). This is particularly useful if <i>number</i> is not positive.</dd>
|
|
|
|
<dt>$<<i>tag</i>>$</dt>
|
|
|
|
<dd>This imposes on the reference the type of the union member referenced by <i>tag</i>. This construction is applicable when a
|
|
reference to a left context value occurs in the grammar, and provides <i>yacc</i> with a means for selecting a type.</dd>
|
|
</dl>
|
|
|
|
<p>Actions can occur anywhere in a rule (not just at the end); an action can access values returned by actions to its left, and in
|
|
turn the value it returns can be accessed by actions to its right. An action appearing in the middle of a rule shall be equivalent
|
|
to replacing the action with a new non-terminal symbol and adding an empty rule with that non-terminal symbol on the left-hand
|
|
side. The semantic action associated with the new rule shall be equivalent to the original action. The use of actions within rules
|
|
might introduce conflicts that would not otherwise exist.</p>
|
|
|
|
<p>By default, the value of a rule shall be the value of the first element in it. If the first element does not have a type
|
|
(particularly in the case of a literal) and type checking is turned on by <b>%type</b>, an error message shall result.</p>
|
|
</dd>
|
|
|
|
<dt><i>precedence</i></dt>
|
|
|
|
<dd>The keyword <b>%prec</b> can be used to change the precedence level associated with a particular grammar rule. Examples of this
|
|
are in cases where a unary and binary operator have the same symbolic representation, but need to be given different precedences,
|
|
or where the handling of an ambiguous if-else construction is necessary. The reserved symbol <b>%prec</b> can appear immediately
|
|
after the body of the grammar rule and can be followed by a token name or a literal. It shall cause the precedence of the grammar
|
|
rule to become that of the following token name or literal. The action for the rule as a whole can follow <b>%prec</b>.</dd>
|
|
</dl>
|
|
|
|
<p>If a program section follows, the application shall ensure that the grammar rules are terminated by <b>%%</b>.</p>
|
|
|
|
<h5><a name="tag_04_174_13_05"></a>Programs Section</h5>
|
|
|
|
<p>The <i>programs</i> section can include the definition of the lexical analyzer <i>yylex</i>(), and any other functions; for
|
|
example, those used in the actions specified in the grammar rules. It is unspecified whether the programs section precedes or
|
|
follows the semantic actions in the output file; therefore, if the application contains any macro definitions and declarations
|
|
intended to apply to the code in the semantic actions, it shall place them within <tt>"%{ ... %}"</tt> in the
|
|
declarations section.</p>
|
|
|
|
<h5><a name="tag_04_174_13_06"></a>Input Grammar</h5>
|
|
|
|
<p>The following input to <i>yacc</i> yields a parser for the input to <i>yacc</i>. This formal syntax takes precedence over the
|
|
preceding text syntax description.</p>
|
|
|
|
<p>The lexical structure is defined less precisely; <a href="#tag_04_174_13_02">Lexical Structure of the Grammar</a> defines most
|
|
terms. The correspondence between the previous terms and the tokens below is as follows.</p>
|
|
|
|
<dl compact>
|
|
<dt><b>IDENTIFIER</b></dt>
|
|
|
|
<dd>This corresponds to the concept of <i>name</i>, given previously. It also includes literals as defined previously.</dd>
|
|
|
|
<dt><b>C_IDENTIFIER</b></dt>
|
|
|
|
<dd>This is a name, and additionally it is known to be followed by a colon. A literal cannot yield this token.</dd>
|
|
|
|
<dt><b>NUMBER</b></dt>
|
|
|
|
<dd>A string of digits (a non-negative decimal integer).</dd>
|
|
|
|
<dt><b>TYPE</b>, <b>LEFT</b>, <b>MARK</b>, <b>LCURL</b>, <b>RCURL</b></dt>
|
|
|
|
<dd><br>
|
|
These correspond directly to <b>%type</b>, <b>%left</b>, <b>%%</b>, <b>%{</b>, and <b>%}</b>.</dd>
|
|
|
|
<dt><b>{ ... }</b></dt>
|
|
|
|
<dd>This indicates C-language source code, with the possible inclusion of <tt>'$'</tt> macros as discussed previously.</dd>
|
|
</dl>
|
|
|
|
<pre>
|
|
<tt>/* Grammar for the input to yacc. */
|
|
/* Basic entries. */
|
|
/* The following are recognized by the lexical analyzer. */
|
|
<br>
|
|
%token IDENTIFIER /* Includes identifiers and literals */
|
|
%token C_IDENTIFIER /* identifier (but not literal)
|
|
followed by a :. */
|
|
%token NUMBER /* [0-9][0-9]* */
|
|
<br>
|
|
/* Reserved words : %type=>TYPE %left=>LEFT, and so on */
|
|
<br>
|
|
%token LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION
|
|
<br>
|
|
%token MARK /* The %% mark. */
|
|
%token LCURL /* The %{ mark. */
|
|
%token RCURL /* The %} mark. */
|
|
<br>
|
|
/* 8-bit character literals stand for themselves; */
|
|
/* tokens have to be defined for multi-byte characters. */
|
|
<br>
|
|
%start spec
|
|
<br>
|
|
%%
|
|
<br>
|
|
spec : defs MARK rules tail
|
|
;
|
|
tail : MARK
|
|
{
|
|
/* In this action, set up the rest of the file. */
|
|
}
|
|
| /* Empty; the second MARK is optional. */
|
|
;
|
|
defs : /* Empty. */
|
|
| defs def
|
|
;
|
|
def : START IDENTIFIER
|
|
| UNION
|
|
{
|
|
/* Copy union definition to output. */
|
|
}
|
|
| LCURL
|
|
{
|
|
/* Copy C code to output file. */
|
|
}
|
|
RCURL
|
|
| rword tag nlist
|
|
;
|
|
rword : TOKEN
|
|
| LEFT
|
|
| RIGHT
|
|
| NONASSOC
|
|
| TYPE
|
|
;
|
|
tag : /* Empty: union tag ID optional. */
|
|
| '<' IDENTIFIER '>'
|
|
;
|
|
nlist : nmno
|
|
| nlist nmno
|
|
;
|
|
nmno : IDENTIFIER /* Note: literal invalid with % type. */
|
|
| IDENTIFIER NUMBER /* Note: invalid with % type. */
|
|
;
|
|
<br>
|
|
/* Rule section */
|
|
<br>
|
|
rules : C_IDENTIFIER rbody prec
|
|
| rules rule
|
|
;
|
|
rule : C_IDENTIFIER rbody prec
|
|
| '|' rbody prec
|
|
;
|
|
rbody : /* empty */
|
|
| rbody IDENTIFIER
|
|
| rbody act
|
|
;
|
|
act : '{'
|
|
{
|
|
/* Copy action, translate $$, and so on. */
|
|
}
|
|
'}'
|
|
;
|
|
prec : /* Empty */
|
|
| PREC IDENTIFIER
|
|
| PREC IDENTIFIER act
|
|
| prec ';'
|
|
;
|
|
</tt>
|
|
</pre>
|
|
|
|
<h5><a name="tag_04_174_13_07"></a>Conflicts</h5>
|
|
|
|
<p>The parser produced for an input grammar may contain states in which conflicts occur. The conflicts occur because the grammar is
|
|
not LALR(1). An ambiguous grammar always contains at least one LALR(1) conflict. The <i>yacc</i> utility shall resolve all
|
|
conflicts, using either default rules or user-specified precedence rules.</p>
|
|
|
|
<p>Conflicts are either shift/reduce conflicts or reduce/reduce conflicts. A shift/reduce conflict is where, for a given state and
|
|
lookahead symbol, both a shift action and a reduce action are possible. A reduce/reduce conflict is where, for a given state and
|
|
lookahead symbol, reductions by two different rules are possible.</p>
|
|
|
|
<p>The rules below describe how to specify what actions to take when a conflict occurs. Not all shift/reduce conflicts can be
|
|
successfully resolved this way because the conflict may be due to something other than ambiguity, so incautious use of these
|
|
facilities can cause the language accepted by the parser to be much different from that which was intended. The description file
|
|
shall contain sufficient information to understand the cause of the conflict. Where ambiguity is the reason either the default or
|
|
explicit rules should be adequate to produce a working parser.</p>
|
|
|
|
<p>The declared precedences and associativities (see <a href="#tag_04_174_13_03">Declarations Section</a> ) are used to resolve
|
|
parsing conflicts as follows:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<p>A precedence and associativity is associated with each grammar rule; it is the precedence and associativity of the last token or
|
|
literal in the body of the rule. If the <b>%prec</b> keyword is used, it overrides this default. Some grammar rules might not have
|
|
both precedence and associativity.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>If there is a shift/reduce conflict, and both the grammar rule and the input symbol have precedence and associativity associated
|
|
with them, then the conflict is resolved in favor of the action (shift or reduce) associated with the higher precedence. If the
|
|
precedences are the same, then the associativity is used; left associative implies reduce, right associative implies shift, and
|
|
non-associative implies an error in the string being parsed.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>When there is a shift/reduce conflict that cannot be resolved by rule 2, the shift is done. Conflicts resolved this way are
|
|
counted in the diagnostic output described in <a href="#tag_04_174_13_08">Error Handling</a> .</p>
|
|
</li>
|
|
|
|
<li>
|
|
<p>When there is a reduce/reduce conflict, a reduction is done by the grammar rule that occurs earlier in the input sequence.
|
|
Conflicts resolved this way are counted in the diagnostic output described in <a href="#tag_04_174_13_08">Error Handling</a> .</p>
|
|
</li>
|
|
</ol>
|
|
|
|
<p>Conflicts resolved by precedence or associativity shall not be counted in the shift/reduce and reduce/reduce conflicts reported
|
|
by <i>yacc</i> on either standard error or in the description file.</p>
|
|
|
|
<h5><a name="tag_04_174_13_08"></a>Error Handling</h5>
|
|
|
|
<p>The token <b>error</b> shall be reserved for error handling. The name <b>error</b> can be used in grammar rules. It indicates
|
|
places where the parser can recover from a syntax error. The default value of <b>error</b> shall be 256. Its value can be changed
|
|
using a <b>%token</b> declaration. The lexical analyzer should not return the value of <b>error</b>.</p>
|
|
|
|
<p>The parser shall detect a syntax error when it is in a state where the action associated with the lookahead symbol is
|
|
<b>error</b>. A semantic action can cause the parser to initiate error handling by executing the macro YYERROR. When YYERROR is
|
|
executed, the semantic action passes control back to the parser. YYERROR cannot be used outside of semantic actions.</p>
|
|
|
|
<p>When the parser detects a syntax error, it normally calls <i>yyerror</i>() with the character string
|
|
<tt>"syntax error"</tt> as its argument. The call shall not be made if the parser is still recovering from a previous error
|
|
when the error is detected. The parser is considered to be recovering from a previous error until the parser has shifted over at
|
|
least three normal input symbols since the last error was detected or a semantic action has executed the macro <i>yyerrok</i>. The
|
|
parser shall not call <i>yyerror</i>() when YYERROR is executed.</p>
|
|
|
|
<p>The macro function YYRECOVERING shall return 1 if a syntax error has been detected and the parser has not yet fully recovered
|
|
from it. Otherwise, zero shall be returned.</p>
|
|
|
|
<p>When a syntax error is detected by the parser, the parser shall check if a previous syntax error has been detected. If a
|
|
previous error was detected, and if no normal input symbols have been shifted since the preceding error was detected, the parser
|
|
checks if the lookahead symbol is an endmarker (see <a href="#tag_04_174_13_09">Interface to the Lexical Analyzer</a> ). If it is,
|
|
the parser shall return with a non-zero value. Otherwise, the lookahead symbol shall be discarded and normal parsing shall
|
|
resume.</p>
|
|
|
|
<p>When YYERROR is executed or when the parser detects a syntax error and no previous error has been detected, or at least one
|
|
normal input symbol has been shifted since the previous error was detected, the parser shall pop back one state at a time until the
|
|
parse stack is empty or the current state allows a shift over <b>error</b>. If the parser empties the parse stack, it shall return
|
|
with a non-zero value. Otherwise, it shall shift over <b>error</b> and then resume normal parsing. If the parser reads a lookahead
|
|
symbol before the error was detected, that symbol shall still be the lookahead symbol when parsing is resumed.</p>
|
|
|
|
<p>The macro <i>yyerrok</i> in a semantic action shall cause the parser to act as if it has fully recovered from any previous
|
|
errors. The macro <i>yyclearin</i> shall cause the parser to discard the current lookahead token. If the current lookahead token
|
|
has not yet been read, <i>yyclearin</i> shall have no effect.</p>
|
|
|
|
<p>The macro YYACCEPT shall cause the parser to return with the value zero. The macro YYABORT shall cause the parser to return with
|
|
a non-zero value.</p>
|
|
|
|
<h5><a name="tag_04_174_13_09"></a>Interface to the Lexical Analyzer</h5>
|
|
|
|
<p>The <i>yylex</i>() function is an integer-valued function that returns a <i>token number</i> representing the kind of token
|
|
read. If there is a value associated with the token returned by <i>yylex</i>() (see the discussion of <i>tag</i> above), it shall
|
|
be assigned to the external variable <i>yylval</i>.</p>
|
|
|
|
<p>If the parser and <i>yylex</i>() do not agree on these token numbers, reliable communication between them cannot occur. For
|
|
(single-byte character) literals, the token is simply the numeric value of the character in the current character set. The numbers
|
|
for other tokens can either be chosen by <i>yacc</i>, or chosen by the user. In either case, the <b>#define</b> construct of C is
|
|
used to allow <i>yylex</i>() to return these numbers symbolically. The <b>#define</b> statements are put into the code file, and
|
|
the header file if that file is requested. The set of characters permitted by <i>yacc</i> in an identifier is larger than that
|
|
permitted by C. Token names found to contain such characters shall not be included in the <b>#define</b> declarations.</p>
|
|
|
|
<p>If the token numbers are chosen by <i>yacc</i>, the tokens other than literals shall be assigned numbers greater than 256,
|
|
although no order is implied. A token can be explicitly assigned a number by following its first appearance in the declarations
|
|
section with a number. Names and literals not defined this way retain their default definition. All token numbers assigned by
|
|
<i>yacc</i> shall be unique and distinct from the token numbers used for literals and user-assigned tokens. If duplicate token
|
|
numbers cause conflicts in parser generation, <i>yacc</i> shall report an error; otherwise, it is unspecified whether the token
|
|
assignment is accepted or an error is reported.</p>
|
|
|
|
<p>The end of the input is marked by a special token called the <i>endmarker</i>, which has a token number that is zero or
|
|
negative. (These values are invalid for any other token.) All lexical analyzers shall return zero or negative as a token number
|
|
upon reaching the end of their input. If the tokens up to, but excluding, the endmarker form a structure that matches the start
|
|
symbol, the parser shall accept the input. If the endmarker is seen in any other context, it shall be considered an error.</p>
|
|
|
|
<h5><a name="tag_04_174_13_10"></a>Completing the Program</h5>
|
|
|
|
<p>In addition to <i>yyparse</i>() and <i>yylex</i>(), the functions <i>yyerror</i>() and <i>main</i>() are required to make a
|
|
complete program. The application can supply <i>main</i>() and <i>yyerror</i>(), or those routines can be obtained from the
|
|
<i>yacc</i> library.</p>
|
|
|
|
<h5><a name="tag_04_174_13_11"></a>Yacc Library</h5>
|
|
|
|
<p>The following functions shall appear only in the <i>yacc</i> library accessible through the <b>-l y</b> operand to <a href=
|
|
"../utilities/c99.html"><i>c99</i></a>; they can therefore be redefined by a conforming application:</p>
|
|
|
|
<dl compact>
|
|
<dt><b>int </b> <i>main</i>(<b>void</b>)</dt>
|
|
|
|
<dd><br>
|
|
This function shall call <i>yyparse</i>() and exit with an unspecified value. Other actions within this function are
|
|
unspecified.</dd>
|
|
|
|
<dt><b>int </b> <i>yyerror</i>(<b>const char</b> *<i>s</i>)</dt>
|
|
|
|
<dd><br>
|
|
This function shall write the NUL-terminated argument to standard error, followed by a <newline>.</dd>
|
|
</dl>
|
|
|
|
<p>The order of the <b>-l y</b> and <b>-l l</b> operands given to <a href="../utilities/c99.html"><i>c99</i></a> is
|
|
significant; the application shall either provide its own <i>main</i>() function or ensure that <b>-l y</b> precedes
|
|
<b>-l l</b>.</p>
|
|
|
|
<h5><a name="tag_04_174_13_12"></a>Debugging the Parser</h5>
|
|
|
|
<p>The parser generated by <i>yacc</i> shall have diagnostic facilities in it that can be optionally enabled at either compile time
|
|
or at runtime (if enabled at compile time). The compilation of the runtime debugging code is under the control of YYDEBUG, a
|
|
preprocessor symbol. If YYDEBUG has a non-zero value, the debugging code shall be included. If its value is zero, the code shall
|
|
not be included.</p>
|
|
|
|
<p>In parsers where the debugging code has been included, the external <b>int</b> <i>yydebug</i> can be used to turn debugging on
|
|
(with a non-zero value) and off (zero value) at runtime. The initial value of <i>yydebug</i> shall be zero.</p>
|
|
|
|
<p>When <b>-t</b> is specified, the code file shall be built such that, if YYDEBUG is not already defined at compilation time
|
|
(using the <a href="../utilities/c99.html"><i>c99</i></a> <b>-D</b> YYDEBUG option, for example), YYDEBUG shall be set explicitly
|
|
to 1. When <b>-t</b> is not specified, the code file shall be built such that, if YYDEBUG is not already defined, it shall be set
|
|
explicitly to zero.</p>
|
|
|
|
<p>The format of the debugging output is unspecified but includes at least enough information to determine the shift and reduce
|
|
actions, and the input symbols. It also provides information about error recovery.</p>
|
|
|
|
<h5><a name="tag_04_174_13_13"></a>Algorithms</h5>
|
|
|
|
<p>The parser constructed by <i>yacc</i> implements an LALR(1) parsing algorithm as documented in the literature. It is unspecified
|
|
whether the parser is table-driven or direct-coded.</p>
|
|
|
|
<p>A parser generated by <i>yacc</i> shall never request an input symbol from <i>yylex</i>() while in a state where the only
|
|
actions other than the error action are reductions by a single rule.</p>
|
|
|
|
<p>The literature of parsing theory defines these concepts.</p>
|
|
|
|
<h5><a name="tag_04_174_13_14"></a>Limits</h5>
|
|
|
|
<p>The <i>yacc</i> utility may have several internal tables. The minimum maximums for these tables are shown in the following
|
|
table. The exact meaning of these values is implementation-defined. The implementation shall define the relationship between these
|
|
values and between them and any error messages that the implementation may generate should it run out of space for any internal
|
|
structure. An implementation may combine groups of these resources into a single pool as long as the total available to the user
|
|
does not fall below the sum of the sizes specified by this section.<br>
|
|
</p>
|
|
|
|
<center><b>Table: Internal Limits in <i>yacc</i></b></center>
|
|
|
|
<center>
|
|
<table border="1" cellpadding="3" align="center">
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b> </b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Minimum</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b> </b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<th align="center">
|
|
<p class="tent"><b>Limit</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Maximum</b></p>
|
|
</th>
|
|
<th align="center">
|
|
<p class="tent"><b>Description</b></p>
|
|
</th>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">{NTERMS}</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">126</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of tokens.</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">{NNONTERM}</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">200</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of non-terminals.</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">{NPROD}</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">300</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of rules.</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">{NSTATES}</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">600</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of states.</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">{MEMSIZE}</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">5200</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Length of rules. The total length, in names (tokens and non-terminals), of all the rules of the grammar. The
|
|
left-hand side is counted for each rule, even if it is not explicitly repeated, as specified in <a href="#tag_04_174_13_04">Grammar
|
|
Rules in yacc</a> .</p>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr valign="top">
|
|
<td align="left">
|
|
<p class="tent">{ACTSIZE}</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">4000</p>
|
|
</td>
|
|
<td align="left">
|
|
<p class="tent">Number of actions. "Actions" here (and in the description file) refer to parser actions (shift, reduce, and so
|
|
on) not to semantic actions defined in <a href="#tag_04_174_13_04">Grammar Rules in yacc</a> .</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</center>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_14"></a>EXIT STATUS</h4>
|
|
|
|
<blockquote>
|
|
<p>The following exit values shall be returned:</p>
|
|
|
|
<dl compact>
|
|
<dt> 0</dt>
|
|
|
|
<dd>Successful completion.</dd>
|
|
|
|
<dt>>0</dt>
|
|
|
|
<dd>An error occurred.</dd>
|
|
</dl>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_15"></a>CONSEQUENCES OF ERRORS</h4>
|
|
|
|
<blockquote>
|
|
<p>If any errors are encountered, the run is aborted and <i>yacc</i> exits with a non-zero status. Partial code files and header
|
|
files may be produced. The summary information in the description file shall always be produced if the <b>-v</b> flag is
|
|
present.</p>
|
|
</blockquote>
|
|
|
|
<hr>
|
|
<div class="box"><em>The following sections are informative.</em></div>
|
|
|
|
<h4><a name="tag_04_174_16"></a>APPLICATION USAGE</h4>
|
|
|
|
<blockquote>
|
|
<p>Historical implementations experience name conflicts on the names <b>yacc.tmp</b>, <b>yacc.acts</b>, <b>yacc.debug</b>,
|
|
<b>y.tab.c</b>, <b>y.tab.h</b>, and <b>y.output</b> if more than one copy of <i>yacc</i> is running in a single directory at one
|
|
time. The <b>-b</b> option was added to overcome this problem. The related problem of allowing multiple <i>yacc</i> parsers to be
|
|
placed in the same file was addressed by adding a <b>-p</b> option to override the previously hard-coded <b>yy</b> variable
|
|
prefix.</p>
|
|
|
|
<p>The description of the <b>-p</b> option specifies the minimal set of function and variable names that cause conflict when
|
|
multiple parsers are linked together. YYSTYPE does not need to be changed. Instead, the programmer can use <b>-b</b> to give the
|
|
header files for different parsers different names, and then the file with the <i>yylex</i>() for a given parser can include the
|
|
header for that parser. Names such as <i>yyclearerr</i> do not need to be changed because they are used only in the actions; they
|
|
do not have linkage. It is possible that an implementation has other names, either internal ones for implementing things such as
|
|
<i>yyclearerr</i>, or providing non-standard features that it wants to change with <b>-p</b>.</p>
|
|
|
|
<p>Unary operators that are the same token as a binary operator in general need their precedence adjusted. This is handled by the
|
|
<b>%prec</b> advisory symbol associated with the particular grammar rule defining that unary operator. (See <a href=
|
|
"#tag_04_174_13_04">Grammar Rules in yacc</a> .) Applications are not required to use this operator for unary operators, but the
|
|
grammars that do not require it are rare.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_17"></a>EXAMPLES</h4>
|
|
|
|
<blockquote>
|
|
<p>Access to the <i>yacc</i> library is obtained with library search operands to <a href="../utilities/c99.html"><i>c99</i></a>. To
|
|
use the <i>yacc</i> library <i>main</i>():</p>
|
|
|
|
<pre>
|
|
<tt>c99 y.tab.c -l y
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>Both the <a href="../utilities/lex.html"><i>lex</i></a> library and the <i>yacc</i> library contain <i>main</i>(). To access the
|
|
<i>yacc</i> <i>main</i>():</p>
|
|
|
|
<pre>
|
|
<tt>c99 y.tab.c lex.yy.c -l y -l l
|
|
</tt>
|
|
</pre>
|
|
|
|
<p>This ensures that the <i>yacc</i> library is searched first, so that its <i>main</i>() is used.</p>
|
|
|
|
<p>The historical <i>yacc</i> libraries have contained two simple functions that are normally coded by the application programmer.
|
|
These functions are similar to the following code:</p>
|
|
|
|
<pre>
|
|
<tt>#include <locale.h>
|
|
int main(void)
|
|
{
|
|
extern int yyparse();
|
|
<br>
|
|
setlocale(LC_ALL, "");
|
|
<br>
|
|
/* If the following parser is one created by lex, the
|
|
application must be careful to ensure that LC_CTYPE
|
|
and LC_COLLATE are set to the POSIX locale. */
|
|
(void) yyparse();
|
|
return (0);
|
|
}
|
|
<br>
|
|
#include <stdio.h>
|
|
<br>
|
|
int yyerror(const char *msg)
|
|
{
|
|
(void) fprintf(stderr, "%s\n", msg);
|
|
return (0);
|
|
}
|
|
</tt>
|
|
</pre>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_18"></a>RATIONALE</h4>
|
|
|
|
<blockquote>
|
|
<p>The references in may be helpful in constructing the parser generator. The referenced DeRemer and Pennello article (along with
|
|
the works it references) describes a technique to generate parsers that conform to this volume of IEEE Std 1003.1-2001.
|
|
Work in this area continues to be done, so implementors should consult current literature before doing any new implementations. The
|
|
original Knuth article is the theoretical basis for this kind of parser, but the tables it generates are impractically large for
|
|
reasonable grammars and should not be used. The "equivalent to" wording is intentional to assure that the best tables that are
|
|
LALR(1) can be generated.</p>
|
|
|
|
<p>There has been confusion between the class of grammars, the algorithms needed to generate parsers, and the algorithms needed to
|
|
parse the languages. They are all reasonably orthogonal. In particular, a parser generator that accepts the full range of LR(1)
|
|
grammars need not generate a table any more complex than one that accepts SLR(1) (a relatively weak class of LR grammars) for a
|
|
grammar that happens to be SLR(1). Such an implementation need not recognize the case, either; table compression can yield the
|
|
SLR(1) table (or one even smaller than that) without recognizing that the grammar is SLR(1). The speed of an LR(1) parser for any
|
|
class is dependent more upon the table representation and compression (or the code generation if a direct parser is generated) than
|
|
upon the class of grammar that the table generator handles.</p>
|
|
|
|
<p>The speed of the parser generator is somewhat dependent upon the class of grammar it handles. However, the original Knuth
|
|
article algorithms for constructing LR parsers were judged by its author to be impractically slow at that time. Although full LR is
|
|
more complex than LALR(1), as computer speeds and algorithms improve, the difference (in terms of acceptable wall-clock execution
|
|
time) is becoming less significant.</p>
|
|
|
|
<p>Potential authors are cautioned that the referenced DeRemer and Pennello article previously cited identifies a bug (an
|
|
over-simplification of the computation of LALR(1) lookahead sets) in some of the LALR(1) algorithm statements that preceded it to
|
|
publication. They should take the time to seek out that paper, as well as current relevant work, particularly Aho's.</p>
|
|
|
|
<p>The <b>-b</b> option was added to provide a portable method for permitting <i>yacc</i> to work on multiple separate parsers in
|
|
the same directory. If a directory contains more than one <i>yacc</i> grammar, and both grammars are constructed at the same time
|
|
(by, for example, a parallel <a href="../utilities/make.html"><i>make</i></a> program), conflict results. While the solution is not
|
|
historical practice, it corrects a known deficiency in historical implementations. Corresponding changes were made to all sections
|
|
that referenced the filenames <b>y.tab.c</b> (now "the code file"), <b>y.tab.h</b> (now "the header file"), and <b>y.output</b>
|
|
(now "the description file").</p>
|
|
|
|
<p>The grammar for <i>yacc</i> input is based on System V documentation. The textual description shows there that the <tt>';'</tt>
|
|
is required at the end of the rule. The grammar and the implementation do not require this. (The use of <b>C_IDENTIFIER</b> causes
|
|
a reduce to occur in the right place.)</p>
|
|
|
|
<p>Also, in that implementation, the constructs such as <b>%token</b> can be terminated by a semicolon, but this is not permitted
|
|
by the grammar. The keywords such as <b>%token</b> can also appear in uppercase, which is again not discussed. In most places where
|
|
<tt>'%'</tt> is used, <tt>'\'</tt> can be substituted, and there are alternate spellings for some of the symbols (for example,
|
|
<b>%LEFT</b> can be <tt>"%<"</tt> or even <tt>"\<"</tt> ).</p>
|
|
|
|
<p>Historically, <<i>tag</i>> can contain any characters except <tt>'>'</tt> , including white space, in the
|
|
implementation. However, since the <i>tag</i> must reference an ISO C standard union member, in practice conforming
|
|
implementations need to support only the set of characters for ISO C standard identifiers in this context.</p>
|
|
|
|
<p>Some historical implementations are known to accept actions that are terminated by a period. Historical implementations often
|
|
allow <tt>'$'</tt> in names. A conforming implementation does not need to support either of these behaviors.</p>
|
|
|
|
<p>Deciding when to use <b>%prec</b> illustrates the difficulty in specifying the behavior of <i>yacc</i>. There may be situations
|
|
in which the <i>grammar</i> is not, strictly speaking, in error, and yet <i>yacc</i> cannot interpret it unambiguously. The
|
|
resolution of ambiguities in the grammar can in many instances be resolved by providing additional information, such as using
|
|
<b>%type</b> or <b>%union</b> declarations. It is often easier and it usually yields a smaller parser to take this alternative when
|
|
it is appropriate.</p>
|
|
|
|
<p>The size and execution time of a program produced without the runtime debugging code is usually smaller and slightly faster in
|
|
historical implementations.</p>
|
|
|
|
<p>Statistics messages from several historical implementations include the following types of information:</p>
|
|
|
|
<pre>
|
|
<i>n</i><tt>/512 terminals,</tt> <i>n</i><tt>/300 non-terminals
|
|
</tt><i>n</i><tt>/600 grammar rules,</tt> <i>n</i><tt>/1500 states
|
|
</tt><i>n</i> <tt>shift/reduce,</tt> <i>n</i> <tt>reduce/reduce conflicts reported
|
|
</tt><i>n</i><tt>/350 working sets used
|
|
Memory: states, etc.</tt> <i>n</i><tt>/15000, parser</tt> <i>n</i><tt>/15000
|
|
</tt><i>n</i><tt>/600 distinct lookahead sets
|
|
</tt><i>n</i> <tt>extra closures
|
|
</tt><i>n</i> <tt>shift entries,</tt> <i>n</i> <tt>exceptions
|
|
</tt><i>n</i> <tt>goto entries
|
|
</tt><i>n</i> <tt>entries saved by goto default
|
|
Optimizer space used: input</tt> <i>n</i><tt>/15000, output</tt> <i>n</i><tt>/15000
|
|
</tt><i>n</i> <tt>table entries,</tt> <i>n</i> <tt>zero
|
|
Maximum spread:</tt> <i>n</i><tt>, Maximum offset:</tt> <i>n</i>
|
|
</pre>
|
|
|
|
<p>The report of internal tables in the description file is left implementation-defined because all aspects of these limits are
|
|
also implementation-defined. Some implementations may use dynamic allocation techniques and have no specific limit values to
|
|
report.</p>
|
|
|
|
<p>The format of the <b>y.output</b> file is not given because specification of the format was not seen to enhance applications
|
|
portability. The listing is primarily intended to help human users understand and debug the parser; use of <b>y.output</b> by a
|
|
conforming application script would be unusual. Furthermore, implementations have not produced consistent output and no popular
|
|
format was apparent. The format selected by the implementation should be human-readable, in addition to the requirement that it be
|
|
a text file.</p>
|
|
|
|
<p>Standard error reports are not specifically described because they are seldom of use to conforming applications and there was no
|
|
reason to restrict implementations.</p>
|
|
|
|
<p>Some implementations recognize <tt>"={"</tt> as equivalent to <tt>'{'</tt> because it appears in historical documentation. This
|
|
construction was recognized and documented as obsolete as long ago as 1978, in the referenced <i>Yacc: Yet Another
|
|
Compiler-Compiler</i>. This volume of IEEE Std 1003.1-2001 chose to leave it as obsolete and omit it.</p>
|
|
|
|
<p>Multi-byte characters should be recognized by the lexical analyzer and returned as tokens. They should not be returned as
|
|
multi-byte character literals. The token <b>error</b> that is used for error recovery is normally assigned the value 256 in the
|
|
historical implementation. Thus, the token value 256, which is used in many multi-byte character sets, is not available for use as
|
|
the value of a user-defined token.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_19"></a>FUTURE DIRECTIONS</h4>
|
|
|
|
<blockquote>
|
|
<p>None.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_20"></a>SEE ALSO</h4>
|
|
|
|
<blockquote>
|
|
<p><a href="c99.html"><i>c99</i></a> , <a href="lex.html"><i>lex</i></a></p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_21"></a>CHANGE HISTORY</h4>
|
|
|
|
<blockquote>
|
|
<p>First released in Issue 2.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_22"></a>Issue 5</h4>
|
|
|
|
<blockquote>
|
|
<p>The FUTURE DIRECTIONS section is added.</p>
|
|
</blockquote>
|
|
|
|
<h4><a name="tag_04_174_23"></a>Issue 6</h4>
|
|
|
|
<blockquote>
|
|
<p>This utility is marked as part of the C-Language Development Utilities option.</p>
|
|
|
|
<p>Minor changes have been added to align with the IEEE P1003.2b draft standard.</p>
|
|
|
|
<p>The normative text is reworded to avoid use of the term "must" for application requirements.</p>
|
|
|
|
<p>IEEE PASC Interpretation 1003.2 #177 is applied, changing the comment on <b>RCURL</b> from the <b>}%</b> token to the
|
|
<b>%}</b>.</p>
|
|
</blockquote>
|
|
|
|
<div class="box"><em>End of informative text.</em></div>
|
|
|
|
<hr>
|
|
<hr size="2" noshade>
|
|
<center><font size="2"><!--footer start-->
|
|
UNIX ® is a registered Trademark of The Open Group.<br>
|
|
POSIX ® is a registered Trademark of The IEEE.<br>
|
|
[ <a href="../mindex.html">Main Index</a> | <a href="../basedefs/contents.html">XBD</a> | <a href=
|
|
"../utilities/contents.html">XCU</a> | <a href="../functions/contents.html">XSH</a> | <a href="../xrat/contents.html">XRAT</a>
|
|
]</font></center>
|
|
|
|
<!--footer end-->
|
|
<hr size="2" noshade>
|
|
</body>
|
|
</html>
|
|
|