yacc - Generates an LR(1) parsing program from input consisting
of a context-free grammar specification
yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix]
[-P pathname] grammar
Interfaces documented on this reference page conform to
industry standards as follows:
yacc: XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information
about industry standards and associated tags.
Uses prefix instead of y as the prefix for all output
filenames (prefix.tab.c, prefix.tab.h, and prefix.output).
Produces the <y.tab.h> file, which contains the #define
statements that associate the yacc-assigned token codes
with your token names. This allows source files other than
y.tab.c to access the token codes by including this header
file. Includes no #line constructs in y.tab.c. Use this
only after the grammar and associated actions are fully
debugged. [Tru64 UNIX] Provides yacc with extra storage
for building its LALR tables, which may be necessary when
compiling very large grammars. The number should be larger
than 40,000 when you use this option. Allows multiple
yacc parsers to be linked together. Use symbol_prefix
instead of yy to prefix global symbols. [Tru64
UNIX] Specifies an alternative parser (instead of
/usr/ccs/lib/yaccpar). The pathname specifies the filename
of the skeleton to be used in place of yaccpar). [Tru64
UNIX] Breaks the yyparse() function into several smaller
functions. Because its size is somewhat proportional to
that of the grammar, it is possible for yyparse() to
become too large to compile, optimize, or execute efficiently.
Compiles run-time debugging code. By default,
this code is not included when y.tab.c is compiled. If
YYDEBUG has a nonzero value, the C compiler (cc) includes
the debugging code, whether or not the -t option was used.
Without compiling this code, yyparse() will run more
quickly. Produces the y.output file, which contains a
readable description of the parsing tables and a report on
conflicts generated by grammar ambiguities.
The pathname of a file containing input instructions. The
format of this file is described in the DESCRIPTION section.
The yacc command converts a context-free grammar specification
into a set of tables for a simple automaton that
executes an LR(1) parsing algorithm. The yacc grammar can
be ambiguous; specified precedence rules are used to break
ambiguities.
You must compile the y.tab.c output file with a C language
compiler to produce the yyparse() function. This function
must be loaded with a yylex lexical analyzer function, as
well as two routines that you must provide, main() and an
error-handling routine, yyerror(). The lex command is useful
for creating lexical analyzers usable by yacc.
The yacc program reads its skeleton parser from the file
/usr/ccs/lib/yaccpar. Use the environment variable YACCPAR
to specify another location for the yacc program to read
from. If you use this environment variable, the -P option
is ignored, if specified.
The general format of the yacc input file is as follows:
[definitions] %% rules [%% [user subroutines]]
where Is the section where you define the variables to be
used later in the grammar, such as in the rules section.
It is also where files are included (#include) and processing
conditions are defined. This section is optional.
Is the section that contains grammar rules for the parser.
A yacc input file must have a rules section. Is the section
that contains user-supplied subroutines that can be
used by the actions in the rules section. This section is
optional.
Comments, in C syntax, can appear anywhere in the user
subroutines section or the definitions section. In the
rules section, comments can appear wherever a symbol is
allowed. Blank lines or lines consisting of white space
can be inserted anywhere in the file, and are ignored. The
NULL character must not be used in grammar rules or literals.
Definitions Section of Input File [Toc] [Back]
The definitions section of a yacc input file contains
entries that perform the following functions: Includes
standard I/O header file. Defines global variables.
Defines the list rule as the place to start processing.
Defines the tokens used by the parser. Defines the operators
and their precedence.
Each line in the definitions section can be: When placed
on lines by themselves, these enclose C code to be passed
into the global definitions of the output file. Such lines
commonly include preprocessor directives and declarations
of external variables and functions. Lists tokens or terminal
symbols to be used in the rest of the input file.
This line is needed for tokens that do not appear in other
% definitions. If type is present, the C type for all
tokens on this line is declared to be the type referenced
by type. If a positive integer number follows a token,
that value is assigned to the token. Indicates that each
token is an operator, all tokens in this definition have
equal precedence, and a succession of the operators listed
in this definition are evaluated left to right. Indicates
that each token is an operator, that all tokens in this
definition have equal precedence, and that a succession of
the operators listed in this definition are evaluated
right to left. Indicates that each token is an operator,
and that the operators listed in this definition cannot
appear in succession. Indicates that the token cannot be
used associatively. Indicates the highest-level production
rule to be reduced; in other words, the rule where
the parser can consider its work done and can terminate
processing. If this definition is not included, the parser
uses the first production rule. The symbol must be nonterminal
(not a token). Defines each symbol as data type
type, to resolve ambiguities. If this construct is present,
yacc performs type checking and otherwise assumes
all symbols to be of type integer. Defines the yylval
global variable as a union, where union-def is a standard
C definition in the format: { type member ; [type member ;
...] }
At least one member should be an int. Any valid C
data type can be defined, including structures.
When you run yacc with the -d option, the definition
of yylval is placed in the <y.tab.h> file and
can be referred to in a lex input file.
Every token (non-terminal symbol) must be listed in one of
the preceding % definitions. Multiple tokens can be separated
by white space or commas. All the tokens in %left,
%right, and %nonassoc definitions are assigned a precedence
with tokens in later definitions having precedence
over those in earlier definitions.
In addition to symbols, a token can be literal character
enclosed in single quotes. (Multibyte characters are recognized
by the lexical analyzer and returned as tokens.)
The following special characters can be used, just as in C
programs: Alert Newline Tab Vertical tab Carriage Return
Backspace Form Feed Backslash Single Quote Question mark
One or more octal digits specifying the integer value of
the character
Rules Section of Input File [Toc] [Back]
The rules section of a yacc input file defines the rules
that parse the input stream. It consists of a series of
production rules that the parser tries to reduce. The format
of each production rule is:
symbol : symbol-sequence [action] [| symbol-sequence
[action] ...] ;
A symbol-sequence consists of zero or more symbols separated
by white space. The first symbol must be the first
character of the line, but newlines and other white space
can appear anywhere else in the rule. All terminal symbols
must be declared in %token definitions.
Each symbol-sequence represents an alternative way of
reducing the rule. A symbol can appear recursively in its
own rule. Always use left-recursion (where the recursive
symbol appears before the terminating case in symbolsequence).
The following sequence indicates that the current sequence
of symbols is to be preferred over others, at the level of
precedence assigned to token in the definitions section of
the input file:
%prec token
The specially defined token error matches any unrecognized
sequence of input. This token causes the parser to invoke
the yyerror function. By default, the parser tries to synchronize
with the input and continue processing it by
reading and discarding all input up to the symbol following
error. (You can override this behavior through the
yyerrok action.) If no error token appears in the yacc
input file, the parser exits with an error message upon
encountering unrecognized input.
The parser always executes action after encountering the
symbol that precedes it. Thus, an action can appear in the
middle of a symbol-sequence, after each symbol-sequence,
or after multiple instances of symbol-sequence. In the
last case, action is executed when the parser matches any
of the sequences.
The action consists of standard C code within braces and
can also take the following values, variables, and keywords.
If the token returned by the yylex function is
associated with a significant value, yylex should place
the value in this global variable. By default, yylval is
of type long. The definitions section can include a %union
definition to associate with other data types, including
structures. If you run yacc with the -d option, the full
yylval definition is passed into the <y.tab.h> file for
access by lex. Causes the parser to start parsing tokens
immediately after an erroneous sequence, instead of performing
the default action of reading and discarding
tokens up to a synchronization token. The yyerrok action
should appear immediately after the error token. Refers
to symbol n, a token index in the production, counting
from the beginning of the production rule, where the first
symbol after the colon is $1. The type variable is the
name of one of the union lines listed in the %union directive
in the declaration section. The <type> syntax (nonstandard)
allows the value to be cast to a specific data
type. Note that you will rarely need to use the type syntax.
Refers to the value returned by the matched symbolsequence
and used for the matched symbol when reducing
other rules. The symbol-sequence generally assigns a value
to $$. The type variable is the name of one of the union
lines listed in the %union directive in the declaration
section. The <type> syntax (non-standard) allows the value
to be cast to a specific data type. Note that you will
rarely need to use the type syntax.
User Subroutines Section of Input File [Toc] [Back]
The user subroutines section of the yacc input file contains
user-supplied functions. Because these functions are
included in this file, you do not need to use the yacc
library when processing this file. If you supply a lexical
analyzer (yylex) to the parser, it must be contained in
the user subroutines section.
The following functions, which are contained in the user
subroutines section, are invoked within the yyparse function
generated by yacc. The lexical analyzer called by
yyparse to recognize each token of input. Usually this
function is created by lex. yylex reads input, recognizes
expressions within the input, and returns a token number
representing the kind of token read. The function returns
an int value. A return value of 0 (zero) means the end of
input.
If the parser and yylex do not agree on these token
numbers, reliable communication between them cannot
occur. For one-character literals, the token is
simply the numeric value of the character in the
current character set. The numbers for other tokens
can be chosen by either yacc or the user. In either
case, the #define construct of C is used to allow
yylex() to return these numbers symbolically. The
#define statements are put into the code file, and
into the header file if that file is requested. The
set of characters permitted by yacc in an identifier
is larger than that permitted by C. Token
names found to contain such characters will not be
included in the #define declarations.
If the token numbers are chosen by yacc, those
tokens other than literals are assigned numbers
greater than 256, although no order is implied. A
token can be explicitly assigned a number by following
its first appearance in the declaration section
with a number. Names and literals not defined
in this way retain their default definition. All
assigned token numbers are unique and distinct from
the token numbers used for literals. If duplicate
token numbers cause conflicts in parser generation,
yacc reports an error; otherwise, it is unspecified
whether the token assignment is accepted or an
error is reported.
The end of the input is marked by a special token
called the endmarker that has a token number that
is zero or negative. All lexical analyzers return
zero or negative as a token number upon reaching
the end of their input. If the tokens up to, but
not excluding, the endmarker form a structure that
matches the start symbol, the parser accepts the
input. If the endmarker is seen in any other context,
it is considered an error. The function that
the parser calls upon encountering an input error.
The default function, defined in liby.a, simply
prints string to the standard error. The user can
redefine the function. The function's type is void.
The wrap-up routine that returns a value of 1 when
the end of input occurs.
The liby.a library contains default main() and yyerror()
functions. (main() is the required main program that calls
yyparse() to start the program.) These routines look like
the following, respectively:
main() {
setlocale(LC_ALL, );
(void) yyparse();
return(0); }
int yyerror(s);
char *s; {
fprintf(stderr,"%s\n",s);
return (0); }
The LANG and LC_* variables affect the execution of the
yacc command as stated. The main() function defined by
yacc issues the following call:
setlocale(LC_ALL, )
As a result, the program generated by yacc will also be
affected by the contents of these variables at run time.
The lex program can be compiled as a C program with -std0,
-std, or -std1 mode. It can also be compiled as a C++ program.
If YY_NOPROTO is defined on the compilation command
line, function prototypes are not generated.
The following exit values are returned: Successful completion.
An error occurred.
This section describes the example programs for the lex
and yacc commands, which together create a simple desk
calculator program that performs addition, subtraction,
multiplication, and division operations. The calculator
program also allows you to assign values to variables
(each designated by a single lowercase ASCII letter), and
then use the variables in calculations. The files that
contain the program are as follows: The lex specification
file that defines the lexical analysis rules. The yacc
grammar file that defines the parsing rules and calls the
yylex() function created by lex to provide input.
The remaining text expects that the current directory is
the directory that contains the lex and yacc example program
files.
Compiling the Example Program [Toc] [Back]
Perform the following steps to create the example program
using lex and yacc: Process the yacc grammar file using
the -d option. The -d option tells yacc to create a file
that defines the tokens it uses in addition to creating
the C language source code file.
yacc -d calc.y
The following files are created: The C language
source file that yacc created for the parser. A
header file containing #define statements for the
tokens used by the parser.
(The *.o files are created temporarily and then
removed.) Process the lex specification file:
lex calc.l
The following file is created: The C language
source file that lex created for the lexical analyzer.
Compile and link the two C language source
files:
cc -o calc y.tab.c lex.yy.c
The following files are created: The object file
for y.tab.c. The object file for lex.yy.c. The
executable program file.
You can then run the program directly by entering: calc
Then, enter numbers and operators in calculator fashion.
After you press <Return>, the program displays the result
of the operation. If you assign a value to a variable as
follows, the cursor moves to the next line:
m=4 <Return> _
You can then use the variable in calculations and it will
have the value assigned to it:
m+5 <Return> 9
The Parser Source Code [Toc] [Back]
The file calc.y has entries in all three of the sections
of a yacc grammar file--declarations, rules, and user subroutines.
It contains the following source code:
%{ #include <stdio.h>
int regs[26]; int base;
%}
%start list
%token DIGIT LETTER
%left '|' %left '&' %left '+' '-' %left '*' '/' '%' %left
UMINUS /*supplies precedence for unary minus */
%% /* beginning of rules section */
list : /*empty */
| list stat '\n'
| list error '\n'
{ yyerrok; }
;
stat : expr
{ printf("%d\n",$1); }
| LETTER '=' expr
{ regs[$1] = $3; }
;
expr : '(' expr ')'
{ $$ = $2; }
| expr '*' expr
{ $$ = $1 * $3; }
| expr '/' expr
{ $$ = $1 / $3; }
| expr '%' expr
{ $$ = $1 % $3; }
| expr '+' expr
{ $$ = $1 + $3; }
| expr '-' expr
{ $$ = $1 - $3; }
| expr '&' expr
{ $$ = $1 & $3; }
| expr '|' expr
{ $$ = $1 | $3; }
| '-' expr %prec UMINUS
{ $$ = -$2; }
| LETTER
{ $$ = regs[$1]; }
| number
;
number : DIGIT
{ $$ = $1; base = ($1==0) ? 8:10;
}
| number DIGIT
{ $$ = base * $1 + $2; }
;
%% /* beginning of user subroutines section */ main()
{
return(yyparse()); }
yyerror(s) char *s; {
fprintf(stderr,"%s\n",s); }
yywrap() {
return(1); }
The Lexical Analyzer Source Code [Toc] [Back]
The file calc.l contains the lexical analyzer source code.
It contains the rules used to generate the tokens from the
input stream. It also contains include statements for
standard input and output, as well as for the <y.tab.h>
file. The yacc program generates the <y.tab.h> file from
the yacc grammar file information, if you use the -d
option with the yacc command. The file <y.tab.h> contains
definitions for the tokens that the parser program uses.
Contents of calc.1: %{
#include <stdio.h> #include "y.tab.h" int c; #if !defined
(YYSTYPE) #define YYSTYPE long #endif extern YYSTYPE yylval;
%} %% " " ; [a-z] {
c = yytext[0];
yylval = c - 'a';
return(LETTER);
} [0-9] {
c = yytext[0];
yylval = c - '0';
return(DIGIT);
} [^a-z 0-9] {
c = yytext[0];
return(c);
}
ENVIRONMENT VARIABLES [Toc] [Back] The following environment variables affect the execution
of yacc: Provides a default value for the internationalization
variables that are unset or null. If LANG is unset
or null, the corresponding value from the default locale
is used. If any of the internationalization variables
contain an invalid setting, the utility behaves as if none
of the variables had been defined. If set to a non-empty
string value, overrides the values of all the other internationalization
variables. Determines the locale for the
interpretation of sequences of bytes of text data as characters
(for example, single-byte as opposed to multi-byte
characters in arguments and input files). Determines the
locale for the format and contents of diagnostic messages
written to standard error. Determines the location of
message catalogs for the processing of LC_MESSAGES.
A readable description of parsing tables and a report on
conflicts generated by grammar ambiguities Output file
Definitions for token names Temporary file Temporary file
Temporary file Default skeleton parser for C programs The
yacc library
Commands: lex(1)
Standards: standards(5)
Programming Support Tools
yacc(1)
[ Back ] |