yacc - Tru64

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> yacc (1)

yacc(1)

NAME [Toc] [Back]

       yacc  - Generates an LR(1) parsing program from input consisting
 of a context-free grammar specification

SYNOPSIS [Toc] [Back]

       yacc [-vltds] [-b prefix]  [-N number]  [-p symbol_prefix]
       [-P pathname] grammar

STANDARDS [Toc] [Back]

       Interfaces  documented  on  this reference page conform to
       industry standards as follows:

       yacc:  XPG4, XPG4-UNIX

       Refer to the standards(5) reference page for more information
 about industry standards and associated tags.

OPTIONS [Toc] [Back]

       Uses  prefix  instead  of  y  as the prefix for all output
       filenames (prefix.tab.c, prefix.tab.h, and prefix.output).
       Produces  the  <y.tab.h>  file, which contains the #define
       statements that associate the  yacc-assigned  token  codes
       with your token names. This allows source files other than
       y.tab.c to access the token codes by including this header
       file.   Includes  no #line constructs in y.tab.c. Use this
       only after the grammar and associated  actions  are  fully
       debugged.   [Tru64 UNIX]  Provides yacc with extra storage
       for building its LALR tables, which may be necessary  when
       compiling very large grammars. The number should be larger
       than 40,000 when you use  this  option.   Allows  multiple
       yacc  parsers  to  be  linked  together. Use symbol_prefix
       instead  of  yy  to   prefix   global   symbols.    [Tru64
       UNIX]  Specifies   an   alternative   parser  (instead  of
       /usr/ccs/lib/yaccpar). The pathname specifies the filename
       of  the  skeleton to be used in place of yaccpar).  [Tru64
       UNIX]  Breaks the yyparse() function into several  smaller
       functions.  Because  its  size is somewhat proportional to
       that of the grammar,  it  is  possible  for  yyparse()  to
       become  too  large  to compile, optimize, or execute efficiently.
  Compiles run-time debugging  code.  By  default,
       this  code  is  not  included when y.tab.c is compiled. If
       YYDEBUG has a nonzero value, the C compiler (cc)  includes
       the debugging code, whether or not the -t option was used.
       Without compiling  this  code,  yyparse()  will  run  more
       quickly.   Produces  the  y.output  file, which contains a
       readable description of the parsing tables and a report on
       conflicts generated by grammar ambiguities.

OPERANDS [Toc] [Back]

       The  pathname of a file containing input instructions. The
       format of this file is described in the  DESCRIPTION  section.

DESCRIPTION [Toc] [Back]

       The  yacc command converts a context-free grammar specification
 into a set of tables for a  simple  automaton  that
       executes  an LR(1) parsing algorithm. The yacc grammar can
       be ambiguous; specified precedence rules are used to break
       ambiguities.

       You must compile the y.tab.c output file with a C language
       compiler to produce the yyparse() function.  This function
       must  be loaded with a yylex lexical analyzer function, as
       well as two routines that you must provide, main() and  an
       error-handling routine, yyerror(). The lex command is useful
 for creating lexical analyzers usable by yacc.

       The yacc program reads its skeleton parser from  the  file
       /usr/ccs/lib/yaccpar. Use the environment variable YACCPAR
       to specify another location for the yacc program  to  read
       from.  If you use this environment variable, the -P option
       is ignored, if specified.

       The general format of the yacc input file is as follows:

       [definitions] %% rules [%% [user subroutines]]

       where Is the section where you define the variables to  be
       used  later  in the grammar, such as in the rules section.
       It is also where files are included  (#include)  and  processing
 conditions are defined.  This section is optional.
       Is the section that contains grammar rules for the parser.
       A  yacc input file must have a rules section.  Is the section
 that contains user-supplied subroutines that  can  be
       used  by the actions in the rules section. This section is
       optional.

       Comments, in C syntax, can appear  anywhere  in  the  user
       subroutines  section  or  the  definitions section. In the
       rules section, comments can appear wherever  a  symbol  is
       allowed.  Blank  lines  or lines consisting of white space
       can be inserted anywhere in the file, and are ignored. The
       NULL character must not be used in grammar rules or literals.


   Definitions Section of Input File    [Toc]    [Back]
       The definitions section of  a  yacc  input  file  contains
       entries  that  perform  the  following functions: Includes
       standard  I/O  header  file.   Defines  global  variables.
       Defines  the  list  rule as the place to start processing.
       Defines the tokens used by the parser.  Defines the operators
 and their precedence.

       Each  line  in the definitions section can be: When placed
       on lines by themselves, these enclose C code to be  passed
       into the global definitions of the output file. Such lines
       commonly include preprocessor directives and  declarations
       of external variables and functions.  Lists tokens or terminal
 symbols to be used in the rest of  the  input  file.
       This line is needed for tokens that do not appear in other
       % definitions. If type is present,  the  C  type  for  all
       tokens  on this line is declared to be the type referenced
       by type. If a positive integer  number  follows  a  token,
       that  value is assigned to the token.  Indicates that each
       token is an operator, all tokens in this  definition  have
       equal precedence, and a succession of the operators listed
       in this definition are evaluated left to right.  Indicates
       that  each  token  is an operator, that all tokens in this
       definition have equal precedence, and that a succession of
       the  operators  listed  in  this  definition are evaluated
       right to left.  Indicates that each token is an  operator,
       and  that  the  operators listed in this definition cannot
       appear in succession. Indicates that the token  cannot  be
       used  associatively.   Indicates the highest-level production
 rule to be reduced; in other words,  the  rule  where
       the  parser  can  consider its work done and can terminate
       processing. If this definition is not included, the parser
       uses  the  first  production rule. The symbol must be nonterminal
 (not a token).  Defines each symbol as data  type
       type,  to  resolve  ambiguities. If this construct is present,
 yacc performs type checking  and  otherwise  assumes
       all  symbols  to  be  of type integer.  Defines the yylval
       global variable as a union, where union-def is a  standard
       C definition in the format: { type member ; [type member ;
       ...] }

              At least one member should be an int. Any  valid  C
              data  type  can  be  defined, including structures.
              When you run yacc with the -d option,  the  definition
  of yylval is placed in the <y.tab.h> file and
              can be referred to in a lex input file.

       Every token (non-terminal symbol) must be listed in one of
       the  preceding % definitions. Multiple tokens can be separated
 by white space or commas. All the tokens  in  %left,
       %right,  and  %nonassoc  definitions are assigned a precedence
 with tokens in later definitions  having  precedence
       over those in earlier definitions.

       In  addition  to symbols, a token can be literal character
       enclosed in single quotes. (Multibyte characters are  recognized
  by  the lexical analyzer and returned as tokens.)
       The following special characters can be used, just as in C
       programs:  Alert  Newline Tab Vertical tab Carriage Return
       Backspace Form Feed Backslash Single Quote  Question  mark
       One  or  more octal digits specifying the integer value of
       the character

   Rules Section of Input File    [Toc]    [Back]
       The rules section of a yacc input file defines  the  rules
       that  parse  the  input stream. It consists of a series of
       production rules that the parser tries to reduce. The format
 of each production rule is:

       symbol   :  symbol-sequence  [action]  [|  symbol-sequence
       [action] ...] ;

       A symbol-sequence consists of zero or more  symbols  separated
  by  white space. The first symbol must be the first
       character of the line, but newlines and other white  space
       can appear anywhere else in the rule. All terminal symbols
       must be declared in %token definitions.

       Each symbol-sequence  represents  an  alternative  way  of
       reducing  the rule. A symbol can appear recursively in its
       own rule.  Always use left-recursion (where the  recursive
       symbol  appears  before  the  terminating  case in symbolsequence).


       The following sequence indicates that the current sequence
       of symbols is to be preferred over others, at the level of
       precedence assigned to token in the definitions section of
       the input file:

       %prec token

       The specially defined token error matches any unrecognized
       sequence of input. This token causes the parser to  invoke
       the yyerror function. By default, the parser tries to synchronize
 with the input  and  continue  processing  it  by
       reading  and discarding all input up to the symbol following
 error. (You can override  this  behavior  through  the
       yyerrok  action.)  If  no  error token appears in the yacc
       input file, the parser exits with an  error  message  upon
       encountering unrecognized input.

       The  parser  always executes action after encountering the
       symbol that precedes it. Thus, an action can appear in the
       middle  of  a symbol-sequence, after each symbol-sequence,
       or after multiple instances  of  symbol-sequence.  In  the
       last  case, action is executed when the parser matches any
       of the sequences.

       The action consists of standard C code within  braces  and
       can  also  take  the following values, variables, and keywords.
  If the token returned by  the  yylex  function  is
       associated  with  a  significant value, yylex should place
       the value in this global variable. By default,  yylval  is
       of type long. The definitions section can include a %union
       definition to associate with other data  types,  including
       structures.  If  you run yacc with the -d option, the full
       yylval definition is passed into the  <y.tab.h>  file  for
       access  by lex.  Causes the parser to start parsing tokens
       immediately after an erroneous sequence, instead  of  performing
  the  default  action  of  reading  and discarding
       tokens up to a synchronization token. The  yyerrok  action
       should  appear  immediately after the error token.  Refers
       to symbol n, a token index  in  the  production,  counting
       from the beginning of the production rule, where the first
       symbol after the colon is $1. The  type  variable  is  the
       name of one of the union lines listed in the %union directive
 in the declaration section. The <type>  syntax  (nonstandard)
  allows  the value to be cast to a specific data
       type. Note that you will rarely need to use the type  syntax.
   Refers to the value returned by the matched symbolsequence
 and used for the  matched  symbol  when  reducing
       other rules. The symbol-sequence generally assigns a value
       to $$. The type variable is the name of one of  the  union
       lines  listed  in  the %union directive in the declaration
       section. The <type> syntax (non-standard) allows the value
       to  be  cast  to  a specific data type. Note that you will
       rarely need to use the type syntax.

   User Subroutines Section of Input File    [Toc]    [Back]
       The user subroutines section of the yacc input  file  contains
 user-supplied functions. Because these functions are
       included in this file, you do not need  to  use  the  yacc
       library when processing this file. If you supply a lexical
       analyzer (yylex) to the parser, it must  be  contained  in
       the user subroutines section.

       The  following  functions, which are contained in the user
       subroutines section, are invoked within the yyparse  function
  generated  by  yacc.  The lexical analyzer called by
       yyparse to recognize each token  of  input.  Usually  this
       function is created by lex.  yylex reads input, recognizes
       expressions within the input, and returns a  token  number
       representing  the kind of token read. The function returns
       an int value. A return value of 0 (zero) means the end  of
       input.

              If the parser and yylex do not agree on these token
              numbers, reliable communication between them cannot
              occur.  For  one-character  literals,  the token is
              simply the numeric value of the  character  in  the
              current character set. The numbers for other tokens
              can be chosen by either yacc or the user. In either
              case,  the  #define construct of C is used to allow
              yylex() to return these numbers  symbolically.  The
              #define  statements are put into the code file, and
              into the header file if that file is requested. The
              set  of  characters permitted by yacc in an identifier
 is larger than  that  permitted  by  C.  Token
              names  found to contain such characters will not be
              included in the #define declarations.

              If the token numbers  are  chosen  by  yacc,  those
              tokens  other  than  literals  are assigned numbers
              greater than 256, although no order is  implied.  A
              token  can  be explicitly assigned a number by following
 its first appearance in the declaration section
  with a number. Names and literals not defined
              in this way retain their  default  definition.  All
              assigned token numbers are unique and distinct from
              the token numbers used for literals.  If  duplicate
              token numbers cause conflicts in parser generation,
              yacc reports an error; otherwise, it is unspecified
              whether  the  token  assignment  is  accepted or an
              error is reported.

              The end of the input is marked by a  special  token
              called  the  endmarker that has a token number that
              is zero or negative. All lexical  analyzers  return
              zero  or  negative  as a token number upon reaching
              the end of their input. If the tokens  up  to,  but
              not  excluding, the endmarker form a structure that
              matches the start symbol, the  parser  accepts  the
              input.  If  the endmarker is seen in any other context,
 it is considered an error.  The function that
              the  parser calls upon encountering an input error.
              The default function,  defined  in  liby.a,  simply
              prints  string  to the standard error. The user can
              redefine the function. The function's type is void.
              The  wrap-up routine that returns a value of 1 when
              the end of input occurs.

       The liby.a library contains default main()  and  yyerror()
       functions. (main() is the required main program that calls
       yyparse() to start the program.) These routines look  like
       the following, respectively:

       main() {
            setlocale(LC_ALL, );
            (void) yyparse();
            return(0); }

       int yyerror(s);
            char *s; {
            fprintf(stderr,"%s\n",s);
            return (0); }

NOTES [Toc] [Back]

       The  LANG  and  LC_* variables affect the execution of the
       yacc command as stated. The  main()  function  defined  by
       yacc issues the following call:

       setlocale(LC_ALL, )

       As  a  result,  the program generated by yacc will also be
       affected by the contents of these variables at run time.

       The lex program can be compiled as a C program with -std0,
       -std, or -std1 mode. It can also be compiled as a C++ program.
 If YY_NOPROTO is defined on the compilation  command
       line, function prototypes are not generated.

EXIT STATUS [Toc] [Back]

       The following exit values are returned: Successful completion.
  An error occurred.

EXAMPLES [Toc] [Back]

       This section describes the example programs  for  the  lex
       and  yacc  commands,  which  together create a simple desk
       calculator program that  performs  addition,  subtraction,
       multiplication,  and  division  operations. The calculator
       program also allows you  to  assign  values  to  variables
       (each  designated by a single lowercase ASCII letter), and
       then use the variables in  calculations.  The  files  that
       contain  the program are as follows: The lex specification
       file that defines the lexical analysis  rules.   The  yacc
       grammar  file that defines the parsing rules and calls the
       yylex() function created by lex to provide input.

       The remaining text expects that the current  directory  is
       the  directory that contains the lex and yacc example program
 files.

   Compiling the Example Program    [Toc]    [Back]
       Perform the following steps to create the example  program
       using  lex  and  yacc: Process the yacc grammar file using
       the -d option. The -d option tells yacc to create  a  file
       that  defines  the  tokens it uses in addition to creating
       the C language source code file.

              yacc -d calc.y

              The following files are  created:  The  C  language
              source  file  that  yacc created for the parser.  A
              header file containing #define statements  for  the
              tokens used by the parser.

              (The  *.o  files  are  created temporarily and then
              removed.)  Process the lex specification file:

              lex calc.l

              The following  file  is  created:  The  C  language
              source  file  that lex created for the lexical analyzer.
  Compile and link the two C language  source
              files:

              cc -o calc y.tab.c lex.yy.c

              The  following  files  are created: The object file
              for y.tab.c.  The object file  for  lex.yy.c.   The
              executable program file.

       You can then run the program directly by entering: calc

       Then,  enter  numbers and operators in calculator fashion.
       After you press <Return>, the program displays the  result
       of  the operation.  If you assign a value to a variable as
       follows, the cursor moves to the next line:

       m=4 <Return> _

       You can then use the variable in calculations and it  will
       have the value assigned to it:

       m+5 <Return> 9


   The Parser Source Code    [Toc]    [Back]
       The  file  calc.y has entries in all three of the sections
       of a yacc grammar file--declarations, rules, and user subroutines.
 It contains the following source code:

       %{ #include <stdio.h>

       int regs[26]; int base;

       %}

       %start list

       %token DIGIT LETTER

       %left  '|' %left '&' %left '+' '-' %left '*' '/' '%' %left
       UMINUS /*supplies precedence for unary minus */

       %%     /* beginning of rules section */

       list   :      /*empty */
              |      list stat '\n'
              |      list error '\n'
                     {        yyerrok;        }
              ;

       stat   :      expr
                     {        printf("%d\n",$1);        }
              |      LETTER '=' expr
                     {        regs[$1] = $3;  }
              ;

       expr   :      '(' expr ')'
                     {      $$ = $2;        }
              |      expr '*' expr
                     {        $$ = $1 * $3;        }
              |      expr '/' expr
              {      $$ = $1 / $3;        }
              |      expr '%' expr
                     {        $$ = $1 % $3;        }
              |      expr '+' expr
                     {        $$ = $1 + $3;        }
              |      expr '-' expr
                     {        $$ = $1 - $3;        }
              |      expr '&' expr
                     {        $$ = $1 & $3;        }
              |      expr '|' expr
                     {        $$ = $1 | $3;        }
              |      '-' expr %prec UMINUS
                     {        $$ = -$2;        }
              |      LETTER
                     {        $$ = regs[$1];        }
              |      number
              ;

       number :      DIGIT
                     {        $$ = $1; base  =  ($1==0)  ?  8:10;
       }
              |      number        DIGIT
                     {        $$ = base * $1 + $2;        }
              ;

       %%      /* beginning of user subroutines section */ main()
       {
               return(yyparse()); }

       yyerror(s) char *s; {
               fprintf(stderr,"%s\n",s); }

       yywrap() {
               return(1); }


   The Lexical Analyzer Source Code    [Toc]    [Back]
       The file calc.l contains the lexical analyzer source code.
       It contains the rules used to generate the tokens from the
       input stream.  It also  contains  include  statements  for
       standard  input  and  output, as well as for the <y.tab.h>
       file. The yacc program generates the <y.tab.h>  file  from
       the  yacc  grammar  file  information,  if  you use the -d
       option with the yacc command. The file <y.tab.h>  contains
       definitions for the tokens that the parser program uses.

       Contents of calc.1: %{

       #include  <stdio.h> #include "y.tab.h" int c; #if !defined
       (YYSTYPE) #define YYSTYPE long #endif extern YYSTYPE  yylval;
 %} %% " "     ; [a-z]   {
                      c = yytext[0];
                      yylval = c - 'a';
                      return(LETTER);
               } [0-9]   {
                      c = yytext[0];
                      yylval = c - '0';
                      return(DIGIT);
               } [^a-z 0-9]      {
                       c = yytext[0];
                       return(c);
                       }

ENVIRONMENT VARIABLES [Toc] [Back]

       The  following  environment variables affect the execution
       of yacc: Provides a default value for  the  internationalization
 variables that are unset or null. If LANG is unset
       or null, the corresponding value from the  default  locale
       is  used.   If  any  of the internationalization variables
       contain an invalid setting, the utility behaves as if none
       of  the variables had been defined.  If set to a non-empty
       string value, overrides the values of all the other internationalization
  variables.  Determines the locale for the
       interpretation of sequences of bytes of text data as characters
  (for example, single-byte as opposed to multi-byte
       characters in arguments and input files).  Determines  the
       locale  for the format and contents of diagnostic messages
       written to standard error.   Determines  the  location  of
       message catalogs for the processing of LC_MESSAGES.

FILES [Toc] [Back]

       A  readable  description of parsing tables and a report on
       conflicts generated by  grammar  ambiguities  Output  file
       Definitions  for token names Temporary file Temporary file
       Temporary file Default skeleton parser for C programs  The
       yacc library

yacc(1)

Contents

NAME [Toc] [Back]

SYNOPSIS [Toc] [Back]

STANDARDS [Toc] [Back]

OPTIONS [Toc] [Back]

OPERANDS [Toc] [Back]

DESCRIPTION [Toc] [Back]

NOTES [Toc] [Back]

EXIT STATUS [Toc] [Back]

EXAMPLES [Toc] [Back]

ENVIRONMENT VARIABLES [Toc] [Back]

FILES [Toc] [Back]

SEE ALSO [Toc] [Back]