lex(1) lex(1)
lex - generate programs for simple lexical tasks
lex [-ctvn -V -Q[y|n]] [file<b>]
NOTE: When the environment variable _XPG is a value greater than 0
(zero), lex execs the POSIX compliant /usr/bin/flex.
The lex command generates programs to be used in simple lexical analysis
of text. The input files (standard input default) contain strings and
expressions to be searched for and C text to be executed when these
strings are found. lex processes supplementary code set characters in
program comments and strings, and single-byte supplementary code set
characters in tokens, according to the locale specified in the LC_CTYPE
environment variable [see LANG on environ(5)].
lex generates a file named lex.yy.c. When lex.yy.c is compiled and
linked with the lex library (/usr/lib/libl.a), it copies the input to the
output except when a string specified in the file is found. When a
specified string is found, then the corresponding program text is
executed. The actual string matched is left in yytext, an external
character array. Matching is done in order of the patterns in the file.
The patterns may contain square brackets to indicate character classes,
as in [abx-z] to indicate a, b, x, y, and z; and the operators *, +, and
? mean, respectively, any non-negative number of, any positive number
of, and either zero or one occurrence of, the previous character or
character class. Thus, [a-zA-Z]+ matches a string of letters. The
character . is the class of all characters except new-line. Parentheses
for grouping and vertical bar for alternation are also supported. The
notation r<b>{d<b>,e<b>} in a rule indicates between d and e instances of regular
expression r. It has higher precedence than |, but lower than *, ?, +,
and concatenation. The character ^ at the beginning of an expression
permits a successful match only immediately after a new-line, and the
character $ at the end of an expression requires a trailing new-line.
The character / in an expression indicates trailing context; only the
part of the expression up to the slash is returned in yytext, but the
remainder of the expression must follow in the input stream. An operator
character may be used as an ordinary symbol if it is within " symbols or
preceded by \.
Three macros are expected: input to read a character; unput(c<b>) to
replace a character read; and output(c<b>) to place an output character.
They are defined in terms of the standard streams, but you can override
them. The program generated is named yylex, and the lex library contains
a main that calls it. The macros input and output read from and write to
stdin and stdout, respectively.
The function yymore accumulates additional characters into the same
yytext. The function yyless(n<b>) pushes back yyleng -n characters into the
input stream. (yyleng is an external long int variable giving the length
Page 1
lex(1) lex(1)
in bytes of yytext.) The function yywrap is called whenever the scanner
reaches end of file and indicates whether normal wrapup should continue.
The action REJECT on the right side of the rule causes the match to be
rejected and the next suitable match executed. The action ECHO on the
right side of the rule is equivalent to printf("%s", yytext).
Any line beginning with a blank is assumed to contain only C text and is
copied; if it precedes %%, it is copied into the external definition area
of the lex.yy.c file. All rules should follow a %%, as in yacc. Lines
preceding %% that begin with a non-blank character define the string on
the left to be the remainder of the line; it can be called out later by
surrounding it with {}. In this section, C code (and preprocessor
statements) can also be included between %{ and %}. Note that curly
brackets do not imply parentheses; only string substitution is done.
The external names generated by lex all begin with the prefix yy or YY.
The flags must appear before any files.
-c Indicates C actions and is the default.
-t Causes the lex.yy.c program to be written instead to standard
output.
-v Provides a two-line summary of statistics.
-n Will not print out the -v summary.
-V Print out version information on standard error.
-Q[y|n] Print out version information to output file lex.yy.c by using
-Qy. The -Qn option does not print out version information and
is the default.
Multiple files are treated as a single file. If no files are specified,
standard input is used.
Certain default table sizes are too small for some users. The table
sizes for the resulting finite state machine can be set in the
definitions section:
%p n number of positions is n (default 20000)
%n n number of states is n (4000)
%e n number of parse tree nodes is n (8000)
%a n number of transitions is n (16000)
%k n number of packed character classes is n (20000)
Page 2
lex(1) lex(1)
%o n size of output array is n (24000)
The use of one or more of the above automatically implies the -v option,
unless the -n option is used.
D [0-9]
%{
void
skipcommnts(void)
{
for(;;)
{
while(input()!='*')
;
if(input()=='/')
return;
else
unput(yytext[yyleng-1]);
}
}
%}
%%
if printf("IF statement\n");
[a-z]+ printf("tag, value %s\n",yytext);
0{D}+ printf("octal number %s\n",yytext);
{D}+ printf("decimal number %s\n",yytext);
"++" printf("unary op\n");
"+" printf("binary op\n");
"\n" ;/*no action */
"/*" skipcommnts();
%%
yacc(1)
PPPPaaaaggggeeee 3333 [ Back ]
|