*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->OpenBSD man pages -> awk (1)              
Title
Content
Arch
Section
 

AWK(1)

Contents


NAME    [Toc]    [Back]

     awk - pattern-directed scanning and processing language

SYNOPSIS    [Toc]    [Back]

     awk [-safe] [-V] [-d[n]] [-F fs] [-v var=value] [prog  |  -f
progfile]
         file ...
     nawk ...

DESCRIPTION    [Toc]    [Back]

     awk  scans each input file for lines that match any of a set
of patterns
     specified literally in prog or in one or more  files  specified as -f
     progfile.   With each pattern there can be an associated action that will
     be performed when a line of  a  file  matches  the  pattern.
Each line is
     matched  against the pattern portion of every pattern-action
statement;
     the associated action is performed for each matched pattern.
The file
     name  `-'  means  the  standard input.  Any file of the form
var=value is
     treated as an assignment, not a filename, and is executed at
the time it
     would have been opened if it were a filename.

     The options are as follows:

     -d[n]    Debug mode.  Set debug level to n, or 1 if n is not
specified.  A
             value greater than 1 causes awk to dump core on  fatal errors.

     -F  fs    Define the input field separator to be the regular
expression fs.

     -f filename
             Read program code from the specified  file  filename
instead of
             from the command line.

     -safe    Disable  file  output  (print >, print >>), process
creation (cmd |
             getline, print |, system) and access to the environment (ENVIRON;
             see  the  section  on  variables  below).  This is a
first (and not
             very reliable) approximation to a  ``safe''  version
of awk.

     -V       Print  the version number of awk to standard output
and exit.

     -v var=value
             Assign value to variable var before prog is  executed; any number
             of -v options may be present.

     The input is normally made up of input lines (records) separated by newlines,
 or by the value of RS.  If RS is null, then any  number of blank
     lines  are  used  as  the record separator, and newlines are
used as field
     separators (in addition to the value of FS).  This is convenient when
     working with multi-line records.

     An  input  line  is  normally made up of fields separated by
whitespace, or
     by the regular expression FS.  The fields  are  denoted  $1,
$2, ..., while
     $0 refers to the entire line.  If FS is null, the input line
is split into
 one field per character.

     Normally, any number of blanks separate fields.  In order to
set the
     field  separator to a single blank, use the -F option with a
value of
     `[ ]'.  If a field separator of `t' is specified, awk treats
it as if
     `' had been specified and uses <TAB> as the field separator.
In order
     to use a literal `t' as the field separator, use the -F  option with a
     value of `[t]'.

     A pattern-action statement has the form

           pattern { action }

     A missing { action } means print the line; a missing pattern
always
     matches.  Pattern-action statements are  separated  by  newlines or semicolons.


     Newlines are permitted after a terminating statement or following a comma
     (`,'), an open brace (`{'), a logical AND (`&&'), a  logical
OR (`||'),
     after  the  `do'  or  `else'  keywords, or after the closing
parenthesis of an
     `if', `for', or `while' statement.   Additionally,  a  backslash (`') can
     be used to escape a newline between tokens.

     An  action  is a sequence of statements.  A statement can be
one of the
     following:

           if (expression) statement [else statement]
           while (expression) statement
           for (expression; expression; expression) statement
           for (var in array) statement
           do statement while (expression)
           break
           continue
           { [statement ...] }
           expression # commonly var = expression
           print [expression-list][>expression]
           printf format [..., expression-list][>expression]
           return [expression]
           next # skip remaining patterns on this input line
           nextfile # skip rest of this file, open next, start
           delete array[expression] # delete an array element
           delete array # delete all elements of array
           exit  [expression]  #  exit  immediately;  status   is
expression

     Statements  are  terminated by semicolons, newlines or right
braces.  An
     empty expression-list stands for $0.  String  constants  are
quoted "",
     with  the  usual  C escapes recognized within (see printf(1)
for a complete
     list of these).  Expressions take on string or numeric  values as appropriate,
 and are built using the operators + - * / % ^ (exponentiation),
     and concatenation (indicated by whitespace).  The  operators
! ++ -- += -=
     *= /= %= ^= > >= < <= == != ?: are also available in expressions.  Variables
 may be  scalars,  array  elements  (denoted  x[i])  or
fields.  Variables
     are initialized to the null string.  Array subscripts may be
any string,
     not necessarily numeric; this allows for a form of  associative memory.
     Multiple  subscripts such as [i,j,k] are permitted; the constituents are
     concatenated, separated by the value of SUBSEP (see the section on
     variables below)).

     The  print  statement  prints  its arguments on the standard
output (or on a
     file if >file or >>file is present or on a pipe if | cmd  is
present),
     separated  by the current output field separator, and terminated by the
     output record separator.  file and cmd may be literal  names
or parenthesized
  expressions;  identical  string  values  in different
statements denote
     the same open file.  The printf statement  formats  its  expression list according
 to the format (see printf(3)).

     Patterns  are  arbitrary Boolean combinations (with ! || &&)
of regular expressions
 and relational expressions.   Regular  expressions
are as in
     egrep(1).   Isolated  regular expressions in a pattern apply
to the entire
     line.  Regular expressions may also occur in relational  expressions, using
  the operators ~ and !~.  /re/ is a constant regular expression; any
     string (constant or variable) may be used as a  regular  expression, except
     in  the position of an isolated regular expression in a pattern.

     A pattern may consist of two patterns separated by a  comma;
in this case,
     the  action is performed for all lines from an occurrence of
the first
     pattern through an occurrence of the second.

     A relational expression is one of the following:

           expression matchop regular-expression
           expression relop expression
           expression in array-name
           (expr, expr, ...) in array-name

     where a relop is any of the six relational operators  in  C,
and a matchop
     is either ~ (matches) or !~ (does not match).  A conditional
is an arithmetic
 expression, a relational expression, or a Boolean combination of
     these.

     The  special  patterns  BEGIN and END may be used to capture
control before
     the first input line is read and after the last.  BEGIN  and
END do not
     combine with other patterns.

     Variable names with special meanings:

     ARGC       Argument count, assignable.
     ARGV        Argument array, assignable; non-null members are
taken as
                filenames.
     CONVFMT    Conversion format when  converting  numbers  (default "%.6g").
     ENVIRON     Array  of  environment variables; subscripts are
names.
     FILENAME   The name of the current input file.
     FNR        Ordinal number of the current record in the  current file.
     FS          Regular expression used to separate fields; also
settable by
                option -F fs.
     NF         Number of fields in the current record.  $NF  can
be used to
                obtain the value of the last field in the current
record.
     NR         Ordinal number of the current record.
     OFMT       Output format for numbers (default "%.6g").
     OFS        Output field separator (default blank).
     ORS        Output record separator (default newline).
     RLENGTH    The length of the string matched by  the  match()
function.
     RS         Input record separator (default newline).
     RSTART      The  starting  position of the string matched by
the match()
                function.
     SUBSEP     Separates multiple subscripts (default 034).

FUNCTIONS    [Toc]    [Back]

     The awk language has a variety of built-in functions: arithmetic, string,
     input/output and general.

   Arithmetic Functions    [Toc]    [Back]

     atan2(y, x)  Return the arctangent of y/x in radians.

     cos(x)        Return the cosine of x, where x is in radians.

     exp(x)       Return the exponential of x.

     int(x)       Return x truncated to an integer value.

     log(x)       Return the natural logarithm of x.

     rand()       Return a random number, n, such that 0<=n<1.

     sin(x)       Return the sine of x, where x is in radians.

     sqrt(x)      Return the square root of x.

     srand(expr)  Sets seed for rand() to expr  and  returns  the
previous seed.
                  If expr is omitted, the time of day is used instead.

   String Functions    [Toc]    [Back]

     gsub(r, t, s)    The same as sub() except  that  all  occurrences of the
                      regular  expression  are  replaced.  gsub()
returns the
                      number of replacements.

     index(s, t)      The position in s where the  string  t  occurs, or 0 if it
                      does not.

     length(s)        The length of s taken as a string, or of $0
if no argument
 is given.

     match(s, r)      The position in s where the regular expression r occurs,
                      or  0  if it does not.  The variable RSTART
is set to the
                      starting position  of  the  matched  string
(which is the
                      same  as  the returned value) or zero if no
match is
                      found.  The variable RLENGTH is set to  the
length of the
                      matched string, or -1 if no match is found.

     split(s, a, fs)  Splits the string  s  into  array  elements
a[1], a[2], ...,
                      a[n] and returns n.  The separation is done
with the
                      regular expression fs  or  with  the  field
separator FS if
                      fs  is not given.  An empty string as field
separator
                      splits the string into  one  array  element
per character.

     sprintf(fmt, expr, ...)
                      The  string resulting from formatting expr,
...  according
 to the printf(3) format fmt.

     sub(r, t, s)     Substitutes t for the first  occurrence  of
the regular
                      expression  r in the string s.  If s is not
given, $0 is
                      used.  An ampersand (`&') in t is  replaced
in string s
                      with  regular  expression r.  A literal ampersand can be
                      specified by preceding it  with  two  backslashes (`\').
                      A  literal  backslash  can  be specified by
preceding it
                      with another backslash  (`\').   sub()  returns the number
                      of replacements.

     substr(s, m, n)  Return at most the n-character substring of
s that begins
 at position m counted from 1.  If n is
omitted, or
                      if  n  specifies  more  characters than are
left in the
                      string, the length of the substring is limited by the
                      length of s.

     tolower(str)      Returns  a copy of str with all upper-case
characters
                      translated to  their  corresponding  lowercase equivalents.


     toupper(str)      Returns  a copy of str with all lower-case
characters
                      translated to  their  corresponding  uppercase equivalents.


   Input/Output and General Functions

     close(expr)            Closes  the  file or pipe expr.  expr
should match
                           the string that was used to  open  the
file or pipe.

     cmd  |  getline [var]   Read a record of input from a stream
piped from the
                           output of cmd.  If var is omitted, the
variables $0
                           and NF are set.  Otherwise var is set.
If the
                           stream is not open, it is opened.   As
long as the
                           stream  remains open, subsequent calls
will read
                           subsequent records  from  the  stream.
The stream remains
  open  until  explicitly  closed
with a call to
                           close().

     fflush(expr)          Flushes any buffered  output  for  the
file or pipe
                           expr.   expr  should  match the string
that was used
                           to open the file or pipe.

     getline               Sets $0 to the next input record  from
the current
                           input file.  This form of getline sets
the variables
 NF, NR, and  FNR.   getline  returns 1 for a
                           successful  input,  0 for end of file,
and -1 for an
                           error.

     getline var           Sets $0 to variable var.  This form of
getline sets
                           the variables NR and FNR.  getline returns 1 for a
                           successful input, 0 for end  of  file,
and -1 for an
                           error.

     getline  [var] < file  Sets $0 to the next record from file.
If var is
                           omitted, the variables $0 and  NF  are
set.  Otherwise
 var is set.  If file is not open,
it is
                           opened.  As long as the stream remains
open, subsequent
   calls   will  read  subsequent
records from file.
                           file  remains  open  until  explicitly
closed with a
                           call to close().

     system(cmd)           Executes cmd and returns its exit status.

     Functions may be defined (at the position of  a  pattern-action statement)
     thusly:

           function foo(a, b, c) { ...; return x }

     Parameters  are  passed by value if scalar, and by reference
if array name;
     functions may be called recursively.  Parameters  are  local
to the function;
  all other variables are global.  Thus local variables
may be created
 by providing excess parameters in  the  function  definition.

EXAMPLES    [Toc]    [Back]

     Print lines longer than 72 characters:

           length($0) > 72

     Print first two fields in opposite order:

           { print $2, $1 }

     Same, with input fields separated by comma and/or blanks and
tabs:

           BEGIN { FS = ",[ ]*|[ ]+" }
                 { print $2, $1 }

     Add up first column, print sum and average:

           { s += $1 }
           END { print "sum is", s, " average is", s/NR }

     Print all lines between start/stop pairs:

           /start/, /stop/

     Simulate echo(1):

           BEGIN { # Simulate echo(1)
                   for (i = 1; i  <  ARGC;  i++)  printf  "%s  ",
ARGV[i]
                   printf "0
                   exit }

     Print an error message to standard error:

           { print "error!" > "/dev/stderr" }

SEE ALSO    [Toc]    [Back]

      
      
     egrep(1), lex(1), printf(1), sed(1), printf(3)

     "Awk -- A Pattern Scanning and Processing Language",
     /usr/share/doc/usd/16.awk/.

     A.  V.  Aho,  B. W. Kernighan, and P. J. Weinberger, The AWK
Programming
     Language, Addison-Wesley, 1988, ISBN 0-201-07981-X.

HISTORY    [Toc]    [Back]

     An awk utility appeared in Version 7 AT&T UNIX.

BUGS    [Toc]    [Back]

     There  are  no  explicit  conversions  between  numbers  and
strings.  To force
     an  expression  to  be  treated  as a number add 0 to it; to
force it to be
     treated as a string concatenate "" to it.

     The scope rules for variables in functions are a botch;  the
syntax is
     worse.

OpenBSD      3.6                           June      29,     1996
[ Back ]
 Similar pages
Name OS Title
oawk IRIX pattern scanning and processing language
awk Tru64 Pattern scanning and processing language
awk IRIX pattern scanning and processing language
mawk Linux pattern scanning and text processing language
lio_listio FreeBSD list directed I/O (REALTIME)
VkGraph IRIX A component that displays directed graphs
tsort OpenBSD topological sort of a directed graph
tsort FreeBSD topological sort of a directed graph
tsort NetBSD topological sort of a directed graph
geocustoms HP-UX configure system language on multi-language systems
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service