oldlinux-files/Minix/CD-ROM-2.0/MINIX/MANUALS/CAT9/AWK.9



Command:   awk - pattern matching language
Syntax:    awk rules [file] ...
Flags:     (none)
Examples:  awk rules input          # Process input according to rules
           awk rules -  >out        # Input from terminal, output to out

     AWK is a programming  language  devised  by  Aho,  Weinberger,  and
Kernighan  at Bell Labs (hence the name).  Awk programs search files for
specific patterns and performs 'actions' for every occurrence  of  these
patterns.   The  patterns can be 'regular expressions' as used in the ed
editor.  The actions are expressed using a subset of the C language.

     The patterns and actions are usually placed in a 'rules' file whose
name  must  be  the  first argument in the command line, preceded by the
flag -f.  Otherwise, the first argument on the command line is taken  to
be  a  string  containing  the rules themselves. All other arguments are
taken to be the names of text  files  on  which  the  rules  are  to  be
applied,  with  -  being  the  standard  input.   To take rules from the
standard input, use -f -.

     The command:

        awk  rules  prog.d*u

would read the patterns and actions rules from the file rules and  apply
them to all the arguments.

     The general format of a rules file is:

   <pattern> { <action> }    <pattern> { <action> }    ...

There may be any number of these <pattern> { <action> } sequences in the
rules  file.   Awk reads a line of input from the current input file and
applies every <pattern> { <action> } in sequence to the line.

     If the <pattern> corresponding to any { <action> } is missing,  the
action  is  applied to every line of input.  The default { <action> } is
to print the matched input line.

Patterns

     The <pattern>s may consist of  any  valid  C  expression.   If  the
<pattern>  consists of two expressions separated by a comma, it is taken
to be a range and the <action> is performed on all lines of  input  that
match the range.  <pattern>s may contain 'regular expressions' delimited
by an @ symbol.  Regular expressions can be thought of as a  generalized
'wildcard'  string  matching  mechanism,  similar  to  that used by many
operating systems  to  specify  file  names.   Regular  expressions  may
contain any of the following characters:


   x     An ordinary character
   \     The backslash quotes any character
   ^     A circumflex at the beginning of an expr matches the  beginning
         of a line.
   $     A dollar-sign at the end of an expression matches the end of  a
         line.
   .     A period matches any single character except newline.
   *     An expression followed by an  asterisk  matches  zero  or  more
         occurrences of that expression: 'fo*' matches 'f', 'fo', 'foo',
         'fooo', etc.
   +     An expression followed by a  plus  sign  matches  one  or  more
         occurrences  of  that  expression:  'fo+'  matches 'fo', 'foo',
         'fooo', etc.
   []    A  string  enclosed  in  square  brackets  matches  any  single
         character  in  that  string,  but  no  others.   If  the  first
         character in the string is a circumflex, the expression matches
         any  character except newline and the characters in the string.
         For example, '[xyz]' matches 'xx'  and  'zyx',  while  '[^xyz]'
         matches  'abc'  but  not  'axb'.   A range of characters may be
         specified by two characters separated by '-'.

Actions

     Actions are expressed as a subset of the C language.  All variables
are  global  and  default to int's if not formally declared. Only char's
and int's and pointers and arrays of char  and  int  are  allowed.   Awk
allows  only  decimal  integer  constants to be used----no hex (0xnn) or
octal (0nn). String and character  constants  may  contain  all  of  the
special C escapes (\n, \r, etc.).

     Awk supports the 'if', 'else', 'while' and 'break' flow of  control
constructs, which behave exactly as in C.

     Also supported are the following unary and binary operators, listed
in order from highest to lowest precedence:

   Operator          Type           Associativity
   () []             unary          left to right
   ! ~ ++ -- - * &   unary          right to left
   * / %             binary         left to right
   + -               binary         left to right
   << >>             binary         left to right
   < <= > >=         binary         left to right
   == !=             binary         left to right
   &                 binary         left to right
   ^                 binary         left to right
   |                 binary         left to right
   &&                binary         left to right
   ||                binary         left to right
   =                 binary         right to left


Comments are introduced by a '#' symbol and are terminated by the  first
newline  character.   The  standard '/*' and '*/' comment delimiters are
not supported and will result in a syntax error.


Fields


     When awk reads a line from the current input file,  the  record  is
automatically  separated  into  'fields.'  A field is simply a string of
consecutive characters delimited by either the beginning or end of line,
or  a  'field separator' character.  Initially, the field separators are
the space and tab character.  The special unary operator '$' is used  to
reference  one  of  the  fields in the current input record (line).  The
fields are numbered sequentially starting at  1.   The  expression  '$0'
references the entire input line.

     Similarly, the 'record separator' is used to determine the  end  of
an  input 'line,' initially the newline character.  The field and record
separators may be changed programatically by one of the actions and will
remain in effect until changed again.

     Multiple (up to 10) field separators are allowed  at  a  time,  but
only one record separator.

     Fields behave exactly like strings; and can be  used  in  the  same
context  as a character array.  These 'arrays' can be considered to have
been declared as:


     char ($n)[ 128 ];


In other words, they are 128 bytes long.  Notice  that  the  parentheses
are  necessary  because  the  operators [] and $ associate from right to
left; without them, the statement would have parsed as:


     char $(1[ 128 ]);


which is obviously ridiculous.

     If the contents of one of these field arrays is altered,  the  '$0'
field will reflect this change.  For example, this expression:


     *$4 = 'A';


will change the first character of the fourth field to  an  upper-  case
letter 'A'.  Then, when the following input line:


     120 PRINT "Name         address        Zip"


is processed, it would be printed as:


     120 PRINT "Name         Address        Zip"


Fields may also be modified with the strcpy() function (see below).  For
example, the expression:

     strcpy( $4, "Addr." );

applied to the same line above would yield:

     120 PRINT "Name         Addr.        Zip"


Predefined Variables

     The following variables are pre-defined:

   FS             Field separator (see below).
   RS             Record separator (see below also).
   NF             Number of fields in current input record (line).
   NR             Number of records processed thus far.
   FILENAME       Name of current input file.
   BEGIN          A special <pattern>  that  matches  the  beginning  of
                  input text.
   END            A special <pattern> that  matches  the  end  of  input
                  text.

Awk also provides some useful built-in functions for string manipulation
and printing:

   print(arg)     Simple printing of strings only, terminated by '\n'.
   printf(arg...) Exactly the printf() function from C.
   getline()      Reads the next record and returns 0 on end of file.
   nextfile()     Closes the current input file  and  begins  processing
                  the next file
   strlen(s)      Returns the length of its string argument.
   strcpy(s,t)    Copies the string 't' to the string 's'.
   strcmp(s,t)    Compares the 's' to 't' and returns 0 if they match.
   toupper(c)     Returns its character  argument  converted  to  upper-
                  case.


   tolower(c)     Returns its character  argument  converted  to  lower-
                  case.
   match(s,@re@)  Compares the string 's' to the regular expression 're'
                  and  returns  the  number  of  matches  found (zero if
                  none).

Authors

     Awk was written by Saeko Hirabauashi and Kouichi Hirabayashi.