296 lines
9.3 KiB
Groff
296 lines
9.3 KiB
Groff
|
||
|
||
|
||
|
||
|
||
Command: awk - pattern matching language
|
||
Syntax: awk rules [file] ...
|
||
Flags: (none)
|
||
Examples: awk rules input # Process input according to rules
|
||
awk rules - >out # Input from terminal, output to out
|
||
|
||
AWK is a programming language devised by Aho, Weinberger, and
|
||
Kernighan at Bell Labs (hence the name). Awk programs search files for
|
||
specific patterns and performs 'actions' for every occurrence of these
|
||
patterns. The patterns can be 'regular expressions' as used in the ed
|
||
editor. The actions are expressed using a subset of the C language.
|
||
|
||
The patterns and actions are usually placed in a 'rules' file whose
|
||
name must be the first argument in the command line, preceded by the
|
||
flag -f. Otherwise, the first argument on the command line is taken to
|
||
be a string containing the rules themselves. All other arguments are
|
||
taken to be the names of text files on which the rules are to be
|
||
applied, with - being the standard input. To take rules from the
|
||
standard input, use -f -.
|
||
|
||
The command:
|
||
|
||
awk rules prog.d*u
|
||
|
||
would read the patterns and actions rules from the file rules and apply
|
||
them to all the arguments.
|
||
|
||
The general format of a rules file is:
|
||
|
||
<pattern> { <action> } <pattern> { <action> } ...
|
||
|
||
There may be any number of these <pattern> { <action> } sequences in the
|
||
rules file. Awk reads a line of input from the current input file and
|
||
applies every <pattern> { <action> } in sequence to the line.
|
||
|
||
If the <pattern> corresponding to any { <action> } is missing, the
|
||
action is applied to every line of input. The default { <action> } is
|
||
to print the matched input line.
|
||
|
||
Patterns
|
||
|
||
The <pattern>s may consist of any valid C expression. If the
|
||
<pattern> consists of two expressions separated by a comma, it is taken
|
||
to be a range and the <action> is performed on all lines of input that
|
||
match the range. <pattern>s may contain 'regular expressions' delimited
|
||
by an @ symbol. Regular expressions can be thought of as a generalized
|
||
'wildcard' string matching mechanism, similar to that used by many
|
||
operating systems to specify file names. Regular expressions may
|
||
contain any of the following characters:
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
x An ordinary character
|
||
\ The backslash quotes any character
|
||
^ A circumflex at the beginning of an expr matches the beginning
|
||
of a line.
|
||
$ A dollar-sign at the end of an expression matches the end of a
|
||
line.
|
||
. A period matches any single character except newline.
|
||
* An expression followed by an asterisk matches zero or more
|
||
occurrences of that expression: 'fo*' matches 'f', 'fo', 'foo',
|
||
'fooo', etc.
|
||
+ An expression followed by a plus sign matches one or more
|
||
occurrences of that expression: 'fo+' matches 'fo', 'foo',
|
||
'fooo', etc.
|
||
[] A string enclosed in square brackets matches any single
|
||
character in that string, but no others. If the first
|
||
character in the string is a circumflex, the expression matches
|
||
any character except newline and the characters in the string.
|
||
For example, '[xyz]' matches 'xx' and 'zyx', while '[^xyz]'
|
||
matches 'abc' but not 'axb'. A range of characters may be
|
||
specified by two characters separated by '-'.
|
||
|
||
Actions
|
||
|
||
Actions are expressed as a subset of the C language. All variables
|
||
are global and default to int's if not formally declared. Only char's
|
||
and int's and pointers and arrays of char and int are allowed. Awk
|
||
allows only decimal integer constants to be used----no hex (0xnn) or
|
||
octal (0nn). String and character constants may contain all of the
|
||
special C escapes (\n, \r, etc.).
|
||
|
||
Awk supports the 'if', 'else', 'while' and 'break' flow of control
|
||
constructs, which behave exactly as in C.
|
||
|
||
Also supported are the following unary and binary operators, listed
|
||
in order from highest to lowest precedence:
|
||
|
||
Operator Type Associativity
|
||
() [] unary left to right
|
||
! ~ ++ -- - * & unary right to left
|
||
* / % binary left to right
|
||
+ - binary left to right
|
||
<< >> binary left to right
|
||
< <= > >= binary left to right
|
||
== != binary left to right
|
||
& binary left to right
|
||
^ binary left to right
|
||
| binary left to right
|
||
&& binary left to right
|
||
|| binary left to right
|
||
= binary right to left
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Comments are introduced by a '#' symbol and are terminated by the first
|
||
newline character. The standard '/*' and '*/' comment delimiters are
|
||
not supported and will result in a syntax error.
|
||
|
||
|
||
Fields
|
||
|
||
|
||
When awk reads a line from the current input file, the record is
|
||
automatically separated into 'fields.' A field is simply a string of
|
||
consecutive characters delimited by either the beginning or end of line,
|
||
or a 'field separator' character. Initially, the field separators are
|
||
the space and tab character. The special unary operator '$' is used to
|
||
reference one of the fields in the current input record (line). The
|
||
fields are numbered sequentially starting at 1. The expression '$0'
|
||
references the entire input line.
|
||
|
||
Similarly, the 'record separator' is used to determine the end of
|
||
an input 'line,' initially the newline character. The field and record
|
||
separators may be changed programatically by one of the actions and will
|
||
remain in effect until changed again.
|
||
|
||
Multiple (up to 10) field separators are allowed at a time, but
|
||
only one record separator.
|
||
|
||
Fields behave exactly like strings; and can be used in the same
|
||
context as a character array. These 'arrays' can be considered to have
|
||
been declared as:
|
||
|
||
|
||
char ($n)[ 128 ];
|
||
|
||
|
||
In other words, they are 128 bytes long. Notice that the parentheses
|
||
are necessary because the operators [] and $ associate from right to
|
||
left; without them, the statement would have parsed as:
|
||
|
||
|
||
char $(1[ 128 ]);
|
||
|
||
|
||
which is obviously ridiculous.
|
||
|
||
If the contents of one of these field arrays is altered, the '$0'
|
||
field will reflect this change. For example, this expression:
|
||
|
||
|
||
*$4 = 'A';
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
will change the first character of the fourth field to an upper- case
|
||
letter 'A'. Then, when the following input line:
|
||
|
||
|
||
120 PRINT "Name address Zip"
|
||
|
||
|
||
is processed, it would be printed as:
|
||
|
||
|
||
120 PRINT "Name Address Zip"
|
||
|
||
|
||
Fields may also be modified with the strcpy() function (see below). For
|
||
example, the expression:
|
||
|
||
strcpy( $4, "Addr." );
|
||
|
||
applied to the same line above would yield:
|
||
|
||
120 PRINT "Name Addr. Zip"
|
||
|
||
|
||
Predefined Variables
|
||
|
||
The following variables are pre-defined:
|
||
|
||
FS Field separator (see below).
|
||
RS Record separator (see below also).
|
||
NF Number of fields in current input record (line).
|
||
NR Number of records processed thus far.
|
||
FILENAME Name of current input file.
|
||
BEGIN A special <pattern> that matches the beginning of
|
||
input text.
|
||
END A special <pattern> that matches the end of input
|
||
text.
|
||
|
||
Awk also provides some useful built-in functions for string manipulation
|
||
and printing:
|
||
|
||
print(arg) Simple printing of strings only, terminated by '\n'.
|
||
printf(arg...) Exactly the printf() function from C.
|
||
getline() Reads the next record and returns 0 on end of file.
|
||
nextfile() Closes the current input file and begins processing
|
||
the next file
|
||
strlen(s) Returns the length of its string argument.
|
||
strcpy(s,t) Copies the string 't' to the string 's'.
|
||
strcmp(s,t) Compares the 's' to 't' and returns 0 if they match.
|
||
toupper(c) Returns its character argument converted to upper-
|
||
case.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
tolower(c) Returns its character argument converted to lower-
|
||
case.
|
||
match(s,@re@) Compares the string 's' to the regular expression 're'
|
||
and returns the number of matches found (zero if
|
||
none).
|
||
|
||
Authors
|
||
|
||
Awk was written by Saeko Hirabauashi and Kouichi Hirabayashi.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|