add directory Minix
This commit is contained in:
263
Minix/2.0.0/wwwman/man9/awk.9.html
Normal file
263
Minix/2.0.0/wwwman/man9/awk.9.html
Normal file
@@ -0,0 +1,263 @@
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<TITLE>awk(9)</TITLE>
|
||||
</HEAD>
|
||||
<BODY>
|
||||
<H1>awk(9)</H1>
|
||||
<HR>
|
||||
<PRE>
|
||||
<STRONG>Command:</STRONG> <STRONG>awk</STRONG> <STRONG>-</STRONG> <STRONG>pattern</STRONG> <STRONG>matching</STRONG> <STRONG>language</STRONG>
|
||||
<STRONG>Syntax:</STRONG> <STRONG>awk</STRONG> <EM>rules</EM> [<EM>file</EM>] ...
|
||||
<STRONG>Flags:</STRONG> (none)
|
||||
<STRONG>Examples:</STRONG> awk rules input # Process <EM>input</EM> according to <EM>rules</EM>
|
||||
awk rules - >out # Input from terminal, output to <EM>out</EM>
|
||||
|
||||
AWK is a programming language devised by Aho, Weinberger, and
|
||||
Kernighan at Bell Labs (hence the name). <EM>Awk</EM> programs search files for
|
||||
specific patterns and performs 'actions' for every occurrence of these
|
||||
patterns. The patterns can be 'regular expressions' as used in the <EM>ed</EM>
|
||||
editor. The actions are expressed using a subset of the C language.
|
||||
|
||||
The patterns and actions are usually placed in a 'rules' file whose
|
||||
name must be the first argument in the command line, preceded by the
|
||||
flag <STRONG>-f</STRONG>. Otherwise, the first argument on the command line is taken to
|
||||
be a string containing the rules themselves. All other arguments are
|
||||
taken to be the names of text files on which the rules are to be
|
||||
applied, with <STRONG>-</STRONG> being the standard input. To take rules from the
|
||||
standard input, use <STRONG>-f</STRONG> <STRONG>-</STRONG>.
|
||||
|
||||
The command:
|
||||
|
||||
<STRONG>awk</STRONG> <STRONG>rules</STRONG> <STRONG>prog.d*u</STRONG>
|
||||
|
||||
would read the patterns and actions rules from the file <EM>rules</EM> and apply
|
||||
them to all the arguments.
|
||||
|
||||
The general format of a rules file is:
|
||||
|
||||
<pattern> { <action> } <pattern> { <action> } ...
|
||||
|
||||
There may be any number of these <pattern> { <action> } sequences in the
|
||||
rules file. <EM>Awk</EM> reads a line of input from the current input file and
|
||||
applies every <pattern> { <action> } in sequence to the line.
|
||||
|
||||
If the <pattern> corresponding to any { <action> } is missing, the
|
||||
action is applied to every line of input. The default { <action> } is
|
||||
to print the matched input line.
|
||||
|
||||
<STRONG>Patterns</STRONG>
|
||||
|
||||
The <pattern>s may consist of any valid C expression. If the
|
||||
<pattern> consists of two expressions separated by a comma, it is taken
|
||||
to be a range and the <action> is performed on all lines of input that
|
||||
match the range. <pattern>s may contain 'regular expressions' delimited
|
||||
by an @ symbol. Regular expressions can be thought of as a generalized
|
||||
'wildcard' string matching mechanism, similar to that used by many
|
||||
operating systems to specify file names. Regular expressions may
|
||||
contain any of the following characters:
|
||||
|
||||
x An ordinary character
|
||||
\ The backslash quotes any character
|
||||
^ A circumflex at the beginning of an expr matches the beginning
|
||||
of a line.
|
||||
$ A dollar-sign at the end of an expression matches the end of a
|
||||
line.
|
||||
. A period matches any single character except newline.
|
||||
* An expression followed by an asterisk matches zero or more
|
||||
occurrences of that expression: 'fo*' matches 'f', 'fo', 'foo',
|
||||
'fooo', etc.
|
||||
+ An expression followed by a plus sign matches one or more
|
||||
occurrences of that expression: 'fo+' matches 'fo', 'foo',
|
||||
'fooo', etc.
|
||||
[] A string enclosed in square brackets matches any single
|
||||
character in that string, but no others. If the first
|
||||
character in the string is a circumflex, the expression matches
|
||||
any character except newline and the characters in the string.
|
||||
For example, '[xyz]' matches 'xx' and 'zyx', while '[^xyz]'
|
||||
matches 'abc' but not 'axb'. A range of characters may be
|
||||
specified by two characters separated by '-'.
|
||||
|
||||
<STRONG>Actions</STRONG>
|
||||
|
||||
Actions are expressed as a subset of the C language. All variables
|
||||
are global and default to int's if not formally declared. Only char's
|
||||
and int's and pointers and arrays of char and int are allowed. <EM>Awk</EM>
|
||||
allows only decimal integer constants to be used----no hex (0xnn) or
|
||||
octal (0nn). String and character constants may contain all of the
|
||||
special C escapes (\n, \r, etc.).
|
||||
|
||||
<EM>Awk</EM> supports the 'if', 'else', 'while' and 'break' flow of control
|
||||
constructs, which behave exactly as in C.
|
||||
|
||||
Also supported are the following unary and binary operators, listed
|
||||
in order from highest to lowest precedence:
|
||||
|
||||
<STRONG>Operator</STRONG> <STRONG>Type</STRONG> <STRONG>Associativity</STRONG>
|
||||
() [] unary left to right
|
||||
! ~ ++ -- - * & unary right to left
|
||||
* / % binary left to right
|
||||
+ - binary left to right
|
||||
<< >> binary left to right
|
||||
< <= > >= binary left to right
|
||||
== != binary left to right
|
||||
& binary left to right
|
||||
^ binary left to right
|
||||
| binary left to right
|
||||
&& binary left to right
|
||||
|| binary left to right
|
||||
= binary right to left
|
||||
|
||||
Comments are introduced by a '#' symbol and are terminated by the first
|
||||
newline character. The standard '/*' and '*/' comment delimiters are
|
||||
not supported and will result in a syntax error.
|
||||
|
||||
|
||||
<STRONG>Fields</STRONG>
|
||||
|
||||
|
||||
When <EM>awk</EM> reads a line from the current input file, the record is
|
||||
automatically separated into 'fields.' A field is simply a string of
|
||||
consecutive characters delimited by either the beginning or end of line,
|
||||
or a 'field separator' character. Initially, the field separators are
|
||||
the space and tab character. The special unary operator '$' is used to
|
||||
reference one of the fields in the current input record (line). The
|
||||
fields are numbered sequentially starting at 1. The expression '$0'
|
||||
references the entire input line.
|
||||
|
||||
Similarly, the 'record separator' is used to determine the end of
|
||||
an input 'line,' initially the newline character. The field and record
|
||||
separators may be changed programatically by one of the actions and will
|
||||
remain in effect until changed again.
|
||||
|
||||
Multiple (up to 10) field separators are allowed at a time, but
|
||||
only one record separator.
|
||||
|
||||
Fields behave exactly like strings; and can be used in the same
|
||||
context as a character array. These 'arrays' can be considered to have
|
||||
been declared as:
|
||||
|
||||
|
||||
char ($n)[ 128 ];
|
||||
|
||||
|
||||
In other words, they are 128 bytes long. Notice that the parentheses
|
||||
are necessary because the operators [] and $ associate from right to
|
||||
left; without them, the statement would have parsed as:
|
||||
|
||||
|
||||
char $(1[ 128 ]);
|
||||
|
||||
|
||||
which is obviously ridiculous.
|
||||
|
||||
If the contents of one of these field arrays is altered, the '$0'
|
||||
field will reflect this change. For example, this expression:
|
||||
|
||||
|
||||
*$4 = 'A';
|
||||
|
||||
|
||||
will change the first character of the fourth field to an upper- case
|
||||
letter 'A'. Then, when the following input line:
|
||||
|
||||
|
||||
120 PRINT "Name address Zip"
|
||||
|
||||
|
||||
is processed, it would be printed as:
|
||||
|
||||
|
||||
120 PRINT "Name Address Zip"
|
||||
|
||||
|
||||
Fields may also be modified with the strcpy() function (see below). For
|
||||
example, the expression:
|
||||
|
||||
strcpy( $4, "Addr." );
|
||||
|
||||
applied to the same line above would yield:
|
||||
|
||||
120 PRINT "Name Addr. Zip"
|
||||
|
||||
|
||||
<STRONG>Predefined</STRONG> <STRONG>Variables</STRONG>
|
||||
|
||||
The following variables are pre-defined:
|
||||
|
||||
FS Field separator (see below).
|
||||
RS Record separator (see below also).
|
||||
NF Number of fields in current input record (line).
|
||||
NR Number of records processed thus far.
|
||||
FILENAME Name of current input file.
|
||||
BEGIN A special <pattern> that matches the beginning of
|
||||
input text.
|
||||
END A special <pattern> that matches the end of input
|
||||
text.
|
||||
|
||||
<EM>Awk</EM> also provides some useful built-in functions for string manipulation
|
||||
and printing:
|
||||
|
||||
print(arg) Simple printing of strings only, terminated by '\n'.
|
||||
printf(arg...) Exactly the printf() function from C.
|
||||
getline() Reads the next record and returns 0 on end of file.
|
||||
nextfile() Closes the current input file and begins processing
|
||||
the next file
|
||||
strlen(s) Returns the length of its string argument.
|
||||
strcpy(s,t) Copies the string 't' to the string 's'.
|
||||
strcmp(s,t) Compares the 's' to 't' and returns 0 if they match.
|
||||
toupper(c) Returns its character argument converted to upper-
|
||||
case.
|
||||
|
||||
tolower(c) Returns its character argument converted to lower-
|
||||
case.
|
||||
match(s,@re@) Compares the string 's' to the regular expression 're'
|
||||
and returns the number of matches found (zero if
|
||||
none).
|
||||
|
||||
<STRONG>Authors</STRONG>
|
||||
|
||||
<EM>Awk</EM> was written by Saeko Hirabauashi and Kouichi Hirabayashi.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
</PRE>
|
||||
</BODY>
|
||||
</HTML>
|
||||
Reference in New Issue
Block a user