Files
oldlinux-files/study/Ref-docs/manual sed/sed_3.html
2024-02-19 00:25:23 -05:00

572 lines
17 KiB
HTML

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.54
from ../texi/sed.texi on 28 October 1999 -->
<TITLE>sed, a stream editor - sed Programs</TITLE>
<link href="sed_4.html" rel=Next>
<link href="sed_2.html" rel=Previous>
<link href="sed_toc.html" rel=ToC>
</HEAD>
<BODY>
<p>Go to the <A HREF="sed_1.html">first</A>, <A HREF="sed_2.html">previous</A>, <A HREF="sed_4.html">next</A>, <A HREF="sed_9.html">last</A> section, <A HREF="sed_toc.html">table of contents</A>.
<P><HR><P>
<H1><A NAME="SEC3" HREF="sed_toc.html#TOC3">SED Programs</A></H1>
<P>
<A NAME="IDX19"></A>
<A NAME="IDX20"></A>
A SED program consists of one or more SED commands,
passed in by one or more of the
<CODE>-e</CODE>, <CODE>-f</CODE>, <CODE>--expression</CODE>, and <CODE>--file</CODE>
options, or the first non-option argument if zero of these
options are used.
This document will refer to "the" SED script;
this will be understood to mean the in-order catenation
of all of the <VAR>script</VAR>s and <VAR>script-file</VAR>s passed in.
</P>
<P>
Each SED command consists of an optional address or
address range, followed by a one-character command name
and any additional command-specific code.
</P>
<H2><A NAME="SEC4" HREF="sed_toc.html#TOC4">Selecting lines with SED</A></H2>
<P>
<A NAME="IDX21"></A>
<A NAME="IDX22"></A>
<A NAME="IDX23"></A>
</P>
<P>
Addresses in a SED script can be in any of the following forms:
<DL COMPACT>
<DT><SAMP>`<VAR>number</VAR>'</SAMP>
<DD>
<A NAME="IDX24"></A>
<A NAME="IDX25"></A>
Specifying a line number will match only that line in the input.
(Note that SED counts lines continuously across all input files.)
<DT><SAMP>`<VAR>first</VAR>~<VAR>step</VAR>'</SAMP>
<DD>
<A NAME="IDX26"></A>
This GNU extension matches every <VAR>step</VAR>th line
starting with line <VAR>first</VAR>.
In particular, lines will be selected when there exists
a non-negative <VAR>n</VAR> such that the current line-number equals
<VAR>first</VAR> + (<VAR>n</VAR> * <VAR>step</VAR>).
Thus, to select the odd-numbered lines,
one would use <CODE>1~2</CODE>;
to pick every third line starting with the second, <CODE>2~3</CODE> would be used;
to pick every fifth line starting with the tenth, use <CODE>10~5</CODE>;
and <CODE>50~0</CODE> is just an obscure way of saying <CODE>50</CODE>.
<DT><SAMP>`$'</SAMP>
<DD>
<A NAME="IDX27"></A>
<A NAME="IDX28"></A>
<A NAME="IDX29"></A>
This address matches the last line of the last file of input.
<DT><SAMP>`/<VAR>regexp</VAR>/'</SAMP>
<DD>
<A NAME="IDX30"></A>
<A NAME="IDX31"></A>
This will select any line which matches the regular expression <VAR>regexp</VAR>.
If <VAR>regexp</VAR> itself includes any <CODE>/</CODE> characters,
each must be escaped by a backslash (<CODE>\</CODE>).
<DT><SAMP>`\%<VAR>regexp</VAR>%'</SAMP>
<DD>
(The <CODE>%</CODE> may be replaced by any other single character.)
<A NAME="IDX32"></A>
This also matches the regular expression <VAR>regexp</VAR>,
but allows one to use a different delimiter than <CODE>/</CODE>.
This is particularly useful if the <VAR>regexp</VAR> itself contains
a lot of <CODE>/</CODE>s, since it avoids the tedious escaping of every <CODE>/</CODE>.
If <VAR>regexp</VAR> itself includes any delimiter characters,
each must be escaped by a backslash (<CODE>\</CODE>).
<DT><SAMP>`/<VAR>regexp</VAR>/I'</SAMP>
<DD>
<DT><SAMP>`\%<VAR>regexp</VAR>%I'</SAMP>
<DD>
<A NAME="IDX33"></A>
The <CODE>I</CODE> modifier to regular-expression matching is a GNU
extension which causes the <VAR>regexp</VAR> to be matched in
a case-insensitive manner.
</DL>
<P>
If no addresses are given, then all lines are matched;
if one address is given, then only lines matching that
address are matched.
</P>
<P>
<A NAME="IDX34"></A>
<A NAME="IDX35"></A>
An address range can be specified by specifying two addresses
separated by a comma (<CODE>,</CODE>).
An address range matches lines starting from where the first
address matches, and continues until the second address matches
(inclusively).
If the second address is a <VAR>regexp</VAR>, then checking for the
ending match will start with the line <EM>following</EM> the
line which matched the first address.
If the second address is a <VAR>number</VAR> less than (or equal to)
the line matching the first address,
then only the one line is matched.
</P>
<P>
<A NAME="IDX36"></A>
<A NAME="IDX37"></A>
Appending the <CODE>!</CODE> character to the end of an address
specification will negate the sense of the match.
That is, if the <CODE>!</CODE> character follows an address range,
then only lines which do <EM>not</EM> match the address range
will be selected.
This also works for singleton addresses,
and, perhaps perversely, for the null address.
</P>
<H2><A NAME="SEC5" HREF="sed_toc.html#TOC5">Overview of regular expression syntax</A></H2>
<P>
[[I may add a brief overview of regular expressions at a later date;
for now see any of the various other documentations for regular
expressions, such as the AWK info page.]]
</P>
<H2><A NAME="SEC6" HREF="sed_toc.html#TOC6">Where SED buffers data</A></H2>
<P>
<A NAME="IDX38"></A>
<A NAME="IDX39"></A>
<A NAME="IDX40"></A>
<A NAME="IDX41"></A>
SED maintains two data buffers: the active <EM>pattern</EM> space,
and the auxiliary <EM>hold</EM> space.
In "normal" operation, SED reads in one line from the
input stream and places it in the pattern space.
This pattern space is where text manipulations occur.
The hold space is initially empty, but there are commands
for moving data between the pattern and hold spaces.
</P>
<H2><A NAME="SEC7" HREF="sed_toc.html#TOC7">Often used commands</A></H2>
<P>
If you use SED at all, you will quite likely want to know
these commands.
</P>
<DL COMPACT>
<DT><SAMP>`#'</SAMP>
<DD>
[No addresses allowed.]
<A NAME="IDX42"></A>
<A NAME="IDX43"></A>
The <CODE>#</CODE> "command" begins a comment;
the comment continues until the next newline.
<A NAME="IDX44"></A>
If you are concerned about portability, be aware that
some implementations of SED (which are not POSIX.2
conformant) may only support a single one-line comment,
and then only when the very first character of the script is a <CODE>#</CODE>.
<A NAME="IDX45"></A>
<A NAME="IDX46"></A>
Warning: if the first two characters of the SED script
are <CODE>#n</CODE>, then the <CODE>-n</CODE> (no-autoprint) option is forced.
If you want to put a comment in the first line of your script
and that comment begins with the letter `n'
and you do not want this behavior,
then be sure to either use a capital `N',
or place at least one space before the `n'.
<DT><SAMP>`s/<VAR>regexp</VAR>/<VAR>replacement</VAR>/<VAR>flags</VAR>'</SAMP>
<DD>
(The <CODE>/</CODE> characters may be uniformly replaced by
any other single character within any given <CODE>s</CODE> command.)
<A NAME="IDX47"></A>
<A NAME="IDX48"></A>
<A NAME="IDX49"></A>
The <CODE>/</CODE> character (or whatever other character is used in its stead)
can appear in the <VAR>regexp</VAR> or <VAR>replacement</VAR>
only if it is preceded by a <CODE>\</CODE> character.
Also newlines may appear in the <VAR>regexp</VAR> using the two
character sequence <CODE>\n</CODE>.
The <CODE>s</CODE> command attempts to match the pattern
space against the supplied <VAR>regexp</VAR>.
If the match is successful, then that portion of the pattern
space which was matched is replaced with <VAR>replacement</VAR>.
<A NAME="IDX50"></A>
<A NAME="IDX51"></A>
The <VAR>replacement</VAR> can contain <CODE>\<VAR>n</VAR></CODE> (<VAR>n</VAR> being
a number from 1 to 9, inclusive) references, which refer to
the portion of the match which is contained between the <VAR>n</VAR>th
<CODE>\(</CODE> and its matching <CODE>\)</CODE>.
Also, the <VAR>replacement</VAR> can contain unescaped <CODE>&#38;</CODE>
characters which will reference the whole matched portion
of the pattern space.
To include a literal <CODE>\</CODE>, <CODE>&#38;</CODE>, or newline in the final
replacement, be sure to precede the desired <CODE>\</CODE>, <CODE>&#38;</CODE>,
or newline in the <VAR>replacement</VAR> with a <CODE>\</CODE>.
<A NAME="IDX52"></A>
<A NAME="IDX53"></A>
<A NAME="IDX54"></A>
The <CODE>s</CODE> command can be followed with zero or more of the
following <VAR>flags</VAR>:
<DL COMPACT>
<DT><SAMP>`g'</SAMP>
<DD>
<A NAME="IDX55"></A>
<A NAME="IDX56"></A>
Apply the replacement to <EM>all</EM> matches to the <VAR>regexp</VAR>,
not just the first.
<DT><SAMP>`p'</SAMP>
<DD>
<A NAME="IDX57"></A>
If the substitution was made, then print the new pattern space.
<DT><SAMP>`<VAR>number</VAR>'</SAMP>
<DD>
<A NAME="IDX58"></A>
Only replace the <VAR>number</VAR>th match of the <VAR>regexp</VAR>.
<DT><SAMP>`w <VAR>file-name</VAR>'</SAMP>
<DD>
<A NAME="IDX59"></A>
If the substitution was made, then write out the result to the named file.
<DT><SAMP>`I'</SAMP>
<DD>
(This is a GNU extension.)
<A NAME="IDX60"></A>
<A NAME="IDX61"></A>
Match <VAR>regexp</VAR> in a case-insensitive manner.
</DL>
<DT><SAMP>`q'</SAMP>
<DD>
[At most one address allowed.]
<A NAME="IDX62"></A>
<A NAME="IDX63"></A>
Exit SED without processing any more commands or input.
Note that the current pattern space is printed
if auto-print is not disabled.
<DT><SAMP>`d'</SAMP>
<DD>
<A NAME="IDX64"></A>
<A NAME="IDX65"></A>
Delete the pattern space;
immediately start next cycle.
<DT><SAMP>`p'</SAMP>
<DD>
<A NAME="IDX66"></A>
<A NAME="IDX67"></A>
Print out the pattern space (to the standard output).
This command is usually only used in conjunction with the <CODE>-n</CODE>
command-line option.
<A NAME="IDX68"></A>
Note: some implementations of SED, such as this one, will
double-print lines when auto-print is not disabled and the <CODE>p</CODE>
command is given.
Other implementations will only print the line once.
Both ways conform with the POSIX.2 standard, and so neither
way can be considered to be in error.
<A NAME="IDX69"></A>
Portable SED scripts should thus avoid relying on either behavior;
either use the <CODE>-n</CODE> option and explicitly print what you want,
or avoid use of the <CODE>p</CODE> command (and also the <CODE>p</CODE> flag to the
<CODE>s</CODE> command).
<DT><SAMP>`n'</SAMP>
<DD>
<A NAME="IDX70"></A>
<A NAME="IDX71"></A>
<A NAME="IDX72"></A>
If auto-print is not disabled, print the pattern space,
then, regardless, replace the pattern space with the next line of input.
If there is no more input then SED exits without processing
any more commands.
<DT><SAMP>`{ <VAR>commands</VAR> }'</SAMP>
<DD>
<A NAME="IDX73"></A>
<A NAME="IDX74"></A>
<A NAME="IDX75"></A>
A group of commands may be enclosed between
<CODE>{</CODE> and <CODE>}</CODE> characters.
(The <CODE>}</CODE> must appear in a zero-address command context.)
This is particularly useful when you want a group of commands
to be triggered by a single address (or address-range) match.
</DL>
<H2><A NAME="SEC8" HREF="sed_toc.html#TOC8">Less frequently used commands</A></H2>
<P>
Though perhaps less frequently used than those in the previous
section, some very small yet useful SED scripts can be built with
these commands.
</P>
<DL COMPACT>
<DT><SAMP>`y/<VAR>source-chars</VAR>/<VAR>dest-chars</VAR>/'</SAMP>
<DD>
(The <CODE>/</CODE> characters may be uniformly replaced by
any other single character within any given <CODE>y</CODE> command.)
<A NAME="IDX76"></A>
<A NAME="IDX77"></A>
Transliterate any characters in the pattern space which match
any of the <VAR>source-chars</VAR> with the corresponding character
in <VAR>dest-chars</VAR>.
Instances of the <CODE>/</CODE> (or whatever other character is used in its stead),
<CODE>\</CODE>, or newlines can appear in the <VAR>source-chars</VAR> or <VAR>dest-chars</VAR>
lists, provide that each instance is escaped by a <CODE>\</CODE>.
The <VAR>source-chars</VAR> and <VAR>dest-chars</VAR> lists <EM>must</EM>
contain the same number of characters (after de-escaping).
<DT><SAMP>`a\'</SAMP>
<DD>
<DT><SAMP>`<VAR>text</VAR>'</SAMP>
<DD>
[At most one address allowed.]
<A NAME="IDX78"></A>
<A NAME="IDX79"></A>
<A NAME="IDX80"></A>
Queue the lines of text which follow this command
(each but the last ending with a <CODE>\</CODE>,
which will be removed from the output)
to be output at the end of the current cycle,
or when the next input line is read.
<DT><SAMP>`i\'</SAMP>
<DD>
<DT><SAMP>`<VAR>text</VAR>'</SAMP>
<DD>
[At most one address allowed.]
<A NAME="IDX81"></A>
<A NAME="IDX82"></A>
<A NAME="IDX83"></A>
Immediately output the lines of text which follow this command
(each but the last ending with a <CODE>\</CODE>,
which will be removed from the output).
<DT><SAMP>`c\'</SAMP>
<DD>
<DT><SAMP>`<VAR>text</VAR>'</SAMP>
<DD>
<A NAME="IDX84"></A>
<A NAME="IDX85"></A>
<A NAME="IDX86"></A>
Delete the lines matching the address or address-range,
and output the lines of text which follow this command
(each but the last ending with a <CODE>\</CODE>,
which will be removed from the output)
in place of the last line
(or in place of each line, if no addresses were specified).
A new cycle is started after this command is done,
since the pattern space will have been deleted.
<DT><SAMP>`='</SAMP>
<DD>
[At most one address allowed.]
<A NAME="IDX87"></A>
<A NAME="IDX88"></A>
<A NAME="IDX89"></A>
Print out the current input line number (with a trailing newline).
<DT><SAMP>`l'</SAMP>
<DD>
<A NAME="IDX90"></A>
<A NAME="IDX91"></A>
<A NAME="IDX92"></A>
Print the pattern space in an unambiguous form:
non-printable characters (and the <CODE>\</CODE> character)
are printed in C-style escaped form;
long lines are split, with a trailing <CODE>\</CODE> character
to indicate the split; the end of each line is marked
with a <CODE>$</CODE>.
<DT><SAMP>`r <VAR>filename</VAR>'</SAMP>
<DD>
[At most one address allowed.]
<A NAME="IDX93"></A>
<A NAME="IDX94"></A>
<A NAME="IDX95"></A>
Queue the contents of <VAR>filename</VAR> to be read and
inserted into the output stream at the end of the current cycle,
or when the next input line is read.
Note that if <VAR>filename</VAR> cannot be read, it is treated as
if it were an empty file, without any error indication.
<DT><SAMP>`w <VAR>filename</VAR>'</SAMP>
<DD>
<A NAME="IDX96"></A>
<A NAME="IDX97"></A>
Write the pattern space to <VAR>filename</VAR>.
The <VAR>filename</VAR> will be created (or truncated) before the
first input line is read; all <CODE>w</CODE> commands (including
instances of <CODE>w</CODE> flag on successful <CODE>s</CODE> commands)
which refer to the same <VAR>filename</VAR> are output through
the same FILE stream.
<DT><SAMP>`D'</SAMP>
<DD>
<A NAME="IDX98"></A>
<A NAME="IDX99"></A>
Delete text in the pattern space up to the first newline.
If any text is left, restart cycle with the resultant
pattern space (without reading a new line of input),
otherwise start a normal new cycle.
<DT><SAMP>`N'</SAMP>
<DD>
<A NAME="IDX100"></A>
<A NAME="IDX101"></A>
<A NAME="IDX102"></A>
Add a newline to the pattern space,
then append the next line of input to the pattern space.
If there is no more input then SED exits without processing
any more commands.
<DT><SAMP>`P'</SAMP>
<DD>
<A NAME="IDX103"></A>
<A NAME="IDX104"></A>
Print out the portion of the pattern space up to the first newline.
<DT><SAMP>`h'</SAMP>
<DD>
<A NAME="IDX105"></A>
<A NAME="IDX106"></A>
<A NAME="IDX107"></A>
<A NAME="IDX108"></A>
Replace the contents of the hold space with the contents of the pattern space.
<DT><SAMP>`H'</SAMP>
<DD>
<A NAME="IDX109"></A>
<A NAME="IDX110"></A>
<A NAME="IDX111"></A>
Append a newline to the contents of the hold space,
and then append the contents of the pattern space to that of the hold space.
<DT><SAMP>`g'</SAMP>
<DD>
<A NAME="IDX112"></A>
<A NAME="IDX113"></A>
<A NAME="IDX114"></A>
<A NAME="IDX115"></A>
Replace the contents of the pattern space with the contents of the hold space.
<DT><SAMP>`G'</SAMP>
<DD>
<A NAME="IDX116"></A>
<A NAME="IDX117"></A>
<A NAME="IDX118"></A>
Append a newline to the contents of the pattern space,
and then append the contents of the hold space to that of the pattern space.
<DT><SAMP>`x'</SAMP>
<DD>
<A NAME="IDX119"></A>
<A NAME="IDX120"></A>
<A NAME="IDX121"></A>
Exchange the contents of the hold and pattern spaces.
</DL>
<H2><A NAME="SEC9" HREF="sed_toc.html#TOC9">Commands for die-hard SED programmers</A></H2>
<P>
In most cases, use of these commands indicates that you are
probably better off programming in something like PERL.
But occasionally one is committed to sticking with SED,
and these commands can enable one to write quite convoluted
scripts.
</P>
<P>
<A NAME="IDX122"></A>
<DL COMPACT>
<DT><SAMP>`: <VAR>label</VAR>'</SAMP>
<DD>
[No addresses allowed.]
<A NAME="IDX123"></A>
<A NAME="IDX124"></A>
Specify the location of <VAR>label</VAR> for the <CODE>b</CODE> and <CODE>t</CODE> commands.
In all other respects, a no-op.
<DT><SAMP>`b <VAR>label</VAR>'</SAMP>
<DD>
<A NAME="IDX125"></A>
<A NAME="IDX126"></A>
<A NAME="IDX127"></A>
Unconditionally branch to <VAR>label</VAR>.
The <VAR>label</VAR> may be omitted, in which case the next cycle is started.
<DT><SAMP>`t <VAR>label</VAR>'</SAMP>
<DD>
<A NAME="IDX128"></A>
<A NAME="IDX129"></A>
<A NAME="IDX130"></A>
Branch to <VAR>label</VAR> only if there has been a successful <CODE>s</CODE>ubstitution
since the last input line was read or <CODE>t</CODE> branch was taken.
The <VAR>label</VAR> may be omitted, in which case the next cycle is started.
</DL>
<P><HR><P>
<p>Go to the <A HREF="sed_1.html">first</A>, <A HREF="sed_2.html">previous</A>, <A HREF="sed_4.html">next</A>, <A HREF="sed_9.html">last</A> section, <A HREF="sed_toc.html">table of contents</A>.
</BODY>
</HTML>