1025 lines
26 KiB
Groff
1025 lines
26 KiB
Groff
.\"
|
|
.\" $Id: ispell.4,v 1.16 91/09/12 00:01:40 geoff Exp $
|
|
.\"
|
|
.\" Copyright 1988, 1989, by Geoff Kuenning, Manhattan Beach, CA
|
|
.\" Permission for non-profit use is hereby granted.
|
|
.\" All other rights reserved.
|
|
.\" See "version.h" for a more complete copyright notice.
|
|
.\"
|
|
.\" $Log: ispell.4,v $
|
|
.\" Revision 1.16 91/09/12 00:01:40 geoff
|
|
.\" Update the documentation of "texchars" to reflect the new requirements.
|
|
.\"
|
|
.\" Revision 1.15 91/05/27 21:47:52 geoff
|
|
.\" Add descriptions of the new statement ordering restrictions and of the
|
|
.\" "defstringtype" and "altstringtype" statements.
|
|
.\"
|
|
.\" Revision 1.14 90/04/26 22:44:03 geoff
|
|
.\" Add a description of the altstringchar statement.
|
|
.\"
|
|
.\" Revision 1.13 89/06/25 15:29:13 geoff
|
|
.\" Major changes to document new statements and options, primarily the
|
|
.\" introduction of string and boundary characters and the folding of
|
|
.\" case-sensitive and non-case-sensitive characters into one statement.
|
|
.\" Most of the text of these changes was taken from the english.aff file.
|
|
.\" Also mention that affix conversion can now be handled by the
|
|
.\" munchlist script.
|
|
.\"
|
|
.\" Revision 1.12 89/04/28 01:10:09 geoff
|
|
.\" Change Header to Id; nobody cares about my pathnames.
|
|
.\"
|
|
.\" Revision 1.11 89/02/22 23:13:45 geoff
|
|
.\" Correct the discussion of case in the root and flags so that it
|
|
.\" mentions the various options with which ispell can be built. Also mention
|
|
.\" that the input dictionary should be sorted for proper operation of
|
|
.\" buildhash (the text previously, incorrectly, stated that sorting was
|
|
.\" not needed).
|
|
.\"
|
|
.\" Revision 1.10 89/02/18 00:52:02 geoff
|
|
.\" Fix a minor English error, and the examples for "nroffchars" and "texchars".
|
|
.\" (Ken Stevens)
|
|
.\"
|
|
.\" Revision 1.9 88/12/26 02:27:27 geoff
|
|
.\" Update the copyright notice.
|
|
.\"
|
|
.\" Revision 1.8 88/11/27 00:17:16 geoff
|
|
.\" Fix a minor error in the discussion of how the case of an affix
|
|
.\" is matched to the case of the root.
|
|
.\"
|
|
.\" Revision 1.7 88/06/25 17:47:51 geoff
|
|
.\" Add descriptions of the new option-selection statements: nroffchars,
|
|
.\" texchars, allaffixes, and compoundwords.
|
|
.\"
|
|
.\" Revision 1.6 88/03/12 02:44:10 geoff
|
|
.\" Make the title line mention that ispell is local.
|
|
.\"
|
|
.\" Revision 1.5 87/09/26 13:40:48 geoff
|
|
.\" Document how to convert affixes to a new language table correctly.
|
|
.\"
|
|
.\" Revision 1.4 87/09/14 22:38:34 geoff
|
|
.\" Add copyright comments
|
|
.\"
|
|
.\" Revision 1.3 87/09/09 00:17:19 geoff
|
|
.\" Add a description of how the character-set statements define
|
|
.\" collating order. Also fix a typo in an example.
|
|
.\"
|
|
.\" Revision 1.2 87/08/28 21:20:25 geoff
|
|
.\" Add documentation of the affix file format. Remove English-specific stuff,
|
|
.\" which is now in english.4.
|
|
.\"
|
|
.\" Revision 1.1 87/05/30 16:50:28 geoff
|
|
.\" Initial revision
|
|
.\"
|
|
.\"
|
|
.TH ISPELL 4 local
|
|
.SH NAME
|
|
ispell \- format of ispell dictionaries and affix files
|
|
.SH DESCRIPTION
|
|
.PP
|
|
.IR Ispell (1)
|
|
requires two files to define the language that it is spell-checking.
|
|
The first file is a dictionary containing words for the language,
|
|
and the second is an "affix" file that defines the meaning of special
|
|
flags in the dictionary.
|
|
The two files are combined by
|
|
.I buildhash
|
|
(see
|
|
.IR ispell "(1))"
|
|
and written to a hash file which is not described here.
|
|
.PP
|
|
A raw
|
|
.I ispell
|
|
dictionary (either the main dictionary or your own personal
|
|
dictionary) contains a list of words, one per line.
|
|
Each word may optionally be followed by a slash ("/") and one or more
|
|
flags, which modify the root word as explained below.
|
|
Depending on the options with which
|
|
.I ispell
|
|
was built, case may or
|
|
may not be significant in either the root word or the flags, independently.
|
|
Specifically, if the compile-time option CAPITALIZATION is defined, case
|
|
is significant in the root word;
|
|
if not, case is ignored in the root word.
|
|
If the compile-time option MASKBITS is set to a value of 32, case is ignored
|
|
in the flags;
|
|
otherwise case is significant in the flags.
|
|
Contact your system administrator or
|
|
.I ispell
|
|
maintainer for more information (or just experiment).
|
|
The dictionary should be sorted with the
|
|
.B \-f
|
|
flag of
|
|
.IR sort (1)
|
|
before the hash file is built;
|
|
this is done automatically by
|
|
.IR munchlist (1),
|
|
which is the normal way of building dictionaries.
|
|
.PP
|
|
The case of the root word controls the case of words accepted by
|
|
.IR ispell ,
|
|
as follows:
|
|
.IP (1)
|
|
If the root word appears only in lower case (e.g.,
|
|
.IR bob "),"
|
|
it will be accepted in lower case, capitalized, or all capitals.
|
|
.IP (2)
|
|
If the root word appears capitalized (e.g.,
|
|
.IR Robert "),"
|
|
it will not
|
|
be accepted in
|
|
all-lower case, but will be accepted capitalized or all in capitals.
|
|
.IP (3)
|
|
If the root word appears all in capitals (e.g.,
|
|
.IR UNIX "),"
|
|
it will only be accepted all in capitals.
|
|
.IP (4)
|
|
If the root word appears with a "funny" capitalization (e.g.,
|
|
.IR ITCorp "),"
|
|
a word will be accepted only if it follows that capitalization, or if
|
|
it appears all in capitals.
|
|
.IP (5)
|
|
More than one capitalization of a root word may appear in the dictionary.
|
|
Flags from different capitalizations are combined by OR-ing them together.
|
|
.PP
|
|
Redundant capitalizations (e.g.,
|
|
.I bob
|
|
and
|
|
.IR Bob ")"
|
|
will be combined
|
|
by
|
|
.I buildhash
|
|
and by
|
|
.I ispell
|
|
(for personal dictionaries),
|
|
and can be removed from a raw dictionary by
|
|
.IR munchlist .
|
|
.PP
|
|
For example, the dictionary:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
bob
|
|
Robert
|
|
UNIX
|
|
ITcorp
|
|
ITCorp
|
|
.fi
|
|
.RE
|
|
.PP
|
|
will accept
|
|
.IR bob ,
|
|
.IR Bob ,
|
|
.IR BOB ,
|
|
.IR Robert ,
|
|
.IR ROBERT ,
|
|
.IR UNIX ,
|
|
.IR ITcorp ,
|
|
.IR ITCorp ,
|
|
and
|
|
.IR ITCORP ,
|
|
and will reject all others.
|
|
Some of the unacceptable forms are
|
|
.IR bOb ,
|
|
.IR robert ,
|
|
.IR Unix ,
|
|
and
|
|
.IR ItCorp .
|
|
.PP
|
|
As mentioned above, root words in any dictionary may be extended by flags.
|
|
Each flag is a single alphabetic character, which represents a prefix or
|
|
suffix
|
|
that may be added to the root to form a new word.
|
|
For example, in an English dictionary
|
|
the
|
|
.B D flag can be added to
|
|
.I bathe
|
|
to make
|
|
.IR bathed .
|
|
Since flags are represented as a single bit in the hashed dictionary, this
|
|
results in significant space savings.
|
|
The
|
|
.I munchlist
|
|
script will reduce an existing raw dictionary by adding flags when possible.
|
|
.PP
|
|
When a word is extended with an affix, the affix will be accepted only
|
|
if it appears in the same case
|
|
as the initial (prefix) or final (suffix) letter of the word.
|
|
Thus, for example, the entry
|
|
.I UNIX/M
|
|
in the main dictionary
|
|
.RB "(" M
|
|
means
|
|
add an apostrophe and an "s" to make a possessive) would accept
|
|
.I "UNIX'S"
|
|
but would reject
|
|
.IR "UNIX's" .
|
|
If
|
|
.I "UNIX's"
|
|
is legal, it must appear as a separate dictionary entry,
|
|
and it will not be combined by
|
|
.IR munchlist .
|
|
(In general, you don't need to worry about these things;
|
|
.I munchlist
|
|
guarantees that its output dictionary will accept the same set of
|
|
words as its input, so all you have to do is add words to the dictionary
|
|
and occasionally run munchlist to reduce its size).
|
|
.PP
|
|
As mentioned, the affix definition file describes the affixes associated
|
|
with particular flags.
|
|
It also describes the character set used by the language.
|
|
.PP
|
|
Although the affix-definition
|
|
grammar is designed for a line-oriented layout, it is actually
|
|
a free-format yacc grammar and can be laid out weirdly if you want.
|
|
Comments are started by a pound (sharp) sign (#),
|
|
and continue to the end of the line.
|
|
Backslashes are supported in the usual fashion (\fB\e\fInnn\fR, plus
|
|
specials
|
|
.BR \en ,
|
|
.BR \er ,
|
|
.BR \et ,
|
|
.BR \ev ,
|
|
.BR \ef ,
|
|
.BR \eb ,
|
|
and the new hex format \fB\ex\fInn\fR).
|
|
Any character
|
|
with special meaning to the parser can be changed to an uninterpreted
|
|
token by backslashing it;
|
|
for example, you can declare a flag named
|
|
'asterisk' or 'colon' with
|
|
.I "flag \e*:"
|
|
or
|
|
.IR "flag \e::" .
|
|
.PP
|
|
The grammar will be presented in a top-down fashion, with discussion
|
|
of each element.
|
|
An affix-definition file must contain exactly one table:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fItable\fR : [\fIheaders\fR] [\fIprefixes\fR] [\fIsuffixes\fR]
|
|
.fi
|
|
.RE
|
|
.PP
|
|
At least one of
|
|
.I prefixes
|
|
and
|
|
.I suffixes
|
|
is required.
|
|
They can appear in either order.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIheaders\fR : [ \fIoptions\fR ] \fIchar-sets\fR
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The headers describe options global to this dictionary and language.
|
|
These include the character sets to be used and the formatter, and
|
|
the defaults for certain
|
|
.I ispell
|
|
flags.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIoptions\fR : { \fIfmtr-stmt\fR | \fIbool-stmt\fR | \fIflag-stmt\fR }
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The options statements define the defaults for certain ispell flags
|
|
and for the character sets used by the formatters.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIfmtr-stmt\fR : { \fInroff-stmt\fR | \fItex-stmt\fR }
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A
|
|
.I fmtr-stmt
|
|
describes characters that have special meaning to a formatter.
|
|
Normally, this statement is not necessary, but some languages may have
|
|
preempted the usual defaults for use as language-specific characters.
|
|
In this case, these statements may be used to redefine the special characters
|
|
expected by the formatter.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fInroff-stmt\fR : { \fBnroffchars\fR | \fBtroffchars\fR } \fIstring\fR
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The
|
|
.B nroffchars
|
|
statement allows redefinition of certain
|
|
.I nroff
|
|
control characters.
|
|
The string given must be exactly five characters long, and must list
|
|
substitutions for the left and right parentheses ("()") , the period ("."),
|
|
the backslash ("\e"), and the asterisk ("*").
|
|
(The right parenthesis is not currently used, but is included for
|
|
completeness.)
|
|
For example, the statement:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fBnroffchars\fR {}.\e\e*
|
|
.fi
|
|
.RE
|
|
would replace the left and right parentheses with left and right curly
|
|
braces for purposes of parsing
|
|
.IR nroff / troff
|
|
strings, with no effect on the others (admittedly a contrived example).
|
|
Note that the backslash is escaped with a backslash.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fItex-stmt\fR : { \fBTeXchars\fR | \fBtexchars\fR } \fIstring\fR
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The
|
|
.B TeXchars
|
|
statement allows redefinition of certain TeX/LaTeX control characters.
|
|
The string given must be exactly thirteen characters long, and must list
|
|
substitutions for the left and right parentheses ("()") , the left
|
|
and right square brackets ("[]"), the left and right curly braces ("{}"),
|
|
the left and right angle brackets ("<>"),
|
|
the backslash ("\e"), the dollar sign ("$"), the asterisk ("*"),
|
|
the period or dot ("."), and the percent sign ("%").
|
|
For example, the statement:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fBtexchars\fR ()\e[\|]<\e><\e>\e\e$*.%
|
|
.fi
|
|
.RE
|
|
.PP
|
|
would replace the functions of the left and right curly braces with the
|
|
left and right angle brackets for purposes of parsing TeX/LaTeX constructs,
|
|
while retaining their functions for the
|
|
.I tib
|
|
bibliographic preprocessor.
|
|
Note that the backslash, the left square bracket, and the right angle bracket
|
|
must be escaped with a backslash.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIbool-stmt\fR : { \fIcmpnd-stmt\fR | \fIaff-stmt\fR }
|
|
.sp
|
|
\fIcmpnd-stmt\fR : \fBcompoundwords\fR \fIon-or-off\fR
|
|
.sp
|
|
\fIaff-stmt\fR : \fBallaffixes\fR\fIon-or-off\fR
|
|
.sp
|
|
\fIon-or-off\fR : { \fBon\fR | \fBoff\fR }
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A
|
|
.I bool-stmt
|
|
controls certain ispell defaults that are best made language-specific.
|
|
The
|
|
.B allaffixes
|
|
statement controls the default for the
|
|
.B \-P
|
|
and
|
|
.B \-m
|
|
options to
|
|
.I ispell.
|
|
If
|
|
.B allaffixes
|
|
is turned
|
|
.B off
|
|
(the default),
|
|
.I ispell
|
|
will default to the behavior of the
|
|
.I \-P
|
|
flag:
|
|
root/affix suggestions will only be made if there are no "near misses".
|
|
If
|
|
.B allaffixes
|
|
is turned
|
|
.BR on ,
|
|
.I ispell
|
|
will default to the behavior of the
|
|
.I \-m
|
|
flag:
|
|
root/affix suggestions will always be made.
|
|
The
|
|
.B compoundwords
|
|
statement controls the default for the
|
|
.B \-B
|
|
and
|
|
.B \-C
|
|
options to
|
|
.I ispell.
|
|
If
|
|
.B compoundwords
|
|
is turned
|
|
.B off
|
|
(the default),
|
|
.I ispell
|
|
will default to the behavior of the
|
|
.I \-B
|
|
flag:
|
|
run-together words will be reported as errors.
|
|
If
|
|
.B compoundwords
|
|
is turned
|
|
.BR on ,
|
|
.I ispell
|
|
will default to the behavior of the
|
|
.I \-C
|
|
flag:
|
|
run-together words will be considered as compounds if both are in
|
|
the dictionary.
|
|
This is useful for languages such as German and Norwegian, which
|
|
form large numbers of compound words.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIflag-stmt\fR : \fBflagmarker\fR \fIcharacter\fR
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The
|
|
.B flagmarker
|
|
statement describes the character which is used to separate affix
|
|
flags from the root word in a raw dictionary file.
|
|
This must be a
|
|
character which is not found in any word (including in string characters;
|
|
see below).
|
|
The default is "/" because this character is not normally
|
|
used to represent special characters in any language.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIchar-sets\fR : \fInorm-sets\fR [ \fIalt-sets\fR ]
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The character-set section describes the characters that can be part of
|
|
a word, and defines their collating order.
|
|
There must always be a definition of "normal" character sets; in
|
|
addition, there may be one or more partial definitions of "alternate"
|
|
sets which are used with various text formatters.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fInorm-sets\fR : [ \fIdeftype\fR ] charset-group
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A "normal" character set may optionally begin with a
|
|
definition of the file suffixes that make use of this set.
|
|
Following this are one or more character-set declarations.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIdeftype\fR : \fBdefstringtype\fR \fIname\fR \fIsuffix\fR*
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The
|
|
.B defstringtype
|
|
declaration gives a list of file suffixes which should make use of the
|
|
default string characters defined as part of the base character set;
|
|
it is only necessary if string characters are being defined.
|
|
The
|
|
.I name
|
|
is a string giving the name associated with these suffixes;
|
|
usually it is a formatter name.
|
|
If the formatter is a member of the troff family, "nroff" should be
|
|
used for the name associated with the most popular macro package;
|
|
members of the TeX family should use "tex".
|
|
Other names may be chosen freely, but they should be kept simple,
|
|
as they are used in
|
|
.I ispell 's
|
|
.B \-T
|
|
switch to specify a formatter type.
|
|
The suffixes are a whitespace-separated list of strings which, if
|
|
present at the end of a filename, indicate that the associated set of
|
|
string characters should be used by default for this file. For
|
|
example, the suffix list for the troff family typically includes
|
|
suffixes such as ".ms", ".me", ".mm", etc.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIcharset-group\fR : { \fIchar-stmt\fR | \fIstring-stmt\fR | \fIdup-stmt\fR}*
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A
|
|
.I char-stmt
|
|
describes single characters;
|
|
a
|
|
.I string-stmt
|
|
describes characters that must appear together as a string, and which
|
|
usually represent a single character in the target language.
|
|
Either may
|
|
also describe conversion between upper and lower case.
|
|
A
|
|
.I dup-stmt
|
|
is used to describe alternate forms of string characters, so that a
|
|
single dictionary may be used with several formatting
|
|
programs that use different conventions for representing non-ASCII
|
|
characters.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIchar-stmt\fR : \fBwordchars\fR \fIcharacter-range\fR
|
|
| \fBwordchars\fR \fIlowercase-range\fR \fIuppercase-range\fR
|
|
| \fBboundarychars\fR \fIcharacter-range\fR
|
|
| \fBboundarychars\fR \fIlowercase-range\fR \fIuppercase-range\fR
|
|
\fIstring-stmt\fR : \fBstringchar\fR \fIstring\fR
|
|
| \fBstringchar\fR \fIlowercase-string\fR \fIuppercase-string\fR
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Characters described with the
|
|
.B boundarychars
|
|
statement are considered
|
|
part of a word only if they are embedded between characters declared with the
|
|
.B wordchars
|
|
or
|
|
.B stringchar
|
|
statements.
|
|
.PP
|
|
If two ranges or strings are given in a
|
|
.I char-stmt
|
|
or
|
|
.IR string-stmt , the first describes
|
|
characters that are interpreted as lowercase and the second describes
|
|
uppercase.
|
|
In the case of a
|
|
.B stringchar
|
|
statement, the two strings must be of the same length.
|
|
Also, in a
|
|
.B stringchar
|
|
statement, the actual strings may contain
|
|
both uppercase and characters themselves without difficulty;
|
|
for instance, the statement
|
|
.PP
|
|
.RS
|
|
.nf
|
|
stringchar "\e\e*(sS" "\e\e*(Ss"
|
|
.fi
|
|
.RE
|
|
.PP
|
|
is legal and will not interfere with (or be interfered with by) other
|
|
declarations of of "s" and "S" as lower and upper case, respectively.
|
|
.PP
|
|
A final note on string characters:
|
|
some languages collate certain special characters as if they were strings.
|
|
For example, the German "a-umlaut"
|
|
is traditionally sorted as if it were "ae".
|
|
Ispell is not capable of this;
|
|
each character must be treated as an individual entity.
|
|
So in certain cases,
|
|
ispell will sort a list of words into a different order than the standard
|
|
"dictionary" order for the target language.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIalt-sets\fR : \fIalttype\fR \fIalt-stmt\fR*
|
|
.fi
|
|
.RE
|
|
.PP
|
|
Because different formatters use different notations to represent
|
|
non-ASCII characters,
|
|
.I ispell
|
|
must be aware of the representations used by these formatters.
|
|
These are declared as alternate sets of string characters.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIalttype\fR : \fBaltstringtype\fR \fIname\fR \fIsuffix\fR*
|
|
.fi
|
|
.RE
|
|
The
|
|
.B altstringtype
|
|
statement introduces each set by declaring the associated formatter
|
|
name and filename suffix list.
|
|
This name and list are interpreted exactly as in the
|
|
.B defstringtype
|
|
statement above.
|
|
Following this header are one or more \fIalt-stmt\fRs which declare
|
|
the alternate string characters used by this formatter.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIalt-stmt\fR : \fBaltstringchar\fR \fIalt-string\fR \fIstd-string\fR
|
|
.fi
|
|
.RE
|
|
.Pp
|
|
The
|
|
.I altstringchar
|
|
statement describes alternate representations for string
|
|
characters.
|
|
For example, the \-mm macro package of
|
|
.I troff
|
|
represents the German "a-umlaut" as
|
|
.IR a\e*: , while
|
|
.I TeX
|
|
uses the sequence \fI\e"a\fR.
|
|
If the
|
|
.I troff
|
|
versions are declared as the standard versions using
|
|
.BR stringchar ,
|
|
the
|
|
.I TeX
|
|
versions may be declared as alternates by using the statement
|
|
.PP
|
|
.RS
|
|
.nf
|
|
altstringchar \e\e\e"a a\e\e*\:
|
|
.fi
|
|
.RE
|
|
.PP
|
|
When the
|
|
.B altstringchar
|
|
statement is used to specify alternate forms,
|
|
all forms for a particular formatter must be declared together as a group.
|
|
Also, each formatter or macro package
|
|
must provide a complete set of characters, both
|
|
upper- and lower-case, and the character sequences used for each formatter
|
|
must be completely distinct.
|
|
Character sequences which describe upper- and lower-case versions of
|
|
the same printable character must also be the same length.
|
|
It may be necessary to define some new macros for a given formatter to
|
|
satisfy these restrictions.
|
|
(The current version of
|
|
.I buildhash
|
|
does not enforce these restrictions, but failure to obey them may
|
|
result in errors being introduced into files that are processed with
|
|
.IR ispell .)
|
|
.PP
|
|
An important minor point is that
|
|
.I ispell
|
|
assumes that all characters declared as
|
|
.B wordchars
|
|
or
|
|
.B boundarychars
|
|
will occupy exactly
|
|
one position on the terminal screen.
|
|
.PP
|
|
A single character-set statement can declare either a single character
|
|
or a contiguous range of characters.
|
|
A range is given as in egrep and the shell:
|
|
[a-z] means lowercase alphabetics;
|
|
[^a-z] means all but lowercase, etc.
|
|
All character-set statements are combined (unioned) to produce
|
|
the final list of characters that may be part of a word.
|
|
The collating order of the characters is defined by the order of their
|
|
declaration;
|
|
if a range is used, the characters are considered to have been declared
|
|
in ASCII order.
|
|
Characters that have case are collated next to each other, with the
|
|
uppercase character first.
|
|
.PP
|
|
The
|
|
character-declaration statements have a rather strange behavior caused by its
|
|
need to match each lowercase character with its uppercase equivalent.
|
|
In any given
|
|
.B wordchars
|
|
or
|
|
.B boundarychars
|
|
statement, the characters in each range are
|
|
first sorted into ASCII collating sequence, then matched one-for-one
|
|
with the other range.
|
|
(The two ranges must have the same number of characters).
|
|
Thus, for example, the two statements:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fBwordchars\fP [aeiou] [AEIOU]
|
|
\fBwordchars\fP [aeiou] [UOIEA]
|
|
.fi
|
|
.RE
|
|
.PP
|
|
would produce exactly the same effect.
|
|
To get the vowels to match
|
|
up "wrong", you would have to use separate statements:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fBwordchars\fP a U
|
|
\fBwordchars\fP e O
|
|
\fBwordchars\fP i I
|
|
\fBwordchars\fP o E
|
|
\fBwordchars\fP u A
|
|
.fi
|
|
.RE
|
|
.PP
|
|
which would cause uppercase 'e' to be 'O', and lowercase 'O' to be 'e'.
|
|
This should normally be a problem only with languages which have been
|
|
forced to use a strange ASCII collating sequence.
|
|
If your uppercase and lowercase letters both collate in the same order,
|
|
you shouldn't have to worry about this "feature".
|
|
.PP
|
|
The prefixes and suffixes sections have exactly the same syntax, except
|
|
for the introductory keyword.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIprefixes\fR : \fBprefixes\fI flagdef\fR*
|
|
\fIsuffixes\fR : \fBsuffixes\fI flagdef\fR*
|
|
\fIflagdef\fR : \fBflag\fR [\fB*\fR] \fIchar\fB : \fIrepl\fR*
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A prefix or suffix table consists of an introductory keyword and a list
|
|
of flag definitions.
|
|
Flags can be defined more than once, in which case
|
|
the definitions are combined.
|
|
Each flag controls one or more
|
|
.IR repl s
|
|
(replacements)
|
|
which are conditionally applied to the beginnings or endings of various
|
|
words.
|
|
.PP
|
|
Flags are named by a single character
|
|
.IR char .
|
|
Depending on a configuration option,
|
|
this character can be either any uppercase letter (the default
|
|
configuration) or any 7-bit ASCII character.
|
|
Most languages should be
|
|
able to get along with just 26 flags.
|
|
.PP
|
|
If an asterisk (\fB*\fP) is placed before the flag character,
|
|
it means that this
|
|
flag participates in
|
|
.I cross-product
|
|
formation.
|
|
This only matters if the
|
|
file contains both prefix and suffix tables.
|
|
If so, all prefixes and
|
|
suffixes marked with an asterisk will be applied in all cross-combinations
|
|
to the root word.
|
|
For example, consider the root
|
|
.I fix
|
|
with prefixes
|
|
.I pre
|
|
and
|
|
.IR in ,
|
|
and suffixes
|
|
.I es
|
|
and
|
|
.IR ed .
|
|
If all flags controlling these prefixes and suffixes are marked with an
|
|
asterisk, then the single root
|
|
.I fix
|
|
would also generate
|
|
.IR prefix ,
|
|
.IR prefixes ,
|
|
.IR prefixed ,
|
|
.IR infix ,
|
|
.IR infixes ,
|
|
.IR infixed ,
|
|
.IR fix ,
|
|
.IR fixes ,
|
|
and
|
|
.IR fixed .
|
|
Cross-product formation can produce a large number of words quickly, some
|
|
of which may be illegal, so watch out.
|
|
If cross-products produce illegal
|
|
words,
|
|
.I munchlist
|
|
will not produce those flag combinations, and the flag will not be useful.
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIrepl\fR : \fIcondition\fR* \fB>\fR [ \fB- \fIstrip-string \fB,\fR ] \fIappend-string\fR
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A
|
|
.I repl
|
|
is a conditional rule for modifying a root word.
|
|
Up to 8
|
|
.I conditions
|
|
may be specified.
|
|
If the
|
|
.I conditions
|
|
are satisfied, the
|
|
rules on the right-hand side of the
|
|
.I repl
|
|
are applied, as follows:
|
|
.IP (1)
|
|
If a strip-string is given, it is first stripped from
|
|
the beginning or ending (as appropriate) of the root word.
|
|
.IP (2)
|
|
Then the append-string is added at that point.
|
|
.PP
|
|
For example, the
|
|
.I condition
|
|
.B .
|
|
means "any word", and the
|
|
.I condition
|
|
.B Y
|
|
means "any word ending in Y".
|
|
The following (suffix) replacements:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
. > MENT
|
|
Y > -Y,IES
|
|
.fi
|
|
.RE
|
|
.PP
|
|
would change
|
|
.I induce
|
|
to
|
|
.I inducement
|
|
and
|
|
.I fly
|
|
to
|
|
.IR flies .
|
|
(If they were controlled by the same flag, they would also change
|
|
.I fly
|
|
to
|
|
.IR flyment ,
|
|
which might not be what was wanted.
|
|
.I Munchlist
|
|
can be used to protect against this sort of problem;
|
|
see the command sequence given below.)
|
|
.PP
|
|
No matter how much you might wish it, the strings on the right must be
|
|
strings of specific characters, not ranges.
|
|
The reasons are rooted deeply in the way
|
|
.I ispell
|
|
works, and it would be difficult or impossible to provide
|
|
for more flexibility.
|
|
For example, you might wish to write:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
[EY] > -[EY],IES
|
|
.fi
|
|
.RE
|
|
.PP
|
|
This will not work.
|
|
Instead, you must use two separate rules:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
E > -E,IES
|
|
Y > -Y,IES
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The application of
|
|
.IR repl s
|
|
can be restricted to certain words with
|
|
.IR conditions :
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fIcondition\fR : { \fB.\fR | \fIcharacter\fR | \fIrange\fR }
|
|
.fi
|
|
.RE
|
|
.PP
|
|
A
|
|
.I condition
|
|
is a restriction on the characters that adjoin, and/or are
|
|
replaced by, the right-hand side of the
|
|
.IR repl .
|
|
Up to 8
|
|
.I conditions
|
|
may be given, which should be enough context for anyone.
|
|
The right-hand side will be applied only if the
|
|
.I conditions
|
|
in the
|
|
.I repl
|
|
are satisfied.
|
|
The
|
|
.I conditions
|
|
also implicitly define a length;
|
|
roots shorter than the number of
|
|
.I conditions
|
|
will not pass the test.
|
|
(As a special case, a
|
|
.I condition
|
|
of a single dot "." defines a length of zero,
|
|
so that the rule applies to all words indiscriminately).
|
|
This length is independent of the separate test that insists that
|
|
all flags produce an output word length of at least four.
|
|
.PP
|
|
.I
|
|
Conditions
|
|
that are single characters should be separated by white space.
|
|
For example, to specify words ending in "ED", write:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
E D > -ED,ING # As in covered > covering
|
|
.fi
|
|
.RE
|
|
.PP
|
|
If you write:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
ED > -ED,ING
|
|
.fi
|
|
.RE
|
|
.PP
|
|
the effect will be the same as:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
[ED] > -ED,ING
|
|
.fi
|
|
.RE
|
|
.PP
|
|
As a final minor, but important point, it is sometimes useful to rebuild
|
|
a dictionary file using an incompatible suffix file.
|
|
For example,
|
|
suppose you expanded the "R" flag to generate "er" and "ers" (thus
|
|
making the Z flag somewhat obsolete).
|
|
To build a new dictionary
|
|
.I newdict
|
|
that, using
|
|
.IR newaffixes ,
|
|
will accept exactly the same list of
|
|
words as the old list
|
|
.I olddict
|
|
did using
|
|
.IR oldaffixes ,
|
|
the
|
|
.B \-c
|
|
switch of
|
|
.I munchlist
|
|
is useful, as in the following example:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
$ munchlist -c oldaffixes -l newaffixes olddict > newdict
|
|
.fi
|
|
.RE
|
|
.PP
|
|
If you use this procedure, your new dictionary will always accept the
|
|
same list the original did, even if you badly screwed up the affix
|
|
file.
|
|
This is because
|
|
.I munchlist
|
|
compares the words generated by a flag with the original word list, and
|
|
refuses to use any flags that generate illegal words.
|
|
(But don't forget that the
|
|
.I munchlist
|
|
step takes a long time and eats up temporary file space).
|
|
.SH EXAMPLES
|
|
.PP
|
|
As an example of conditional suffixes, here is the specification of the
|
|
.B S
|
|
flag from the English affix file:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
flag *S:
|
|
[^AEIOU]Y > -Y,IES # As in imply > implies
|
|
[AEIOU]Y > S # As in convey > conveys
|
|
[SXZH] > ES # As in fix > fixes
|
|
[^SXZHY] > S # As in bat > bats
|
|
.fi
|
|
.RE
|
|
.PP
|
|
The first line applies to words ending in Y, but not in vowel-Y.
|
|
The second takes care of the vowel-Y words.
|
|
The third then handles those words that end in a sibilant
|
|
or near-sibilant, and the last picks up everything else.
|
|
.PP
|
|
Note that the
|
|
.I conditions
|
|
are written very carefully so that they apply
|
|
to disjoint sets of words.
|
|
In particular, note that the fourth line
|
|
excludes words ending in Y as well as the obvious SXZH.
|
|
Otherwise, it would convert "imply" into "implys".
|
|
.PP
|
|
Although the English affix file does not do so, you can also have a flag
|
|
generate more than one variation on a root word.
|
|
For example, we could extend the English "R" flag as follows:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
flag *R:
|
|
E > R # As in skate > skater
|
|
E > RS # As in skate > skaters
|
|
[^AEIOU]Y > -Y,IER # As in multiply > multiplier
|
|
[^AEIOU]Y > -Y,IERS # As in multiply > multipliers
|
|
[AEIOU]Y > ER # As in convey > conveyer
|
|
[AEIOU]Y > ERS # As in convey > conveyers
|
|
[^EY] > ER # As in build > builder
|
|
[^EY] > ERS # As in build > builders
|
|
.fi
|
|
.RE
|
|
.PP
|
|
This flag would generate both "skater" and "skaters" from "skate".
|
|
This capability can be very useful in languages that make use of noun, verb,
|
|
and adjective endings.
|
|
For instance, one could define a single flag
|
|
that generated all of the German "weak" verb endings.
|
|
.SH "SEE ALSO"
|
|
ispell(1)
|