155 lines
3.2 KiB
Groff
155 lines
3.2 KiB
Groff
.PU
|
|
.TH STATIST 1 local
|
|
.SH NAME
|
|
statist \- calculate Huffman distribution for
|
|
.IR freeze "(1)"
|
|
.SH SYNOPSIS
|
|
.ll +8
|
|
.B statist
|
|
[
|
|
.B \-g...
|
|
]
|
|
.ll -8
|
|
.br
|
|
.SH DESCRIPTION
|
|
The default table is tuned for both C texts and executable files (as in
|
|
LHARC). If you will freeze any other files (natural language texts,
|
|
databases, images, fonts, etc.) you can calculate the matching
|
|
positions distribution using the
|
|
.B "`statist'"
|
|
program, which calculates and displays the mentioned
|
|
distribution for the given file. It is useful for large (100K or more)
|
|
files.
|
|
|
|
Though the built-in position table is polyvalent, the tuning can increase
|
|
the compression rate up to one additional percent. (Observed mainly on
|
|
text files.)
|
|
.SH USAGE
|
|
.br
|
|
.B statist [\-g...] < sample_file
|
|
.in +8
|
|
or
|
|
.in -8
|
|
.B gensample | statist [\-g...]
|
|
.br
|
|
where
|
|
.B "`gensample'"
|
|
is a program generating some sample stream of
|
|
bytes similar to files to be frozen.
|
|
.PP
|
|
The
|
|
.B \-g
|
|
switch has the same meaning as for
|
|
.IR freeze "(1)"
|
|
and may be repeated.
|
|
.PP
|
|
You can also see the intermediate values
|
|
and watch their changes by pressing INTR key when you wish.
|
|
.PP
|
|
Note: If you use
|
|
.B "gensample | statist"
|
|
, remember that INTR influence BOTH
|
|
processes !!
|
|
.br
|
|
The results have the following format:
|
|
.br
|
|
.I "n1 n2 n3 n4 n5 n6 n7 n8"
|
|
(uncertainty =
|
|
.I x)
|
|
.br
|
|
Average match length:
|
|
.I xx.yy
|
|
.br
|
|
Percentile 99.9:
|
|
.I p999
|
|
.br
|
|
Percentile 99.5:
|
|
.I p995
|
|
.br
|
|
Percentile 99.0:
|
|
.I p990
|
|
.br
|
|
Percentile 97.0:
|
|
.I p970
|
|
.br
|
|
Percentile 95.0:
|
|
.I p950
|
|
.br
|
|
Percentile 90.0:
|
|
.I p900
|
|
.br
|
|
Percentile 80.0:
|
|
.I p800
|
|
.br
|
|
Percentile 70.0:
|
|
.I p700
|
|
.br
|
|
Percentile 50.0:
|
|
.I p500
|
|
.br
|
|
Sigma:
|
|
.I xx.yy
|
|
.br
|
|
.PP
|
|
Here
|
|
.I n1 \- n8
|
|
are values of the calculated position table elements,
|
|
uncertainty is a number which denotes validity of given results
|
|
(non-zero values of uncertainty indicate that the
|
|
results may be unusable). Other values (average match length,
|
|
percentiles and sigma) are FYI only.
|
|
.PP
|
|
You may create the
|
|
.IR /etc/default/freeze
|
|
file (if you don't like
|
|
.IR /etc/default/
|
|
directory, choose another - in MS-DOS it is FREEZE.CNF in
|
|
the directory of FREEZE.EXE), which has the following format:
|
|
.in +8
|
|
.I name
|
|
=
|
|
.I "n1 n2 n3 n4 n5 n6 n7 n8"
|
|
.in -8
|
|
.I (name
|
|
must start in column 1). For example:
|
|
.ll +8
|
|
.br
|
|
---------- cut here -----------
|
|
.br
|
|
# This is freeze's defaults file
|
|
.br
|
|
russian=0 0 1 2 6 20 31 2 # The sample was mailx.lp (Russian)
|
|
.br
|
|
english=0 0 1 2 7 16 36 0 # The sample was gcc.lp (English)
|
|
.br
|
|
# End of file
|
|
.br
|
|
---------- cut here -----------
|
|
.ll -8
|
|
.PP
|
|
If you find values, which are better THAN DEFAULT both for text (C
|
|
programs) and binary (executable) files, please send them to me.
|
|
|
|
Important note: statist.c is NOT a part of freeze package, it is an
|
|
aditional feature.
|
|
|
|
.SH "SEE ALSO"
|
|
freeze(1), melt(1), fcat(1)
|
|
.SH "DIAGNOSTICS"
|
|
Huffman tree has more than 8 levels, reducing...
|
|
.in +8
|
|
Self-explanatory, but sometimes reducing falls into infinite loop.
|
|
.in -8
|
|
.IR xxx K
|
|
.in +8
|
|
Progress indicator is written after each 4K of a file processed.
|
|
.in -8
|
|
.SH "BUGS"
|
|
Sometimes use of the results with uncertainty = 1 (on a file)
|
|
gives compression rate worse than default but use of the results
|
|
with uncertainty = 13 (on other file) works quite good.
|
|
.PP
|
|
Found bugs descriptions, incompatibilities, etc. please send to
|
|
leo@s514.ipmce.su.
|
|
|