Files
oldlinux-files/Linux-0.98/Yggdrasil-0.98.3/usr/man/man1/statist.1
2024-02-19 00:21:16 -05:00

155 lines
3.2 KiB
Groff

.PU
.TH STATIST 1 local
.SH NAME
statist \- calculate Huffman distribution for
.IR freeze "(1)"
.SH SYNOPSIS
.ll +8
.B statist
[
.B \-g...
]
.ll -8
.br
.SH DESCRIPTION
The default table is tuned for both C texts and executable files (as in
LHARC). If you will freeze any other files (natural language texts,
databases, images, fonts, etc.) you can calculate the matching
positions distribution using the
.B "`statist'"
program, which calculates and displays the mentioned
distribution for the given file. It is useful for large (100K or more)
files.
Though the built-in position table is polyvalent, the tuning can increase
the compression rate up to one additional percent. (Observed mainly on
text files.)
.SH USAGE
.br
.B statist [\-g...] < sample_file
.in +8
or
.in -8
.B gensample | statist [\-g...]
.br
where
.B "`gensample'"
is a program generating some sample stream of
bytes similar to files to be frozen.
.PP
The
.B \-g
switch has the same meaning as for
.IR freeze "(1)"
and may be repeated.
.PP
You can also see the intermediate values
and watch their changes by pressing INTR key when you wish.
.PP
Note: If you use
.B "gensample | statist"
, remember that INTR influence BOTH
processes !!
.br
The results have the following format:
.br
.I "n1 n2 n3 n4 n5 n6 n7 n8"
(uncertainty =
.I x)
.br
Average match length:
.I xx.yy
.br
Percentile 99.9:
.I p999
.br
Percentile 99.5:
.I p995
.br
Percentile 99.0:
.I p990
.br
Percentile 97.0:
.I p970
.br
Percentile 95.0:
.I p950
.br
Percentile 90.0:
.I p900
.br
Percentile 80.0:
.I p800
.br
Percentile 70.0:
.I p700
.br
Percentile 50.0:
.I p500
.br
Sigma:
.I xx.yy
.br
.PP
Here
.I n1 \- n8
are values of the calculated position table elements,
uncertainty is a number which denotes validity of given results
(non-zero values of uncertainty indicate that the
results may be unusable). Other values (average match length,
percentiles and sigma) are FYI only.
.PP
You may create the
.IR /etc/default/freeze
file (if you don't like
.IR /etc/default/
directory, choose another - in MS-DOS it is FREEZE.CNF in
the directory of FREEZE.EXE), which has the following format:
.in +8
.I name
=
.I "n1 n2 n3 n4 n5 n6 n7 n8"
.in -8
.I (name
must start in column 1). For example:
.ll +8
.br
---------- cut here -----------
.br
# This is freeze's defaults file
.br
russian=0 0 1 2 6 20 31 2 # The sample was mailx.lp (Russian)
.br
english=0 0 1 2 7 16 36 0 # The sample was gcc.lp (English)
.br
# End of file
.br
---------- cut here -----------
.ll -8
.PP
If you find values, which are better THAN DEFAULT both for text (C
programs) and binary (executable) files, please send them to me.
Important note: statist.c is NOT a part of freeze package, it is an
aditional feature.
.SH "SEE ALSO"
freeze(1), melt(1), fcat(1)
.SH "DIAGNOSTICS"
Huffman tree has more than 8 levels, reducing...
.in +8
Self-explanatory, but sometimes reducing falls into infinite loop.
.in -8
.IR xxx K
.in +8
Progress indicator is written after each 4K of a file processed.
.in -8
.SH "BUGS"
Sometimes use of the results with uncertainty = 1 (on a file)
gives compression rate worse than default but use of the results
with uncertainty = 13 (on other file) works quite good.
.PP
Found bugs descriptions, incompatibilities, etc. please send to
leo@s514.ipmce.su.