libdisorder
provides a simple C library for calculating classic Shannon entropy
(more to come in future releases).
git@github.com:locasto/libdisorder.git git@code.dyne.org:libdisorder.gitThanks particularly to jaromil at dyne.org for volunteering to host the repository.
ropy
that
scans a file (named on the command line) and reports bits of entropy.
`tool/ropy.c'
) that reports entropy values of
a file named on the command line. Usage:
ropy [-v] [filename]Here is a sample run on the
ropy
binary followed by
a run on the source file and then a run on data from /dev/urandom
:
[locasto@xorenduex tool]$ ./ropy ropy tokens: 225 entropy: 2.552223 maxent: 7.813781 ratio: 0.326631 [locasto@xorenduex tool]$ ./ropy ropy.c tokens: 80 entropy: 4.968827 maxent: 6.321928 ratio: 0.785967 [locasto@xorenduex tool]$ cat /dev/urandom > random ^C [locasto@xorenduex tool]$ ./ropy random tokens: 256 entropy: 7.999995 maxent: 8.000000 ratio: 0.999999 [locasto@xorenduex tool]$
test.c
in test/
directory)
This program produces output of the following form (one trial per line):
... [ 336] entropy: 7.989476 maxent: 8.000000 ratio: 0.998684 [ 337] entropy: 7.987074 maxent: 8.000000 ratio: 0.998384 [ 338] entropy: 7.988840 maxent: 8.000000 ratio: 0.998605 [ 339] entropy: 7.987904 maxent: 8.000000 ratio: 0.998488 [ 340] entropy: 7.990039 maxent: 8.000000 ratio: 0.998755 [ 341] entropy: 7.988061 maxent: 8.000000 ratio: 0.998508 ...
The image above is a gnuplot of about 5000 trials of reading /dev/urandom
into an 8K buffer. Entropy remains fairly high, very close to 8. This imagine is not cropped (i.e., all observed values are plotted). This data was collected using the test.c program in the libdisorder tarball.
shannon_H
:
float
shannon_H(char* buf,
long long length);
It takes a byte buffer and a length argument and returns a float indicating
the number of bits of entropy in the byte buffer. The return value should be
a real number between zero and eight.
The library also returns the maximum entropy for the data collection as
well as the ratio of the entropy to the max entropy.
NB: this code assumes that a token is defined by a sequence of 8 bits; thus, the entropy of a collection of these tokens has a maximum of 8. I'm contemplating methods to use other bit collections as the basis of a 'token'; with a different definition, we would see other maximum values for entropy.