libdisorder: A Simple C Library for Entropy Measurement

First posted: 4 March 2010
Last updated: 27 August 2010

Introduction

Disorder and chaos are interesting phenomena. Calculating the amount of entropy, information, or disorder in an information stream or data collection has many interesting applications. libdisorder provides a simple C library for calculating classic Shannon entropy (more to come in future releases).

News

17 August 2010: FreeBSD and OpenBSD ports now available (thanks to Kevin Lo)
20 July 2010: libdisorder code is also hosted at code.dyne.org
16 July 2010: libdisorder code is now hosted at github.com
16 July 2010: this page moved from gmu.edu to freshdefense.net
8 March 2010: version 0.0.2 available

Source Code Access

You can retrieve the source from github.com or dyne.org via git over SSH. I plan to push changes to both these repositories.

     git@github.com:locasto/libdisorder.git
     git@code.dyne.org:libdisorder.git

Thanks particularly to jaromil at dyne.org for volunteering to host the repository.

Tarballs

NB: The current best available version is 0.0.2. This page also provides older versions, but these are known to contain bugs. Version 0.0.2 includes a bugfix related to counting the number of tokens in a sample. It also introduces a command line tool ropy that scans a file (named on the command line) and reports bits of entropy.
  1. Download libdisorder-0.0.2.tar.gz (source tarball) [MD5]
  2. Download libdisorder-0.0.1.tar.gz (source tarball) [MD5] (known bug)

Ports

  1. FreeBSD: http://www.freebsd.org/cgi/cvsweb.cgi/ports/devel/libdisorder/
  2. OpenBSD: http://www.openbsd.org/cgi-bin/cvsweb/ports/devel/libdisorder/

Documentation

  1. As of version 0.0.2, the library comes with a command line tool (see `tool/ropy.c') that reports entropy values of a file named on the command line. Usage:
           ropy [-v] [filename]
          
    Here is a sample run on the ropy binary followed by a run on the source file and then a run on data from /dev/urandom:
    [locasto@xorenduex tool]$ ./ropy ropy
    tokens: 225 entropy: 2.552223 maxent: 7.813781 ratio: 0.326631
    [locasto@xorenduex tool]$ ./ropy ropy.c
    tokens: 80 entropy: 4.968827 maxent: 6.321928 ratio: 0.785967
    [locasto@xorenduex tool]$ cat /dev/urandom > random
    ^C
    [locasto@xorenduex tool]$ ./ropy random 
    tokens: 256 entropy: 7.999995 maxent: 8.000000 ratio: 0.999999
    [locasto@xorenduex tool]$ 
    
  2. Tutorial / Basic Use Case (see test.c in test/ directory) This program produces output of the following form (one trial per line):
    ...
    [     336] entropy: 7.989476 maxent: 8.000000 ratio: 0.998684
    [     337] entropy: 7.987074 maxent: 8.000000 ratio: 0.998384
    [     338] entropy: 7.988840 maxent: 8.000000 ratio: 0.998605
    [     339] entropy: 7.987904 maxent: 8.000000 ratio: 0.998488
    [     340] entropy: 7.990039 maxent: 8.000000 ratio: 0.998755
    [     341] entropy: 7.988061 maxent: 8.000000 ratio: 0.998508
    ...
    
  3. A nice picture of the library in action (see image below)

entropy of /dev/urandom for ~5K trials with 8K buffer

The image above is a gnuplot of about 5000 trials of reading /dev/urandom into an 8K buffer. Entropy remains fairly high, very close to 8. This imagine is not cropped (i.e., all observed values are plotted). This data was collected using the test.c program in the libdisorder tarball.

Manual Documentation

If anyone wants to write a manual or info page, please feel free.
The major function that this library exposes is shannon_H:

float
shannon_H(char* buf,
          long long length);

It takes a byte buffer and a length argument and returns a float indicating the number of bits of entropy in the byte buffer. The return value should be a real number between zero and eight. The library also returns the maximum entropy for the data collection as well as the ratio of the entropy to the max entropy. NB: this code assumes that a token is defined by a sequence of 8 bits; thus, the entropy of a collection of these tokens has a maximum of 8. I'm contemplating methods to use other bit collections as the basis of a 'token'; with a different definition, we would see other maximum values for entropy.

Related Work

  1. A Mathematical Theory of Communication by Claude E. Shannon [PDF]
  2. Forum: Searching For Entropy Tool (OpenRCE.org)
  3. Scanning Data for Entropy Anomalies (dkbza.org)
  4. Calculating Entropy for Data Mining (onlamp.com; this is the source for the first version of libdisorder's implementation)
  5. Finding Entropy in Binary Files(deadhacker.com)
  6. XMagic
  7. Cryptool
  8. NetEntropy
  9. Analysing the byte entropy of a FAT formatted disk

Contact

Email me with questions, comments, patches, or requests. You can reach me by using my first name at this domain or my last name at ucalgary.ca.