A C library for efficient handling of 2D binary matrices.
by Kevin Pulo, [email protected]
https://github.com/devkev/binmat
Copyright (c) 2013
Licensed under the GNU GPL v3 or higher, as per the COPYING file.
Some applications need to store and manipulate a 2D matrix of binary (ie. boolean) values. For example, an unweighted graph can be stored as an adjacency matrix - the (i,j)-th element is a boolean indicating whether or not the i-th node is connected to the j-th. Given such a data structure, k-connectivity (for each node, the other nodes that can be reached in at most k edges or "hops") can easily be computed by taking the k-th power of the adjacency matrix.
However, for large matrices, storing each element as a byte (or worse, a 32 or 64 bit integer) is incredibly inefficient and wasteful, both in time and space. For example, a 10k square matrix requires 400Mb of RAM if stored as 32 bit int values.
Binmat is a library that bit-packs these matrices, so that the above 10k square matrix requires just 12.5Mb (the minimum space possible to store such a dense matrix).
More than that, binmat takes advantage of extremely fast bit-operations when multiplying matrices. The usual series of multiplications and additions required to compute each element are replaced by bitwise AND and OR operations. Furthermore, on 64-bit hardware each bitwise operation can replace up to 64 multiplications or additions, reducing operations that can take hundreds of clock cycles down to just a single cycle. This gives rise to some very considerable performance increases, especially when taking the power of a matrix. Binary exponentiation is implemented to further improve the performance of higher matrix powers.
binmat currently uses a straightforward GNU make Makefile, and so a simple
make
(or gmake
) should work on all modern Unixy operating systems.
There are no dependencies beyond a C compiler and associated toolchain.
After building, simply copy libbinmat.so
and/or libbinmat.a
to a location
that the linker can find (eg. /usr/local/lib
), and libbinmat.h
to a location
that the C preprocessor/compiler can find (eg. /usr/local/include
).
Alternatively, you can specify -I/path/to/libbinmat
when compiling and
-L/path/to/libbinmat
when linking.
To use binmat, simply do
#include "libbinmat.h"
in your code, and then use -lbinmat
when linking.
If the HAVE_UNSIGNED_LONG_LONG_INT
preprocessor symbol is defined, then binmat
will use unsigned long long
for its underlying chunk storage, otherwise
unsigned long
will be used. Alternatively, the BINMAT_DATA_TYPE
macro can
be defined, and that will be used instead, eg. -DBINMAT_DATA_TYPE='unsigned char'
. This type must be unsigned.
-
binmat_data_t
The underlying bit storage, aka the "chunk" (unsigned long long
orunsigned long
). -
binmat_index_t
The type of indices into arrays ofbinmat_data_t
(unsigned int
). -
binmat_bool_t
The native C type used to indicate true/false (unsigned char
).
-
const binmat_data_t one
The constant1U
as abinmat_data_t
type. -
const binmat_data_t zero
The constant0U
as abinmat_data_t
type.
-
binmat_chunkbytes
The number of bytes in a chunk. -
binmat_chunkbits
The number of bits in a chunk. -
binmat_numchunks(n)
The number of chunks required to store n bits. -
binmat_numbytes(n)
The number of bytes required to store n bits. -
binmat_numbits(n)
The number of bits required to store n bits (will be greater than n when n is not a multiple ofbinmat_chunkbits
). -
binmat_numbytes_matrix(n)
The number of bytes required to store an n by n matrix. -
binmat_finalchunkbits(n)
The number of bits used in the final chunk (for n bits total). If n is a multiple ofbinmat_chunkbits
, then this will bebinmat_chunkbits
(not 0). -
binmat_finalchunkmask(n)
A mask of the bits used in the final chunk. -
binmat_binary_bufn_t(x,n)
Declare achar
array called x of sufficient size to render n chunks. -
binmat_binary_buf_t(x)
Declare achar
array called x of sufficient size to render a single chunk.
-
binmat_data_t *binmat_alloc(binmat_index_t n)
Allocate an n by n matrix (initialised to all zeros) and return a reference to it. -
void binmat_free(binmat_data_t *m)
Free an allocated matrix m. -
binmat_bool_t binmat_getbit(const binmat_data_t *m, binmat_index_t n, binmat_index_t row, binmat_index_t col)
Returns the (row,col) bit of the n by n matrix m. -
void binmat_setbit(binmat_data_t *m, binmat_index_t n, binmat_index_t row, binmat_index_t col, binmat_bool_t value)
Sets the (row,col) bit of the n by n matrix m to be value. -
char *binmat_format_chunk(binmat_data_t x, char *buf)
Helper function that renders the chunk x into buf (and returns a reference to it). The buffer must be of sufficient length (see thebinmat_binary_buf_t()
macro). -
void binmat_print_matrix_slow(FILE *f, const binmat_data_t *m, binmat_index_t n)
Uses a slow, obvious algorithm (ie. loop over the bits and callbinmat_getbit()
) to print the n by n matrix m to the stdio FILE f. -
void binmat_print_matrix_fast(FILE *f, const binmat_data_t *m, binmat_index_t n)
Use a faster algorithm to print the n by n matrix m to the stdio FILE f. Currently broken -
void binmat_print_matrix_hex(FILE *f, const binmat_data_t *m, binmat_index_t n)
Print the n by n matrix m to the stdio FILE f in a more compact hex notation. Currently broken -
void binmat_transpose(binmat_data_t *output, const binmat_data_t *input, binmat_index_t n)
Transpose the n by n matrix input and store the output in output (which must already have been allocated). output can be the same as input, in which case a temporary matrix will be allocated and used. -
int binmat_are_identical(const binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
Compare the two n by n matrices a and b, and return true if they are identical and false otherwise. -
void binmat_copy(binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
Copy the n by n matrix b into a (which must already have been allocated). -
void binmat_multiply(binmat_data_t *output, const binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
Uses a fast algorithm to multiply together the n by n matrices a and b. Note that the b matrix must have been transposed before being passed to this function. -
void binmat_multiply_slow(binmat_data_t *output, const binmat_data_t *a, const binmat_data_t *b, binmat_index_t n)
Uses the simple, naive matrix multiplication algorithm (ie. usingbinmat_getbit()
andbinmat_setbit()
) to multiply the n by n matrices a and b together, storing the result in output. Mostly useful to check the result ofbinmat_multiply()
. -
void binmat_power(binmat_data_t *output, const binmat_data_t *a, const binmat_data_t *trans, binmat_index_t n, unsigned int pow)
Takes the pow-th power of the n by n matrix a (with trans its transpose), storing the result in output. -
void binmat_power_slow(binmat_data_t *output, const binmat_data_t *a, binmat_index_t n, unsigned int pow)
Uses the slow, naive algorithm (ie. repeated application ofbinmat_multiply_slow()
) to take the pow-th power of the n by n matrix a, storing the result in output.
The binmat-test
program is used to test binmat's operations. In addition to
confirming that fast binmat multiplication and powering matches that of the
naive implementations, it also implements naive multiplication and power using
straightforward "traditional" int-based matrices, and compares the results
against them. This also means that it can be used for benchmarking, to test the
performance improvements possible with binmat (although the limiting factor
quickly becomes the time and space needed for the traditional matrices).
It takes the following command line options. n indicates a positive integer, and d indicates a (double) floating point value.
-
-n
,--size
n
Specify the size of the (square) matrix (default: 151). -
-p
,--power
n
Specify the power to which the matrix should be raised (default: 3). -
-w
,--warmups
n
Specify the number of times to power the matrix as a warmup (ie. untimed) (default: 3). -
-r
,--reps
n
Specify the number of times to power the matrix (timed) (default: 10). -
-t
,--trad
Include traditional matrix operations and comparisons (default). -
-T
,--no-trad
Do not include traditional matrix operations and comparisons. -
-tw
,--trad-warmups
n
Specify the number of times to power the traditional matrix as a warmup (default: 1). -
-tr
,--trad-reps
n
Specify the number of times to power the traditional matrix (default: 1). -
-d
,--density
d
Specify the density of ones in the random matrix (between 0.0 and 1.0) (default: 0.01). -
-s
,--seed
n
Specify the PRNG seed value (default:time(NULL)
). -
-v
,--verbose
,--print
Include verbose output (ie. print the matrices). -
-V
,--no-verbose
,--silent
Do not print the actual matrix contents (default). -
-l
,--limit
d
The maximum amount of memory (in Mb) that will be allocated (default: 1024).
binmat is copyright (c) 2013 Kevin Pulo, [email protected].
Licensed under the GNU GPL v3 or higher, as per the COPYING file.
However, if you're interested in using binmat in a way that is not compatible with the GPLv3+ (whether Free/Open Source or not), please feel free to contact me at the above email address to discuss the possibility of dual licensing or alternative arrangements.
Comments, feature suggestions, bug reports, patches, etc are most welcome. Feel free to submit issues, pull requests, etc on github.