A modest collection of utility classes that make some bit and byte-wise operations in Java a little bit more readable.
This tiny library can be used in projects where bit and byte manipulation is
performed to increase the readability of the code. In particular, it was
designed to supplement (and sometimes replace) Hadoop's Bytes
class, which is
very useful, but part of a much larger library.
This tiny library has no dependencies, which makes it useful in (for example) Hadoop's MapReduce tasks.
This library is available on Maven Central. The newest versions use the Java 11 API:
<dependency>
<groupId>org.lable.oss.bitsandbytes</groupId>
<artifactId>bitsandbytes</artifactId>
<version>4.0</version>
</dependency>
For Java 8, please use the latest 3.*
version.
In unit tests and debugging output the Binary
class can aid in making the result of bit-wise operations more
readable in your tests.
For example:
// Instead of:
assertThat(someBitManipulatingMethodUnderTest(0),
is(new byte[] {-1, -86, 15, 0})));
// Or in the bit-notation with the required casting to byte for values over 127:
assertThat(someBitManipulatingMethodUnderTest(0),
is(new byte[] {(byte) 0b11111111, (byte) 0b10101010, 0b00001111, 0b00000000})));
// Write:
assertThat(someBitManipulatingMethodUnderTest(0),
is(Binary.decode("10000000 10101010 00001111 00000000")));
To encode a byte[]
into a string of ones and zeroes:
byte[] input = "a1".getBytes();
// Using the defaults.
// 0110000100110001
String out = Binary.encode(input);
// Or specify formatting.
// 01100001 10000100
String spaceSeparated = Binary.encode(input, false, true);
// [01100001][10000100]
String delimited = Binary.encode(input, true, false);
Turn hexadecimal string representations into byte arrays, and vice versa.
byte[] java = Hex.decode("0xCAFEBABE");
byte[] macAddress = Hex.decode("00:e1:8c:fc:05:5f");
In addition to Binary
and Hex
, the
ByteVisualization
enumerator class can be used to
format byte values and byte arrays in a variety of colourful string representations:
byte[] input = Hex.decode("CAFE BABE 0001 0203 9933 FF");
// ⣊⣾⢺⢾⠀⠁⠂⠃⢙⠳⣿
String braille = ByteVisualization.BRAILLE.visualize(input);
// ▄▐█▟▜▐▜▟ ▘ ▝ ▀▚▚▀▀██
String squares = ByteVisualization.SQUARES.visualize(input);
The ByteMangler
class contains a number of methods that aim to make working
with byte arrays a little more pleasant:
// Concatenate any number of byte arrays in a single call.
byte[] cat = ByteMangler.add(part, anotherPart, andAnotherPart, yetAnotherPart);
// Shrink a byte[], discarding the rest. Output here is the same as "1234".getBytes().
byte[] fourBytes = ByteMangler.shrink("12345".getBytes(), 4);
// Remove a number of bytes from the start of a byte[]. Output here is the same as "345".getBytes().
byte[] threeBytes = ByteMangler.chomp("12345".getBytes(), 2);
// Flip all bits in a byte[].
byte[] normal = Binary.decode("11001111 00000001");
// Equal to Binary.decode("00110000 11111110").
byte[] flipped = ByteMangler.flip(normal);
// Reverse the order of bits in a byte[].
// Equal to Binary.decode("10000000 11110011").
byte[] reversed = ByteMangler.reverse(normal);
Analogous to String
's split
method, ByteMangler
provides a split
method as well:
// Split a byte[] on 0x00 bytes:
byte[] input = Binary.decode("11110011 00000000 00000001 10000001");
// The output list will contain two byte[] equal to:
// * Binary.decode("11110011")
// * Binary.decode("00000001 10000001")
List<byte[]> parts = ByteMangler.split(input, new byte[]{0x00});
And a replace
method:
// Replace all occurrences of 0xFF with 0x00 0x00:
byte[] input = Binary.decode("11110011 11111111 00000000 00000001 11111111 10000001");
// 11110011 00000000 00000000 00000000 00000001 00000000 00000000 10000001.
byte[] output = ByteMangler.replace(input, Binary.decode("11111111"), new byte[]{0, 0});
The ByteComparison
class provides a couple of methods useful for comparing byte arrays:
byte[] input = new byte[]{0, 1, 2, 3, 4, 5};
boolean res1 = ByteComparison.startsWith(input, new byte[]{0, 1, 2}); // True.
boolean res2 = ByteComparison.endsWith(input, new byte[]{3, 4, 5}); // Also true.
boolean res3 = ByteComparison.contains(input, new byte[]{2, 3}); // True again.
HBase's
FuzzyRowFilter
expects a bitmask in the form of a byte array containing 0x00
or 0x01
for
each byte as one of its inputs. BitMask
has a method that can compose this
byte array mask by specifying the consecutive number of ones and zeroes you
need:
// 0011101111:
byte[] mask = BitMask.bitMask(2, 3, 1, 4);
// To start the mask with ones, pass 0 as the first argument.
// 11111111:
byte[] anotherMask = BitMask.bitMask(0, 8);
When you end up with byte[]
that contain printable UTF-8 encoded text as well as
some unprintable control characters (e.g., null
byte separators), logging these
for debugging can be cumbersome. The BytePrinter
class can help in these cases,
by creating a string where valid UTF-8 sequences are left as-is, and unprintable
characters are replaced by printable escape sequences:
byte[] unprintable = ByteMangler.add("abc".getBytes(), new byte[]{0x00}, "def".getBytes());
// "abc\x00def"
String printable = BytePrinter.utf8Escaped(unprintable);
Convert Java primitives to and from byte arrays.
// Strings are encoded as UTF-8; 0xE2 0x82 0xAC:
byte[] result = ByteConversion.fromString("€");
String euro = ByteConversion.toString(result);
Primitives such as ints are converted to their conventional Java byte representation. This usually means two's complement.
// 0x7F 0xFF 0xFF 0xFF:
byte[] result = ByteConversion.fromInt(Integer.MAX_VALUE);
In cases where you want the byte array of an int or long to be naturally sortable (which is often what you want when they are used as part of a HBase row-key), two's complement causes negative numbers to be sorted after positive numbers.
To prevent this, use ByteMangler
to flip the first bit when you read and
write the byte representation of an int
or long
:
// 0x7F 0xFF 0xFF 0xFF:
byte[] resultMinusOne = ByteMangler.flipTheFirstBit(ByteConversion.fromInt(-1));
int minusOne = ByteConversion.toInt(ByteMangler.flipTheFirstBit(resultMinusOne));
// 0x80 0x00 0x00 0x00:
byte[] resultZero = ByteMangler.flipTheFirstBit(ByteConversion.fromInt(0));
int zero = ByteConversion.toInt(ByteMangler.flipTheFirstBit(resultZero));
// 0x80 0x00 0x00 0x01:
byte[] resultOne = ByteMangler.flipTheFirstBit(ByteConversion.fromInt(1));
int one = ByteConversion.toInt(ByteMangler.flipTheFirstBit(resultOne));
Now -1
sorts right before 0
.
Or, specify the LEXICOGRAPHIC_SORT
NumberRepresentation parameter for the same effect:
// 0x7F 0xFF 0xFF 0xFF:
byte[] resultMinusOne = ByteConversion.fromInt(-1, NumberRepresentation.LEXICOGRAPHIC_SORT);
int minusOne = ByteConversion.toInt(resultMinusOne, NumberRepresentation.LEXICOGRAPHIC_SORT);
The 3.*
series of this library adds support for the Instant
,
OffsetDateTime
, and ZonedDateTime
classes introduced in Java 8 as well.