Skip to content

A motley collection of utility classes that perform a variety of potentially useful operations on bits, bytes, and strings representing them.

License

Notifications You must be signed in to change notification settings

LableOrg/java-bitsandbytes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bits and Bytes

A modest collection of utility classes that make some bit and byte-wise operations in Java a little bit more readable.

Purpose

This tiny library can be used in projects where bit and byte manipulation is performed to increase the readability of the code. In particular, it was designed to supplement (and sometimes replace) Hadoop's Bytes class, which is very useful, but part of a much larger library.

This tiny library has no dependencies, which makes it useful in (for example) Hadoop's MapReduce tasks.

Installation

This library is available on Maven Central. The newest versions use the Java 11 API:

<dependency>
    <groupId>org.lable.oss.bitsandbytes</groupId>
    <artifactId>bitsandbytes</artifactId>
    <version>4.0</version>
</dependency>

For Java 8, please use the latest 3.* version.

Example usage

Binary

In unit tests and debugging output the Binary class can aid in making the result of bit-wise operations more readable in your tests.

For example:

// Instead of:
assertThat(someBitManipulatingMethodUnderTest(0),
           is(new byte[] {-1, -86, 15, 0})));

// Or in the bit-notation with the required casting to byte for values over 127:
assertThat(someBitManipulatingMethodUnderTest(0),
           is(new byte[] {(byte) 0b11111111, (byte) 0b10101010, 0b00001111, 0b00000000})));

// Write:
assertThat(someBitManipulatingMethodUnderTest(0),
           is(Binary.decode("10000000 10101010 00001111 00000000")));

To encode a byte[] into a string of ones and zeroes:

byte[] input = "a1".getBytes();

// Using the defaults.
//   0110000100110001
String out = Binary.encode(input);

// Or specify formatting.
//   01100001 10000100
String spaceSeparated = Binary.encode(input, false, true);

//   [01100001][10000100]
String delimited = Binary.encode(input, true, false);

Hexadecimal

Turn hexadecimal string representations into byte arrays, and vice versa.

byte[] java = Hex.decode("0xCAFEBABE");

byte[] macAddress = Hex.decode("00:e1:8c:fc:05:5f");

Various byte visualization methods

In addition to Binary and Hex, the ByteVisualization enumerator class can be used to format byte values and byte arrays in a variety of colourful string representations:

byte[] input = Hex.decode("CAFE BABE 0001 0203 9933 FF");

// ⣊⣾⢺⢾⠀⠁⠂⠃⢙⠳⣿
String braille = ByteVisualization.BRAILLE.visualize(input);

// ▄▐█▟▜▐▜▟   ▘ ▝ ▀▚▚▀▀██
String squares = ByteVisualization.SQUARES.visualize(input);

ByteMangler

The ByteMangler class contains a number of methods that aim to make working with byte arrays a little more pleasant:

// Concatenate any number of byte arrays in a single call.
byte[] cat = ByteMangler.add(part, anotherPart, andAnotherPart, yetAnotherPart);

// Shrink a byte[], discarding the rest. Output here is the same as "1234".getBytes().
byte[] fourBytes = ByteMangler.shrink("12345".getBytes(), 4);

// Remove a number of bytes from the start of a byte[]. Output here is the same as "345".getBytes().
byte[] threeBytes = ByteMangler.chomp("12345".getBytes(), 2);

// Flip all bits in a byte[].
byte[] normal = Binary.decode("11001111 00000001");
// Equal to Binary.decode("00110000 11111110").
byte[] flipped = ByteMangler.flip(normal);

// Reverse the order of bits in a byte[].
// Equal to Binary.decode("10000000 11110011").
byte[] reversed = ByteMangler.reverse(normal);

Analogous to String's split method, ByteMangler provides a split method as well:

// Split a byte[] on 0x00 bytes:
byte[] input = Binary.decode("11110011 00000000 00000001 10000001");
// The output list will contain two byte[] equal to:
//   * Binary.decode("11110011")
//   * Binary.decode("00000001 10000001")
List<byte[]> parts = ByteMangler.split(input, new byte[]{0x00});

And a replace method:

// Replace all occurrences of 0xFF with 0x00 0x00:
byte[] input = Binary.decode("11110011 11111111 00000000 00000001 11111111 10000001");

// 11110011 00000000 00000000 00000000 00000001 00000000 00000000 10000001.
byte[] output = ByteMangler.replace(input, Binary.decode("11111111"), new byte[]{0, 0});

ByteComparison

The ByteComparison class provides a couple of methods useful for comparing byte arrays:

byte[] input = new byte[]{0, 1, 2, 3, 4, 5};

boolean res1 = ByteComparison.startsWith(input, new byte[]{0, 1, 2}); // True.
boolean res2 = ByteComparison.endsWith(input, new byte[]{3, 4, 5}); // Also true.
boolean res3 = ByteComparison.contains(input, new byte[]{2, 3}); // True again.

BitMask

HBase's FuzzyRowFilter expects a bitmask in the form of a byte array containing 0x00 or 0x01 for each byte as one of its inputs. BitMask has a method that can compose this byte array mask by specifying the consecutive number of ones and zeroes you need:

// 0011101111:
byte[] mask = BitMask.bitMask(2, 3, 1, 4);
 
// To start the mask with ones, pass 0 as the first argument.
// 11111111:
byte[] anotherMask = BitMask.bitMask(0, 8);

BytePrinter

When you end up with byte[] that contain printable UTF-8 encoded text as well as some unprintable control characters (e.g., null byte separators), logging these for debugging can be cumbersome. The BytePrinter class can help in these cases, by creating a string where valid UTF-8 sequences are left as-is, and unprintable characters are replaced by printable escape sequences:

byte[] unprintable = ByteMangler.add("abc".getBytes(), new byte[]{0x00}, "def".getBytes());
// "abc\x00def"
String printable = BytePrinter.utf8Escaped(unprintable);

ByteConversion

Convert Java primitives to and from byte arrays.

// Strings are encoded as UTF-8; 0xE2 0x82 0xAC:
byte[] result = ByteConversion.fromString("€");
String euro = ByteConversion.toString(result);

Primitives such as ints are converted to their conventional Java byte representation. This usually means two's complement.

// 0x7F 0xFF 0xFF 0xFF:
byte[] result = ByteConversion.fromInt(Integer.MAX_VALUE);

In cases where you want the byte array of an int or long to be naturally sortable (which is often what you want when they are used as part of a HBase row-key), two's complement causes negative numbers to be sorted after positive numbers.

To prevent this, use ByteMangler to flip the first bit when you read and write the byte representation of an int or long:

// 0x7F 0xFF 0xFF 0xFF:
byte[] resultMinusOne = ByteMangler.flipTheFirstBit(ByteConversion.fromInt(-1));
int minusOne = ByteConversion.toInt(ByteMangler.flipTheFirstBit(resultMinusOne));
// 0x80 0x00 0x00 0x00:
byte[] resultZero = ByteMangler.flipTheFirstBit(ByteConversion.fromInt(0));
int zero = ByteConversion.toInt(ByteMangler.flipTheFirstBit(resultZero));
// 0x80 0x00 0x00 0x01:
byte[] resultOne = ByteMangler.flipTheFirstBit(ByteConversion.fromInt(1));
int one = ByteConversion.toInt(ByteMangler.flipTheFirstBit(resultOne));

Now -1 sorts right before 0.

Or, specify the LEXICOGRAPHIC_SORT NumberRepresentation parameter for the same effect:

// 0x7F 0xFF 0xFF 0xFF:
byte[] resultMinusOne = ByteConversion.fromInt(-1, NumberRepresentation.LEXICOGRAPHIC_SORT);
int minusOne = ByteConversion.toInt(resultMinusOne, NumberRepresentation.LEXICOGRAPHIC_SORT);

The 3.* series of this library adds support for the Instant, OffsetDateTime, and ZonedDateTime classes introduced in Java 8 as well.

About

A motley collection of utility classes that perform a variety of potentially useful operations on bits, bytes, and strings representing them.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages