This project provides a command-line utility for compressing and decompressing text files using Huffman encoding, an efficient and lossless data compression technique developed by David A. Huffman in 1952. Huffman encoding functions by assigning binary codes to characters according to their frequency in the input text. Characters that appear more frequently are assigned shorter binary codes; less common characters are assigned longer codes. This results in a compact binary representation of the input text that can be decompressed to restore the original text.
- Compression: Converts input text into a compact binary representation based on character frequency. This is a linear process, relying on two passes through the input data, one to build the Huffman tree and another to encode the data.
- Decompression: Restores the original text from the compressed file. This is likewise a linear process that requires only one pass through the compressed data, using the Huffman tree stored in the compressed file to decode the binary representation.
- Testing Suite: Includes automated tests to verify functionality. The test suite compresses and decompresses text files, comparing the original and decompressed files to ensure data integrity and determine the compression ratio.
main.c
: Entry point for the program. Provides CLI interface for compression and decompression operations.encode.c
: Contains logic for encoding data using Huffman trees and writing compressed files.decode.c
: Contains logic for decoding compressed data back into its original form.common.c
: Shared utility functions, including Huffman tree creation and memory management.types.h
: Header file defining shared data structures and function prototypes.test.sh
: Bash script to automate testing of compression and decompression.
- C Standard Library: Used for file handling, memory allocation, and string manipulation.
To build the project, use the provided Makefile
:
make
This will generate an executable named huffman
.
The program supports two modes: compress and decompress.
./huffman <compress | decompress> <input_file> <output_file>
./huffman compress input.txt compressed
./huffman decompress compressed output.txt
Run the test suite using the test.sh
script
./test.sh
This will:
- Compress and decompress all
.txt
files in the tests directory. - Verify that decompressed files match the original input files.
- Output compression ratios for successful tests.
- Compressed files are not human-readable.
- The compression ratio varies depending on the content of the input files.