CLP's core is the low-level component that performs compression, decompression, and search.
- We have built and tested CLP on Ubuntu 18.04 (bionic) and Ubuntu 20.04 (focal).
- If you have trouble building for another OS, file an issue and we may be able to help.
- A compiler that supports c++14
- To build, we require some source dependencies, packages from package managers, and libraries built from source.
We use both git submodules and third-party source packages. To download all, you can run this script:
tools/scripts/deps-download/download-all.sh
This will download:
A handful of packages and libraries are required to build CLP. There are two options to use them:
- Install them on your machine and build CLP natively
- Build CLP within a prebuilt docker container that contains the libraries; However, this won't work if you need additional libraries that aren't already in the container.
Packages
If you're using apt-get, you can use the following command to install all:
sudo apt-get install -y ca-certificates checkinstall cmake build-essential \
libboost-filesystem-dev libboost-iostreams-dev libboost-program-options-dev \
libssl-dev pkg-config rsync wget zlib1g-dev
This will download:
- ca-certificates
- checkinstall
- cmake
- build-essential
- libboost-filesystem-dev
- libboost-iostreams-dev
- libboost-program-options-dev
- libssl-dev
- pkg-config
- rsync
- wget
- zlib1g-dev
Libraries
The latest versions of some packages are not offered by apt repositories, so we've included some scripts to download, compile, and install them:
./tools/scripts/lib_install/fmtlib.sh 8.0.1
./tools/scripts/lib_install/libarchive.sh 3.5.1
./tools/scripts/lib_install/lz4.sh 1.8.2
./tools/scripts/lib_install/mariadb-connector-c.sh 3.2.3
./tools/scripts/lib_install/spdlog.sh 1.9.2
./tools/scripts/lib_install/zstandard.sh 1.4.9
You can use these commands to start a container in which you can build and run CLP:
# Make sure to change /path/to/clp/components/core and /path/to/my/logs below
docker run --rm -it \
--name 'clp-build-env' \
-u$(id -u):$(id -g) \
-v$(readlink -f /path/to/clp/components/core):/mnt/clp \
-v$(readlink -f /path/to/my/logs):/mnt/logs \
ghcr.io/y-scope/clp/clp-core-dependencies-x86-ubuntu-focal:main \
/bin/bash
cd /mnt/clp
Make sure to change /path/to/clp/components/core
and /path/to/my/logs
to
the relevant paths on your machine.
-
Configure the cmake project:
mkdir build cd build cmake ../
-
Build:
make
- CLP contains two executables:
clp
andclg
clp
is used for compressing and extracting logsclg
is used for performing wildcard searches on the compressed logs
To compress some logs:
./clp c archives-dir /home/my/logs
archives-dir
is where compressed logs should be outputclp
will create a number of files and directories within, so it's best if this directory is empty- You can use the same directory repeatedly and
clp
will add to the compressed logs within.
/home/my/logs
is any log file or directory containing log files
To decompress those logs:
./clp x archive-dir decompressed
archives-dir
is where the compressed logs were previously storeddecompressed
is a directory where they will be decompressed to
You can also decompress a specific file:
./clp x archive-dir decompressed /my/file/path.log
/my/file/path.log
is the uncompressed file's path (the one that was passed toclp
for compression)
More usage instructions can be found by running:
./clp --help
To search the compressed logs:
./clg archives-dir " a *wildcard* search phrase "
archives-dir
is where the compressed logs were previously stored- The search phrase can contain the
*
wildcard which matches 0 or more characters, or the?
wildcard which matches any single character.
Similar to clp
, clg
can search a single file:
./clg archives-dir " a *wildcard* search phrase " /my/file/path.log
/my/file/path.log
is the uncompressed file's path (the one that was passed toclp
for compression)
More usage instructions can be found by running:
./clg --help
If you'd like to convert the dictionaries of an individual archive into a human-readable form, you
can use make-dictionaries-readable
.
./make-dictionaries-readable archive-path <output dir>
archive-path
is a path to a specific archive (insidearchives-dir
)
See the make-dictionaries-readable
README for
details on the output format.
By default, clp
uses an embedded SQLite database, so each directory containing archives can only
be accessed by a single clp
instance.
To enable parallel compression to the same archives directory, clp
/clg
can be configured to
use a MySQL-type database (MariaDB) as follows:
- Install and configure MariaDB using the instructions for your platform
- Create a user that has privileges to create databases, create tables, insert records, and delete records.
- Copy and change
config/metadata-db.yml
, setting the type tomysql
and uncommenting the MySQL parameters. - Install the MariaDB and PyYAML Python packages
pip3 install mariadb PyYAML
- This is necessary to run the database initialization script. If you prefer, you can run the
SQL statements in
tools/scripts/db/init-db.py
directly.
- This is necessary to run the database initialization script. If you prefer, you can run the
SQL statements in
- Run
tools/scripts/db/init-db.py
with the updated config file. This will initialize the database CLP requires. - Run
clp
orclg
as before, with the addition of the--db-config-file
option pointing at the updated config file. - To compress in parallel, simply run another instance of
clp
concurrently.
Note that currently, decompression (clp x
) and search (clg
) can only be run with a single
instance. We are in the process of open-sourcing parallelizable versions of these as well.