Skip to content

v1.1.0

Compare
Choose a tag to compare
@simonwo simonwo released this 25 Sep 11:14
· 1464 commits to main since this release
970e1a0

v1.1.0 Release Notes

📢 Introducing Bacalhau v1.1.0 - Unleash the Power!

We are thrilled to announce the release of Bacalhau v1.1.0, a significant milestone in our quest for unparalleled computing capabilities. Packed with exciting new features like Full Fleet Targeting, Configurable Compute Timeouts, persistent storage, integration with private data swarms and API TLS support, this release is sure to take your computational experience to new heights! 🚀

But that's not all! We invite you to explore the experimental features of this release, such as Long-Running Jobs, as we continue to push the boundaries of computational possibilities.

So, what are you waiting for? Upgrade to Bacalhau v1.1.0 and unlock a world of infinite possibilities in distributed computing! 🌟

curl https://get.bacalhau.org/install.sh | bash

New features

Full Fleet Targeting

Jobs can now target all nodes in a network simultaneously, allowing for more efficient and parallel operations jobs that need to query or modify an entire fleet.

Full fleet jobs are perfect for fleet management, allowing an operator to quickly understand the state of all of their nodes at once with a single command.

Full fleet jobs will only succeed if all known nodes in a network can be reached and can execute the job successfully. Jobs can still be targeted at a subset of the fleet by using labels or resource requirements.

Pass the --target=all parameter to any Bacalhau job command or set Deal.TargetAll: true in an existing Bacalhau job spec.

New node CLI and APIs

New CLI and APIs have been introduced allowing users to easily list nodes in a network and see what compute resources that are available.

Use the new command bacalhau node list to get a tabular output of all known nodes:

Bacalhau node list output table

You can then use bacalhau node describe to get in-depth output about a specific node.

Configurable Timeouts

Jobs can now last for days or weeks, enabling the execution of big computations that require longer processing times.

By default, compute nodes now do not enforce an execution timeout and jobs default to the longest allowed timeout. Job submitters can still request a timeout using the --timeout flag or the Timeout field in their job spec.

Node operators can still choose to limit the maximum timeout allowed by passing the --max-timeout flag to the serve command or by specifying the new Node.Compute.Capacity.JobTimeouts.MaxJobExecutionTimeout property in their config file.

Richer Node Configuration

We're excited to unveil enhanced configuration options in Bacalhau v1.1.0! With a heightened focus on flexibility, we've expanded the ways you can configure Bacalhau, whether it be via a configuration file, command-line flags, or environment variables.

The new release introduces a persistent configuration file that provides more flexibility and control over node configurations. Read the documentation for how to get started with configuration files.

Key Changes from v1.0.3 to v1.1.0:

  • The enriched config.yaml now has a trove of default configuration values, an improvement from the empty version in v1.0.3.
  • Event and Libp2p tracing is no longer activated by default. Enable this by specifying paths for via EventTracerPath and Libp2PTracerPath in config.yaml.
  • The node’s private key is no longer called**private_key.1235** and is named **libp2p_private_key**by default. Configure its path with Libp2PKeyPath in config.yaml.
  • user_id.pem remains consistent. Direct its location using KeyPath in config.yaml.
  • Directory name has changed from execution-state-<NODE_ID> to <NODE_ID>-compute, and now, apart from jobStats.json, it also includes executions.db using BoltDB when using persistent storage mode. Define its path using ExecutionStore.Path in config.yaml.
  • New directories include <NODE_ID>-requester (stores the state for the requester node using BoltDB), executor_storages (hosts data for Bacalhau storage types), and plugins (houses executor plugin binaries). Configure their paths respectively via JobStore.Path, ComputeStoragePath, and ExecutorPluginPath in config.yaml.

⚠️Note: there are optional migration steps for existing Bacalhau users who want to keep their previous configuration. See the end of this note for how to migrate.

Support for TLS on public APIs

TLS certificates for serving client-facing APIs are now supported, ensuring secure and encrypted communication between Bacalhau clients and requester nodes.

To use a TLS certificate to encrypt communication, you can:

  • Configure automatic certificates from Let’s Encrypt by passing --autocert=<your-hostname> and ensuring the Bacalhau binary can respond to challenges by running sudo setcap CAP_NET_BIND_SERVICE+ep $(which bacalhau).
  • Pass a certificate to --tlscert and the corresponding private key to --tlskey.

By default, if none of the above options are used, the server will continue to serve its API endpoints over HTTP.

Persistent Storage of Jobs and Executions

Compute and requester nodes now support persistent storage, ensuring data integrity and allowing for long-term job and execution audit records. This feature is now switched on by default and records are persisted to the Bacalhau repository.

See the documentation for how to configure persistence.

Improved Error Messages

Clearer error messages are now displayed when no node is available to run a job, making troubleshooting easier and more efficient.

Instead of receiving ‘not enough nodes to run the job’, users will now get more specific help messages, such as ‘Docker image does not exist or repo is inaccessible’ or ‘job timeout exceeds the maximum allowed’.

Fine-Grained Control Over Image Entrypoint and Parameters

Users now have finer control over the entrypoint and parameters passed to a Docker image. Previously, Bacalhau would ignore the default entrypoint to the image and replace it with the first argument after bacalhau docker run <image>. Now, the default entrypoint in the image is used and all of the positional arguments are passed as the command to that entrypoint.

The entrypoint can still be explicitly overriden by using the --entrypoint flag or by setting the Entrypoint field in a Docker job spec.

GPU Support Inside Docker Containers

Bacalhau now has the capability to automatically utilize GPUs when the Bacalhau node is running inside a Docker container. Ensure that the Bacalhau node is started with a GPU capability by passing --gpus=all to docker run, and Bacalhau nodes will automatically detect GPUs running on the host machine.

Submit a job to a node running inside Docker using bacalhau docker run --gpu=1 to run the job in a new GPU-enabled container on the host.

Support for Private IPFS Clusters

Integration with private IPFS clusters has been added, providing enhanced security and control over data storage and retrieval.

To connect to a private swarm, pass the path to a swarm key to --ipfs-swarm-key, set the BACALHAU_IPFS_SWARM_KEY environment variable or configure the Node.IPFS.SwarmKeyPath configuration property.

When connecting to a private swarm, Bacalhau will no longer bootstrap using or connect to public peers and will rely on the swarm for all data retrieval.

These steps are also necessary on clients who use bacalhau get to download from a private IPFS swarm.

Note that these steps are not necessary if using the --ipfs-connect flag, which already can connect to IPFS nodes running a private swarm.

New Experimental Features

All of these features are experimental, meaning that their APIs are liable to change in an upcoming release. You are encouraged to try out these features and provide feedback or bug reports on Bacalhau Slack.

Long-Running Jobs

Bacalhau jobs can now run indefinitely and will automatically restart when nodes come back online, allowing for continuous and uninterrupted processing.

Long-running jobs allow compute workloads to process data that arrives continuously, and is perfect for tasks such as pre-filtering logs, processing real-time analytics, or working with edge sensors.

With the introduction of long-running jobs, ML inference tasks can now operate in a "warm-boot" environment. This means that the necessary resources and dependencies are already loaded, significantly reducing the time taken to run an inference job.

With this experimental feature, you can now unleash the power of Bacalhau to handle dynamic and ever-changing data streams, ensuring continuous and uninterrupted processing of your computational workloads.

Deprecated Features

Estuary

The Estuary publisher is no longer supported in this release. Compute nodes will now reject jobs that require the Estuary publisher.

Verification

The Verifiers feature is no longer supported in this release. Compute nodes will silently ignore verification requirements on jobs.

⚠️ Migration steps

Users who wish to continue using their previous Bacalhau private key or their previous Bacalhau Client ID as their identity will need to either:

  • Rename private_key.1235 to libp2p_private_key
  • Modify the config.yaml to use the previous key by editing the value of Libp2PKeyPath to point to its path.

Up Next

These upcoming features aim to provide users with increased flexibility and convenience in their computational workflows while maintaining a focus on privacy and security.

User-definable executor plugins

In the next release, users will have the opportunity to experiment with pluggable executors, which will allow them to run jobs without the need to worry about Docker images. The first executor that we will make available will be for Python, and it will be able to execute Python srcipts by using the command bacalhau run python script.py. This feature aims to provide a more seamless and convenient experience for running jobs.

Cluster bootstrapping and private data

Additionally, on the roadmap for future releases, we are planning to introduce easier bootstrapping of Bacalhau clusters. This will simplify the process of setting up and configuring Bacalhau clusters, making it more accessible for users. Furthermore, we are also working on adding support for private data and jobs, ensuring enhanced security and control over sensitive information.