Skip to content

Commit

Permalink
FastFreeze can now create its own container
Browse files Browse the repository at this point in the history
  • Loading branch information
nviennot committed Mar 15, 2021
1 parent 15a27e1 commit 270d06b
Show file tree
Hide file tree
Showing 33 changed files with 1,362 additions and 355 deletions.
61 changes: 60 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "fastfreeze"
version = "1.0.0"
version = "1.3.0"
authors = ["Nicolas Viennot <[email protected]>"]
edition = "2018"
description = "Turn-key solution for checkpoint/restore"
Expand All @@ -22,6 +22,7 @@ rand = "0.7"
url = "2.1"
chrono = "0.4"
hostname = "0.3"
caps = "0.5"

[profile.release]
lto = true
Expand Down
4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ DIST_LIBS := \
DIST_MISC := scripts/fastfreeze \

# We assume an installation location. This is only used when the user
# makes one of the binary a d
# makes one of the binary secure (like criu, or set_ns_last_pid) via
# setuid/setcap. See more info on the comment near extract-libs
INSTALL_LOCATION=/opt/fastfreeze

# We avoid packaging libc libraries because they work in tandem with the system
Expand All @@ -75,6 +76,7 @@ PACKAGE_SKIP_LIBS := \
libdl.so.* \
libpthread.so.* \
libc.so.* \
libm.so.* \
ld-linux-*.so.* \

define add_dist_file
Expand Down
46 changes: 37 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,12 @@ FastFreeze includes the following high-level features:

FastFreeze does not use privileged operations. This creates the following drawbacks:

* FastFreeze must run within a Linux container (e.g., Kubernetes, Docker). This
guarantees that there are no PID conflicts. The container image must remain
unchanged when migrating an application to a different container.
* FastFreeze must run within a Linux container. FastFreeze has the ability to
create its own via `--container name`, or use an existing one (e.g.,
Kubernetes, Docker). This guarantees that there are no PID conflicts.
The file system must remain unchanged when migrating an application to a
different container. This is typically achieved by using the same container
image.

* The network connections are dropped upon restore. We rely on the application
to be tolerant to network failures and reconnect to needed services.
Expand Down Expand Up @@ -154,7 +157,7 @@ FastFreeze supports most Linux applications, with some restrictions:
script with `sudo` is a problem.

* On some systems, apparmor can prevent the execution of certain application
such as `man` because we relocate the system ld.so at `/var/fastfreeze/run`
such as `man` because we relocate the system ld.so at `/var/tmp/fastfreeze/run`
which may not be in the white-listed path of executable mmap files. This is
not an issue in practice.

Expand All @@ -168,12 +171,37 @@ FastFreeze supports most Linux applications, with some restrictions:
* Checkpoint images are not managed by FastFreeze. Pruning old images is not in
the scope of FastFreeze.

## Usage
## Usage for running on a regular machine

### Installation

FastFreeze is distributed in a self-contained 4MB package that needs to be
extracted in `/opt/fastfreeze`.
FastFreeze is distributed in a self-contained 5MB package that can be extracted
anywhere.

```bash
# might be needed to download and extract the archive
sudo apt-get install -y curl xz-utils

# Select the installation location. You may pick something like your home
# directory or /opt
cd ~

# This creates a fastfreeze directory in the current directory
curl -SL https://github.com/twosigma/fastfreeze/releases/download/v1.3.0-rc6/fastfreeze-v1.3.0-rc6.tar.xz | tar xJf -

# Optionally, you can make a fastfreeze symlink in ~/bin or /usr/local/bin for easy access.
ln -s $(pwd)/fastfreeze/fastfreeze ~/bin

# Confirm fastfreeze is working
fastfreeze/fastfreeze run sleep 60
```

## Usage for running in Docker / Kubernetes

### Installation

FastFreeze is distributed in a self-contained 5MB package that prefers to be
extracted in `/opt/fastfreeze` (see more details below for why).

The following shows an example of the installation of FastFreeze in a Debian
Docker image.
Expand All @@ -185,14 +213,14 @@ RUN apt-get update
RUN apt-get install -y curl xz-utils

RUN set -ex; \
curl -SL https://github.com/twosigma/fastfreeze/releases/download/v1.1.1/fastfreeze-1.1.1.tar.xz | \
curl -SL https://github.com/twosigma/fastfreeze/releases/download/v1.3.0-rc6/fastfreeze-v1.3.0-rc6.tar.xz | \
tar xJf - -C /opt; \
ln -s /opt/fastfreeze/fastfreeze /usr/local/bin; \
fastfreeze install
```

The `install` command overrides the system loader `/lib64/ld-linux-x86-64.so.2`,
and creates `/var/fastfreeze` where files such as logs are kept. Note that
and creates `/var/tmp/fastfreeze` where files such as logs are kept. Note that
replacing the system loader is useful even when not doing CPUID virtualization.
It facilitates the injection of the time virtualiation library into all processes.

Expand Down
6 changes: 3 additions & 3 deletions deps/Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# The change of these two variables must be reflected in ../src/consts.rs
INTERPOSED_LD_PATH=/var/fastfreeze/run/ld-linux-x86-64.so.2
LD_INJECT_ENV_PATH=/var/fastfreeze/ld-inject.env
INTERPOSED_LD_PATH=/var/tmp/fastfreeze/run/ld-linux-x86-64.so.2
LD_INJECT_ENV_PATH=/var/tmp/fastfreeze/ld-inject.env

RUSTUP_DEP_FILE=.deps-rustup-$(shell hostname)
$(RUSTUP_DEP_FILE):
Expand All @@ -27,7 +27,7 @@ BUILDS := \

.PHONY: $(BUILDS) clean

all: $(BUILDS)
all: $(BUILDS) $(DEPS_FILE) $(RUSTUP_DEP_FILE)

build_criu: | $(DEPS_FILE)
$(MAKE) -C criu criu
Expand Down
2 changes: 1 addition & 1 deletion deps/criu
5 changes: 3 additions & 2 deletions scripts/Dockerfile.build
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ RUN set -ex; \

# Build dependencies (CRIU, rust toolchain, libvirtcpuid, etc.)
COPY deps deps
# We clean first because we might have a copy of the host compiled binaries
RUN make -C deps clean && make -C deps -j4
ENV CARGO=/root/.cargo/bin/cargo

Expand All @@ -31,13 +32,13 @@ RUN set -ex; \
mkdir src; \
echo "" > src/lib.rs; \
echo "fn main() {}" > src/main.rs; \
$CARGO test; \
$CARGO test --release; \
$CARGO build --release;

# Build FastFreeze
COPY src src
RUN touch src/lib.rs src/main.rs
RUN $CARGO test
RUN $CARGO test --release
RUN $CARGO build --release

# Package FastFreeze.
Expand Down
10 changes: 7 additions & 3 deletions scripts/fastfreeze
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,18 @@ set -e

FF_DIR=$(dirname -- "$(readlink -f -- "$0")")

# Pass the original PATH and LD_LIBRARY_PATH down to the application
export FF_APP_PATH=$PATH
export FF_APP_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
# Pass the original PATH and LD_LIBRARY_PATH down to the application, unless
# already set
export FF_APP_PATH=${FF_APP_PATH=$PATH}
export FF_APP_LD_LIBRARY_PATH=${FF_APP_LD_LIBRARY_PATH=$LD_LIBRARY_PATH}

# Override the PATH and LD_LIBRARY_PATH that fastfreeze should use
export LD_LIBRARY_PATH=$FF_DIR/lib:$LD_LIBRARY_PATH
export PATH=$FF_DIR/bin:$PATH

# tmux sessions should not escape fastfreeze containers
unset TMUX

# You may set the following environment variables
# FF_APP_VIRT_CPUID_MASK The CPUID mask to use. See libvirtcpuid documentation for more details
# FF_APP_INJECT_<VAR_NAME> Additional environment variables to inject to the application and its children.
Expand Down
26 changes: 14 additions & 12 deletions src/cli/checkpoint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,20 @@
use anyhow::{Result, Context};
use std::{
collections::HashSet,
path::{Path, PathBuf},
path::PathBuf,
time::{SystemTime, Duration},
};
use nix::{
poll::{PollFd, PollFlags},
sys::signal,
unistd::Pid,
};
use structopt::StructOpt;
use serde::Serialize;
use crate::{
consts::*,
store,
container,
image::{ImageManifest, CpuBudget, shard, check_passphrase_file_exists},
process::{Command, ProcessExt, ProcessGroup, Stdio},
metrics::{with_metrics, emit_metrics},
Expand Down Expand Up @@ -54,7 +56,7 @@ ENVS:
))]
pub struct Checkpoint {
/// Image URL, defaults to the value used during the run command
#[structopt(long)]
#[structopt(short, long)]
image_url: Option<String>,

/// Dir/file to include in the image in addition to the ones specified during the run command.
Expand Down Expand Up @@ -88,16 +90,17 @@ pub struct Checkpoint {
/// Verbosity. Can be repeated
#[structopt(short, long, parse(from_occurrences))]
pub verbose: u8,
}

fn is_app_running() -> bool {
Path::new("/proc").join(APP_ROOT_PID.to_string()).exists()
/// Checkpoint the specified application. See the run command help about
/// --app-name for more details.
#[structopt()]
app_name: Option<String>,
}

pub fn do_checkpoint(opts: Checkpoint) -> Result<Stats> {
let Checkpoint {
image_url, num_shards, cpu_budget, passphrase_file,
preserved_paths, leave_running, verbose: _,
preserved_paths, leave_running, app_name: _, verbose: _,
} = opts;

// We override TMPDIR with a safe location. The uploader (or metrics CLI)
Expand Down Expand Up @@ -130,8 +133,6 @@ pub fn do_checkpoint(opts: Checkpoint) -> Result<Stats> {
check_passphrase_file_exists(passphrase_file)?;
}

ensure!(is_app_running(), "Application is not running");

// The manifest contains the name of the shards, which are generated at random.
// We combine it with the store to generate the shard upload commands.
// A shard upload command is of the form:
Expand Down Expand Up @@ -254,7 +255,7 @@ pub fn do_checkpoint(opts: Checkpoint) -> Result<Stats> {

if leave_running {
trace!("Resuming application");
kill_process_tree(APP_ROOT_PID, signal::SIGCONT)
kill_process_tree(Pid::from_raw(APP_ROOT_PID), signal::SIGCONT)
.context("Failed to resume application")?;
} else {
// We kill the app later, once metrics are emitted.
Expand All @@ -267,14 +268,15 @@ pub fn do_checkpoint(opts: Checkpoint) -> Result<Stats> {
img_manifest.persist_to_store(&*store)
.with_context(|| format!("Failed to upload image manifest at {}", image_url))?;

info!("Checkpoint to {} complete. Took {:.1}s",
image_url, START_TIME.elapsed().as_secs_f64());
info!("Checkpoint completed in {:.1}s", START_TIME.elapsed().as_secs_f64());

Ok(stats)
}

impl super::CLI for Checkpoint {
fn run(self) -> Result<()> {
container::maybe_nsenter_app(self.app_name.as_ref())?;

// Holding the lock while invoking the metrics CLI is preferable to avoid
// disturbing another instance trying to do PID control.
with_checkpoint_restore_lock(|| {
Expand All @@ -287,7 +289,7 @@ impl super::CLI for Checkpoint {
// risk terminating the container, preventing metrics from being emitted.
if !leave_running {
debug!("Killing application");
kill_process_tree(APP_ROOT_PID, signal::SIGKILL)
kill_process_tree(Pid::from_raw(APP_ROOT_PID), signal::SIGKILL)
.context("Failed to kill application")?;
}

Expand Down
4 changes: 2 additions & 2 deletions src/cli/extract.rs
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ impl super::CLI for Extract {
let store = store::from_url(&image_url)?;
store.prepare(false)?;

info!("Fetching image manifest for {}", image_url);
debug!("Fetching image manifest for {}", image_url);

match ImageManifest::fetch_from_store(&*store, allow_bad_image_version)? {
ManifestFetchResult::Some(img_manifest) => {
Expand All @@ -125,7 +125,7 @@ impl super::CLI for Extract {
fetched, desired);
}
ManifestFetchResult::NotFound => {
bail!("Image manifest not found, running app normally");
bail!("Image manifest not found");
}
}

Expand Down
Loading

0 comments on commit 270d06b

Please sign in to comment.