Releases: vmware-tanzu/crash-diagnostics
v0.3.0
v0.3.0
This version introduces a new direction for the project. Instead of continuing with a Dockerfile-like configuration language, this version has been completely redesigned to adopt Starlark, a Python-like language, as the configuration language to write Crashd scripts. There are many other important new features introduced in this version:
- Use Starlark to create simple or complex scripts to automate interaction with Kubernetes cluster
- Use Python-like constructs and functions to create scripts to query or capture infrastructure information
- Support for provider model that allows interaction with a growing list of infrastructure providers including local KinD clusters, plain Kubernetes clusters, and Cluster-API based clusters
- Ability to automatically enumerate nodes and execute commands on those nodes to capture
- Easily query and capture object and other cluster information from Kubernetes API server
- Adoption of the shortened named
crashd
for the referring to the project and the built binary. - Update go directive in the
go.mod
file to version1.15
.
The Crashd Script
This release introduces the use of Starlark as the language to create scripts to interact with the Kubernetes cluster. For instance, the following script shows how to use the kube_nodes_provider
which uses the Kubernetes Nodes objects to enumerate and discover compute resources that are part of the cluster. Then (assuming ability to securely SSH to the node) the script executes a simple command on each node and retrieves and prints the result:
# setup and configuration
ssh=ssh_config(
username=os.username,
private_key_path=args.ssh_pk_path,
port=args.ssh_port,
max_retries=50,
)
hosts=resources(
provider=kube_nodes_provider(
kube_config=kube_config(path=args.kubecfg),
ssh_config=ssh,
),
)
# commands to run on each host
uptimes = run(cmd="uptime", resources=hosts)
# result for resource 0 (localhost)
print(uptimes.result)
Explore more example scripts here.
Script Elements Available
Release 0.3.0 comes with many built-in functions and other types to help you create functioning and useful scripts. Each built-in function falls in to one the following category:
Configuration functions
crashd_config()
kube_config()
ssh_config()
Provider functions
capa_provider()
capv_provider()
host_list_provider()
kube_nodes_provider()
resources()
Command functions
archive()
capture()
capture_local()
copy_from()
run()
run_local()
Kubernetes functions
kube_capture()
kube_get()
See the complete list of script elements here
Known Issues
N/A
Changelog
6a8c896 Adds docs for --args-file and use-ssh-agent
689a4fe Adds .crashd directory when running the program
ec0d453 Adds support for passphrase protected ssh key
ecafae3 Adds args-file flag to run command
bf00df2 updates the args flag example
da64801 Refactor e2e test framework
985a657 Changes ssh command string for proxy args
c342713 Updates extensions for the example scripts
7384add Updates the CLI name to crashd
2501def Includes a new provider for CAPA
managed objects
cf42fe3 Replaces named flag with positional argument
6118ba0 Reference documentation update
9ec5022 Fixes the function name for kube functions
dc5b5cc Adds the set_as_default
directive
1013294 Adds meaningful constructor name to starlark structs
7bfa1a0 New provider for CAPV
resource enumeration
a2d6299 Multiple Example Starlark Scripts
4afbef5 Implementation of the archive
function
427eece Adds kube_nodes_provider
starlark built-in
f108963 Implementation of the capture_local()
Starlark function
a46c358 Implementation of the Starlark run_local()
function
d17fad8 Implementation of starlark copy_from()
function
0f489be Adds kube_get starlark
built-in
6257c24 Implementation of the capture()
starlark function.
12be3fc Implementation of kube_capture
starlark function
ac2a419 Adds kube_config
built-in
1dda5e6 Implementation of run
starlark function
daf30cb Implemenation of host_list_provider
function.
90209bf Starlark - Base implementation to support configuration
Known Issues
- Running
crashd
without the--args-file
flag and the defaultargs
file fails to run [#176]
v0.3.0-beta
This version includes a few incremental changes to address the following issues:
- (#128) add support for passphrase protected ssh keys
- (#163) improve passing multiple arguments to the
run
command
Changes
-
crashd_config directive
A new Boolean parameteruse_ssh_agent
was added to thecrashd_config()
directive . Whenever this parameter is set, crashd starts a new instance of the ssh-agent and any ssh keys used in the script get added to this instance of the agent. Correspondingly, all following ssh/scp operations leverage this ssh-agent for remote connections. -
run command
A new flag--args-file
was added to thecrashd run
command. This flag takes as input a path to a file containing new-line separated key=value pairs which are passed to the diagnostics script during runtime. -
Go directive
The go directive in thego.mod
file was updated to version1.15
.
Changelog
ec0d453 Adds support for passphrase protected ssh key
ecafae3 Adds args-file flag to run command
v0.3.0-alpha
v0.3.0-alpha
This version introduces a new direction for the project. Instead of continuing with a Dockerfile-like configuration language, this version has been completely redesigned to adopt Starlark, a Python-like language, as the configuration language to write Crashd scripts. There are many other important new features introduced in this version:
- Use Starlark to create simple or complex scripts to automate interaction with Kubernetes cluster
- Use Python-like constructs and functions to create scripts to query or capture infrastructure information
- Support for provider model that allows interaction with a growing list of infrastructure providers including local KinD clusters, plain Kubernetes clusters, and Cluster-API based clusters
- Ability to automatically enumerate nodes and execute commands on those nodes to capture
- Easily query and capture object and other cluster information from Kubernetes API server
- Adoption of the shortened named
crashd
for the referring to the project and the built binary.
The Crashd Script
This release introduces the use of Starlark as the language to create scripts to interact with the Kubernetes cluster. For instance, the following script shows how to use the kube_nodes_provider
which uses the Kubernetes Nodes objects to enumerate and discover compute resources that are part of the cluster. Then (assuming ability to securely SSH to the node) the script executes a simple command on each node and retrieves and prints the result:
# setup and configuration
ssh=ssh_config(
username=os.username,
private_key_path=args.ssh_pk_path,
port=args.ssh_port,
max_retries=50,
)
hosts=resources(
provider=kube_nodes_provider(
kube_config=kube_config(path=args.kubecfg),
ssh_config=ssh,
),
)
# commands to run on each host
uptimes = run(cmd="uptime", resources=hosts)
# result for resource 0 (localhost)
print(uptimes.result)
Explore more example scripts here.
Script Elements Available
Release 0.3.0 introduces many new language elements and functions.
Configuration functions
crashd_config()
kube_config()
ssh_config()
Provider functions
capa_provider()
capv_provider()
host_list_provider()
kube_nodes_provider()
resources()
Command functions
archive()
capture()
capture_local()
copy_from()
run()
run_local()
Kubernetes functions
kube_capture()
kube_get()
See the complete list of script elements here
Known Issues
- Support for passphrase-protected SSH keys may not work when executing commands on compute nodes
Changelog
bf00df2 updates the args flag example
da64801 Refactor e2e test framework
985a657 Changes ssh command string for proxy args
c342713 Updates extensions for the example scripts
7384add Updates the CLI name to crashd
2501def Includes a new provider for CAPA
managed objects
cf42fe3 Replaces named flag with positional argument
6118ba0 Reference documentation update
9ec5022 Fixes the function name for kube functions
dc5b5cc Adds the set_as_default
directive
1013294 Adds meaningful constructor name to starlark structs
7bfa1a0 New provider for CAPV
resource enumeration
a2d6299 Multiple Example Starlark Scripts
4afbef5 Implementation of the archive
function
427eece Adds kube_nodes_provider
starlark built-in
f108963 Implementation of the capture_local()
Starlark function
a46c358 Implementation of the Starlark run_local()
function
d17fad8 Implementation of starlark copy_from()
function
0f489be Adds kube_get starlark
built-in
6257c24 Implementation of the capture()
starlark function.
12be3fc Implementation of kube_capture
starlark function
ac2a419 Adds kube_config
built-in
1dda5e6 Implementation of run
starlark function
daf30cb Implemenation of host_list_provider
function.
90209bf Starlark - Base implementation to support configuration
v0.2.3-alpha.0
v0.2.2
v0.2.2
This is a bug fix release. As outline in #48 the previous version of Crash did not do a good job pulling all of the objects (specially logs) from the server. This fix attempts to bring KUBEGET
on parity with kubectl cluster-info --dump
.
KUBEGET objects
KUBEGET does the following when retrieving objects
- All retrieved objects are saved in its respective file placed in directory
kubeget
- Each object retrieved is saved into a JSON file
- Namespaced objects are saved in a corresponding namespaced sub-directory
- Non-namespaced objects are saved in the root
kubeget
dir
The following shows an example directory layout of KUBEGET objects
crashd/stage
└── kubeget
├── apiservices.json
├── clusterinformations.json
├── clusterrolebindings.json
├── clusterroles.json
├── default
│ ├── configmaps.json
│ ├── controllerrevisions.json
│ ├── cronjobs.json
│ ├── daemonsets.json
│ ├── deployments.json
KUBEGET logs
Fetching container logs with KUBEGET stores logs in a directory structure as follows
<namespace>/<pod-name>/<container-name>/container.log
Each container log is saved individually in its associated file as shown in the following example directory layout:
├── kube-system
│ ├── calico-kube-controllers-ff95847f5-tjccn
│ │ └── calico-kube-controllers
│ │ └── calico-kube-controllers.log
│ ├── calico-node-87b7l
│ │ ├── calico-node
│ │ │ └── calico-node.log
│ │ ├── install-cni
│ │ │ └── install-cni.log
│ │ └── upgrade-ipam
│ │ └── upgrade-ipam.log
Changelog
c558caa Merge pull request #52 from vladimirvivien/kubeget-logs-fix-take-2
fe86bb0 GitHub Actions CI update with [email protected]
a832e06 Documentation update for KUBEGET
9dcb1c8 Update KUBEGET to better organize object search results
c479bac Refactor k8s client code for object search
v0.2.1
v0.2.1
This latest release of Crash-Diagnostics introduce several fundamental changes that were introduced in several alpha releases as outlined below:
Command result redirected to Stdout
Commands RUN
and CAPTURE
can direct their output to the console
RUN cmd:"/bin/journalctl -l -u kube-apiserver" echo:"true"
- See v0.2.1-alpha.0 https://github.com/vmware-tanzu/crash-diagnostics/releases/tag/v0.2.1-alpha.0)
Unified Executor Backend
- A new unified default executor backend that only uses
SCP/SSH
to execute remote commands. - This removes the local executor backend and treats all machine as a remote connection even when running against a local.
- See v0.2.1-alpha.1 https://github.com/vmware-tanzu/crash-diagnostics/releases/tag/v0.2.1-alpha.1
New Package for End-to-End Tests
- New Go
testing
package to help with true end-to-end testing. - The new package is used to launch an OpenSSH server docker image during tests as a way to test all commands that need SSH/SCP.
- The package is also capable of automating the creation of K8s cluster using
kind
to test commands and other directives that rely on a Kubernetes cluster.
Enhancement to FROM
The FROM
directive has been enhanced with several features including the ability to discover remote machines (from which to source diagnostics information) from an available K8s API-server.
- FROM supports param
nodes:
to hint at K8s machine discovery using a K8s API-server. For instanceFROM nodes:"all"
will source from all machines represented by a node object in the API-server. - The
nodes:
param can be used to list specific Node namesFROM nodes:"node.name.1 node.name.2
FROM
also supports alabels:
param to filter out sourced nodes (i.e.FROM nodes:"all" labels:"foo=bar"
).FROM
now supports paramport:
to specify the default port to use in cases where it is not specified (i.e.FROM hosts:"10.10.20.100 10.10.20.200 port:"2222"
).
Changelog
33aa529 Merge pull request #46 from vladimirvivien/from-node-enhancements
a5b71b3Doc update for FROM directive changes
ce9a8f5 Test updates for all end-to-end tests
f983bde Executor support for parameterized connection retries
8a11b46 Automate create/destroy kind clusters for tests
958cf84 FROM command/test refactor for new params
v0.2.1-alpha.1
v0.2.1-alpha.1
This release introduced a non-functionality change that updates Crash Diag to use a single executor backend based on SSH/SCP. Prior to this change, the code supported two executor backends one for local execution and one for remote execution. This release uses only the remote execution model via SSH/SCP for both local and remote machines.
Using a single executor backend that relies on SSH/SCP means that testing would require standing up an SSH/SCP server. The following changes were done to support this:
- Refactor all executor code to only run using SSH backend
- Update to test code to start/stop a full SSH server via Docker containers
- Enhancement to the SSH connection code to retry upon failed connection
- Update to CI/CD code for end-to-end testing of diagnostics scripts using SSH/SCP
Changelog
9348f8a Merge pull request #43 from vladimirvivien/single-exec-backend
c416e21 Documentation update for ssh/scp backend
8cd3a84 Update to GitHub Actions for end-to-end SSH/SCP tests
b1ec56e Add test cert/key, update GitActions for testing
2421cab Refactor tests to support ssh/scp exec backend only
31cedc5 Remove local exec backend, refactor, ssh-server for testing
8d105f9 Refactor to switch to scp/ssh for command exec
b7c0854 Command and script changes
v0.2.1-alpha.0
v0.2.1-alpaha.0
This release implements the capability to direct CAPTURE and RUN command output to the standard output using the echo
parameter as shown below:
RUN cmd:"/bin/journalctl -l -u kube-apiserver" echo:"true"
See docs for detail.
Changelog
a95de45 Merge pull request #42 from vladimirvivien/exec-echo
91df766 Document update for RUN and CAPTURE commands
d2433cc RUN and CAPTURE and tests to output to sdout
6775f28 Command update to support echo param
v0.2.0
Release v0.2.0
KUBEGET
This release introduces new directive KUBEGET
to retrieve API objects and pod logs from the API server as shown in the following example:
KUBEGET objects groups:"core" kinds:"pods" namespaces:"kube-system default" containers:"kindnet-cni etcd"
Read more about KUBEGET
in the README.
COPY File Pattern
In this release, command COPY
now supports file pattern or globbing when specifying one or more files to copy from the cluster node as shown below:
COPY /var/logs/kube*.log
GitHub Actions
Other changes in this release includes switching the build system from Travis to GitHub Actions.
Changelog
5e455c9 Merge pull request #39 from vladimirvivien/ghactions-fix
53b2eb5 Fixes for GitHub Action workflows
1fe7326 Remove travis.yaml file
855a6cb Changelog update
71cbe4a Documentation update for file globbing
v0.2.0-alpha.0
This release introduces new directive KUBEGET
to retrieve API objects and pod logs from the API server. When an API connection is configured properly using KUBECONFIG
, KUBEGET can be used to retrieve any accessible arbitrary API objects or access logs for running pods as shown in the following example:
KUBEGET objects groups:"core" kinds:"pods" namespaces:"kube-system default" containers:"kindnet-cni etcd"
The previous command would retrieve all pods from namespace kube-system or default having containers named kindnet-cni or etcd.
See README for detail.