-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #92 from stakater/update-main
Update main
- Loading branch information
Showing
8 changed files
with
120 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Metrics and Logs Documentation | ||
|
||
This document offers an overview of the Prometheus metrics implemented by the `multi_tenant_operator` controllers, along with an interpretation guide for the logs and statuses generated by these controllers. Each metric is designed to provide specific insights into the controllers' operational performance, while the log interpretation guide aids in understanding their behavior and workflow processes. Additionally, the status descriptions for custom resources provide operational snapshots. Together, these elements form a comprehensive toolkit for monitoring and enhancing the performance and health of the controllers. | ||
|
||
## Metrics List | ||
|
||
**`multi_tenant_operator_resources_deployed_total`** | ||
|
||
- **Description**: Tracks the total number of resources deployed by the operator. | ||
- **Type**: Gauge | ||
- **Labels**: `kind`, `name`, `namespace` | ||
- **Usage**: Helps to understand the overall workload managed by the operator. | ||
|
||
**`multi_tenant_operator_resources_deployed`** | ||
|
||
- **Description**: Monitors resources currently deployed by the operator. | ||
- **Type**: Gauge | ||
- **Labels**: `kind`, `name`, `namespace`, `type` | ||
- **Usage**: Useful for tracking the current state and type of resources managed by the operator. | ||
|
||
**`multi_tenant_operator_reconcile_error`** | ||
|
||
- **Description**: Indicates resources in an error state, broken down by resource kind, name, and namespace. | ||
- **Type**: Gauge | ||
- **Labels**: `kind`, `name`, `namespace`, `state`, `errors` | ||
- **Usage**: Essential for identifying and analyzing errors in resource management. | ||
|
||
**`multi_tenant_operator_reconcile_count`** | ||
|
||
- **Description**: Counts the number of reconciliations performed for a template group instance, categorized by name. | ||
- **Type**: Gauge | ||
- **Labels**: `kind`, `name` | ||
- **Usage**: Provides insight into the frequency of reconciliation processes. | ||
|
||
**`multi_tenant_operator_reconcile_seconds`** | ||
|
||
- **Description**: Represents the cumulative duration, in seconds, taken to reconcile a template group instance, categorized by instance name. | ||
- **Type**: Gauge | ||
- **Labels**: `kind`, `name` | ||
- **Usage**: Critical for assessing the time efficiency of the reconciliation process. | ||
|
||
**`multi_tenant_operator_reconcile_seconds_total`** | ||
|
||
- **Description**: Tracks the total duration, in seconds, for all reconciliation processes of a template group instance, categorized by instance name. | ||
- **Type**: Gauge | ||
- **Labels**: `kind`, `name` | ||
- **Usage**: Useful for understanding the overall time spent on reconciliation processes. | ||
|
||
## Custom Resource Status | ||
|
||
In this section, we delve into the status of various custom resources managed by our controllers. The `kubectl describe` command can be used to fetch the status of these resources. | ||
|
||
### Template Group Instance | ||
|
||
Status from the `templategroupinstances.tenantoperator.stakater.com` custom resource: | ||
|
||
- **Current Operational State**: Provides a snapshot of the resource's current condition. | ||
- **Conditions**: Offers a detailed view of the resource's status, which includes: | ||
- `InstallSucceeded`: Indicates the success of the instance's installation. | ||
- `Ready`: Shows the readiness of the instance, with details on the last reconciliation process, its duration, and relevant messages. | ||
- `Running`: Reports on active processes like ongoing resource reconciliation. | ||
- **Deployed Namespaces**: Enumerates the namespaces where the instance has been deployed, along with their statuses and associated template manifests. | ||
- **Manifest Hashes**: Includes the `Template Manifests Hash` and `Resource Mapping Hash`, which provide versioning and change tracking for template manifests and resource mappings. | ||
|
||
## Log Interpretation Guide | ||
|
||
### Template Group Instance Controller | ||
|
||
Logs from the `tenant-operator-templategroupinstance-controller`: | ||
|
||
- **Reconciliation Process**: Logs starting with `Reconciling!` mark the beginning of a reconciliation process for a TemplateGroupInstance. Subsequent actions like `Creating/Updating TemplateGroupInstance` and `Retrieving list of namespaces Matching to TGI` outline the reconciliation steps. | ||
- **Namespace and Resource Management**: Logs such as `Namespaces test-namespace-1 is new or failed...` and `Creating/Updating resource...` detail the management of Kubernetes resources in specific namespaces. | ||
- **Worker Activities**: Logs labeled `[Worker X]` show tasks being processed in parallel, including steps like `Validating parameters`, `Gathering objects from manifest`, and `Apply manifests`. | ||
- **Reconciliation Completion**: Entries like `End Reconciling` and `Defering XXth Reconciling, with duration XXXms` indicate the end of a reconciliation process and its duration, aiding in performance analysis. | ||
- **Watcher Events**: Logs from `Watcher` such as `Delete call received for object...` and `Following resource is recreated...` are key for tracking changes to Kubernetes objects. | ||
|
||
These logs are crucial for tracking the system's behavior, diagnosing issues, and comprehending the resource management workflow. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters