Releases: kubecost/cost-analyzer-helm-chart
Releases · kubecost/cost-analyzer-helm-chart
V2.2.6-rc.1
Fixes
- Fixed issues that caused “white screens” with Kubecost free/trial installations
V2.3.0-rc.7
- Modify orchestrator/debug endpoint to allow progress to be given without access to write database, to prevent race condition/panic.
V2.3.0-rc.6
- Disable debug and orchestrator endpoints for testing race condition while copying database.
V2.3.0-rc.5
- Fix panic from ingestion
- Add option DB_COPY_FULL for handling db promotion
V2.3.0-rc.4
- Fix panic from ingestion
- Fix memory leak in ingestion
V2.3.0-rc.3
V2.3.0-rc.3 Release Notes
Overview:
Version 2.3.0-rc.3 is a public release candidate for v2.3.0 which will be a 'production' release focused on targeting bug fixes and stability.
The upgrade will create a new aggregator
database, which is used to quickly serve Kubecost metrics. This can take hours in large environments (large is considered $10k+ per day in Kubernetes costs). For these environments, it is possible to run a second "parallel" Kubecost primary environment. Please reach out to us via slack for assistance with this process.
Major:
- Kubernetes Efficiency View - Easily breakdown the efficiency of your clusters and workloads over time.
- A link to this view has been added to the Cluster Efficiency card at the top of the Overview page.
- Enterprise Integration (Postgres) - Add the ability to integrate kubecost data for enterprise customers to export kubecost data for usage with BI tools with Postgres.
- Custom SMTP Server integration * Add the ability to integrate with custom SMTP servers for alerts, budgets, etc instead of using Kubecost’s default SMTP solution.
- Anomaly Detection Enhancements
- Change how anomalies are detected and make the output more actionable. Anomalies are now detected on a rolling lookback window.
- Add a user defined threshold and minimum cost filter for detecting anomalies.
- Add anomaly detection for allocations as well as cloud costs.
- When navigating to the allocations or cloud costs page by clicking an anomaly, the anomalous entry will now be highlighted, and the lookback window marked.
- All Business Tier users granted access to full Enterprise features during transition period.
Minor:
- Cross-provider cur access - IRSA access for cloud cost integrations in kubecost with multiple providers.
- Add the ability to end a free trial with the /expireProductTrial endpoint, allowing users to see what Free tier features are available even if they have begun a free trial.
- Added a button to the Settings page which can be used to call this endpoint.
- Free trials will now automatically begin when the limit of 250 monitored cores is exceeded, rather than upon install. Trials can still be started manually via the settings page.
- Additional diagnostic information is available in bug reports, and is used to more accurately indicate the state of data ingestion on startup. The Helm Chart version is now also visible from the settings page.
- Updated the workload field label for Budgets.
- Add support for specifying label values in Assets filters.
- Reduced frequency of calls to a diagnostics endpoint.
Ingestion Fixes
A large focus has been placed on fixing data ingestion issues that we have seen in live environments during this release cycle. Below are a list of the focus areas
- Orders of magnitude performance increase in ingestion and derivation of allocations, assets, cloud cost, network and containerstats data.
- Fix an issue with the promotion of “write data” to “read data” where a race condition would sometimes cause a breakage.
- Fix the error ‘internal list scan offset is out of range’ updates to the database.
- Added many new diagnostics data points for assistance in troubleshooting and to help ensure a healthy flow of data during the initial phases of ingesting especially large datasets.
- Add the ability to automatically grow the refresh interval on initial load of large datasets. This fixes an issue where on large datasets the ingestion process would be halted so the next ingestion process could begin. This feature will grow the refresh interval in the beginning to allow ingestion to get to a complete state before promoting and moving to the next ingestion cycle.
- Add the ability to get a first cut of the data faster when a full reingest is being processed, enabling the nearest time-series data to be viewed while the historical data is still being ingested.
- Fix the order of ingestion to make sure daily data, and the latest data is ingested first.
Fixes:
- Fix Summary Allocation windowing inconsistencies between different accumulation options.
- Fix the http 500 error in Cluster Sizing error when some nodes don’t have a valid asset type.
- Fix an issue with savings api for clusters that contain Fargate nodes (nodes without a node type).
- Fix the http 500 error in Assets Topline API when aggregating by label.
- Fix an error filtering with the “contains”/”contains prefix”/”contains suffix” operators on custom labels.
- Fix an issue with multiple reports with the same profile being created on pod restart when using v1 filters in the config map.
- Fix an issue with Business tier licenses not being appropriately recognized post v2.0.
- Fix an issue with /debug/orchestrator and /providerOptimization endpoints not being accessible when core count exceeds free tier and no valid license or trial.
- Fix collections to use filters from teams/rbac configuration.
- Fix an issue with duplicate budgets being created when creating a new budget and a budget with the same criteria already exists.
- Fix an issue causing negative idle when multiple clusters share the same nodes.
- Fix an issue where cloud costs and external costs processes could be initialized in cost-model even when the separate container is running for cloud cost and external costs causing messy logs and un-needed processing in the cost-model container.
- Fix an issue with SMTP connection causing a panic.
- Fix issues with Idle calculation with service/label aggregation.
- Fix an issue with SAML configuration when query filter is empty and saml filter is not empty.
- Fix an issue with request sizing missing valid parameters in query validation.
- Fix an issue when aggregating by predefined label aliases (deployment, daemonset, etc).
- Fix idle sharing of CPU, GPU, and RAM for Allocations API.
- Fix aggregate by label when separating idle.
- Fix an issue with drastic differences between assets visual representation in 2.x versus 1.108. This was due to seemingly duplicative data for certain time periods. Added cluster id to ingestion to aid in the de-duplication of this time-series data.
- Fix an error in Assets View API when the end of the window is empty causing a Boundary Error.
- Fix many noisy logs to be logged at the appropriate level or removed for ease of understanding state and troubleshooting.
- Fix an issue when saving a scheduled report where the next run was sometimes not appropriately set.
- Fix an error when enabling .Values.saml.enabled=true and .Values.readonly=true.
- Fix an error where network insights would not be visible even when configuration was set to enable.
- Fix an issue with Address Network cost reconciliation for Azure provider with an edge case for virtual machine scale sets.
- Fix an issue where unallocated__idle was being returned in /savings/requestSizingv2.
- Fix cluster sizing recommendation failures when nil objects detected.
- Fix Assets API to accurately align topline and table data.
- Fix Allocations to appropriately display idle for unallocated workloads.
- Fix large inflated node prices before reconciliation occurs.
- Updates Cloud Cost ingestion for GCP to fall back to resource.global_name when resource.name is null for determining ProviderID. This is particularly relevant for Cloud SQL, Cloud Storage, and Cloud Logging, which very often have null resource.name values, resulting in unallocated ProviderID values.
- Fix an issue where some time series charts used the end of a time period for their x-axis instead of the start.
- Fix an issue where the UI would attempt to show hourly data for External Costs on small, recent time windows. Hourly data is not collected for External Costs at this time.
- Fix an issue where idle costs were not represented in the Namespaces table of the cluster inspect page.
- Fix an issue where exporting the values.yaml entries for reports would sometimes format filters incorrectly.
- Fix an issue on the cluster-inspect page in which the cost for a namespace did not account for the shareTenancyCost configuration.
- Fix for sorting by cluster name on clusters Page.
- Fix UI issues with the Automatic Request Rightsizing Action, where no loading indicator was shown while the action was registered, and failures were not indicated to the user.
- Fix an issue in which filters were not properly set when drilling into an anomaly when using multi-aggregation.
- Fix an issue where Business Tier users were being blocked from usage when exceeding 250 cores monitored. Business Tier is now effectively Enterprise.
- Fix an issue where table header sort icon (up/down arrow) sometimes appeared to the left of the text instead of to the right.
- Fix an issue where the “Add Cloud Provider” button would be absent from the Settings page unless at least one provider was already set up.
- Fix an issue with deleting Asset reports.
- Fix an issue where the diagnostics page would get stuck in Loading state when calls to the github API failed.
- Fix an issue where table would reset to the first page when opening the Assets detail dialog.
- Fix several issues with Cloud Provider creation.
- Fix an issue where filters like Contains and Starts With, which are only useful when they do not exactly match the queried item, did not allow for free-form text: the user was forced to click on an autocomplete option.
Helm Changes:
- #3456 Fix option for non-federated primary, allowing the primary cluster to serve the user interface without shipping the local cluster metrics to s3.
- #3444 Add scrape configuration for aggregator telemetry metrics.
- Move product key proxies to aggregator container.
- #3440 Update cluster controller to resolve security issues.
- #3437 Update kubecost-modeling to add many new anomaly detection features, bug fixes for anomaly detection, and forecasting, and update...
v2.2.5
Fixed
- Fix an issue with expired trials that had 403 status codes and json-formatting response causing white screen on UI
- Fix an issue with very large price discrepancy on Rancher nodes
- Fix an issue in allocations when aggregating by aliased labels
V2.1.2
Merge pull request #3406 from kubecost/bump-2.1.2 Bump in-code version of v2.1 branch for 2.1.2
V2.2.4
Fixed
- Fix an issue with prefix etl buckets (ie. bucket-name/folder-name) that was causing ingestion not to work correctly.
- Fix an issue with Azure Cloud Cost ingestion where line items with "Virtual Machines Licenses" received an incorrect categorization as "other" while representing the cost of compute resources.
- Fix an issue where the Assets graph would not load for windows where part of the window was missing asset data.
- Fix an issue with collections not loading for certain saml configurations.
V2.2.3
Fixed
- Fix issues with creating Cloud provider integrations via the UI.
- Fix an issue with continuous right-sizing calls not passing seconds in timestamp to API.
- Fix an issue with Abandoned Workloads where selecting the date range would override the threshold.
- Fix an issue where the cluster inspect page would incorrectly show used/requested cpu cores as millicores.
- Fix issues where the overview network tables did not match the data shown on click-through.
- Fix an issue with AWS workload identities.
- Fix network insights API returning an empty response in a single cluster install.
- Fix an issue creating scheduled reports via the helm config map resulting in next run time not being set.
- Fix an error when aggregating by label and annotation when idle costs are not hidden.
- Fix an error when querying assets with empty asset sets giving boundary errors.
- Fixes idle sharing which was broken for CPU, GPU, and RAM on the Allocation API
- Fixes a bug where idle GPUs were not properly accounted
- Fix an error on asset ingestion allowing spikes in pv costs.
Helm Changes
#3324 Disable helm-rollout-restart if cicd=true