- v0.23.0
- v0.22.0
- v0.21.2
- v0.21.1
- v0.21.0
- v0.20.2
- v0.20.1
- v0.20.0
- v0.19.0
- v0.18.0
- v0.17.0
- v0.16.0
- v0.15.0
- v0.14.0
- v0.13.0
- v0.12.0
- v0.11.0
- v0.10.0
- v0.9.0
- v0.8.0
- v0.7.0
- v0.6.0
- v0.5.0
- Update the Velero plugin for AWS to v1.3.1
- Updated ingress-nginx helm chart to v4.1.3 and ingress-nginx controller image to v1.2.1
Breaking changes - deprecated http2_recv_timeout in favor of client_header_timeout (client-header-timeout); - deprecated http2_max_field_size (http2-max-field-size) and http2_max_header_size (http2-max-header-size) in favor of large_client_header_buffers (large-client-header-buffers); - deprecated http2_idle_timeout and http2_max_requests (http2-max-requests) in favor of keepalive_timeout (upstream-keepalive-timeout?) and keepalive_requests (upstream-keepalive-requests?) respectively; - added an option to jail/chroot the nginx process, inside the controller container, is being introduced; - implemented an object deep inspector. The inspection is a walk through of all the spec, checking for possible attempts to escape configs.
- Updated the prometheus-alerts chart alerts and rules
- Bump falco-exporter chart to v0.8.0.
- Users are now not forced to use proxy for connecting to alertmanager but can use port-forward as well.
- The OpenSearch security config will now be managed completely by securityadmin
- Patched Falco rules and added the rules
Change thread namespace
&System procs network activity
. - set the user-alertmanager default receiver to null
- Increased limits for thanos receiveDistributor
prometheus-blackbox-exporter's
internal thanos servicemonitor changed name to avoid name collisions.- dex
topologySpreadConstraints
matchLabel was changed fromapp: dex
toapp.kubernetes.io/name: dex
to increase stability of replica placements. - Fixed issue where user admin groups wasn't added to the user alertmanager rolebinding
- Fixed links in welcome dashboard
- Add option to encrypt off-site buckets replicated with rclone sync
- Added metrics for field mappings and an alert that will throw an error if the fields get close to the max limit.
- Add support for automatic reloading of the security config for OpenSearch
- Warning: When this runs the security plugin settings will be reset. All users, roles, and role mappings created via the API will be removed, so create a backup or be prepared to recreate the resources.
- The securityadmin can be disabled to protect manually created resources, but it will prevent the OpenSearch cluster to initialize the security plugin when the cluster is forming.
- Add missing roles for alerting in OpenSearch
- Make the clean script more verbose what cluster will be cleaned.
- Added possibility to use either encrypted or unencrypted kubeconfigs. The scripts will automatically detect if the file is encrypted or not.
- wcReader mentions from all configs files
- Set S3 region in OpenSearch config
- Bump kubectl version to v1.22.6
- Patched Falco rules for
write_etc_common
,Launch Package Management Process in Container
,falco_privileged_images
&falco_sensitive_mount_containers
. Will be removed if upstream Falco Chart accepts these. - Improved error handling for applying manifests in wc deploy script
kube-prometheus-stack-alertmanager
is configured to have 2 replicas to increase stability and make it highly available.- Add pattern
security-auditlog-*
to default retention for Curator
- Issue where users couldn't do
POST
orDELETE
requests to alertmanager via service proxy - Fixed deploy script with correct path to
extra-user-view
manifest. - Fixed issue when
keys
in config had'.'
in its name and was being moved fromsc/wc
tocommon
configs. - Fixed broken index per namespace feature for logging. The version of
elasticsearch_dynamic
plugin in Fluentd no longer supports OpenSearch. Now the OpenSearch output plugin is used for the feature thanks to the usage of placeholders. - Fixed conflicting type
ts
in opensearch, where multiple services logts
as different types. - Fixed conflicting type
@timestamp
, should always bedate
in opensearch. - Fluentd no longer tails its own container log. Fixes the issue when Fluentd failed to push to OpenSearch and started filling up its logs with
\
. Because recursive logging of its own errors to OpenSearch which kept failing and for each fail adding more\
. - Split the grafana-ops configmaplist into separate configmaps, which in some instances caused errors in helm due to the size of the resulting resource
- PrometheusNotConnectedToAlertmanagers alert will be sent to
null
if Alertmanger is disabled in wc - Removed undefined macro preventing falco rules to be compiled
- Add missing default config option for prometheus replicas
- Added support for Elastx
- Added support for UpCloud
- Made thanos storegateway persistence size configurable
- New 'Welcoming' Opensearch dashboard / home page.
- New 'Welcoming' Grafana dashboard / home page.
- Add allowlisting for kubeapi-metrics (wc) and thanos-receiver (sc) endpoints
- Add support for running prometheus in HA mode
- Add option for deduplication/vertical compaction with thanos-compactor
- Removed disabled releases from helmfile
- Fixed broken index per namespace feature for logging. The version of
elasticsearch_dynamic
plugin in Fluentd no longer supports OpenSearch. Now the OpenSearch output plugin is used for the feature thanks to the usage of placeholders. - Fixed conflicting type
ts
in opensearch, where multiple services logts
as different types. - Fixed conflicting type
@timestamp
, should always bedate
in opensearch. - Fluentd no longer tails its own container log. Fixes the issue when Fluentd failed to push to OpenSearch and started filling up its logs with
\
. Because recursive logging of its own errors to OpenSearch which kept failing and for each fail adding more\
.
- Improved error handling for applying manifests in wc deploy script
- Fixed deploy script with correct path to
extra-user-view
manifest. - Issue where users couldn't do
POST
orDELETE
requests to alertmanager via service proxy.
- Added the repo - "quay.io/jetstack/cert-manager-acmesolver" in allowrepo safeguard by default.
- Backup operator namespaces can for example be added as veloro parameters to be able to back them up. 'alertmanager' is added as default in the workload cluster.
- Set 'continue_if_exception' in curator as to not fail when a snapshot is in progress and it is trying to remove some indices.
- Vulnerability scanner reports ttl is now set to 720 hours, i.e., 30 days.
- Reports will now be deleted every 30 days by the operator and newer reports are generated.
- Older reports that are not created with ttl parameter set, should be deleted manually.
- Users are now allowed to get ClusterIssuers.
- Changed the container names of the vulnerability exporter to a bit more meaningful ones.
- Added persistence to alertmanager.
- made the CISO grafana dashboards visible to the end-users
- indices.query.bool.max_clause_count is now configurable.
- Patched Falco rules for
k8s_containers
,postgres_running_wal_e
&user_known_contact_k8s_api_server_activities
. Will be removed if upstream Falco Chart accepts these. - Curator can now delete all but for system indices.
- Added the user-permissions available pre-defined alerting roles for opensearch.
- PrometheusBlackboxExporter targets with customized propes added for internal service health-checking.
- The dex chart has been upgraded from version 0.6.3 to 0.8.2. Dex has changed to have two replicas to increase the stability of OpenSearch's authentication. A dex ServiceMonitor has also been enabled
- Self service: User admins are now allowed to add new users to the clusterrole user-view. Clusterrole and Clusterrolebinding has been added accordingly.
- Enabled falcosidekick alertmanager if user alertmanager is also enabled
- fluentd are upgraded from 2.4.0 to 5.0.15 and fluentd-elasticsearch are upgraded from 10.2.1 to 13.3.0. Elastisys also made their own fluentd-elasticsearch container image using fluentd-plugin-opensearch 1.0.4 to work with opensearch.
- Changed the grafana image tag from 8.2.7 to 8.4.7 in both user-grafana and kube-prometheus-stack, as the latter has less vulnerabilities.
- Changed the harbor chartmuseum image tag from 2.2.1 to 2.4.2 as the latter has less vulnerabilities.
- Exposed fluentd-elasticsearch buffer settings in the wc-config.yaml
- Increased fluentd
FOR
alerts to 30m, should decrease number of false-positive alerts.
- Use
master
tag for the grafana-label-enforcer as the previous sha used no longer exist. - The opensearch SLM job now uses
/_cat/snapshots
to make it work better when there are a large amount of snapshots available. - predictlinear alerts
- Calico-accountant is now being scheduled on master nodes.
- it is now possible to set tolerations and affinity for vulnerability-exporter
- SC log retention no longer fails silently after removing one day of logs.
- the possibility to add falco custom rules for each environment
- New Grafana dashboard that shows how many timeseries there are in Prometheus.
- Added the alternative port for kubelogin (18000) to be an allowed redirect url by default.
- Increased fluentd
FOR
alerts to 30m, should decrease number of false-positive alerts.
- Issue where users couldn't do
POST
orDELETE
requests to alertmanager via service proxy
- Set 'continue_if_exception' in curator as to not fail when a snapshot is in progress and it is trying to remove some indices.
- Added persistence to alertmanager.
- Made the CISO grafana dashboards visible to the end-users
- Ingress-nginx has been upgraded from 0.49.3 to 1.1.1.
- In ingress-nginx >= 1.0.0, an ingressClass object is required.
By default, an ingressClass called
nginx
will be available in the cluster. Ingress-nginx will still handle ingresses that do not specify aningressClassName
, however users are strongly encouraged to update their Ingress Objects and specifyspec.ingressClassName: nginx
. - The entire changelog can be found here.
- In ingress-nginx >= 1.0.0, an ingressClass object is required.
By default, an ingressClass called
- Added a new config
global.containerRuntime
(default set tocontainerd
).- Supported runtimes are
containerd
anddocker
- Supported runtimes are
- The option to enable and configure kured to keep nodes up to date with security patches. Kured is disabled by default.
- Storageclass installation is not longer part of the bootstrap step. If you have the nfs-provisioner or the local-pv-provisioner installed, they will be left untouched when upgrading. You are responsible for managing them and/or removing them, and any unsused storageClasses.
- InfluxDB is deprecated and Thanos is now enabled by default.
- With the removal of InfluxDB, the backups and buckets can eventually be removed.
- Running without object storage is no longer supported since it is required for Thanos.
- The dev flavor is now updated to use s3 by default
- Upgraded nginx-ingress helm chart to
v4.0.17
, which upgrade nginx-ingress tov1.1.1
. When upgrading an ingressClass object callednginx
will be installed, this class has been set as the default class in Kubernetes. Ingress-nginx has been configured to still handle existing ingress objects that do not specify anyingressClassName
. Read more on the ingressClassName changes here. - Upgraded starboard-operator helm chart to
v0.9.1
, upgrading starboard-operator tov0.14.1
- Exposed sc-log-retention's resource requests.
- Persist Dex state in Kubernetes.
- Upgrade gatekeeper helm chart to
v3.7.0
, which also upgrades gatekeeper tov3.7.0
. - Updated opensearch helm chart to version
1.7.1
, which upgrades opensearch tov1.2.4
. - Renamed release
blackbox
toprometheus-blackbox-exporter
. - Added new panel to backup dashboard to reflect partial, failed and successful velero backups
- Alertmanager group-by parameters was removed and replaced by the special value
...
See https://github.com/prometheus/alertmanager/blob/ec83f71/docs/configuration.md#route for more information - Exposed opensearch-slm-job max request seconds for curl.
- Made opensearch-slm-job more verbose when using curl.
- Update kubeapi-metrics ingress api version to
networking.k8s.io/v1
. - Fluentd can now properly handle and write orphaned documents to Opensearch when using the index per namespace feature.
The orphaned documents will be written to
.orphaned-...
indices, which a user does not have access to read from. - Add
ingressClassName
in ingresses where that configuration option is available. - Upgrade velero helm chart to
v2.27.3
, which also upgrades velero tov1.7.1
. - Upgrade prometheus-elasticsearch-exporter helm chart to v4.11.0 and prometheus-elasticsearch-exporter itself to v1.3.0
- Exposed options for starboard-operator to control the number of jobs it generates and to allow for it to be disabled.
- Added the new OPA policy - disallowed the latest image tag.
- Moved
user.alertmanager.group_by
toprometheus.alertmanagerSpec.groupBy
insc-config.yaml
- Moved
user.grafana.userGroups
touser.grafana.oidc.userGroups
insc-config.yaml
- kubeconfig.bash have been edited to work with the new 'secret' structure.
- memory limit for thanos receiveDistributor and pvc size for thanos receiver
- Increased cpu requests and limits for kube-state-metrics
- Thanos is now enabled by default.
- Disabled default kube-prometheus-stack rules and copied them over to prometheus alerts
- Modified rules to allow for different labels for alert and record rules, and to pass the cluster label through aggregations
- Unused rules have been dropped
- Grouped thanos charts
- Configured thanos-ruler so it is enabled by default, runs as an HA pair without persistence, and dynamically reloads its rules on changes
- Changed dashboards previously defaulting to wc-reader to default to "Thanos All"
- Changed service cluster prometheus to use an external label instead of service monitor relabeling and write relabeling
- Increased resources for thanos receiveDistributor, compactor and storegateway components
- Exposed harbor components replicas in config
- Opensearch unable to parse
"source":{}
when gatekeeper starts up. The log including"source":{}
from gatekeeper is excluded for now. - Fixed some grafana dashboards so they can retrieve the cluster label properly
- Fixed opensearch naming on falco and gatekeeper dashboard
- Fixed the missing tag on the grafana-label-enforcer.
- Fixed the gatekeeper templates by adding the legacySchema: true and correcting the apiVersion.
- Added Prometheus alerts for the 'backup status' and 'daily checks' dashboards. Also, 's3BucketPercentLimit' and 's3BucketSizeQuotaGB' parameters to set what limits the s3 rule including will alert off.
- RBAC for admin user so that they now can list pods cluster wide and run the
kubectl top
. - Added containerd support for fluentd.
- added the option to disable predict linear alerts
- fluentd alerts for sc #812
- fluentd grafana dashboard #812
kured
- Kubernetes Reboot Daemon. Added helm chart version2.11.2
which defaults tov1.9.1
of the application.- Added dummy thanos-ruler instance to make prometheus-operator collect rules to be evaluated by thanos
- Added alerts when no metrics are received from sc and wc cluster.
- Removed disabled helm releases from the application helmfile
- The no longer needed rolebinding and clusterrole
metrics
has been removed. - Storageclass installation from bootstrap step.
- Removed helm charts for nfs-provisioner and local-pv-provisioner.
- Removed influxDB and dependent helm charts
- Check out the upgrade guide for a complete set of instructions needed to upgrade.
- This release introduces a new feature called "index per namespace". Enabling it makes fluentd log to indices in elasticsearch based on the namespace from which the container log originates.
- CK8S_FLAVOR is now mandatory on init
- This release migrates from Open Distro for Elasticsearch to OpenSearch.
- Updated Blackbox chart to v5.3.1, and blackbox app to v0.19.0
- HTTP probe: no_follow_redirects has been renamed to follow_redirects
- Added option to enable thanos as a metric storage solution Thanos will in the future replace influxDB, we strongly encourage enabling thanos so that when influxdb is removed metrics will already be present in thanos. Removing InfluxDB is not supported in this release.
- kubectl version from v1.19.8 to v1.20.7 #725
- updated falco helm chart to version 1.16.0, this upgrades falco to version 0.30.0
- cert-manager 1.4.0 upgraded to 1.6.1
- Updated Open Distro for Elasticsearch to 1.13.3 to mitigate CVE-2021-44228 & CVE-2021-45046
- kube-prometheus-stack to v19.2.2 #685
- upgrade prometheus-operator to v0.50.0
- sync dashboards, rules and subcharts
- add ability to specify existing secret with additional alertmanager configs
- add support for prometheus TLS and basic auth on HTTP endpoints
- allows to pass hashed credentials to the helm chart
- add option to override the allowUiUpdates for grafana dashboards
- promethues to v2.28.1 full changelog
- grafana to v8.2.7 full changelog
- security fixes: CVE-2021-43798), CVE-2021-41174, stylesheet injection vulnerability, short URL vulnerability, CVE-2021-36222, CVE-2021-39226
- accessControl: Document new permissions restricting data source access. #39091
- admin: Prevent user from deleting user's current/active organization. #38056
- oauth: Make generic teams URL and JMES path configurable. #37233,
- kube-state-metrics to v2.2.0 full changelog
- node exporter to v1.2.2 full changelog
- updated metrics-server helm chart to version 0.5.2, this upgrades metrics-server image to 3.7.0 #702
- Updated Dex chart to v0.6.3, and Dex itself to v2.30.0
- Updated Blackbox chart to v5.3.1, and blackbox app to v0.19.0
- HTTP probe: no_follow_redirects has been renamed to follow_redirects
- The falco grafana dashboard now shows the misbehaving pod and instance for traceability
- Reworked configuration handling to use a common config in addition to the service and workload configs. This is handled in the same way as the sc and wc configs, meaning it is split between a default and an override config. Running
init
will update this configuration structure, update and regenerate any missing configs, as well as merge common options from sc and wc overrides into the common override. - Updated fluentd config to adhere better with upsream configuration
- Fluentd now logs reasons for 400 errors from elasticsearch
- Enabled the default rules from kube-prometheus-stack and deleted them from
prometheus-alerts
chart #681 - Enabled extra api server metrics #681
- Increased resources requests and limits for Starboard-operator in the common config #681
- Updated the common config as "prometheusBlackboxExporter" will now be required in both sc and wc cluster
- moved the elasticsearch alerts from the prometheus-elasticsearch-exporter chart to the prometheus-alerts chart #685
- Changed the User Alertmanager namespace (alertmanager) to an operator namespace from an user namespace
- Moved the User Alertmanager RBAC to
user-alertmanager
chart - Made CK8S_FLAVOR mandatory on init
- Exposed harbor's backup retention period as config.
- Migrated from OpenDistro for Elasticsearch to OpenSearch.
- This will be a breaking change as some API, specifically related to plugins and security, have been renamed in OpenSearch. The impact will be minimal as the function of the API will stay mostly the same, and the configuration will basically works as is, although renamed. The user experience will change slightly as this will replace Kibana with OpenSearch Dashboards, however the functionality remains the same.
- OpenSearch is compatible with existing tools supporting ODFE using a compatibility setting, however this will only last for version 1.x. Newer versions of offical Elasticsearch tools and libraries already contain checks against unofficial Elasticsearch and will therefore not work for either ODFE or OpenSearch. Older versions exists that will still work, and the OpenSearch project is working on providing their own set of tools and libraries.
- This will cause downtime for Elasticsearch and Kibana during the migration, and OpenSearch and OpenSearch Dashboards will replace them. Data will be kept by the migration steps, except security settings, any manually created user or roles must be manually handled.
- resources requests and limits for falco-exporter, kubeStateMetrics and prometheusNodeExporter #739
- increased resource requests and limits for falco-exporter, kubeStateMetrics and prometheusNodeExporter #739
- increased the influxDB pvc size #739
- Exposed velero's backup timetolive for both sc and wc.
- disabled internal database for InfluxDB
- OPA policies are now enforced (deny) for the prod flavor.
- Added option to disable influxDB
- Moved prometheus-blackbox-exporter helm chart to the upstream charts folder
- Grafana dashboards by keeping more metrics from the kubeApiServer #681
- Fixed rendering of new prometheus alert rule to allow it to be admitted by the operator
- Fixed rendering of s3-exporter to be idempotent
- Fixed bug where init'ing a config path a second time without the
CK8S_FLAVOR
variable set would fail. - Fixed relabeling for rook-ceph and cert servicemonitor.
- Fluentd will now properly detect changes in container logs.
- The
init
script will now properly generate secrets for new configuration options. - Fixed an issue preventing OpenSearch to run without snapshots enabled
- Fixed a permission issue preventing OpenSearch init container to run sysctl
- Added fluentd metrics
- Enabled automatic compaction (cleanup) of pos_files for fluentd
- Added and enabled by default an option for Grafana Viewers to temporarily edit dashboards and panels without saving.
- New Prometheus rules have been added to forewarn against when memory and disk (PVC and host disk) capacity overloads
- Added the possibility to whitelist IP addresses to the loadbalancer service
- Added pwgen and htpasswd as requirements
- Added the blackbox installation in the wc cluster based on ADR to monitor the uptime of internal services as well in wc .
- Added option to enable index per namespace feature in fluentd and elasticsearch
- Added optional off-site backup replication between two providers or regions using rclone sync
- Added option to enable thanos as a metric storage solution
- Added node exporter full dashboard
- Removed disabled helm charts. All have been disabled for at least one release which means no migration steps are needed as long as the updates have been done one version at a time.
nfs-client-provisioner
gatekeeper-operator
common-psp-rbac
workload-cluster-psp-rbac
- Removed the "prometheusBlackboxExporter" from sc config and updated the common config as it will now be required in both sc and wc cluster
- Removed curator alerts
- Removed
blackbox
helm chart
- Check out the upgrade guide for a complete set of instructions needed to upgrade.
- ingress-nginx chart was upgraded from 2.10.0 to 3.39.0 and ingress-nginx-controller was upgraded from v0.28.0 to v.0.49.3. During the upgrade the services may be unavailable for short period of time.
With this version:
- set allow-snippet-annotations: “false” to mitigate CVE-2021-25742
- only ValidatingWebhookConfiguration AdmissionReviewVersions v1 is supported
- the nginx-ingress-controller repository was deprecated
- access-log-path setting is deprecated
- server-tokens, ssl-session-tickets, use-gzip, upstream-keepalive-requests, upstream-keepalive-connections have new defaults
- TLSv1.3 is enabled by default
- Add linux node selector as default
- Update versions of components for base image, including nginx-http-auth-digest, ngx_http_substitutions_filter_module, nginx-opentracing, opentracing-cpp, ModSecurity-nginx, yaml-cpp, msgpack-c, lua-nginx-module, stream-lua-nginx-module, lua-upstream-nginx-module, luajit2, dd-opentracing-cpp, ngx_http_geoip2_module, nginx_ajp_module, lua-resty-string, lua-resty-balancer, lua-resty-core, lua-cjson, lua-resty-cookie, lua-resty-lrucache, lua-resty-dns, lua-resty-http, lua-resty-memcached, lua-resty-ipmatcher
- Updated influxdb chart 4.8.12 to 4.8.15
- Updated starboard-operator chart from v0.5.1 (app v0.10.1) to v0.7.0 (app v0.12.0), this introduces a PSP RBAC as a subchart since the Trivy scanners were unable to run.
-
ingress-nginx increased the value for client-body-buffer-size from 16K to 256k
-
Lowered default falco resource requests
-
The timeout of the prometheus-elasticsearch-exporter is set to be 5s lower than the one of the service monitor
-
fluentd replaced the time_key value from time to requestReceivedTimestamp for kube-audit log pattern #571
-
group_by in alertmanager changed to be configurable
-
Reworked harbor restore script into a k8s job and updated the documentation.
-
Increased slm timeout from 30 to 45 min.
-
charts/grafana-ops #587:
- create one ConfigMap for each dashboard
- add differenet values for "labelKey" so we can separate the user and ops dashboards in Grafana
- the chart template to automatically load the dashboards enabled in the values.yaml file
-
grafana-user.yaml.gotmpl:
- grafana-user.yaml.gotmpl to load only the ConfiMaps that have the value of "1" fron "labelKey" #587
- added prometheus-sc as a datasource to user-grafana
-
enabled podSecurityPolicy in falco, fluentd, cert-manager, prometheus-elasticsearch-exporter helm charts
-
ingress-nginx chart was upgraded from 2.10.0 to 3.39.0. #640 ingress-nginx-controller was upgraded from v0.28.0 to v.0.49.3 nginx was upgraded to 1.19
Breaking Changes: * Kubernetes v1.16 or higher is required. Only ValidatingWebhookConfiguration AdmissionReviewVersions v1 is supported. * Following the Ingress extensions/v1beta1 deprecation, please use networking.k8s.io/v1beta1 or networking.k8s.io/v1 (Kubernetes v1.19 or higher) for new Ingress definitions * The repository https://quay.io/repository/kubernetes-ingress-controller/nginx-ingress-controller is deprecated and read-only
Deprecations: * Setting access-log-path is deprecated and will be removed in 0.35.0. Please use http-access-log-path and stream-access-log-path
New defaults: * server-tokens is disabled * ssl-session-tickets is disabled * use-gzip is disabled * upstream-keepalive-requests is now 10000 * upstream-keepalive-connections is now 320 * allow-snippet-annotations is set to “false”
New Features: * TLSv1.3 is enabled by default * OCSP stapling * New PathType and IngressClass fields * New setting to configure different access logs for http and stream sections: http-access-log-path and stream-access-log-path options in configMap * New configmap option enable-real-ip to enable realip_module * Add linux node selector as default * Add hostname value to override pod's hostname * Update versions of components for base image * Change enable-snippet to allow-snippet-annotation * For the full list of New Features check the Full Changelog
Full Changelog: https://github.com/kubernetes/ingress-nginx/blob/main/Changelog.md
-
enable hostNetwork and set the dnsPolicy to ClusterFirstWithHostNet only if hostPort is enabled #535
Note: The upgrade will fail while disabling the hostNetwork when LoadBalancer type service is used, this is due removing some privileges from the PSP. See the migration steps for more details.
-
Prometheus alert and servicemonitor was separated
-
Default user alertmanager namespace changed from monitoring to alertmanager.
-
Reworked configuration handling to keep a read-only default with specifics for the environment and a seperate editable override config for main configuration.
-
Integrated secrets generation script into
ck8s init
which will by default generate password and hashes when creating a newsecrets.yaml
, and can be forced to generate new ones with the flag--generate-new-secrets
. -
Increased metricsserver resource limits.
-
Increased cert-managers resource limits.
-
Increased harbor resource request and limits.
- Fixed influxdb-du-monitor to only select influxdb and not backup pods
- Added dex/dex as a need for opendistro-es to make kibana available out-the-box at cluster initiation if dex is enabled
- Fixed disabling retention cronjob for influxdb by allowing to create required resources
- Fixed harbor backup job run as non-root
- fixed "Uptime and status", "ElasticSearch" and "Kubernetes cluster status" grafana dashboards
- Fixed warning from velero that the default backup location "default" was missing.
- Fixed dex tls handshake failed
- Added the ability to configure elasticsearch ingress body size from sc config.
- Added RBAC to allow users to view PVs.
- Added group support for user RBAC.
- Added option
elasticsearch.snapshot.retentionActiveDeadlineSeconds
to control the deadline for the SLM job. - Added configuration properties for falco-exporter.
- calico-felix-metrics helm chart to enable calico targets discovery and scraping calico felix grafana dashboard to visualize the metrics
- Added JumpCloud as a IDP using dex.
- Setting chunk limit size and queue limit size for fluentd from sc-config file
- Added options to configure the liveness and readiness probe settings for fluentd forwarder.
- resource requests for apps #551
NOTE: This will cause disruptions/downtime in the cluster as many of the pods will restart to apply the new resource limits/requests. Check your cluster available resources before applying the new requests. The pods will remain in a pending state if not enough resources are available.
- Increased Velero request limits.
- Velero restic backup is now default
- Velero backups everything in user namespaces, opt out by using label compliantkubernetes.io/nobackup: velero
- Added configuration for Velero daily backup schedule in config files
- cert-manager networkpolicy, the possibility to configure a custom public repository for the http01 challenge image and the possibility to add an OPA exception for the cert-manager-acmesolver image #593
NOTE: Possible breaking change if OPA policies are enabled
- Added prometheus probes permission for users
- Added the ability to set and choose subdomain of your service endpoints.
- Added backup function for configurations and secrets during
ck8s init
. - Issuers is on by default for wc.
- Removed unnecessary PSPs and RBAC files for wc and sc.
- Check out the upgrade guide for a complete set of instructions needed to upgrade.
- Changed from depricated nfs provisioner to the new one. Migration is automatic (no manual intervention)
- The sc-logs-retention cronjob now runs without error even if no backups were found for automatic removal
- Harbor Swift authentication configuration options has moved from
citycloud
toharbor.persistence.swift
. - The dry-run and apply command now have the options to check against the state of the cluster while ran by using the flags "--sync" and "--kubectl".
- The dex chart is upgraded from stable/dex to dex/dex (v0.3.3). Dex is upgraded to v2.18.1
- cert-manager upgrade from 1.1.0 to 1.4.0.
- Increased slm cpu request slightly
- The
clusterDns
config variable now matches Kubespray defaults. Using the wrong value causes node-local-dns to not be used. - Blackbox-exporter now ignores checking the harbor endpoint if harbor is disabled.
- Kube-prometheus-stack are now being upgraded from 12.8.0 to 16.6.1 to fix dashboard errors. Grafana 8.0.1 and Prometheus 2.27.1.
- "serviceMonitor/" have been added to all prometheus targets in our tests to make them work
- The openid url port have been changed from 32000 to 5556 to match the current setup.
- sc-log-rentention fixed to delete all logs within a 5 second loop.
- Fixed issue where curator would fail if postgres retention was enabled
- Option to set cluster admin groups
- Configuration option
dex.additionalStaticClients
insecrets.yaml
can now be used to define additional static clients for Dex. - ck8s providers command
- ck8s flavors command
- Added script to make it easier to generate secrets
- The configuration option
global.cloudProvider
is no longer needed.
- Support for multiple connectors for dex and better support for OIDC groups.
- Check out the upgrade guide for a complete set of instructions needed to upgrade.
- The project now requires
helm-diff >= 3.1.2
. Remove the old one (viarm -rf ~/.local/share/helm/plugins/helm-diff/
), before reinstalling dependencies.
- A new helm chart
starboard-operator
, which createsvulnerabilityreports
with information about image vulnerabilities. - Dashboard in Grafana showcasing image vulnerabilities.
- Added option to enable dex integration for ops grafana
- Added resource request/limits for ops grafana
- Added support for admin group for harbor
- Rook monitoring (ServiceMonitor and PrometheusRules) and dashboards.
- Changed the way connectors are provided to dex
- Default retention values for other* and authlog* are changed to fit the needs better
- CK8S version validation accepts version number if exactly at the release tag, otherwise commit hash of current commit. "any" can still be used to disable validation.
- The node-local-dns chart have been updated to match the upstream manifest. force_tcp have been removed to improve performence and the container image have beve been updated from 1.15.10 to 1.17.0.
- Fixed issue where you couldn't configure dex google connector to support groups
- Fixed issue where groups wouldn't be fetched for kubelogin
- Fixed issue where grafana would get stuck on upgrade
- Rook monitor for the alertmanagers is no longer hard-coded to true.
- Only install rbac for user alertmanager if it's enabled.
- Convert all values to integers for elasticsearch slm cronjob
- The script for generating a user kubeconfig is now
bin/ck8s kubeconfig user
(frombin/ck8s user-kubeconfig
) - Harbor have been updated to v2.2.1.
- Use update strategy
Recreate
instead ofRollingUpdate
for Harbor components.
- When using harbor together with rook there is a potential bug that appears if the database pod is killed and restarted on a new node. This is fixed by upgrading the Harbor helm chart to version 1.6.1.
- The command
team-add
for adding new PGP fingerprints no longer crashes when validating some environment variables.
- Authlog now indexed by elasticsearch
- Added a ClusterRoleBinding for using an OIDC-based cluster admin kubeconfig and a script for generating such a kubeconfig (see
bin/ck8s kubeconfig admin
) - S3-exporter for collecting metrics about S3 buckets.
- Dashboard with common things to check daily, e.g. object storage usage, Elasticsearch snapshots and InfluxDB database sizes.
- Removed the functionality to automatically restore InfluxDB and Grafana when running
bin/ck8s apply
. The config values controlling this (restore.*
) no longer have any effect and can be safely removed.
- Script to restore Harbor from backup
- Elasticsearch slm now deletes excess snapshots also when none of them are older than the maximum age
- The Service Cluster Prometheus now alerts based on Falco metrics. These alerts are sent to Alertmanager as usual so they now have the same flow as all other alerts. This is in addition to the "Falco specific alerting" through Falco sidekick.
- Elasticsearch slm now deletes indices in bulk
- Default to object storage disabled for the dev flavor.
- Removed namespace
gatekeeper
from bootstrap. The namespace can be safely removed from clusters running ck8s v0.13.0 or later.
- Elasticsearch SLM retention value conversion bug
- FluentId logs stop being shipped to S3
- Increased default active deadline for the slm job from 5 to 10 minutes
- Updated the release documentation
- ClusterIssuers are used instead of Issuers. Administrators should be careful regarding the use of ClusterIssuers in workload clusters, since users will be able to use them and may cause rate limits.
- Check out the upgrade guide for a complete set of instructions needed to upgrade.
- NetworkPolicy dashboard in Grafana
- Added a new helm chart
calico-accountant
- Clean-up scripts that can remove compliantkubernetes-apps from a cluster
- ClusterIssuers are used instead of Issuers
- Persistent volumes for prometheus are now optional (disabled by default)
- Updated velero chart and its CRDs to 2.15.0 (velero 1.5.0)
- Updated fluentd forwarder config to always include
s3_region
- Updated gatekeeper to v3.3.0 and it now uses the official chart.
- Tweaked config default value for disabled option
- Removed label
certmanager.k8s.io/disable-validation
from cert-manager namespace - Removed leftover default tolerations config for
ingress-nginx
. - Removed unsed config option
objectStorage.s3.regionAddress
.
- Fixed service cluster log retention using the wrong service account.
- Fixed upgrade of user Grafana.
- Bumped
helm
tov3.5.2
. - Bumped
kubectl
tov1.19.8
. - Bumped
helmfile
tov0.138.4
.
- Fluentd prometheus metrics.
- Possibility to disable metrics server
=======
- With the update of the opendistro helm chart you can now decide whether or not you want dedicated deployments for data and client/ingest nodes.
By setting
elasticsearch.dataNode.dedicatedPods: false
andelasticsearch.clientNode.dedicatedPods: false
, the master node statefulset will assume all roles. - Ck8sdash has been deprecated and will be removed when upgrading. Some resources like it's namespace will have to be manually removed.
- Check out the upgrade guide for a complete set of instructions needed to upgrade.
- Several new dashboards for velero, nginx, gatekeeper, uptime of services, and kubernetes status.
- Metric scraping for nginx, gatekeeper, and velero.
- Check for Harbor endpoint in the blackbox exporter.
- The falco dashboard has been updated with a new graph, multicluster support, and a link to kibana.
- Changed path that fluentd looks for kubernetes audit logs to include default path for kubespray.
- Opendistro helm chart updated to 1.12.0.
- Options to disable dedicated deployments for elasticsearch data and client/ingest nodes.
- By default, no storageclass is specified for elasticsearch, meaning it'll consume whatever is cluster default.
- Updated elasticsearch config in dev-flavor. Now the deployment consists of a single master/data/client/ingest node.
- Fixed issue with adding annotation to bootstrap namespace chart
- Ck8sdash.
- Removed unused config
global.environmentName
and addedglobal.clusterName
to migrate there's this script. - To udate the password for
user-alertmanager
you'll have to re-install the chart. - With the replacement of the helm chart
stable/elasticsearch-exporter
toprometheus-community/prometheus-elasticsearch-exporter
, it is required to manually execute some steps to upgrade. - Configuration regarding backups (in general) and harbor storage have been changed and requires running init again. If
harbor.persistence.type
equalss3
orgcs
in your config you must update it toobjectStorage
. - With the removal of
scripts/post-infra-common.sh
you'll now have to, if enabled, manually set the address to the nfs server innfsProvisioner.server
- The cert-manager CustomResourceDefinitions has been upgraded to
v1
, see API reference docs. It is advisable that you update your resources tov1
in the near future to maintain functionality. - The cert-manager letsencrypt issuers have been updated to the
v1
API and the oldletsencrypt
releases must be removed before upgrading. - To get some of the new default values for resource requests on Harbor pods you will first need to remove the resource requests that you have in your Harbor config and then run
ck8s init
to get the new values. - Check out the upgrade guide for a complete set of instructions needed to upgrade.
- Dex configuration to accept groups from Okta as an OIDC provider
- Added record
cluster.name
in all logs to elasticsearch that matches the cluster settingglobal.clusterName
- Role mapping from OIDC groups to roles in user grafana
- Configuration options regarding resources/tolerations for prometheus-elasticsearch-exporter
- Options to disable different types of backups.
- Harbor image storage can now be set to
filesystem
in order to use persistent volumes instead of object storage. - Object storage is now optional. There is a new option to set object storage type to
none
. If you disable object storage, then you must also disable any feature that uses object storage (mostly all backups). - Two new InfluxDB users to used by prometheus for writing metrics to InfluxDB.
- Multicluster support for some dashboards in Grafana.
- More config options for falco sidekick (tolerations, resources, affinity, and nodeSelector)
- Option to configure serviceMonitor for elasticsearch exporter
- Option to add more redirect URIs for the
kubelogin
client in dex. - Option to disable the creation of user namespaces (RBAC will still be created)
- The possibility to configure resources, affinity, tolerations, and nodeSelector for all Harbor pods.
- Updated user grafana chart to 6.1.11 and app version to 7.3.3
- The
stable/elasticsearch-exporter
helm chart has been replaced byprometheus-community/prometheus-elasticsearch-exporter
- OIDC group claims added to Harbor
- The options
s3
andgcs
forharbor.persistence.type
have been replaced withobjectStorage
and will then match the type set in the global object storage configuration. - Bump kubectl to v1.18.13
- InfluxDB is now exposed via ingress.
- Prometheus in workload cluster now pushes metrics directly to InfluxDB.
- The prometheus release
wc-scraper
has been renamed towc-reader
. Now wc-reader only reads from the workload_cluster database in InfluxDB. - InfluxDB helm chart upgraded to
4.8.11
. kube-prometheus-stack
updated to version12.8.0
.- Bump prometheus to
2.23.0
. - Added example config for Kibana group mapping from an OIDC provider
- Replaced
kiwigrid/fluentd-elasticsearch
helm chart withkokuwa/fluentd-elasticsearch
. - Replaced
stable/fluentd
helm chart withbitnami/fluentd
. - StorageClasses are now enabled/disabled in the
{wc,sc}-cofig.yaml
files. - Mount path and IP/hostname is now configurable in
nfs-client-provisioner
. - Upgraded
cert-manager
to1.1.0
. - Moved the
bootstrap/letsencrypt
helm chart to the apps step and renamed it toissuers
. The issuers are now installed after cert-manager. You can now select which namespaces to install the letsencrypt issuers. - Helm upgraded to
v3.5.0
. - InfluxDB upgraded to
v4.8.12
. - Resource requests/limits have been updated for all Harbor pods.
- Wrong password being used for user-alertmanager.
- Retention setting for wc scraper always overriding the user config and being set to 10 days.
- Blackbox exporter checks kibana correctly
- Removed duplicate enforcement config for OPA from wc-config
- Inavlid apiKey field being used in opsgenie config.
- The following helm release has been deprecated and will be uninstalled when upgrading:
wc-scraper
prometheus-auth
wc-scraper-alerts
fluentd-aggregator
- Helm chart
basic-auth-secret
has been removed. - Unused config option
dnsPrefix
. - Removed
scripts/post-infra-common.sh
file. - The image scanner Clair in Harbor, image scanning is done by the scanner Trivy
Note: This upgrade will cause disruptions in some services, including the ingress controller! See the complete migration guide for all details.
You may get warnings about missing values for some fluentd options in the Workload cluster. This can be disregarded.
- Helm has been upgraded to v3.4.1. Please upgrade the local binary.
- The Helm repository
stable
has changed URL and has to be changed manually:helm repo add "stable" "https://charts.helm.sh/stable" --force-update
- The blackbox chart has a changed dependency URL and has to be updated manually:
cd helmfile/charts/blackbox && helm dependency update
- Configuration changes requires running init again to get new default values.
- Run the following migration script to update the object storage configuration:
migration/v0.7.x-v0.8.x/migrate-object-storage.sh
- Some configuration options must be manually updated. See the complete migration guide for all details
- A few applications require additional steps. See the complete migration guide for all details
- Configurable persistence size in Harbor
any
can be used as configuration version to disabled version check- Configuration options regarding pod placement and resources for cert-manager
- Possibility to configure pod placement and resourcess for velero
- Add
./bin/ck8s ops helm
to allow investigating issues betweenhelmfile
andkubectl
. - Allow nginx config options to be set in the ingress controller.
- Allow user-alertmanager to be deployed in custom namespace and not only in
monitoring
. - Support for GCS
- Backup retention for InfluxDB.
- Add Okta as option for OIDC provider
- The
stable/nginx-ingress
helm chart has been replaced byingress-nginx/ingress-nginx
- Configuration for nginx has changed from
nginxIngress
toingressNginx
- Configuration for nginx has changed from
- Harbor chart has been upgraded to version 1.5.1
- Helm has been upgraded to v3.4.1
- Grafana has been updated to a new chart repo and bumped to version 5.8.16
- Bump
kubectl
to 1.17.11 - useRegionEndpoint moved to fluentd conf.
- Dex application upgraded to v2.26.0
- Dex chart updated to v2.15.2
- The issuer for the user-alertmanager ingress is now taken from
global.issuer
. - The
stable/prometheus-operator
helm chart has been replaced byprometheus-community/kube-prometheus-stack
- InfluxDB helm chart upgraded to
4.8.9
- Rework of the InfluxDB configuration.
- The sized based retention for InfluxDB has been lowered in the dev flavor.
- Bump opendistro helm chart to
1.10.4
. - The configuration for the opendistro helm chart has been reworked.
Check the release notes for more information on replaces and removed options.
One can now for example configure:
- Role and subject key for OIDC
- Tolerations, affinity, nodeSelecor, and resources for most components
- Additional opendistro security roles, ISM policies, and index templates
- OIDC is now enabled by default for elasticsearch and kibana when using the prod flavor
- The user fluentd configuration uses its dedicated values for tolerations, affinity and nodeselector.
- The wc fluentd tolerations and nodeSelector configuration options are now only specified in the configuration file.
- Helmfile install error on
user-alertmanager
whenuser.alertmanager.enabled: true
. - The wrong job name being used for the alertmanager rules in wc when
user.alertmanager.enabled: true
. - Commented lines in
secrets.yaml
, showing whichobjectStorage
values need to be set, now appear when runningck8s init
.
- Broken OIDC configuration for the ops Grafana instance has been removed.
- Unused alertmanager retention configuration from workload cluster
- Configuration for the certificate issuers has been changed and requires running the migration script.'
- Remove
alerts.opsGenieHeartbeat.enable
andalerts.opsGenieHeartbeat.enabled
from your config filesc-config.yaml
. - Run
ck8s init
again to update your config files with new options (after checking out v0.7.0). - Update your
yq
binary to version3.4.1
.
- Support for providing certificate issuer manifests to override default issuers.
- Configurable extra role mappings in Elasticsearch
- Added falco exporter to workload cluster
- Falco dashboard added to Grafana
- Config option to disable redirection when pushing to Harbor image storage.
- Configuration value
global.certType
has been replaced withglobal.issuer
andglobal.verifyTls
. - Certificate issuer configuration has been changed from
letsencrypt
toissuers.letsencrypt
and extended to support more issuers. - Explicitly disabled multitenancy in Kibana.
- Cloud provider dependencies are removed from the templates, instead, keys are added to the sc|wc-config.yaml by the init script so no more "hidden" config. This requires a re-run of ck8s init or manully adding the missing keys.
- Version of
yq
have been updated to3.4.1
.
- Kibana OIDC logout not redirecting correctly.
- Getting stuck at selecting tenant when logging in to Kibana.
- Typo in elasticsearch slm config for the schedule.
- Pushing images to Harbor on Safespring
- Typo in Alertmanager config regarding connection to Opsgenie heartbeat
- The old config format of bashscripts will no longer be supported. All will need to use the yaml config instead. The scripts in
migration/v0.5.x-0.6.x
can be used to migrate current config files. - The new Opendistro for Elasticsearch version requires running the steps in the migration document.
-
The
ENABLE_PSP
config option has been removed and it needs to be removed fromconfig.sh
before upgrading. See more extensive migration instructions here. -
CK8S_ADDITIONAL_VALUES
is now deprecated and no longer supported. Everything needed can now be set as values in config files. -
All bash and env config files have been replaced to yaml config.
-
Before upgrading, add the
CK8S_FLAVOR
variable to yourconfig.sh
. It can be set to eitherdev
orprod
and will impact the validation. For example, theprod
flavor will require a value forOPSGENIE_HEARTBEAT_NAME
. If you want to keep the current behavior (no new requirements for validation) set the value todev
. -
Add
logRetention.days
to yoursc-config.yaml
to specify retention period in days for service cluster logs. -
To updatge apps to use the 'user' rather than 'customer' text you will need to destroy the customer-rbac chart first ./ck8s ops helmfile wc -l app=customer-rbac destroy Also note the existing config for clusters must be changed manually to migrate from customers to users
-
Add
letsencrypt.prod.email
andletsencrypt.staging.email
to yoursc-config.yaml
andwc-config.yaml
to specify email addresses to be used for letsencrypt production certificate issuers and staging ("fake") certificate issuers, respectively. Additionally, old issuers must be deleted beforeck8s bootstrap
is run; they can be deleted by running the migration scriptremove-old-issuers.bash
.
- Yq is upgraded to v3.3.2
- Customer kubeconfig is no longer created automatically, and has to be created using the
ck8s user-kubeconfig
command. - Storageclasses are installed as part of
ck8s bootstrap
instead of together with other applications. ck8s apply
now runs the stepsck8s bootstrap
andck8s apps
.- Namespaces and Issuers are installed as part of
ck8s bootstrap
instead of together with other applications. blackbox-exporter
uses ingress for health checking workload cluster kube api- Renamed the flavors:
default
->dev
,ha
->prod
. - Group alerts by
severity
label. - Pipeline is now prevented from being run if only .md files have been modified.
- When pulling code from
ck8s-cluster
, the pipeline targets the branch/tag[email protected]
instead ofcluster
- Upgraded
helmfile
to version v0.129.3 - Removed
hook-failed
from opendistro helm hook deletion policy - Upgraded ck8s-dash to v0.3.2
- By default, ES snapshots are now taken every 2 hours
- Continue on error in pipeline install apps steps
jq
upgraded to1.6
- Update ck8sdash helm chart values file to use the correct index for kubecomponents logs
- References to customer changed to user
- Helm is upgraded to v3.3.4
- Opendistro for Elasticsearch is updated to v1.10.1
- Falco chart updated to v1.5.2
- Falco image tag updated to v0.26.1
- Added
ck8s validate (sc|wc)
to the cli. This command can be run to validate your config. - Helm secrets as a requirement.
- InfluxDB metric retention size limit for each cluster is now configurable.
- InfluxDB now uses a persistent volume during the backup process.
- Added
ck8s bootstrap (sc|wc)
to the CLI to bootstrap clusters before installing applications. - Added
ck8s apps (sc|wc)
to the CLI to install applications. - CRDs are installed in the bootstrap stage.
- Kibana SSO with oidc and dex.
- Namespaces and Issuers are installed in bootstrap
- S3 region to influxdb backup credentials.
- Kube apiserver /healthz endpoint is exposed through nginx with basic auth.
- Added alerts for endpoints monitored by blackbox.
- The flavors now include separate defaults for
config.sh
. - Set opsgenie priority based on alert severity.
- New ServiceMonitor scrapes data from cert-manager.
- Cronjob for service cluster log backup retention.
- InfluxDB volume size is now configurable
- In the pipeline the helm relese statuses are listed and k8s objects are printed in the apps install steps.
- Alertmanager now generates alerts when certificates are about to expire (< 20 days) or if they are invalid.
- Letsencrypt email addresses are now configurable.
- Fixed syntax in the InfluxDB config
- Elasticsearch eating up node diskspace most likely due to a bug in the performance_analyzer plugin.
- InfluxDB database retention variables are now used.
- InfluxDB backups are automatically removed after 7 days.
CK8S_ADDITIONAL_VALUES
is now deprecated and no longer supported. Everything needed can now be set as values in config files.set-storage-class.sh
is removed. The storage class can now be set as a value directly in the config instead.- Elasticsearch credentials from ck8sdash.
- Broken elasticsearch api key creation from ck8sdash.
- The
ENABLE_PSP
config value is removed. "Disabling" has to be done by creating a permissive policy instead.
First release of the application installer for Compliant Kubernetes.
The application installer will both install and configure applications forming the Compliant Kubernetes on top of existing Kubernetes clusters.