-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm Chart Production Guide #4703
base: main
Are you sure you want to change the base?
Conversation
👋 🤖 🤔 Hello, @hamza-m-masood! Did you make your changes in all the right places? These files were changed only in docs/. You might want to duplicate these changes in versioned_docs/version-8.6/.
You may have done this intentionally, but we wanted to point it out in case you didn't. You can read more about the versioning within our docs in our documentation guidelines. |
7f6254c
to
aad795a
Compare
@conceptualshark, I wanted to add a link to the OpenSearch IRSA guide, but I am having a problem with the docs. I opened an issue here: #4754 |
|
||
## Camunda Core Configuration | ||
|
||
At this point you are able to connect to your platform through HTTPS, correctly authenticate users using AWS Simple Active Directory, and have connected to external databases such as Amazon OpenSearch and Amazon PostgreSQL. The logical next step is to focus on the Camunda application-specific configurations suitable for a production environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest rephrasing it to in a hypothetical sense since afaik the user shouldn't be running helm upgrade every time, right?
... you would be able ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving my review so far already.
I've only come halfway so far and would continue tomorrow to read through the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second part and overall impressions:
I'm not entirely sure about the focus on AWS.
You don't seem to leverage much that's specifically AWS-related, apart from possibly the Identity Provider topic. This could just be kept general, or as Jesse suggested, you could simply reference the page you already have.
Beyond that, the content could easily be generalized and doesn't necessarily need AWS-specific details. AWS services could be an example, especially with the guides we have.
The reason I bring this up is that it raises a question for customers: what is the value of this compared to the AWS-specific guides we already provide?
The AWS guide provides a clear, step-by-step approach that leads to a working solution. In contrast, this page, at least for me, seems to aim for a similar outcome but doesn't fully achieve it, as the aim should be to generalise it actually which is countering a working result. Guiding the customer to production ready configuration. If it was easy to say X is production required, it would be a default of the Helm chart, or?
What I mean is that this page might benefit from focusing more on Helm Chart production readiness configuration and maybe Camunda itself and would likely not result in a working solution as it's just snippets. It could dive deeper into the reasoning behind the recommendations, rather than assuming the audience will accept them without question. Explain why specific suggestions are made, what they achieve, and how they relate to the context of the Helm Chart. Essentially, ask: Why is this recommended? What purpose does it serve? Providing this level of detail would enhance the clarity and usefulness of the guide.
For example, for the Databases, is there anything else we recommend as production, or is there a default already in place, e.g., some environment variables that optimize the connection?
For example, you show how to configure index retention, but if I'm thinking production, then why is it recommended? What does it mean? How do I possibly have to adjust for my own use case, and why?
Similar thoughts about Scaling and performance.
The AWS guide could in the end mention this page as sort of further education on getting production ready. Taking a working setup and tweak your recommendations on top.
Ultimately, that's just my pov and feel free to adjust things the way you see as you're owning the topic.
The TL:DR is:
- Ask yourself why on each recommendation and put those into words.
- Don't assume customers will take it at face value.
- What's the big difference to the AWS guides and unique selling point?
- Keep it general (removing AWS focus if possible)
I'm on FTO next week, so feel free to dismiss my review worst case and as said just my pov, feel free to do with the comments as you see fit or dismiss.
I think the page has potential and a good trajectory!
maxUnavailable: 1 | ||
``` | ||
|
||
- Version Management: Stay on a stable Camunda and Kubernetes version. Follow Camunda’s release notes for security patches or critical updates. A list of our supported versions Camunda Helm Charts can be found on the [version matrix](https://helm.camunda.io/camunda-platform/version-matrix/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A list of our supported versions Camunda Helm Charts
Tricky, the matrix you link to lists every possible Camunda Helm chart and Camunda version there is and afaik the latest currently supported one is 8.3.
My suggestion would be to rephrase away from the supported, possibly just removing it but there may be better alternatives.
@theburi and @conceptualshark https://docs.camunda.io/docs/reference/supported-environments/ shouldn't this mention that our support cycle is 18 months from the time of the first minor release?
It still mentions on the page even Zeebe 1.3.x, 8.2.x which I think are outside of the 18 months, so from my pov not considered supported environments anymore.
Would it make sense to add the Helm chart version to the matrix ?
So you have Design | Automate | Improve | Deploy
with Deploy being then the related major version of the Helm chart (e.g. 10.x.y)
maxUnavailable: 1 | ||
``` | ||
|
||
- Version Management: Stay on a stable Camunda and Kubernetes version. Follow Camunda’s release notes for security patches or critical updates. A list of our supported versions Camunda Helm Charts can be found on the [version matrix](https://helm.camunda.io/camunda-platform/version-matrix/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the release notes we could link for example to https://docs.camunda.io/docs/reference/release-notes/
``` | ||
|
||
- Version Management: Stay on a stable Camunda and Kubernetes version. Follow Camunda’s release notes for security patches or critical updates. A list of our supported versions Camunda Helm Charts can be found on the [version matrix](https://helm.camunda.io/camunda-platform/version-matrix/) | ||
- Secrets should be created prior to installing the Helm Chart so they can be referenced as existing secrets when installing the Helm Chart. In this scenario we are going to auto-generate the secrets. The following can be added to your `production-values.yaml`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering the docs will be part of 8.7, I thought it's mandatory to create secrets prior now?
We solved at like this atm in the aws guide.
Reading this part here, I'm wondering what is the production way? Creating those prior or having them as part of the Helm Chart?
If it's considered production to autogenerate it, why isn't it the default anymore in the helm chart?
- Make sure to not store any state or important, long term business data in the local file system of the container. A pod is transient, if the pod is restarted then the data will get wiped. It is better to create a volume and volume mount instead. Here is some example configuration for the core component to create persistent storage: | ||
|
||
```yaml | ||
core: | ||
extraVolumes: | ||
extraVolumes: | ||
- name: persistent-state | ||
emptyDir: {} | ||
extraVolumeMounts: | ||
- name: persistent-state | ||
mountPath: /mount | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own understanding in what cases would I need this?
My assumption is that the Helm chart automatically does a PVC for me already where the RocksDB houses.
If it's intended to be kept, maybe go first in the direction of Helm chart does PVC already for Zeebe data bla bla rocks db, probably there's a link to somewhere else that explains that further.
And then go into detail of what those additional use cases are where a customer would need additional PVCs.
<!-- This seems very specific to the application. I might remove this: --> | ||
<!-- - Mount Secrets as volumes, not environment variables --> | ||
|
||
- It is recommended to set a memory and resource quota for your namespace. Please refer to the [Kubernetes documenation](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-memory-cpu-namespace/) to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, I'm asking myself why
.
Why is it recommended, why should I do this and what's the advantage of this over the pod resources that were talked about a bit before.
- If you have a use case for enabling [Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) then it is recommended to do so. | ||
<!--Maybe link this to customer: https://github.com/ahmetb/kubernetes-network-policy-recipes--> | ||
- It is possible to have a pod security standard that is suitable to the security constraints you might have. This is possible through modifying the Pod Security Admission. Please refer to the [Kubernetes documentation](https://kubernetes.io/docs/concepts/security/pod-security-admission/) in order to do so. | ||
- By default, The Camunda Helm Chart is configured by default to use a read-only root file system for the pod. It is advisable to retain this default setting, and no modifications are required in your `production-values.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- By default, The Camunda Helm Chart is configured by default to use a read-only root file system for the pod. It is advisable to retain this default setting, and no modifications are required in your `production-values.yaml`. | |
- By default, the Camunda Helm Chart is configured to use a read-only root file system for the pod. It is advisable to retain this default setting, and no modifications are required in your `production-values.yaml`. |
- It is possible to modify either the `containerSecurityContext` or the `podSecurityContext`. For example, here is a configuration for the core component that can be added to your `production-values.yaml`: | ||
|
||
```yaml | ||
podSecurityContext: | ||
runAsNonRoot: true | ||
fsGroup: 1001 | ||
seccompProfile: | ||
type: RuntimeDefault | ||
|
||
containerSecurityContext: | ||
allowPrivilegeEscalation: false | ||
privileged: false | ||
readOnlyRootFilesystem: true | ||
runAsNonRoot: true | ||
runAsUser: 1001 | ||
seccompProfile: | ||
type: RuntimeDefault | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this related to?
Is this a subpoint of disabling privileged containers?
If not, why should I do this and what is it doing?
``` | ||
|
||
- It is recommended to pull images exclusively from a private registry, such as [Amazon ECR](https://aws.amazon.com/ecr/), rather than directly from Docker Hub. Doing so enhances control over the images, avoids rate limits, and improves performance and reliability. Additionally, you can configure your cluster to pull images only from trusted registries. Tools like [Open Policy Agent](https://blog.openpolicyagent.org/securing-the-kubernetes-api-with-open-policy-agent-ce93af0552c3#3c6e) can be used to enforce this restriction. | ||
- Open Policy Agent can also be used to [whitelist for ingress hostnames](https://www.openpolicyagent.org/docs/latest/kubernetes-tutorial/#4-define-a-policy-and-load-it-into-opa-via-kubernetes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe do a single point of Open Policy Agent combining this and the previous trusted sources one as sub points?
- .... recommended to use private registry ...
- ... Open Policy Agent ...
- trusted registry topic
- whitelist ingress topic
For the whitelist ingress topic, again why should it be done, what's the advantage as to why we recommend it.
|
||
### Upgrade and Maintenance | ||
|
||
Make sure secrets are not auto-generated on upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't link my comment from above, but that's exactly what I meant earlier when the auto-generating of the secrets was created. If we already say in the next steps that you should be careful with auto-generated secrets, then it's not a production recommendation.
|
||
Here are some points to keep in mind when considering reliability: | ||
|
||
- Check node affinity and tolerations. Please refer to the [Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to modify the node affinity and tolerations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the why and maybe an example of how one should taint, node, zone, or something, and what we recommend as criteria.
|
||
### Connect External Databases | ||
|
||
The next stage of the production setup is configuring databases. To make it easy for testing, the Camunda Helm Chart provides external, dependency Helm Charts for Databases such as Bitnami Elasticsearch Helm Chart and Bitnami PostgresQL Helm Chart. Within a production setting, these dependency charts should be disabled and production databases should be used instead. For example, instead of the Bitnami Elasticsearch dependency chart, we will use Amazon OpenSearch, and instead of the Bitnami PostgreSQL dependency chart, we will use Amazon Aurora PostgreSQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next stage of the production setup is configuring databases. To make it easy for testing, the Camunda Helm Chart provides external, dependency Helm Charts for Databases such as Bitnami Elasticsearch Helm Chart and Bitnami PostgresQL Helm Chart. Within a production setting, these dependency charts should be disabled and production databases should be used instead. For example, instead of the Bitnami Elasticsearch dependency chart, we will use Amazon OpenSearch, and instead of the Bitnami PostgreSQL dependency chart, we will use Amazon Aurora PostgreSQL. | |
The next stage of the production setup is configuring databases. To make it easy for testing, the Camunda Helm Chart provides external dependency Helm Charts for Databases such as the Bitnami Elasticsearch Helm Chart and the Bitnami PostgreSQL Helm Chart. Within a production setting, these dependency charts should be disabled, and production databases should be used instead. For example, instead of the Bitnami Elasticsearch dependency chart, we will use Amazon OpenSearch, and instead of the Bitnami PostgreSQL dependency chart, we will use Amazon Aurora PostgreSQL. |
Minor grammar corrections are needed to use the definitive article for better flow and consistency within the paragraph.
enabled: false | ||
``` | ||
|
||
You should only enable the auto-mounting of a service account token when the application explicitly needs access to the Kubernetes API server or you have created a service account with the exact permissions required for the application and bound it to the pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should only enable the auto-mounting of a service account token when the application explicitly needs access to the Kubernetes API server or you have created a service account with the exact permissions required for the application and bound it to the pod. | |
You should only enable the auto-mounting of a service account token when the application explicitly needs access to the Kubernetes API server, or you have created a service account with the exact permissions required for the application and bound it to the pod. |
Missing comma
Here is the full `production-values.yaml` considering all the above topics. | ||
|
||
```yaml | ||
# make sure to the values.yaml in a multinamespace setting and configure console likewise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is unclear to me.
I think there's a missing verb after "make sure to" also, multi-namespace should be hyphenated.
With regards to the document as a whole: That said, definitely nice work; I could only really offer some grammar corrections. 👍 |
Before proceeding with the setup, ensure the following requirements are met: | ||
|
||
- **Kubernetes Cluster**: A functioning Kubernetes cluster with kubectl access. We are going to use an AWS EKS cluster. Have a look at the following guides: | ||
- [Deploy an EKS cluster with Terraform (advanced)](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[docsInstance.hrefDocsToDocs] Improper link format: 'Deploy an EKS cluster with Terraform (advanced)'. Please specify the file extension.
|
||
- **Kubernetes Cluster**: A functioning Kubernetes cluster with kubectl access. We are going to use an AWS EKS cluster. Have a look at the following guides: | ||
- [Deploy an EKS cluster with Terraform (advanced)](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/) | ||
- [Install Camunda 8 on an EKS cluster](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-helm/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[docsInstance.hrefDocsToDocs] Improper link format: 'Install Camunda 8 on an EKS cluster'. Please specify the file extension.
- **DNS Configuration**: Access to configure DNS for your domain to point to the Kubernetes cluster ingress. | ||
- **TLS Certificates**: Obtain valid X.509 certificates for your domain from a trusted Certificate Authority. | ||
- **External Dependencies**: Provision the following external dependencies: | ||
- **Amazon Aurora PostgreSQL**: For persistent data storage required for the Web Modeler component. Have a look at the [Set up the Aurora PostgreSQL module](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/#set-up-the-aurora-postgresql-module) guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[docsInstance.hrefDocsToDocs] Improper link format: 'Set up the Aurora PostgreSQL module'. Please specify the file extension.
- **TLS Certificates**: Obtain valid X.509 certificates for your domain from a trusted Certificate Authority. | ||
- **External Dependencies**: Provision the following external dependencies: | ||
- **Amazon Aurora PostgreSQL**: For persistent data storage required for the Web Modeler component. Have a look at the [Set up the Aurora PostgreSQL module](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/#set-up-the-aurora-postgresql-module) guide. | ||
- **Amazon OpenSearch**: is used as a datastore for Camunda Orchestration Core components. Have a look at our guide for setting an [OpenSearch domain](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-eksctl/#4-opensearch-domain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[docsInstance.hrefDocsToDocs] Improper link format: 'OpenSearch domain'. Please specify the file extension.
Co-authored-by: Lars Lange <[email protected]>
kubectl create namespace orchestration | ||
``` | ||
|
||
Within the `management` namespace (Web Modeler and Console), we will install Identity, Console, and all the Web Modeler components. Within the `orchestration` namespace, we will install the Camunda Orchestration Core component, along with Connectors and Optimize importer. We do not have to worry about installing each component separately since that will be handled by the Helm Chart automatically. For more information on the Orchestration Cluster vs Web Modeler and Console, please review this [guide](/docs/self-managed/reference-architecture/#orchestration-cluster-vs-web-modeler-and-console) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Preview environment for this PR has been torn down. |
redirectUrl: "https://modeler.camunda.example.com" | ||
``` | ||
|
||
If you would like some more guidance relating to authentication, refer to the [Connect to an OpenID Connect provider](/docs/self-managed/setup/guides/connect-to-an-oidc-provider/#configuration) guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[docsInstance.hrefDocsToDocs] Improper link format: 'Connect to an OpenID Connect provider'. Please specify the file extension.
Description
closes: https://github.com/camunda/distribution/issues/344
When should this change go live?
hold
label or convert to draft PR)PR Checklist
/versioned_docs
directory./docs
directory (aka/next/
).