Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm Chart Production Guide #4703

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open

Helm Chart Production Guide #4703

wants to merge 60 commits into from

Conversation

hamza-m-masood
Copy link
Contributor

Description

closes: https://github.com/camunda/distribution/issues/344

When should this change go live?

  • This is a bug fix, security concern, or something that needs urgent release support.
  • This is already available but undocumented and should be released within a week.
  • This on a specific schedule and the assignee will coordinate a release with the DevEx team. (apply hold label or convert to draft PR)
  • This is part of a scheduled alpha or minor. (apply alpha or minor label)
  • There is no urgency with this change and can be released at any time.

PR Checklist

  • My changes are for an already released minor and are in /versioned_docs directory.
  • My changes are for the next minor and are in /docs directory (aka /next/).

@hamza-m-masood hamza-m-masood added hold This issue is parked, do not merge. target:8.7 Issues included in the 8.7 release labels Dec 3, 2024
@hamza-m-masood hamza-m-masood self-assigned this Dec 3, 2024
Copy link
Contributor

github-actions bot commented Dec 3, 2024

👋 🤖 🤔 Hello, @hamza-m-masood! Did you make your changes in all the right places?

These files were changed only in docs/. You might want to duplicate these changes in versioned_docs/version-8.6/.

  • docs/self-managed/operational-guides/assets/smarch.jpg
  • docs/self-managed/operational-guides/assets/smarch.pdf
  • docs/self-managed/operational-guides/production-guide.md

You may have done this intentionally, but we wanted to point it out in case you didn't. You can read more about the versioning within our docs in our documentation guidelines.

@hamza-m-masood
Copy link
Contributor Author

@conceptualshark, I wanted to add a link to the OpenSearch IRSA guide, but I am having a problem with the docs. I opened an issue here: #4754

@hamza-m-masood hamza-m-masood added the deploy Stand up a temporary docs site with this PR label Dec 12, 2024
@github-actions github-actions bot temporarily deployed to camunda-docs December 12, 2024 12:00 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs December 12, 2024 15:47 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs December 12, 2024 20:02 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs December 16, 2024 08:34 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs December 16, 2024 09:06 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs December 16, 2024 09:27 Destroyed

## Camunda Core Configuration

At this point you are able to connect to your platform through HTTPS, correctly authenticate users using AWS Simple Active Directory, and have connected to external databases such as Amazon OpenSearch and Amazon PostgreSQL. The logical next step is to focus on the Camunda application-specific configurations suitable for a production environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest rephrasing it to in a hypothetical sense since afaik the user shouldn't be running helm upgrade every time, right?

... you would be able ...

Copy link
Member

@Langleu Langleu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving my review so far already.
I've only come halfway so far and would continue tomorrow to read through the rest.

Copy link
Member

@Langleu Langleu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part and overall impressions:

I'm not entirely sure about the focus on AWS.

You don't seem to leverage much that's specifically AWS-related, apart from possibly the Identity Provider topic. This could just be kept general, or as Jesse suggested, you could simply reference the page you already have.

Beyond that, the content could easily be generalized and doesn't necessarily need AWS-specific details. AWS services could be an example, especially with the guides we have.

The reason I bring this up is that it raises a question for customers: what is the value of this compared to the AWS-specific guides we already provide?

The AWS guide provides a clear, step-by-step approach that leads to a working solution. In contrast, this page, at least for me, seems to aim for a similar outcome but doesn't fully achieve it, as the aim should be to generalise it actually which is countering a working result. Guiding the customer to production ready configuration. If it was easy to say X is production required, it would be a default of the Helm chart, or?

What I mean is that this page might benefit from focusing more on Helm Chart production readiness configuration and maybe Camunda itself and would likely not result in a working solution as it's just snippets. It could dive deeper into the reasoning behind the recommendations, rather than assuming the audience will accept them without question. Explain why specific suggestions are made, what they achieve, and how they relate to the context of the Helm Chart. Essentially, ask: Why is this recommended? What purpose does it serve? Providing this level of detail would enhance the clarity and usefulness of the guide.

For example, for the Databases, is there anything else we recommend as production, or is there a default already in place, e.g., some environment variables that optimize the connection?

For example, you show how to configure index retention, but if I'm thinking production, then why is it recommended? What does it mean? How do I possibly have to adjust for my own use case, and why?

Similar thoughts about Scaling and performance.

The AWS guide could in the end mention this page as sort of further education on getting production ready. Taking a working setup and tweak your recommendations on top.

Ultimately, that's just my pov and feel free to adjust things the way you see as you're owning the topic.

The TL:DR is:

  • Ask yourself why on each recommendation and put those into words.
  • Don't assume customers will take it at face value.
  • What's the big difference to the AWS guides and unique selling point?
  • Keep it general (removing AWS focus if possible)

I'm on FTO next week, so feel free to dismiss my review worst case and as said just my pov, feel free to do with the comments as you see fit or dismiss.

I think the page has potential and a good trajectory!

maxUnavailable: 1
```

- Version Management: Stay on a stable Camunda and Kubernetes version. Follow Camunda’s release notes for security patches or critical updates. A list of our supported versions Camunda Helm Charts can be found on the [version matrix](https://helm.camunda.io/camunda-platform/version-matrix/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list of our supported versions Camunda Helm Charts

Tricky, the matrix you link to lists every possible Camunda Helm chart and Camunda version there is and afaik the latest currently supported one is 8.3.
My suggestion would be to rephrase away from the supported, possibly just removing it but there may be better alternatives.

@theburi and @conceptualshark https://docs.camunda.io/docs/reference/supported-environments/ shouldn't this mention that our support cycle is 18 months from the time of the first minor release?
It still mentions on the page even Zeebe 1.3.x, 8.2.x which I think are outside of the 18 months, so from my pov not considered supported environments anymore.

Would it make sense to add the Helm chart version to the matrix ?
So you have Design | Automate | Improve | Deploy with Deploy being then the related major version of the Helm chart (e.g. 10.x.y)

maxUnavailable: 1
```

- Version Management: Stay on a stable Camunda and Kubernetes version. Follow Camunda’s release notes for security patches or critical updates. A list of our supported versions Camunda Helm Charts can be found on the [version matrix](https://helm.camunda.io/camunda-platform/version-matrix/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the release notes we could link for example to https://docs.camunda.io/docs/reference/release-notes/

```

- Version Management: Stay on a stable Camunda and Kubernetes version. Follow Camunda’s release notes for security patches or critical updates. A list of our supported versions Camunda Helm Charts can be found on the [version matrix](https://helm.camunda.io/camunda-platform/version-matrix/)
- Secrets should be created prior to installing the Helm Chart so they can be referenced as existing secrets when installing the Helm Chart. In this scenario we are going to auto-generate the secrets. The following can be added to your `production-values.yaml`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering the docs will be part of 8.7, I thought it's mandatory to create secrets prior now?

We solved at like this atm in the aws guide.

Reading this part here, I'm wondering what is the production way? Creating those prior or having them as part of the Helm Chart?
If it's considered production to autogenerate it, why isn't it the default anymore in the helm chart?

Comment on lines +326 to +337
- Make sure to not store any state or important, long term business data in the local file system of the container. A pod is transient, if the pod is restarted then the data will get wiped. It is better to create a volume and volume mount instead. Here is some example configuration for the core component to create persistent storage:

```yaml
core:
extraVolumes:
extraVolumes:
- name: persistent-state
emptyDir: {}
extraVolumeMounts:
- name: persistent-state
mountPath: /mount
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding in what cases would I need this?

My assumption is that the Helm chart automatically does a PVC for me already where the RocksDB houses.

If it's intended to be kept, maybe go first in the direction of Helm chart does PVC already for Zeebe data bla bla rocks db, probably there's a link to somewhere else that explains that further.
And then go into detail of what those additional use cases are where a customer would need additional PVCs.

<!-- This seems very specific to the application. I might remove this: -->
<!-- - Mount Secrets as volumes, not environment variables -->

- It is recommended to set a memory and resource quota for your namespace. Please refer to the [Kubernetes documenation](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-memory-cpu-namespace/) to do so.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this, I'm asking myself why.
Why is it recommended, why should I do this and what's the advantage of this over the pod resources that were talked about a bit before.

- If you have a use case for enabling [Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) then it is recommended to do so.
<!--Maybe link this to customer: https://github.com/ahmetb/kubernetes-network-policy-recipes-->
- It is possible to have a pod security standard that is suitable to the security constraints you might have. This is possible through modifying the Pod Security Admission. Please refer to the [Kubernetes documentation](https://kubernetes.io/docs/concepts/security/pod-security-admission/) in order to do so.
- By default, The Camunda Helm Chart is configured by default to use a read-only root file system for the pod. It is advisable to retain this default setting, and no modifications are required in your `production-values.yaml`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- By default, The Camunda Helm Chart is configured by default to use a read-only root file system for the pod. It is advisable to retain this default setting, and no modifications are required in your `production-values.yaml`.
- By default, the Camunda Helm Chart is configured to use a read-only root file system for the pod. It is advisable to retain this default setting, and no modifications are required in your `production-values.yaml`.

Comment on lines +378 to +395
- It is possible to modify either the `containerSecurityContext` or the `podSecurityContext`. For example, here is a configuration for the core component that can be added to your `production-values.yaml`:

```yaml
podSecurityContext:
runAsNonRoot: true
fsGroup: 1001
seccompProfile:
type: RuntimeDefault

containerSecurityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1001
seccompProfile:
type: RuntimeDefault
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this related to?

Is this a subpoint of disabling privileged containers?

If not, why should I do this and what is it doing?

```

- It is recommended to pull images exclusively from a private registry, such as [Amazon ECR](https://aws.amazon.com/ecr/), rather than directly from Docker Hub. Doing so enhances control over the images, avoids rate limits, and improves performance and reliability. Additionally, you can configure your cluster to pull images only from trusted registries. Tools like [Open Policy Agent](https://blog.openpolicyagent.org/securing-the-kubernetes-api-with-open-policy-agent-ce93af0552c3#3c6e) can be used to enforce this restriction.
- Open Policy Agent can also be used to [whitelist for ingress hostnames](https://www.openpolicyagent.org/docs/latest/kubernetes-tutorial/#4-define-a-policy-and-load-it-into-opa-via-kubernetes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do a single point of Open Policy Agent combining this and the previous trusted sources one as sub points?

  • .... recommended to use private registry ...
  • ... Open Policy Agent ...
    • trusted registry topic
    • whitelist ingress topic

For the whitelist ingress topic, again why should it be done, what's the advantage as to why we recommend it.


### Upgrade and Maintenance

Make sure secrets are not auto-generated on upgrade.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't link my comment from above, but that's exactly what I meant earlier when the auto-generating of the secrets was created. If we already say in the next steps that you should be careful with auto-generated secrets, then it's not a production recommendation.


Here are some points to keep in mind when considering reliability:

- Check node affinity and tolerations. Please refer to the [Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to modify the node affinity and tolerations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the why and maybe an example of how one should taint, node, zone, or something, and what we recommend as criteria.


### Connect External Databases

The next stage of the production setup is configuring databases. To make it easy for testing, the Camunda Helm Chart provides external, dependency Helm Charts for Databases such as Bitnami Elasticsearch Helm Chart and Bitnami PostgresQL Helm Chart. Within a production setting, these dependency charts should be disabled and production databases should be used instead. For example, instead of the Bitnami Elasticsearch dependency chart, we will use Amazon OpenSearch, and instead of the Bitnami PostgreSQL dependency chart, we will use Amazon Aurora PostgreSQL.
Copy link
Member

@bkenez bkenez Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The next stage of the production setup is configuring databases. To make it easy for testing, the Camunda Helm Chart provides external, dependency Helm Charts for Databases such as Bitnami Elasticsearch Helm Chart and Bitnami PostgresQL Helm Chart. Within a production setting, these dependency charts should be disabled and production databases should be used instead. For example, instead of the Bitnami Elasticsearch dependency chart, we will use Amazon OpenSearch, and instead of the Bitnami PostgreSQL dependency chart, we will use Amazon Aurora PostgreSQL.
The next stage of the production setup is configuring databases. To make it easy for testing, the Camunda Helm Chart provides external dependency Helm Charts for Databases such as the Bitnami Elasticsearch Helm Chart and the Bitnami PostgreSQL Helm Chart. Within a production setting, these dependency charts should be disabled, and production databases should be used instead. For example, instead of the Bitnami Elasticsearch dependency chart, we will use Amazon OpenSearch, and instead of the Bitnami PostgreSQL dependency chart, we will use Amazon Aurora PostgreSQL.

Minor grammar corrections are needed to use the definitive article for better flow and consistency within the paragraph.

enabled: false
```

You should only enable the auto-mounting of a service account token when the application explicitly needs access to the Kubernetes API server or you have created a service account with the exact permissions required for the application and bound it to the pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You should only enable the auto-mounting of a service account token when the application explicitly needs access to the Kubernetes API server or you have created a service account with the exact permissions required for the application and bound it to the pod.
You should only enable the auto-mounting of a service account token when the application explicitly needs access to the Kubernetes API server, or you have created a service account with the exact permissions required for the application and bound it to the pod.

Missing comma

Here is the full `production-values.yaml` considering all the above topics.

```yaml
# make sure to the values.yaml in a multinamespace setting and configure console likewise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is unclear to me.
I think there's a missing verb after "make sure to" also, multi-namespace should be hyphenated.

@bkenez
Copy link
Member

bkenez commented Jan 10, 2025

With regards to the document as a whole:
I do think that this would be very useful overall. However, my overall takeaway is that for a production guide, there should be less focus on a specific example installation, but rather should delve further into when and mainly why certain options should or should not be used - like Lars had said above, looking at it from a customer perspective I would want to see reasonings behind why something is recommended so I could deepen my understanding about the deployment as a whole, and if I were to choose not to use recommendations then I would also like to be aware of the consequences of doing so.

That said, definitely nice work; I could only really offer some grammar corrections. 👍

@github-actions github-actions bot temporarily deployed to camunda-docs January 20, 2025 01:16 Destroyed
Before proceeding with the setup, ensure the following requirements are met:

- **Kubernetes Cluster**: A functioning Kubernetes cluster with kubectl access. We are going to use an AWS EKS cluster. Have a look at the following guides:
- [Deploy an EKS cluster with Terraform (advanced)](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[docsInstance.hrefDocsToDocs] Improper link format: 'Deploy an EKS cluster with Terraform (advanced)'. Please specify the file extension.


- **Kubernetes Cluster**: A functioning Kubernetes cluster with kubectl access. We are going to use an AWS EKS cluster. Have a look at the following guides:
- [Deploy an EKS cluster with Terraform (advanced)](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/)
- [Install Camunda 8 on an EKS cluster](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-helm/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[docsInstance.hrefDocsToDocs] Improper link format: 'Install Camunda 8 on an EKS cluster'. Please specify the file extension.

- **DNS Configuration**: Access to configure DNS for your domain to point to the Kubernetes cluster ingress.
- **TLS Certificates**: Obtain valid X.509 certificates for your domain from a trusted Certificate Authority.
- **External Dependencies**: Provision the following external dependencies:
- **Amazon Aurora PostgreSQL**: For persistent data storage required for the Web Modeler component. Have a look at the [Set up the Aurora PostgreSQL module](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/#set-up-the-aurora-postgresql-module) guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[docsInstance.hrefDocsToDocs] Improper link format: 'Set up the Aurora PostgreSQL module'. Please specify the file extension.

- **TLS Certificates**: Obtain valid X.509 certificates for your domain from a trusted Certificate Authority.
- **External Dependencies**: Provision the following external dependencies:
- **Amazon Aurora PostgreSQL**: For persistent data storage required for the Web Modeler component. Have a look at the [Set up the Aurora PostgreSQL module](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-terraform/#set-up-the-aurora-postgresql-module) guide.
- **Amazon OpenSearch**: is used as a datastore for Camunda Orchestration Core components. Have a look at our guide for setting an [OpenSearch domain](/docs/self-managed/setup/deploy/amazon/amazon-eks/eks-eksctl/#4-opensearch-domain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[docsInstance.hrefDocsToDocs] Improper link format: 'OpenSearch domain'. Please specify the file extension.

@github-actions github-actions bot temporarily deployed to camunda-docs January 20, 2025 02:08 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs January 20, 2025 06:40 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs January 20, 2025 07:06 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs January 20, 2025 07:27 Destroyed
@github-actions github-actions bot temporarily deployed to camunda-docs January 20, 2025 12:49 Destroyed
@hamza-m-masood hamza-m-masood removed the deploy Stand up a temporary docs site with this PR label Jan 20, 2025
kubectl create namespace orchestration
```

Within the `management` namespace (Web Modeler and Console), we will install Identity, Console, and all the Web Modeler components. Within the `orchestration` namespace, we will install the Camunda Orchestration Core component, along with Connectors and Optimize importer. We do not have to worry about installing each component separately since that will be handled by the Helm Chart automatically. For more information on the Orchestration Cluster vs Web Modeler and Console, please review this [guide](/docs/self-managed/reference-architecture/#orchestration-cluster-vs-web-modeler-and-console)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[docsInstance.hrefDocsToDocs] Improper link format: 'guide'. Please specify the file extension.

Copy link
Contributor

🧹 Preview environment for this PR has been torn down.

redirectUrl: "https://modeler.camunda.example.com"
```

If you would like some more guidance relating to authentication, refer to the [Connect to an OpenID Connect provider](/docs/self-managed/setup/guides/connect-to-an-oidc-provider/#configuration) guide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[docsInstance.hrefDocsToDocs] Improper link format: 'Connect to an OpenID Connect provider'. Please specify the file extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold This issue is parked, do not merge. target:8.7 Issues included in the 8.7 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants