-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify onboarding new participants, including for more major cloud providers #874
Comments
Following discussions with @BenjaminPelletier @BradNicolle and @marcadamsge, the plan to update the DSS deployment approach to support other cloud providers and keep it manageable for InterUSS over time. BackgroundThe deployment of the DSS is currently mostly documented on a README. Kubernetes (K8s) deployment instructions only cover GKE. Tanka is used to generate and configure kubernetes resources. In addition, the DSS codebase is being refactored to require only one container instead of two currently. Most of the complexity lies in getting a kubernetes cluster running for cockroach-db ready to be pooled and the pooling steps. We have undertaken the process of extracting self-contained modules to separate repositories. Finally, we are starting the work to support other cloud providers. Default DSS infrastructureThe DSS is composed of two different services to run (we assume the http-gateway and core-service application as one since refactoring is under way): the DSS API and the cockroach database.
The requirements for the InterUSS standard deployment of the DSS in terms of infrastructure are:
Objectives and change plan overview1. Infrastructure as codeConceptually, the deployment will be broken down in three main categories: Infrastructure: It is responsible for the cloud resources required to run the DSS services. It includes the kubernetes cluster creation, cluster nodes, load balancer and associated fixed IPs, etc. This stage is cloud provider specific. The objective is to support Amazon Web Services (EKS), Azure (AKS), Google (GKE). Services: The ambition is to be cloud provider agnostic for the services part. It will be responsible for managing Kubernetes resources. We will distinguish core services which are the minimal set of services required by the DSS and supporting services, which may be of interest for users wishing to operate the DSS out of the box. Currently, services are deployed using tanka. Tanka provides a templating mechanism to k8s manifests. The second main change proposal is to replace Tanka with Helm [C.2]. In addition to the templating feature, it would offer the benefit of packaging and publishing helm charts so more advanced users can reuse it for their own deployments. Helm is especially well suited for gitops deployments. Helm charts are versioned and can be used to automate upgrade lifecycle. Helm can be published to cloud providers container registries. It supports hooks and testing to allow sequences of operations in upgrades and validation steps. Operations: Diagnostic and utilities operations such as certificates management may be simplified using the deployment manager CLI tool / pod. To keep the learning curve and maintenance burden low, new users should be able to deploy the DSS with knowledge of terraform only. Advanced users running their own infrastructure should be able to deploy the DSS using the Helm Chart directly. 2. New repository structureThis is the opportunity to reorganize the repository structure incrementally to split build and deployment [C.3]. All assets are currently located in the build folder and expect users to work by default in an ignored folder
3. Extract deployment example to a new repositoryTerraform modules, helm charts and the deployment manager CLI can be packaged and published. [C.4] Once those components can be installed from a publicly available registry, an example repository could be created supporting users to work outside the main dss repository for their own deployment. [C.5] 4. Use secret manager to store the generated certificatesCurrently certificates are generated in the repository in an ignored folder. Storing them in a secret manager. [C.7] The following services are available: 5. Automatically test the deploymentOnce the infrastructure and the services can be deployed using infrastructure as code, the pooling procedure of a DSS Region deployment with multi-cloud DSS instances can be added to the CI/CD. [C.6] The pooling procedure will be orchestrated by the deployment manager. This will support committers and contributors to gain confidence on contributions / changes to the deployment procedure and unnoticed changes of cloud providers. Changes summaryPriority 1C.1: Introduction of terraform to manage the infrastructure stage for each cloud provider. Priority 2C.7: Use secret manager to store the certificates. Priority 3C.6: Test in the CI/CD the deployment of a DSS Region with multi-cloud DSS instances and test the pooling procedure. |
* [terraform] #874: terraform module for gcp * Add desired db versions as a variable * Format and simplify commons * Fix variable name consistency * Default crdb_node_count to 3 * Add required crdb_node_count to example definitions * Add /build/workspace to the repository * Add a note about GCP login * Reorganize the files to use composition instead of encapsulation * Move README temporarily to terraform-google-dss * Refactor variables and use module composition for terraform-google-dss * Add utility to manage variables of tf modules * Update variables and example.tfvars * Format * Fix examples * Remove redundant crdb_internal_addresses and adapt make-certs to handle joining cluster * Fix link in readme * Fix link in readme * Update documentation * Fix link in readme * Update deploy/infrastructure/modules/terraform-google-dss/README.md Co-authored-by: Benjamin Pelletier <[email protected]> * Update deploy/infrastructure/modules/terraform-google-dss/README.md Co-authored-by: Benjamin Pelletier <[email protected]> * Update deploy/infrastructure/modules/terraform-google-dss/README.md Co-authored-by: Benjamin Pelletier <[email protected]> * Apply suggestions from code review regarding default values Co-authored-by: Benjamin Pelletier <[email protected]> * Add missing cd as suggested in PR * Update deploy/infrastructure/modules/terraform-google-dss/terraform.example.tfvars Co-authored-by: Benjamin Pelletier <[email protected]> * Address PR comments - Update build/deploy/db_schemas/README.md - Change kubernetes_storage_class.tf to google_kubernetes_storage_class.tf - Add "latest" value to specify default db schema version - Move variables descriptions to TFVARS.md instead of the example file - Improve google_zone documentation and add list of options - Fix us-demo.pem path - Include dummy auth option in the authentication variable documentation * TF format * Add latest value for image variable * Add latest value for image and revert default for kubernetes_namespace * Fail bash script in case of error * Improve instructions * Expose cluster context as outpu * Fix key path * Propose test key by default in example file * Split step 3 in instructions as suggested in PR * Clarify cluster context folder. Co-authored-by: Benjamin Pelletier <[email protected]>
The instructions for bringing up a DSS instance are pretty actionable (complete, clear) currently, but they’re very long and require a fair amount of engineering expertise. We have a tool under development called deployment_manager which should simplify this process substantially, and therefore make deployment of a DSS instance easier
Deployment instructions: https://github.com/interuss/dss/tree/master/build
Deployment tool: https://github.com/interuss/dss/tree/master/monitoring/deployment_manager
The text was updated successfully, but these errors were encountered: