-
Notifications
You must be signed in to change notification settings - Fork 0
Technical Overview Guide
Matthew Pugh edited this page Oct 23, 2024
·
2 revisions
The data platform is a way for tools such as LIIA Tools to run in a cloud environment using Dagster. Dagster is a cloud-native data-pipeline orchestrator that is open source and used widely.
The infrastructure, contained in this repo, describes the infrastructure that runs this application. It is broken into two parts:
- The LA / Hub Instance - This is where users upload data, they're cleaned, pseudonymized, and then made ready to be combined with other data sets
- The Organisation Instance - This is the account where data is either sent on or made available to download by the organisation. Any combining of pseudonymized data sets is done here.
This is a brief overview of the LA / Hub Instance:
architecture-beta
group aws(cloud)[LA AWS Account]
group dagster(server)[Dagster] in aws
service frontend(internet)[Frontend] in aws
service s3_store(disk)[Data Store Bucket] in aws
service s3_workspace(disk)[Workspace Bucket] in aws
service s3_shared_space(disk)[Shared Bucket] in aws
service dagster_database(database)[Dagster Database] in aws
service code_server(server)[Code Server] in dagster
service daemon(server)[Daemon] in dagster
service dagit(server)[Dagit] in dagster
service cognito(internet)[SSO Configuration] in aws
service azure(internet)[Azure Application]
azure:B -- T:cognito
cognito:B -- T:frontend
frontend:R -- L:s3_store
s3_store:B -- T:code_server
code_server:R -- L:s3_workspace
code_server:B -- T:s3_shared_space
dagit:R -- L:daemon
daemon:R -- L:code_server
dagster_database:T -- B:daemon
dagster_database:T -- B:dagit
This is a brief overview of the Organisation Instance. The Shared Bucket in this instance matches up with the Shared Bucket in the LA/Hub Instance and represents the link between the different systems.
architecture-beta
group aws_org(cloud)[Organisation AWS Account]
group dagster_org(server)[Organisation Dagster] in aws_org
service hub_code_server(server)[Hub Code Server] in dagster_org
service hub_daemon(server)[Daemon] in dagster_org
service hub_dagit(server)[Dagit] in dagster_org
service hub_dagster_database(database)[Hub Dagster Database] in aws_org
service frontend_hub(internet)[Hub Frontend] in aws_org
service s3_hub_workspace(disk)[Hub Workspace Bucket] in aws_org
service s3_egress(disk)[Egress Bucket] in aws_org
service s3_shared_input(disk)[Shared Bucket]
s3_shared_input:T -- B:hub_code_server
frontend_hub:R -- L:s3_egress
hub_dagit:R -- L:hub_daemon
hub_daemon:R -- L:hub_code_server
hub_code_server:R -- L:s3_hub_workspace
hub_code_server:B -- T:s3_egress
hub_dagster_database:B -- T:hub_daemon
hub_dagster_database:B -- T:hub_dagit