Skip to content

Technical Overview Guide

Matthew Pugh edited this page Oct 23, 2024 · 2 revisions

The data platform is a way for tools such as LIIA Tools to run in a cloud environment using Dagster. Dagster is a cloud-native data-pipeline orchestrator that is open source and used widely.

The infrastructure, contained in this repo, describes the infrastructure that runs this application. It is broken into two parts:

  • The LA / Hub Instance - This is where users upload data, they're cleaned, pseudonymized, and then made ready to be combined with other data sets
  • The Organisation Instance - This is the account where data is either sent on or made available to download by the organisation. Any combining of pseudonymized data sets is done here.

The LA / Hub Instance

This is a brief overview of the LA / Hub Instance:

architecture-beta
    group aws(cloud)[LA AWS Account]
    group dagster(server)[Dagster] in aws

    service frontend(internet)[Frontend] in aws
    service s3_store(disk)[Data Store Bucket] in aws
    service s3_workspace(disk)[Workspace Bucket] in aws
    service s3_shared_space(disk)[Shared Bucket] in aws
    service dagster_database(database)[Dagster Database] in aws
    service code_server(server)[Code Server] in dagster
    service daemon(server)[Daemon] in dagster
    service dagit(server)[Dagit] in dagster

    service cognito(internet)[SSO Configuration] in aws
    service azure(internet)[Azure Application]

    azure:B -- T:cognito
    cognito:B -- T:frontend
    frontend:R -- L:s3_store
    s3_store:B -- T:code_server
    code_server:R -- L:s3_workspace
    code_server:B -- T:s3_shared_space
    dagit:R -- L:daemon
    daemon:R -- L:code_server
    dagster_database:T -- B:daemon
    dagster_database:T -- B:dagit


Loading

The Organisation Instance

This is a brief overview of the Organisation Instance. The Shared Bucket in this instance matches up with the Shared Bucket in the LA/Hub Instance and represents the link between the different systems.

architecture-beta
    group aws_org(cloud)[Organisation AWS Account]
    group dagster_org(server)[Organisation Dagster] in aws_org

    service hub_code_server(server)[Hub Code Server] in dagster_org
    service hub_daemon(server)[Daemon] in dagster_org
    service hub_dagit(server)[Dagit] in dagster_org
    service hub_dagster_database(database)[Hub Dagster Database] in aws_org

    service frontend_hub(internet)[Hub Frontend] in aws_org
    service s3_hub_workspace(disk)[Hub Workspace Bucket] in aws_org
    service s3_egress(disk)[Egress Bucket] in aws_org

    service s3_shared_input(disk)[Shared Bucket]

    s3_shared_input:T -- B:hub_code_server
    frontend_hub:R -- L:s3_egress
    hub_dagit:R -- L:hub_daemon
    hub_daemon:R -- L:hub_code_server
    hub_code_server:R -- L:s3_hub_workspace
    hub_code_server:B -- T:s3_egress
    hub_dagster_database:B -- T:hub_daemon
    hub_dagster_database:B -- T:hub_dagit
Loading
Clone this wiki locally