-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction
Welcome to the sf-fons-platform wiki! These pages detail what this project is and how it connects to other repos.
The general idea of the platform is as follows:
flowchart LR
codeserver@{ shape: processes, label: "Code Server" }
org-codeserver@{ shape: processes, label: "Org Code Server" }
frontend@{ shape: lean-l, label: "Frontend (Upload Data)" }
hub-frontend@{ shape: lean-l, label: "Hub Frontend (download Data)" }
subgraph hub
frontend-->codeserver
codeserver-->daemon
daemon-->dagit
end
subgraph org
codeserver-->org-codeserver
org-codeserver-->org-daemon
org-daemon-->org-dagit
org-codeserver-->hub-frontend
end
The Frontend is a simple django application that allows a person to login using SSO and upload files to the application depending on what group membership they have. Further documentation can be found on the repo for this code base.
The Hub Frontend is a simplified version of the Frontend that only allows the organisation to download the resulting files and their error logs (not the original data). This doesn't currently need any restrictions such as group membership to restrict access, but does rely on SSO. Refer to the application for more details.
The Dagster Daemon and Dagit are used to coordinate execution of the data pipeline. There is very little code maintained by us here, but is merely a dockerised version of the dagster libraries that is frozen for use in the application (so we can control rollout of updates) Further information can be found in its own README file)
A code server is merely a set of libraries with instructions for how the daemon should behave. It contains things like schedules, pipelines, sensors etc that the Daemon uses to determine what should be run when. There can be many code servers per installation but we only have one for our own purposes at the moment though it has two definitions: the org and the la. A code server has a defined set of libraries and python versions. If you require something with a conflicting python version, for instance, you'd need to define a new code server.
Once the daemon determines something needs to be done, copies of the code server are spun up as needed to run the processes in parallel before they are then shut off again and the service returns to a resting state.
Please refer to the documentation there for more info.