Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update move_current_la to only move dataset specific data #127

Open
patrick-troy opened this issue Nov 5, 2024 · 0 comments
Open

Update move_current_la to only move dataset specific data #127

patrick-troy opened this issue Nov 5, 2024 · 0 comments
Assignees

Comments

@patrick-troy
Copy link

Currently the move_current_la_sensor will trigger once for each dataset (cin/ssda903) however it does move all files both times which is not the most efficient. It might be worth adjusting this to only move (and remove) one datasets worth of data each time.

Example of how it's inefficient:
Assume this is a fresh platform
cin clean success -> move_current_la -> no workspace current files to delete -> add new cin files to workspace current
ssdsa903 clean success -> move_current_la -> delete cin workspace current files -> add old cin and new ssda903 files to workspace current
ssdsa903 clean success -> move_current_la -> delete cin and ssda903 workspace current files -> add old cin and new ssda903 files to workspace current

This could be adjusted to:
cin clean success -> move_current_la -> no workspace current files to delete -> add new cin files to workspace current
ssdsa903 clean success -> move_current_la -> no ssda903 workspace current files to delete -> add new ssda903 files to workspace current (retain old cin files)
ssdsa903 clean success -> move_current_la -> delete just ssda903 workspace current files -> add new ssda903 files to workspace current (retain old cin files)

We do still want to maintain runs for each data as the concatenate_sensor will lead on from this. The concatenate_sensor works based on dataset (so it runs for both cin and ssda903) and Dagster will only trigger one run for each run_id. So if we convert move_current_la_sensor to just create one run_id for both datasets then the concatenate_sensor will only trigger for one dataset when we need it to trigger for both

@patrick-troy patrick-troy self-assigned this Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant