You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the move_current_la_sensor will trigger once for each dataset (cin/ssda903) however it does move all files both times which is not the most efficient. It might be worth adjusting this to only move (and remove) one datasets worth of data each time.
Example of how it's inefficient:
Assume this is a fresh platform
cin clean success -> move_current_la -> no workspace current files to delete -> add new cin files to workspace current
ssdsa903 clean success -> move_current_la -> delete cin workspace current files -> add old cin and new ssda903 files to workspace current
ssdsa903 clean success -> move_current_la -> delete cin and ssda903 workspace current files -> add old cin and new ssda903 files to workspace current
This could be adjusted to:
cin clean success -> move_current_la -> no workspace current files to delete -> add new cin files to workspace current
ssdsa903 clean success -> move_current_la -> no ssda903 workspace current files to delete -> add new ssda903 files to workspace current (retain old cin files)
ssdsa903 clean success -> move_current_la -> delete just ssda903 workspace current files -> add new ssda903 files to workspace current (retain old cin files)
We do still want to maintain runs for each data as the concatenate_sensor will lead on from this. The concatenate_sensor works based on dataset (so it runs for both cin and ssda903) and Dagster will only trigger one run for each run_id. So if we convert move_current_la_sensor to just create one run_id for both datasets then the concatenate_sensor will only trigger for one dataset when we need it to trigger for both
The text was updated successfully, but these errors were encountered:
Currently the move_current_la_sensor will trigger once for each dataset (cin/ssda903) however it does move all files both times which is not the most efficient. It might be worth adjusting this to only move (and remove) one datasets worth of data each time.
Example of how it's inefficient:
Assume this is a fresh platform
cin clean success -> move_current_la -> no workspace current files to delete -> add new cin files to workspace current
ssdsa903 clean success -> move_current_la -> delete cin workspace current files -> add old cin and new ssda903 files to workspace current
ssdsa903 clean success -> move_current_la -> delete cin and ssda903 workspace current files -> add old cin and new ssda903 files to workspace current
This could be adjusted to:
cin clean success -> move_current_la -> no workspace current files to delete -> add new cin files to workspace current
ssdsa903 clean success -> move_current_la -> no ssda903 workspace current files to delete -> add new ssda903 files to workspace current (retain old cin files)
ssdsa903 clean success -> move_current_la -> delete just ssda903 workspace current files -> add new ssda903 files to workspace current (retain old cin files)
We do still want to maintain runs for each data as the concatenate_sensor will lead on from this. The concatenate_sensor works based on dataset (so it runs for both cin and ssda903) and Dagster will only trigger one run for each run_id. So if we convert move_current_la_sensor to just create one run_id for both datasets then the concatenate_sensor will only trigger for one dataset when we need it to trigger for both
The text was updated successfully, but these errors were encountered: