You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.
Very often when we do simple operations on a variable, the metadata disappears. We need to:
Ensure the metadata is inherited properly (when possible), e.g. if tb["c"] = tb["a"] + tb["b"], the new variable c should have the union of sources and licenses of a and b.
Keep a log of all processing done to a variable, e.g. "variable loaded from table ...", "variable c created as the sum of variables a and b", etc.
I started implementing this logic in this branch (and created a PR). But there's some more work to be done, to ensure the changes are robust, and to include additional logic and features.
I also created an etl branch to test these changes on a simple dataset. We may decide to delete this etl branch in the future if things change significantly.
Once done implementing these features, we would need to ensure that all active ETL steps work without any modification (and check that they don't take much longer to run). To migrate to a workflow where we properly handle metadata and keep a processing log, we could start by adding a default processing log to each variable in ETL, which has 3 entries: "variable loaded from table ...", "data processing", "variable saved to table ...". Then, whenever each step is updated, the code could be refactored to properly build the processing log.
The text was updated successfully, but these errors were encountered:
Very often when we do simple operations on a variable, the metadata disappears. We need to:
tb["c"] = tb["a"] + tb["b"]
, the new variablec
should have the union of sources and licenses ofa
andb
.I started implementing this logic in this branch (and created a PR). But there's some more work to be done, to ensure the changes are robust, and to include additional logic and features.
I also created an etl branch to test these changes on a simple dataset. We may decide to delete this etl branch in the future if things change significantly.
Once done implementing these features, we would need to ensure that all active ETL steps work without any modification (and check that they don't take much longer to run). To migrate to a workflow where we properly handle metadata and keep a processing log, we could start by adding a default processing log to each variable in ETL, which has 3 entries: "variable loaded from table ...", "data processing", "variable saved to table ...". Then, whenever each step is updated, the code could be refactored to properly build the processing log.
The text was updated successfully, but these errors were encountered: