-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate Geoglam & NO2 dataset ingestion #155
Comments
Putting the discovery-items config within s3://<EVENT_BUCKET>/collections/ in the following format: https://github.com/US-GHG-Center/ghgc-data/blob/add/lpdaac-dataset-scheduled-config/ingestion-data/discovery-items/scheduled/emit-ch4plume-v1-items.json will trigger the discovery and subsequent ingestion of the collection items based on the schedule attribute |
mcp-prod will need a new release of airflow to include automated ingestion |
Update: We have decided to run these weekly instead of bi-weekly |
I added the scheduled collection configs from veda-data #177 to mcp-test and mcp-production |
It looks like the uah-staging DAG has run for geoglam and discovered no files (expected). And the DAGs for the NO2 collections are visible but have not run (I need to revisit the configs to see if this is also expected). In mcp-prod the DAGs are present but have not yet run. Question: Do we expect the scheduled ingest setup for SM2A to be the same as it was for MWAA? |
UPDATE: Let's keep this open until I get a chance to add the config to the SM2A /collections bucket because we will be deprecating MWAA |
Configs now in SM2A. We may later update the scheduled job regex to be more restrictive to address a recurring filename pattern change for the geoglam collection #213 |
Description
NO2 (#89) and Geoglam (#167, #173) datasets requires monthly ingestion as new assets are created. This is currently a manual process however should be automated.
veda-data-airflow
has a feature that allows scheduled ingestion by creating dataset specific DAGs. The file must still be transferred to the collection s3 bucket. A json file must be uploaded to the airflow event bucket. Here is an example json:Acceptance Criteria
no2-monthly
andno2-monthly-diff
froms3://covid-eo-data
bucket tos3://veda-data-store-staging
ands3://veda-data-store
using MWAA transfer dagno2-monthly, no2-monthly-diff
) in mwaa event bucket for staging (UAH) and production (MCP)geoglam
) in mwaa event bucket for staging (UAH) and production (MCP)The text was updated successfully, but these errors were encountered: