Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope and Plan approach to continuous deletions of MCP MAXAR delivery bucket files after ingestion #367

Closed
krisstanton opened this issue Jun 26, 2024 · 2 comments
Assignees

Comments

@krisstanton
Copy link
Collaborator

krisstanton commented Jun 26, 2024

This is a ticket to capture some of our discussions and longer term plans on how to approach continuous file deletions from the source MAXAR delivery bucket on MCP, post ingest and post validation.

The first Meeting is set for PI Planning week but this ticket will likely make it into Sprint 1.

Scope and Plan approach to Continuous delivery MCP file Deletes after ingestion
Topics to discuss (Draft)

  • Log Sync (Sending Manifests from CBA PROD (NGAP) over to MCP?
  • A Dag or other process to process the manifests or confirm Granules
    • Scope of this process, i.e. are we processing the entire huge manifest every time the process runs? Is there a more efficient way to do this?
  • CMR queries to verify publication (including validating the path to CBA PROD bucket in the CMR Records
    • (There is some existing code that does some of this already -- See OLD NGAP Deletes Task)
  • Other Topics?

Note: Reference to Starter ticket on the Cumulus side of this work: (Take a look at this ticket during the meeting) #328

  • A possible starting point to this discussion could be, what are we doing with the manifests that come in from CBA PROD (NGAP)? Are we processing all of the manifest data, each time this process runs?
  • Don't forget, there is a slight delay after ingestion before the items show up in ORCA,
  • It is notably easier if we have a list of expected granules generated from 'somewhere' (maybe parsing cloud watch data before going to the manifests or direct S3 queries?)
    • Maybe we do a slow running process that attempts to sync the entire manifest once per week?
@krisstanton krisstanton self-assigned this Jun 26, 2024
@krisstanton krisstanton changed the title Scope and Plan approach to Continuous delivery MCP file Deletes after ingestion Scope and Plan approach to continuous deletions of MCP MAXAR delivery bucket files after ingestion Jun 26, 2024
@krisstanton
Copy link
Collaborator Author

krisstanton commented Jul 1, 2024

We had a meeting to discuss how to approach the continuous deletes.
At the end of the meeting, the current approach (from the Cumulus perspective) is there is a DAG which runs on MCP that will check for these 4 verifications

  • Files for a Granule exist in CBA PROD
  • Files for a Granule exist in CBA PROD (ORCA)
  • Granules are published in EarthData CMR
  • There is verification of external metrics data

Once the 4 verifications happen, The Dag will take 2 actions

  • Delete the files for the granule from the MCP Maxar Delivery Bucket
  • Remove only corresponding Granule entry from the correct DynamoDB table where we normally insert file lists and checksums info.

There will also be some involvement of assisting with the logic in the DAG that does the check points.

In the process of updating some of the tickets to reflect this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant