Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use distributed storage instead of container storage: S3 #892

Closed
spwoodcock opened this issue Oct 10, 2023 · 1 comment
Closed

Use distributed storage instead of container storage: S3 #892

spwoodcock opened this issue Oct 10, 2023 · 1 comment
Assignees
Labels
devops Related to deployment or configuration enhancement New feature or request

Comments

@spwoodcock
Copy link
Member

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

  • Distributed storage via S3.
  • I propose we use Garage as a lightweight alternative to S3 as part of our software stack.
    • For development and self-hosted installs of FMTM we would use Garage.
    • For production deploys on cloud infrastructure, we would use managed S3 buckets (AWS).
    • This needs to be configurable.
  • Then whenever we process a file, it should be uploaded to S3 for later retrieval (by the frontend, another container, etc).

Additional context

Doing this would make it much easier to deploy with something like Kubernetes.

@spwoodcock spwoodcock added enhancement New feature or request devops Related to deployment or configuration labels Oct 10, 2023
@spwoodcock spwoodcock self-assigned this Oct 10, 2023
@spwoodcock
Copy link
Member Author

#908 adds Minio to the compose stack, plus some wrapper functions for accessing the bucket.

Two buckets have been added by default:

  • Basemaps: to store pmtiles, mbtiles, sqlite basemaps.
  • Overlays: to store overlay vector data (outline flatgeobufs).

To use S3:

from app.config import settings
from app.s3 import add_file_to_bucket, add_obj_to_bucket, get_file_from_bucket, get_obj_from_bucket

# Available buckets
basemap_bucket = settings.S3_BUCKET_NAME_BASEMAPS
overlay_bucket = settings.S3_BUCKET_NAME_OVERLAYS

# Upload from filesystem
add_file_to_bucket(bucket_name, file_path, s3_path)

# Upload from python obj (BytesIO)
add_obj_to_bucket(bucket_name, file_obj, s3_path)

# Download to file
get_file_from_bucket(bucket_name, s3_path, file_path)

# Download to python obj (returns BytesIO object)
variable = get_obj_from_bucket(bucket_name, s3_path)

I will test this further and refactor a bit so we use the S3 buckets to store files.
@nrjadkry going forward we should:

  • Use in-memory BytesIO objects whenever we would use /tmp files.
  • For files that need to persist (e.g. basemaps), use S3 files as described above.

We can discuss this on our next FMTM call 😄

@github-project-automation github-project-automation bot moved this from In Progress to Deployed in Field Mapping Tasking Manager (FMTM) Oct 19, 2023
@spwoodcock spwoodcock moved this from Deployed to In Review in Field Mapping Tasking Manager (FMTM) Oct 19, 2023
@spwoodcock spwoodcock moved this from In Review to QA Ready in Field Mapping Tasking Manager (FMTM) Oct 19, 2023
@susmina94 susmina94 moved this from QA Ready to Deployed in Field Mapping Tasking Manager (FMTM) Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops Related to deployment or configuration enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant