Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add configuration for deploying a cloudfront distribution for the veda-backend #229

Merged
merged 10 commits into from
Oct 12, 2023

Conversation

smohiudd
Copy link
Contributor

@smohiudd smohiudd commented Sep 25, 2023

Include Cloudfront Distribution construct in veda-back which should help with the STAC Browser construct feature.

One way to address the API Gateway url not resolving was to use the restapi_id in the url pattern: https://{restapi_id}.execute-api.{region}.amazonaws.com/ to access both the stac and raster apis.

This branch has been test deployed as the path-prefix-veda-backend stack using the delta-backend.xyz domain.

@smohiudd smohiudd marked this pull request as ready for review September 25, 2023 16:09
@smohiudd smohiudd changed the title Include Cloudfront Distribution Construct feat: Add configuration for deploying a cloudfront distribution for the veda-backend Sep 25, 2023
@smohiudd smohiudd changed the title feat: Add configuration for deploying a cloudfront distribution for the veda-backend feat: add configuration for deploying a cloudfront distribution for the veda-backend Sep 25, 2023
Copy link

@emileten emileten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, do have an important broader question regarding our usage of CloudFront though

bucket_name=veda_route_settings.stac_browser_bucket,
)

no_cache_policy = cf.CachePolicy(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the CachingDisabled managed policy instead ? Looks pretty similar ?

Longer note :

So we're turning off any caching Cloud Front is doing with that policy. We've actually been doing that on MAAP as well (albeit manually on the console, so I don't have any code to link), after noticing that STAC metadata wasn't getting updated frequently enough on the STAC browser.

However, I'm still puzzled that I am doing this because, isn't the point of CloudFront to cache the content in the closest edge point to the user and decrease latency thanks to that ? If we turn that off entirely, why are we using CloudFront ?

In the first place I remember that we started to use CloudFront because that's what the AWS docs tell us to do for serving static content stored in S3 over HTTPs. However, is the CloudFront layer really needed for this, can't we just create a route53 record that directly points to the bucket ?

Note : those are mainly questions, I am not suggesting we stop using CloudFront

else None
)

self.distribution = cf.Distribution(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right that with this set of behaviors, the mapping is going to be, assuming the domain is dev-stac.delta-backend.com :

  • https://dev-stac.delta-backend.com -> browser
  • https://dev-stac.delta-backend.com/api/stac STAC API
  • https://dev-stac.delta-backend.com/api/raster raster API

veda_stack,
"routes",
raster_api_id=raster_api.raster_api.api_id,
stac_api_id=stac_api.stac_api.api_id,
Copy link

@emileten emileten Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to address the API Gateway url not resolving was to use the restapi_id in the url pattern: https://{restapi_id}.execute-api.{region}.amazonaws.com/ to access both the stac and raster apis.

😱 very nice trick. Spent so much time finding a workaround for this.

Copy link
Collaborator

@ividito ividito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to spend a bit more time looking at this today (particularly looking at how the ingest API is working with the CF distribution in this stack). The failing action will pass if the environment secret is updated with the ingest_url and stac_browser_bucket values.

@smohiudd
Copy link
Contributor Author

@ividito Also the routes construct isn't configured as as optional (similar to how the domain construct works). So the ingest_url and stac_browser_bucket configs need to be optional.

Copy link
Collaborator

@anayeaye anayeaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple comments on the config

description="Certificate’s ARN",
)

using_mcp_acct: Optional[bool] = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using_mcp_acct is not used? Also, not sure about the intent but possibly we could be inferring this info from whether or not VEDA_PERMISSIONS_BOUNDARY_POLICY_NAME is configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was an old config when I copied over Slesa's original cloudfront code. It's not used anywhere, can be removed.

@@ -0,0 +1,52 @@
"""Settings for Cloudfront distribution - any environment variables starting with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line to the readme advanced config about this? (sorry if this is a repeat comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@slesaad
Copy link
Member

slesaad commented Oct 2, 2023

One way to address the API Gateway url not resolving was to use the restapi_id in the url pattern: https://{restapi_id}.execute-api.{region}.amazonaws.com/ to access both the stac and raster apis.

I'm kinda surprised that this is the change that made it work. We have the same setup in ghgc, but it still doesn't resolve the urls 🤔

image
http://staging.ghg.center/api/stac :(

Comment on lines 39 to 45
no_cache_policy = cf.CachePolicy(
self,
"no-cache-policy",
default_ttl=Duration.seconds(0),
min_ttl=Duration.seconds(0),
max_ttl=Duration.seconds(0),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's already no cache policy defined in cdk,

cf.CachePolicy.CACHING_DISABLED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@smohiudd
Copy link
Contributor Author

smohiudd commented Oct 2, 2023

@slesaad I did some experimentation last week and realized that for the cloudfront stac api to work you need the trailing slash... http://staging.ghg.center/api/stac/

Without the trailing slash you get 307 temporary redirect the goes to api gateway url

@slesaad
Copy link
Member

slesaad commented Oct 3, 2023

@smohiudd yeah, but the URLs in the catalogs are still the same api gateway urls

@smohiudd
Copy link
Contributor Author

smohiudd commented Oct 5, 2023

@slesaad the issue with that is cloudfront is not able to forward the host header to api gateway. I tried another way which seems to work using a load balancer in this PR: #232

pydocstyle fixes

format

isort changes

optional config

format

config changes

format

config changes

construct change
@smohiudd smohiudd force-pushed the feature/cloudfront-stac-browser branch from ead8e14 to dccaf49 Compare October 5, 2023 19:01
@smohiudd
Copy link
Contributor Author

smohiudd commented Oct 5, 2023

Thanks @vincentsarago! I got the host header working in APIgateway using the parameter mapping. Doesn't look like we need to go down the ALB route.

description="Boolean if Cloudfront Distribution should be deployed",
)

# STAC S#3 browser bucket name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. this is the name of an existing bucket?
  2. why is the default an empty string instead of None (same for ingest url below)
  3. should we add addition info in description to explain that an existing s3 bucket is required if using cloudfront

also: typo in the comment S#3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, for now it's the name of an existing bucket. But that may change depending on how the stac browser construct @emileten is working on is set up
  2. Yes you're right they should None
  3. Agreed

class Config:
"""model config"""

env_file = ".env"
env_prefix = "VEDA_STAC_"
env_prefix = "VEDA_"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a breaking change because it causes existing implementations to update the secrets stored in the secrets manager? I think we should address changing <SCOPE>_VARNAME across all constructs if we want to break the pattern (especially because it could be useful to share a variable that should be the same in the scopes of different constructs).

Is the change needed now, though, or can we keep scoping our environment variables?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like staging has no VEDA_STAC_ vars in the secrets store so this comment can be downgraded to a nit

if veda_stac_settings.domain_hosted_zone_name
else None,
)
),
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome + I wonder if the prefix can be managed here also (in the future)?

Question: I need to try this out but to me it seems that if domain_hosted_zone_name is None, then "host" will be overwritten with None? Or does it just end up being a no-op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make it like a no-op if hosted zone wasn't present. I can double check if that's what it's doing

@vincentsarago
Copy link
Contributor

@smohiudd Glad it's working 🙏 , sorry for not seeing this issue earlier 😭

)

# Certificate must be in zone us-east-1
domain_cert = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the certificate used in cloudfront must be in the us-east-1 region. Right now the arn is being passed manually. Should we be creating a new certificate in us-east-1 instead.

Comment on lines 79 to 90
construct_id,
handler=lambda_function,
parameter_mapping=(
aws_apigatewayv2_alpha.ParameterMapping().overwrite_header(
"host",
aws_apigatewayv2_alpha.MappingValue(
veda_stac_settings.domain_hosted_zone_name
)
if veda_stac_settings.domain_hosted_zone_name
else None,
)
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smohiudd I realized we'd want to do this to the raster API too, because otherwise when we use /mosaic/register, it gives back the API gateway URL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slesaad have you tested that on GHGC yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! - https://github.com/US-GHG-Center/ghgc-backend/pull/46/files
deployed to dev.ghg.center/api/raster

handler=veda_raster_function,
parameter_mapping=(
aws_apigatewayv2_alpha.ParameterMapping().overwrite_header(
"host",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least temporarily, I think we need to make this assignment conditional based on whether or not cloudfront is used. It appears that the staging APIs will also be overwritten. Same for the stac api

[~] AWS::ApiGatewayV2::Integration raster-api/veda-backend-uah-staging-raster-api/DefaultRoute/raster-api rasterapivedabackenduahstagingrasterapiDefaultRouterasterapi20F21B19 
 └─ [+] RequestParameters
     └─ {"overwrite:header.host":"delta-backend.com"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a conditional for the parameter mapping

@smohiudd smohiudd merged commit 8965651 into develop Oct 12, 2023
@vincentsarago
Copy link
Contributor

@smohiudd @anayeaye developmentseed/titiler#716 might be of interest. Basically when using an API Gateway prefix, it seems that you'll need to set a root_path in mangum but also in the fastAPI application as well https://github.com/developmentseed/titiler/blob/main/src/titiler/application/titiler/application/main.py#L65

anayeaye added a commit that referenced this pull request Nov 9, 2023
# What
- When a custom host is configured, overwrite the host in the api
integration (bring back the parameter mapping method introduced in #229)
- Include origin_path in cloudfront ingest api behavior. Note: this path
will not work until the root_path is changed to `api/ingest` in the
ingest api
- Domain construct now only generates custom subdomains for endpoints
when configured (existing stacks must provide new configuration
variable). _Note: indentation changes that wrap all subdomain constructs
make the change appear larger than they really are. The change removes
an unused shared subdomain construct and only generates api-specific
subdomains when create subdomains configuration is true._

## Environment variable changes
### New
`VEDA_CUSTOM_DOMAIN` optional configuration to provide a complete url
for host overrides in lambda APIs (i.e. `dev.somedomain.com`)
### Removed
`VEDA_DOMAIN_SHARED_SUBDOMAIN` is no longer used (logic is managed with
custom subdomains config below)
### New+Breaking
`VEDA_DOMAIN_CREATE_CUSTOM_SUBDOMAINS` boolean defaults false. Existing
deployments with custom subdomains for each raster and stac api
endpoints need to set this variable to TRUE.
anayeaye added a commit that referenced this pull request Nov 13, 2023
…cloudfront (#244)

# What
- adds optional configuration to deploy all endpoints with a custom root
path (i.e. `/api/stac` and `/api/raster`) #229, #241
- adds support for using a cloudfront as a reverse proxy #229, #241,
#245
- titiler-pgstac upgraded from 0.2.3 to 0.8.0 #239
- titiler upgrade for custom colormap configuration #243

# How tested
A temporary test stack was deployed to confirm that raster-api works as
expected and that the two-subdomain staging-stac and staging-raster
pattern is still supported. The [pre-deploy diff action for this
pr](https://github.com/NASA-IMPACT/veda-backend/actions/runs/6791564314/job/18463354484#step:11:919)
confirms that the raster API lambda will be upgraded and no domain name
changes will be caused.
@anayeaye anayeaye deleted the feature/cloudfront-stac-browser branch March 12, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants