You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As Federalist grows we have started to observe some bumps in the road with the garden build platform:
Disk space limits
We have a limited amount of disk space available during the build. This is currently 4GB but hopefully increasing to 6GB soon. Of this, our build container currently takes up about 2GB. This is a hard limit imposed by cloud.gov and will most likely not increase over time (after the impending bump). In order to continue to serve our customers with larger sites we need to reduce the footprint of our container
Intermittent mystery failures
The frequency of these failures has varied over time, and while a most of these are resolved by rebuilding, we still don't have an understand of why these occur. There are no logs and the process itself does not stop.
Reduce Build times
Builds are inefficient because we start from scratch every build, including installing the specified runtime version (if not using the default) and all the project dependencies.
I think we can address these issues over the next 6 months by doing the following:
Use smaller base docker images
We currently have a single docker container that includes Ruby, Node and Python runtimes based off of the default debian images. Explore using "slim" or "alpine" base images to remove unnecessary bloat.
(Done in "experimental" image) Package our build code as a static executable
Currently our build containers also have to include the environment and dependencies for running our Python code in addition to our customers'. Package and install our build code as a single executable so that the Python runtime and it's dependencies are not required. This may be possible now, but is complicated by the use of PyInvoke, which needs to be available at runtime, defeating the purpose of the static executable.
(Done) PyInvoke with Python's "subprocess" module
Python >3.5 's "subprocess" module provides the necessary primitives for running commands while managing the environment, logging, and errors without the extra layer of abstraction introduced by PyInvoke. We do not need the ability to run individual pieces of the build process from the command line and requiring each command to be run in a separate python process enforces undesirable constraints on the architecture of the application. I'm hopeful that removing PyInvoke will give us greater visibility into the intermittent mystery errors.
I am planning on starting with Step 3 and iterating from there.
Reduce publish times
Problem
Originally, in order to determine the minimum number of files to add/remove/update publishes compared the hashes of built files with those obtained using S3 SDK's "list objects" call. However, when adding the ability for users to configure custom Cache-Control headers, this check was insufficient as the custom headers are considered metadata and not part of the file hash. In order to only push changes, we would have had to fetch each object individually to compare for differences, so instead of checking, we just push all of the files, resulting in longer build times.
Solution
apply custom Cache-Control headers in federalist-proxy
do not apply Cache-Control headers to S3 objects
revert to diffing using the hashes or another more efficient method of diffing/publishing (ie. aws s3 sync, Transfer, etc...)
Once the recent changes to the proxy (access S3 over https/no longer use website config) are deployed the use case for "redirect objects" will be handled in proxy and they can be safely remove from the publish process.
A few customers have CORS configured on their buckets and we should allow this for all customers. However, this becomes more complicated with the move the private buckets. We need to investigate how folks are using this, why they need to go straight to the bucket (or is this required bc we just proxy) and the impact.
As Federalist grows we have started to observe some bumps in the road with the garden build platform:
Disk space limits
We have a limited amount of disk space available during the build. This is currently 4GB but hopefully increasing to 6GB soon. Of this, our build container currently takes up about 2GB. This is a hard limit imposed by cloud.gov and will most likely not increase over time (after the impending bump). In order to continue to serve our customers with larger sites we need to reduce the footprint of our container
Intermittent mystery failures
The frequency of these failures has varied over time, and while a most of these are resolved by rebuilding, we still don't have an understand of why these occur. There are no logs and the process itself does not stop.
Reduce Build times
Builds are inefficient because we start from scratch every build, including installing the specified runtime version (if not using the default) and all the project dependencies.
I think we can address these issues over the next 6 months by doing the following:
Use smaller base docker images
We currently have a single docker container that includes Ruby, Node and Python runtimes based off of the default debian images. Explore using "slim" or "alpine" base images to remove unnecessary bloat.
(Done in "experimental" image) Package our build code as a static executable
Currently our build containers also have to include the environment and dependencies for running our Python code in addition to our customers'. Package and install our build code as a single executable so that the Python runtime and it's dependencies are not required. This may be possible now, but is complicated by the use of PyInvoke, which needs to be available at runtime, defeating the purpose of the static executable.
(Done) PyInvoke with Python's "subprocess" module
Python >3.5 's "subprocess" module provides the necessary primitives for running commands while managing the environment, logging, and errors without the extra layer of abstraction introduced by PyInvoke. We do not need the ability to run individual pieces of the build process from the command line and requiring each command to be run in a separate python process enforces undesirable constraints on the architecture of the application. I'm hopeful that removing PyInvoke will give us greater visibility into the intermittent mystery errors.
I am planning on starting with Step 3 and iterating from there.
Reduce publish times
Problem
Originally, in order to determine the minimum number of files to add/remove/update publishes compared the hashes of built files with those obtained using S3 SDK's "list objects" call. However, when adding the ability for users to configure custom
Cache-Control
headers, this check was insufficient as the custom headers are considered metadata and not part of the file hash. In order to only push changes, we would have had to fetch each object individually to compare for differences, so instead of checking, we just push all of the files, resulting in longer build times.Solution
Cache-Control
headers infederalist-proxy
Cache-Control
headers to S3 objectsaws s3 sync
,Transfer
, etc...)References
Originally captured here.
Stop creating/publishing "Redirect Objects"
Once the recent changes to the proxy (access S3 over https/no longer use website config) are deployed the use case for "redirect objects" will be handled in proxy and they can be safely remove from the publish process.
References
Originally captured here.
Custom 404 and index.html pages
Allow SPAs and other sites hosted on the platform to have branch-specific 404 and index pages. To be implemented via the proxy and federalist.json
References
Originally captured here.
Garden build logging
Implement log drain for garden build instead of writing logs to db. Ex. fluentd, logstash, etc
References
Originally captured here.
CORS
A few customers have CORS configured on their buckets and we should allow this for all customers. However, this becomes more complicated with the move the private buckets. We need to investigate how folks are using this, why they need to go straight to the bucket (or is this required bc we just proxy) and the impact.
References
Originally captured here.
The text was updated successfully, but these errors were encountered: