-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gitFileSystem): safer api #1046
Conversation
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
c8c6807
to
06a8feb
Compare
5757b98
to
2a826b8
Compare
06a8feb
to
18d81c8
Compare
2a826b8
to
3050748
Compare
9c10830
to
437d36c
Compare
ecc5966
to
c2f88d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes are fine, albeit repetitive. similar work should probably be abstracted out as a function to avoid slight behavioural drift over time that creates bugs. this is a maintainability issue but it's not too bad for now.
otoh, we have a possibility of a silent failure, where we commit to EFS but fail the push to github. this could potentially be very confusing and we could fail fast here or attempt to handle the error properly. no pref, as long as the error itself is handled.
@@ -230,7 +230,7 @@ ${syncedRepos.map((repo) => `<li>${repo}</li>`)} | |||
|
|||
doesRepoNeedClone(repoName: string): ResultAsync<true, false> { | |||
return this.gitFileSystemService | |||
.isGitInitialized(repoName) | |||
.isGitInitialized(repoName, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm assuming that this change was made because the method should only be used as part of the repair form?
this is also out of scope but omitting the argument (branchName
or isStaging
) and exporting this function might lead to people calling this for staging
and wondering why it doesn't work.
we can either inline this directly or make it private (cos the false
case can always be chained off an orElse
) but ok with not doing this now because it's not a functional change and more of a good to have.
this.STAGING_LITE_BRANCH | ||
) | ||
} | ||
this.pushToGithub(sessionData, shouldUpdateStagingLite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can fail silently - we should probably just chain off this. alternatively, what we can do is just chain off the results sequentially and throw an error at the end (the behaviour now), which makes things clearer to read.
also eliminates the issue of staging-lite
updating but not staging
cos if the initial call to staging
fails, staging-lite
won't even run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey this was an intentional trade off we made as a team not to await for calls for they are expensive. But I do air your concerns regarding silent failures, this is the cause of the divergence that we are seeing. Dont intend to tackle it here!
For info only + outside of scope of pr: I tried adding an alarm for this, but it was TOO noisy, there is prob a way to make this such that we only throw alarms for this silent pushes, but havnt had time to invetigate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a few different ways to tackle this tbh, just spitballing ideas - not an ask to implement.
- separate out the push to github and let the router handle this. the router can always send response early then fail later (no cost to user) if the push to github fails.
- because
neverthrow
is safe, we could just write a wrapper for retryingErrors<T>
- just send an email on failure
we probably want to decide what categories of errors we care about - transient errors vs persistent errors; in terms of our usage, transient is probably outages <10 mins in length (we don't have that much pushes tbvh) and persistent >=10.
if transient, retry w/ exponential backoff will work well; if persistent, we probably want a notif + manual fix. to be honest, the divergence should arise only if we edit on github directly whilst there's a push failure so should be minimal but the problem is we got high barrier of entry to edit on efs (hence why we edit on github directly).
this.STAGING_LITE_BRANCH | ||
) | ||
} | ||
this.pushToGithub(sessionData, shouldUpdateStagingLite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue as above; there's also a repeated pattern - we could probably just separate out the steps as functions and chain them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file has alot of the same pattern of do some change -> check the result of that change -> update staging lite -> update github -> throw error if any surface.
we could probably separate those out and make this whole file clearer. that being said, we don't have to do it now.
however, there is an issue where updates are allowed to silently fail at the push step. because our call returns a result
, it won't throw by design and we need to check if it's recoverable or we should just fail fast (throw error).
I am quite curious how you would have structured this instead (but not today kek release first), maybe i book time w u after i come back to learn from you
I do think this is outside of the scope of this pr, since this is parity to what currently exists in production! thoughts? |
ye sure
if there's no easy win here then sure, we can just go with this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm if no quick way of feeding back failures to us.
do remove extra console log before merge.
Merge activity
|
56c6bde
to
19e412d
Compare
27d9fd6
to
7d9bced
Compare
Problem
Closes [insert issue #]
Solution