-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rollback handler): convert to ts for safety #1044
fix(rollback handler): convert to ts for safety #1044
Conversation
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
518dace
to
a257d11
Compare
3d72484
to
1bce332
Compare
a257d11
to
480621f
Compare
1bce332
to
6f5c0d1
Compare
b7cd529
to
f601f8d
Compare
6f5c0d1
to
0b1d0b9
Compare
f601f8d
to
5ea51b9
Compare
0b1d0b9
to
79ec119
Compare
5ea51b9
to
42c9594
Compare
src/middleware/routeHandler.ts
Outdated
) | ||
) | ||
|
||
await backOff(() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how many times does this retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max default 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st attempt
200ms
2nd
400ms
3rd
800ms
4th
1600ms
5th
3200ms
7th
6400ms
and so on right? is the above backoff sequence correct?
was just thinking in a worst case scenario the request will be open for quite long? will this timeout on client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would need to dive into source code here to get the exact algo (https://github.com/coveooss/exponential-backoff), but based on their readme I would assume that would be the case
wait ah this backoff occurs when the call has an error bah, so the FE would have already gotten an error code that we try to fix via exponential backoff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is a handler middleware right? so it will completed through the execution of all handlers before request completes right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up suggestion, option 1 + alarm if it fails for 5 times as failing 5 times signifies some deeper issue not resolvable with retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure option 1 implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm sure about alarm? worried about numberous false positives tho
What I can do is add logging for this, can review later if this occurs too freq with false positives. will add todo regardless in jira
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kishore03109 pls test once on staging/local
@@ -539,6 +539,29 @@ export default class GitFileSystemService { | |||
) | |||
.orElse(() => | |||
// Retry push once | |||
ResultAsync.fromPromise( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harishv7 ps should have been clearer, specificly here, why do we have a retry here ah? + why is it on the second retry we pass in diferent args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is just cos in case the first attempt fails, we have a backup to retry once more.
regarding the options, I think there may be a bug - we should be passing in the same options both for first & 2nd attempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why it (i think consistently) fails on file rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making all retries with same options. I am not too sure why we need retries anyways (kept hitting it during testing), so am keeping verbose code as is first for functionality first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why it (i think consistently) fails on file rename?
hmm, consistent failure on 1st try is not an expected behaviour
42c9594
to
d27ba8a
Compare
33bff68
to
46c7528
Compare
d27ba8a
to
6ff9598
Compare
46c7528
to
3e85ab2
Compare
a8abd09
to
3df3e80
Compare
) => { | ||
const result = await gitFileSystemService.hasGitFileLock(repoName, true) | ||
if (result.isErr()) { | ||
next(result.error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that bugfix :monkas:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is copy-pasted code :sadge:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kishore03109 can add a follow up minor todo - add alarm if 5 retries all fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
focused mostly on ensuring no regression; can't be fully sure but kinda sure.
we should aim to fix low-hanging fruits here + checking the scope to see that the changes made on API are safe
repoName: string, | ||
next: (arg0: any) => void | ||
) => { | ||
const result = await gitFileSystemService.hasGitFileLock(repoName, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume we add the true
here because we wanna show the staging site right?
separately, i realised that earlier on in 1 of your PRs, we removed a default argument. the majority of callsites should then require the isStaging
prop (probably set to true
now); can i check if this has been done for all the call sites? (i think yes la but i lazy check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also it's actually really difficult to tell when it's isStaging
vs not, which opens us up to errors but that's out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//todo to self, check call sites for hasGitFileLock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return false | ||
} | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style - we can just return !!isGitLocked
but noted that this was taken as-is from existing code.
} | ||
|
||
// Used when there are no write API calls to the repo on GitHub | ||
export const attachReadRouteHandlerWrapper = (routeHandler: any) => async ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't necessarily have to do it here (this just maintains parity) but express does export a RouteHandler
type and retaining the implicit any
is harmful to our codebase as it encourages (or at least does not prevent) ppl from then using any
as-is or having it bleed through our code.
src/middleware/routeHandler.ts
Outdated
false | ||
) | ||
|
||
const isGitAvailable = await handleGitFileLock(siteName, next) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is taken as-is but there's a difference in behaviour between this method and the earlier one.
notably, earlier on, we only check isGitAvailable
is IS_GGS_ENABLED
- in here, we always check regardless of the flag.
within the handleGitFileLock
method itself, i think it just returns false
if result.isErr()
so this should be safe but we might want to simplify this process if isGitAvailable
is computed using IS_GGS_ENABLED
and handleGitFileLock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/middleware/routeHandler.ts
Outdated
} | ||
|
||
let originalStagingCommitSha: any | ||
let originalStagingLiteCommitSha: any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should set the type here (and above) - this is quite low effort as the method used to determine the sha
is typed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
await backOff( | ||
() => | ||
revertCommit( | ||
originalStagingLiteCommitSha, | ||
siteName, | ||
accessToken, | ||
STAGING_LITE_BRANCH | ||
), | ||
backoffOptions | ||
) | ||
} | ||
} catch (retryErr) { | ||
await unlock(siteName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbvh this whole chunk is very jank and confusing but this PR doesn't focus on code clarity but migration from js -> ts so ok with it. we should, however, look into probably refactoring this into a sensible form.
@@ -136,7 +135,7 @@ class SettingsRouter { | |||
router.post( | |||
"/repo-password", | |||
this.authorizationMiddleware.verifyIsEmailUser, | |||
attachWriteRouteHandlerWrapper(this.updateRepoPassword) | |||
attachReadRouteHandlerWrapper(this.updateRepoPassword) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't updating a repo's password be a write op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -542,10 +542,33 @@ export default class GitFileSystemService { | |||
isForce | |||
? this.git | |||
.cwd({ path: `${efsVolPath}/${repoName}`, root: false }) | |||
.push(["--force"]) | |||
.push([...gitOptions, "--force"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushes are supposed to have the required options
} | ||
) | ||
) | ||
.orElse(() => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order for this to work, should we check that the previous failure was over the network (ie, github doesn't see the commit)? otherwise, we might have a failure from a non-github issue -> push -> dup commits or fail, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after disc resolving this as an extreme edge case
src/utils/neverthrow.ts
Outdated
* expect a .catch() method on the returned promise. This should not be used in most | ||
* control flows as it removes the benefits that neverthrow provides. | ||
*/ | ||
const convertNeverThrowToPromise = <T, E>(x: ResultAsync<T, E>): Promise<T> => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does x._unsafeUnwrap
work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just tried, the below snippet works!
const res = await x
return res._unsafeUnwrap()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 097f1f4
Merge activity
|
d2ab13c
to
b995436
Compare
79cdf6a
to
b96e0bc
Compare
Now that there exists commits to 2 different branches, there needs to be a rollback handler to cater for the case of network failures that exist for a commit to 1 branch. else, we might have a scary situation whereby staging-lite updates, but staging doesnt, and this leads to a wrongly output production site. this effectively means that there should be any use of the
writeroutehandler
since any write can lead to deviation of staging and staging liteNote that we dont need to do any thing with regards checking if a site is whitelisted for quickie since all sites have the staging lite branch infra already set up. the whitelisting only dictates whether or not the staging lite file gets updated or not.
Previously, we were not using the rollback properly, this pr converts the file to ts to allow for easier type checking.
there were also some weird errors that occured that could be avoided with the retry. however, staging lite needs to be pushed with a
.push(gitOptions)
, which the retry mechanism does not have. as a bandage, retry twice with the options, check in with @harishv7 on why there exists a retry inGitFileSystemService
in the first place.Manual tests for rollback handler
[Testing for quickie whitelisted site that is GGs]
create
function ofGitFileSystemService
and short circuit the function by adding a early// return errAsync(new ConflictError("this is a test failure"))
eg.[Testing for non-quickie whitelisted site that is GGs]
[Testing for GH sites for rollback handler] (all gh sites are NOT whitelisted for quickie)
GithubService
[Testing normal crud operations]
Tests
In staging environment,
kishore-test
is quickie +zhongjun-test-amplify
is NON-quickie +kishore-test-dev-gh
on github (non qucikie)