Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rollback handler): convert to ts for safety #1044

Merged

Conversation

kishore03109
Copy link
Contributor

@kishore03109 kishore03109 commented Dec 3, 2023

Now that there exists commits to 2 different branches, there needs to be a rollback handler to cater for the case of network failures that exist for a commit to 1 branch. else, we might have a scary situation whereby staging-lite updates, but staging doesnt, and this leads to a wrongly output production site. this effectively means that there should be any use of the writeroutehandler since any write can lead to deviation of staging and staging lite

Note that we dont need to do any thing with regards checking if a site is whitelisted for quickie since all sites have the staging lite branch infra already set up. the whitelisting only dictates whether or not the staging lite file gets updated or not.

Previously, we were not using the rollback properly, this pr converts the file to ts to allow for easier type checking.
there were also some weird errors that occured that could be avoided with the retry. however, staging lite needs to be pushed with a .push(gitOptions), which the retry mechanism does not have. as a bandage, retry twice with the options, check in with @harishv7 on why there exists a retry in GitFileSystemService in the first place.

Manual tests for rollback handler

[Testing for quickie whitelisted site that is GGs]

  • use a email login site, ensure that that it is whitelisted for build time reduction in growthbook
  • create a file, ensure that no error are obsevered
  • go into the create function of GitFileSystemService and short circuit the function by adding a early // return errAsync(new ConflictError("this is a test failure")) eg.
create(
    repoName: string,
    userId: string,
    content: string,
    directoryName: string,
    fileName: string,
    encoding: "utf-8" | "base64" = "utf-8",
    branchName: string
  ): ResultAsync<
    GitCommitResult,
    ConflictError | GitFileSystemError | NotFoundError
  > {
    // short circuit for testing
    return errAsync(new ConflictError("this is a test failure"))
... // rest of code
}
  • re-run the create operation, notice that all rollbacks occur for both staging + staging-lite.

[Testing for non-quickie whitelisted site that is GGs]

  • do the same as above but not quickie should not be whitelisted

[Testing for GH sites for rollback handler] (all gh sites are NOT whitelisted for quickie)

  • throw error in the create function for GithubService
  • notice that rollback works as intended -> it goes back to the last known state sucessfully.

[Testing normal crud operations]

Tests

In staging environment,
kishore-test is quickie + zhongjun-test-amplify is NON-quickie + kishore-test-dev-gh on github (non qucikie)

  • rename of folders with folder + pages inside
  • changes made to any page should be persisted

@kishore03109 kishore03109 requested a review from a team December 3, 2023 22:52
@kishore03109 kishore03109 marked this pull request as ready for review December 3, 2023 22:53
@kishore03109 kishore03109 removed the request for review from a team December 3, 2023 23:05
@kishore03109 kishore03109 marked this pull request as draft December 3, 2023 23:05
@kishore03109 kishore03109 changed the base branch from 12-04-chore_quickie_delete_quickie_for_gh to 12-04-test_gitCommitService_add_test_cases December 4, 2023 04:13
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from 518dace to a257d11 Compare December 4, 2023 04:13
@kishore03109 kishore03109 force-pushed the 12-04-test_gitCommitService_add_test_cases branch from 3d72484 to 1bce332 Compare December 4, 2023 04:58
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from a257d11 to 480621f Compare December 4, 2023 04:58
@kishore03109 kishore03109 marked this pull request as ready for review December 4, 2023 17:29
@kishore03109 kishore03109 requested a review from a team December 4, 2023 17:29
@kishore03109 kishore03109 force-pushed the 12-04-test_gitCommitService_add_test_cases branch from 1bce332 to 6f5c0d1 Compare December 6, 2023 01:20
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from b7cd529 to f601f8d Compare December 6, 2023 01:21
@kishore03109 kishore03109 force-pushed the 12-04-test_gitCommitService_add_test_cases branch from 6f5c0d1 to 0b1d0b9 Compare December 6, 2023 02:07
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from f601f8d to 5ea51b9 Compare December 6, 2023 02:07
@kishore03109 kishore03109 force-pushed the 12-04-test_gitCommitService_add_test_cases branch from 0b1d0b9 to 79ec119 Compare December 6, 2023 02:08
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from 5ea51b9 to 42c9594 Compare December 6, 2023 02:08
src/middleware/routeHandler.ts Show resolved Hide resolved
src/middleware/routeHandler.ts Outdated Show resolved Hide resolved
)
)

await backOff(() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many times does this retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max default 10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st attempt
200ms
2nd
400ms
3rd
800ms
4th
1600ms
5th
3200ms
7th
6400ms

and so on right? is the above backoff sequence correct?

was just thinking in a worst case scenario the request will be open for quite long? will this timeout on client?

Copy link
Contributor Author

@kishore03109 kishore03109 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would need to dive into source code here to get the exact algo (https://github.com/coveooss/exponential-backoff), but based on their readme I would assume that would be the case

wait ah this backoff occurs when the call has an error bah, so the FE would have already gotten an error code that we try to fix via exponential backoff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is a handler middleware right? so it will completed through the execution of all handlers before request completes right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up suggestion, option 1 + alarm if it fails for 5 times as failing 5 times signifies some deeper issue not resolvable with retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure option 1 implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm sure about alarm? worried about numberous false positives tho
What I can do is add logging for this, can review later if this occurs too freq with false positives. will add todo regardless in jira

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kishore03109 pls test once on staging/local

src/middleware/routeHandler.ts Outdated Show resolved Hide resolved
src/middleware/routeHandler.ts Outdated Show resolved Hide resolved
src/middleware/routeHandler.ts Show resolved Hide resolved
src/routes/v2/authenticatedSites/contactUs.js Show resolved Hide resolved
src/middleware/routeHandler.ts Show resolved Hide resolved
@@ -539,6 +539,29 @@ export default class GitFileSystemService {
)
.orElse(() =>
// Retry push once
ResultAsync.fromPromise(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harishv7 ps should have been clearer, specificly here, why do we have a retry here ah? + why is it on the second retry we pass in diferent args?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is just cos in case the first attempt fails, we have a backup to retry once more.

regarding the options, I think there may be a bug - we should be passing in the same options both for first & 2nd attempts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why it (i think consistently) fails on file rename?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making all retries with same options. I am not too sure why we need retries anyways (kept hitting it during testing), so am keeping verbose code as is first for functionality first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why it (i think consistently) fails on file rename?

hmm, consistent failure on 1st try is not an expected behaviour

@kishore03109 kishore03109 requested a review from harishv7 December 6, 2023 05:48
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from 42c9594 to d27ba8a Compare December 6, 2023 07:14
@kishore03109 kishore03109 force-pushed the 12-04-test_gitCommitService_add_test_cases branch from 33bff68 to 46c7528 Compare December 6, 2023 07:36
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from d27ba8a to 6ff9598 Compare December 6, 2023 07:37
@kishore03109 kishore03109 force-pushed the 12-04-test_gitCommitService_add_test_cases branch from 46c7528 to 3e85ab2 Compare December 6, 2023 07:53
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch 2 times, most recently from a8abd09 to 3df3e80 Compare December 6, 2023 08:12
) => {
const result = await gitFileSystemService.hasGitFileLock(repoName, true)
if (result.isErr()) {
next(result.error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that bugfix :monkas:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is copy-pasted code :sadge:

Copy link
Contributor

@harishv7 harishv7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kishore03109 can add a follow up minor todo - add alarm if 5 retries all fail

Copy link
Contributor

@seaerchin seaerchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

focused mostly on ensuring no regression; can't be fully sure but kinda sure.

we should aim to fix low-hanging fruits here + checking the scope to see that the changes made on API are safe

repoName: string,
next: (arg0: any) => void
) => {
const result = await gitFileSystemService.hasGitFileLock(repoName, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume we add the true here because we wanna show the staging site right?

separately, i realised that earlier on in 1 of your PRs, we removed a default argument. the majority of callsites should then require the isStaging prop (probably set to true now); can i check if this has been done for all the call sites? (i think yes la but i lazy check)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it's actually really difficult to tell when it's isStaging vs not, which opens us up to errors but that's out of scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//todo to self, check call sites for hasGitFileLock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +46 to +55
return false
}
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style - we can just return !!isGitLocked but noted that this was taken as-is from existing code.

}

// Used when there are no write API calls to the repo on GitHub
export const attachReadRouteHandlerWrapper = (routeHandler: any) => async (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't necessarily have to do it here (this just maintains parity) but express does export a RouteHandler type and retaining the implicit any is harmful to our codebase as it encourages (or at least does not prevent) ppl from then using any as-is or having it bleed through our code.

false
)

const isGitAvailable = await handleGitFileLock(siteName, next)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is taken as-is but there's a difference in behaviour between this method and the earlier one.

notably, earlier on, we only check isGitAvailable is IS_GGS_ENABLED - in here, we always check regardless of the flag.

within the handleGitFileLock method itself, i think it just returns false if result.isErr() so this should be safe but we might want to simplify this process if isGitAvailable is computed using IS_GGS_ENABLED and handleGitFileLock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

let originalStagingCommitSha: any
let originalStagingLiteCommitSha: any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should set the type here (and above) - this is quite low effort as the method used to determine the sha is typed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +236 to +270
)
await backOff(
() =>
revertCommit(
originalStagingLiteCommitSha,
siteName,
accessToken,
STAGING_LITE_BRANCH
),
backoffOptions
)
}
} catch (retryErr) {
await unlock(siteName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbvh this whole chunk is very jank and confusing but this PR doesn't focus on code clarity but migration from js -> ts so ok with it. we should, however, look into probably refactoring this into a sensible form.

@@ -136,7 +135,7 @@ class SettingsRouter {
router.post(
"/repo-password",
this.authorizationMiddleware.verifyIsEmailUser,
attachWriteRouteHandlerWrapper(this.updateRepoPassword)
attachReadRouteHandlerWrapper(this.updateRepoPassword)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't updating a repo's password be a write op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -542,10 +542,33 @@ export default class GitFileSystemService {
isForce
? this.git
.cwd({ path: `${efsVolPath}/${repoName}`, root: false })
.push(["--force"])
.push([...gitOptions, "--force"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushes are supposed to have the required options

}
)
)
.orElse(() =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order for this to work, should we check that the previous failure was over the network (ie, github doesn't see the commit)? otherwise, we might have a failure from a non-github issue -> push -> dup commits or fail, isn't it?

Copy link
Contributor Author

@kishore03109 kishore03109 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after disc resolving this as an extreme edge case

* expect a .catch() method on the returned promise. This should not be used in most
* control flows as it removes the benefits that neverthrow provides.
*/
const convertNeverThrowToPromise = <T, E>(x: ResultAsync<T, E>): Promise<T> =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does x._unsafeUnwrap work here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tried, the below snippet works!

  const res = await x
  return res._unsafeUnwrap()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 097f1f4

Copy link
Contributor Author

kishore03109 commented Dec 6, 2023

Merge activity

  • Dec 6, 5:51 AM: @kishore03109 started a stack merge that includes this pull request via Graphite.
  • Dec 6, 6:01 AM: Graphite rebased this pull request as part of a merge.

@kishore03109 kishore03109 force-pushed the 12-04-test_gitCommitService_add_test_cases branch from d2ab13c to b995436 Compare December 6, 2023 10:58
@kishore03109 kishore03109 changed the base branch from 12-04-test_gitCommitService_add_test_cases to develop December 6, 2023 11:00
@kishore03109 kishore03109 force-pushed the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch from 79cdf6a to b96e0bc Compare December 6, 2023 11:01
@kishore03109 kishore03109 merged commit 00c532e into develop Dec 6, 2023
8 checks passed
@mergify mergify bot deleted the 12-04-fix_rollback_handler_convert_to_ts_for_safety branch December 6, 2023 11:03
This was referenced Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants