Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gitFileSystem): safer api #1046

Merged
merged 5 commits into from
Dec 6, 2023

Conversation

kishore03109
Copy link
Contributor

@kishore03109 kishore03109 commented Dec 3, 2023

Problem

  1. the presence of default arguments lead to sneaky bugs introduced due to caller unknowingly calling the wrong branch.
  2. there is a scary instance whereby the lack of update to staging-lite could occur, but the update to staging does not, leading to issues downstream.

Closes [insert issue #]

Solution

  1. to keep things simple, we make all the args compulsory.
  2. make staging update first before updating staging lite.

@kishore03109 kishore03109 requested a review from a team December 3, 2023 22:54
@kishore03109 kishore03109 marked this pull request as ready for review December 3, 2023 22:57
@kishore03109 kishore03109 force-pushed the 12-04-test_githubService_add_tests branch from c8c6807 to 06a8feb Compare December 4, 2023 04:13
@kishore03109 kishore03109 force-pushed the 12-04-feat_gitFileSystem_safer_api branch from 5757b98 to 2a826b8 Compare December 4, 2023 04:13
@kishore03109 kishore03109 force-pushed the 12-04-test_githubService_add_tests branch from 06a8feb to 18d81c8 Compare December 4, 2023 04:57
@kishore03109 kishore03109 force-pushed the 12-04-feat_gitFileSystem_safer_api branch from 2a826b8 to 3050748 Compare December 4, 2023 04:57
@kishore03109 kishore03109 force-pushed the 12-04-test_githubService_add_tests branch 2 times, most recently from 9c10830 to 437d36c Compare December 6, 2023 01:19
@kishore03109 kishore03109 force-pushed the 12-04-feat_gitFileSystem_safer_api branch 3 times, most recently from ecc5966 to c2f88d4 Compare December 6, 2023 02:08
Copy link
Contributor

@seaerchin seaerchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes are fine, albeit repetitive. similar work should probably be abstracted out as a function to avoid slight behavioural drift over time that creates bugs. this is a maintainability issue but it's not too bad for now.

otoh, we have a possibility of a silent failure, where we commit to EFS but fail the push to github. this could potentially be very confusing and we could fail fast here or attempt to handle the error properly. no pref, as long as the error itself is handled.

@@ -230,7 +230,7 @@ ${syncedRepos.map((repo) => `<li>${repo}</li>`)}

doesRepoNeedClone(repoName: string): ResultAsync<true, false> {
return this.gitFileSystemService
.isGitInitialized(repoName)
.isGitInitialized(repoName, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm assuming that this change was made because the method should only be used as part of the repair form?

this is also out of scope but omitting the argument (branchName or isStaging) and exporting this function might lead to people calling this for staging and wondering why it doesn't work.

we can either inline this directly or make it private (cos the false case can always be chained off an orElse) but ok with not doing this now because it's not a functional change and more of a good to have.

src/services/db/GitFileCommitService.ts Outdated Show resolved Hide resolved
this.STAGING_LITE_BRANCH
)
}
this.pushToGithub(sessionData, shouldUpdateStagingLite)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can fail silently - we should probably just chain off this. alternatively, what we can do is just chain off the results sequentially and throw an error at the end (the behaviour now), which makes things clearer to read.

also eliminates the issue of staging-lite updating but not staging cos if the initial call to staging fails, staging-lite won't even run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey this was an intentional trade off we made as a team not to await for calls for they are expensive. But I do air your concerns regarding silent failures, this is the cause of the divergence that we are seeing. Dont intend to tackle it here!

For info only + outside of scope of pr: I tried adding an alarm for this, but it was TOO noisy, there is prob a way to make this such that we only throw alarms for this silent pushes, but havnt had time to invetigate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a few different ways to tackle this tbh, just spitballing ideas - not an ask to implement.

  1. separate out the push to github and let the router handle this. the router can always send response early then fail later (no cost to user) if the push to github fails.
  2. because neverthrow is safe, we could just write a wrapper for retrying Errors<T>
  3. just send an email on failure

we probably want to decide what categories of errors we care about - transient errors vs persistent errors; in terms of our usage, transient is probably outages <10 mins in length (we don't have that much pushes tbvh) and persistent >=10.

if transient, retry w/ exponential backoff will work well; if persistent, we probably want a notif + manual fix. to be honest, the divergence should arise only if we edit on github directly whilst there's a push failure so should be minimal but the problem is we got high barrier of entry to edit on efs (hence why we edit on github directly).

this.STAGING_LITE_BRANCH
)
}
this.pushToGithub(sessionData, shouldUpdateStagingLite)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue as above; there's also a repeated pattern - we could probably just separate out the steps as functions and chain them together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file has alot of the same pattern of do some change -> check the result of that change -> update staging lite -> update github -> throw error if any surface.

we could probably separate those out and make this whole file clearer. that being said, we don't have to do it now.

however, there is an issue where updates are allowed to silently fail at the push step. because our call returns a result, it won't throw by design and we need to check if it's recoverable or we should just fail fast (throw error).

src/services/db/GitFileSystemService.ts Outdated Show resolved Hide resolved
@kishore03109
Copy link
Contributor Author

@seaerchin

the changes are fine, albeit repetitive. similar work should probably be abstracted out as a function to avoid slight behavioural drift over time that creates bugs. this is a maintainability issue but it's not too bad for now.

I am quite curious how you would have structured this instead (but not today kek release first), maybe i book time w u after i come back to learn from you

otoh, we have a possibility of a silent failure, where we commit to EFS but fail the push to github. this could potentially be very confusing and we could fail fast here or attempt to handle the error properly. no pref, as long as the error itself is handled.

I do think this is outside of the scope of this pr, since this is parity to what currently exists in production! thoughts?

@seaerchin
Copy link
Contributor

seaerchin commented Dec 6, 2023

I am quite curious how you would have structured this instead (but not today kek release first), maybe i book time w u after i come back to learn from you

ye sure

I do think this is outside of the scope of this pr, since this is parity to what currently exists in production! thoughts?

if there's no easy win here then sure, we can just go with this

Copy link
Contributor

@seaerchin seaerchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm if no quick way of feeding back failures to us.

do remove extra console log before merge.

Copy link
Contributor Author

kishore03109 commented Dec 6, 2023

Merge activity

@kishore03109 kishore03109 force-pushed the 12-04-test_githubService_add_tests branch from 56c6bde to 19e412d Compare December 6, 2023 10:54
@kishore03109 kishore03109 changed the base branch from 12-04-test_githubService_add_tests to develop December 6, 2023 10:55
@kishore03109 kishore03109 force-pushed the 12-04-feat_gitFileSystem_safer_api branch from 27d9fd6 to 7d9bced Compare December 6, 2023 10:56
@kishore03109 kishore03109 merged commit 22f239c into develop Dec 6, 2023
8 checks passed
@mergify mergify bot deleted the 12-04-feat_gitFileSystem_safer_api branch December 6, 2023 10:57
This was referenced Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants