-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cli): org:search:dump issues when dump is fetched across multiple queries #1474
Conversation
Pull Request Report PR Title ✅ Title follows the conventional commit spec. |
I moved the bulk of the changes to the platform-client project: coveo/platform-client#834 So basically all I'm doing in this PR now is setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, after the release of the platform-client and its update in our project ofc
Sorry @fbeaudoincoveo, this PR does not satisifies all conditions of mergeability:
Modify your pull request to satisfies them and check the box below to try again:
You can also reach out to a maintainer if you think your contribution should be merged regardless of the reported issues. |
Proposed changes
Context:
The
org:search:dump command
executes queries under the hood. When we execute this command, it's possible that we hit the Search API maximum response size. The likelihood of hitting that limit is increased in the HIPAA environment, as the HIPAA maximum response size is 4x smaller than in PROD.When the dump is executed across multiple queries, we ensure that results are fetched from the same index by passing the
indexToken
of the initial query to each subsequent queries. This is, by definition, incompatible with using the index cache, as stated in theindexToken
parameter documentation:Another issue is that in order to know what result to start from in each subsequent query after the first, we request results to be ordered by ascending
rowid
value, and we request only results whoserowid
is greater than therowid
of the last result returned in the previous query. However, therowid
value is returned as a very large integer (greater than the maximum safe integer size JavaScript can handle without losing precision). Therefore, the expressionrowid>ROW_ID_OF_LAST_RESULT
typically behaves likerowid>=ROW_ID_OF_LAST_RESULT
because JavaScript rounds therowid
during the JSON.parse conversion of the response body.Fix:
Setting
maximumAge: 0
on every query performed for fetching source dumps trivially addresses the index cache issue.Using response handler to the platform client that uses the new
context
parameter of theJSON.parse
reviver (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse#parameters) allows us to avoid losing precision when converting rowid values. However, this new parameter is only available from Node 21, so we have to polyfill it for now. EDIT: After discussion with the DX team, this change was made directly in the platform-client project. See fix(success response handler): parse unsafe integers as strings platform-client#834.Breaking changes
None
Testing
org:search:dump
command locally before / after platform-client update against problematic HIPAA test organization. The duplicate / inconsistent dump result issue is no longer present after the fix.