Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cli): org:search:dump issues when dump is fetched across multiple queries #1474

Merged
merged 9 commits into from
Jun 21, 2024

Conversation

fbeaudoincoveo
Copy link
Contributor

@fbeaudoincoveo fbeaudoincoveo commented Jun 19, 2024

Proposed changes

Context:

The org:search:dump command executes queries under the hood. When we execute this command, it's possible that we hit the Search API maximum response size. The likelihood of hitting that limit is increased in the HIPAA environment, as the HIPAA maximum response size is 4x smaller than in PROD.

When the dump is executed across multiple queries, we ensure that results are fetched from the same index by passing the indexToken of the initial query to each subsequent queries. This is, by definition, incompatible with using the index cache, as stated in the indexToken parameter documentation:

The Base64 encoded identifier of the index mirror to forward the request to. See also the index parameter.
If you do not specify an indexToken (or index) value, any index mirror could be used.
Note: Passing an indexToken (or index) value has no effect when the results of a specific request can be returned from cache (see the maximumAge parameter).

Another issue is that in order to know what result to start from in each subsequent query after the first, we request results to be ordered by ascending rowid value, and we request only results whose rowid is greater than the rowid of the last result returned in the previous query. However, the rowid value is returned as a very large integer (greater than the maximum safe integer size JavaScript can handle without losing precision). Therefore, the expression rowid>ROW_ID_OF_LAST_RESULT typically behaves like rowid>=ROW_ID_OF_LAST_RESULT because JavaScript rounds the rowid during the JSON.parse conversion of the response body.

Fix:

Breaking changes

None

Testing

  • Unit Tests: Yes
  • Functionnal Tests: No, unit tests are sufficient to ensure that maximumAge is set to 0
  • Manual Tests: Tested the org:search:dump command locally before / after platform-client update against problematic HIPAA test organization. The duplicate / inconsistent dump result issue is no longer present after the fix.

@fbeaudoincoveo fbeaudoincoveo requested a review from a team as a code owner June 19, 2024 14:20
@fbeaudoincoveo fbeaudoincoveo requested review from olamothe, y-lakhdar and louis-bompart and removed request for a team June 19, 2024 14:20
Copy link
Contributor

github-actions bot commented Jun 19, 2024

Pull Request Report

PR Title

✅ Title follows the conventional commit spec.

@fbeaudoincoveo fbeaudoincoveo changed the title Fix org:search:dump issues when dump is fetched across multiple queries fix(cli): org:search:dump issues when dump is fetched across multiple queries Jun 19, 2024
@fbeaudoincoveo
Copy link
Contributor Author

I moved the bulk of the changes to the platform-client project: coveo/platform-client#834

So basically all I'm doing in this PR now is setting maximumAge: 0 on dump queries.

Copy link
Collaborator

@louis-bompart louis-bompart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after the release of the platform-client and its update in our project ofc

Copy link
Contributor

Sorry @fbeaudoincoveo, this PR does not satisifies all conditions of mergeability:

Modify your pull request to satisfies them and check the box below to try again:

  • Merge! :shipit:

You can also reach out to a maintainer if you think your contribution should be merged regardless of the reported issues.

@fbeaudoincoveo fbeaudoincoveo merged commit 2ef81a1 into master Jun 21, 2024
57 checks passed
@fbeaudoincoveo fbeaudoincoveo deleted the KIT-3321 branch June 21, 2024 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants