Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send search results through the results cache in the CLP package #223

Merged
merged 31 commits into from
Jan 18, 2024

Conversation

wraymo
Copy link
Contributor

@wraymo wraymo commented Jan 10, 2024

References

#212

Description

This PR adds supports for sending query results to the results cachef for clp

Validation performed

  • Built the package using Task in ubuntu-focal
  • Used the package to compress hive-24hrs dataset.
  • Used the package to perform some queries. The queries results were stored in MongoDB and printed out to the console.

@wraymo wraymo marked this pull request as draft January 10, 2024 20:54
@wraymo wraymo force-pushed the send_to_results_cache branch from 2ec65b5 to 7b93016 Compare January 11, 2024 17:24
@wraymo wraymo marked this pull request as ready for review January 11, 2024 20:52
@wraymo wraymo requested a review from kirkrodrigues January 11, 2024 20:53
@wraymo wraymo changed the title WIP: Add supports for sending query results to results cache Add supports for sending query results to results cache Jan 11, 2024
@kirkrodrigues
Copy link
Member

Can you investigate the CentOS build failure?

@wraymo
Copy link
Contributor Author

wraymo commented Jan 11, 2024

Can you investigate the CentOS build failure?

Two issues with CentOS

  • dpkg is not a valid command in CentOS, probably we should use rpm instead
  • This PR didn't rebuild the execution image. Our last build was not successful (see raw logs from the screenshot). It's because checkinstall was not installed in CentOS and we used set -u in mongoc.sh, which resulted in empty install_command_prefix_args and the unbound variable error. mongocxx is dependent on mongoc, so it also failed.
Screenshot 2024-01-11 at 18 18 56

@kirkrodrigues
Copy link
Member

Thanks. Should be addressed in #229.

@wraymo wraymo force-pushed the send_to_results_cache branch from b465d8e to 006168e Compare January 12, 2024 15:33
@wraymo wraymo changed the title Add supports for sending query results to results cache Add supports for sending query results to results cache for clp Jan 12, 2024
@wraymo
Copy link
Contributor Author

wraymo commented Jan 16, 2024

Thanks. Should be addressed in #229.

This PR is ready for review

components/core/src/clp/clo/MongoDBClient.hpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/MongoDBClient.hpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/MongoDBClient.hpp Outdated Show resolved Hide resolved
class MongoDBClient {
public:
// Constructors
MongoDBClient(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring.

components/core/src/clp/clo/MongoDBClient.hpp Outdated Show resolved Hide resolved
components/clp-py-utils/clp_py_utils/clp_config.py Outdated Show resolved Hide resolved
components/core/src/clp/clo/clo.cpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/clo.cpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/clo.cpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/clo.cpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/CommandLineArguments.cpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/ResultsCacheClient.hpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/ResultsCacheClient.hpp Outdated Show resolved Hide resolved
components/core/src/clp/clo/ResultsCacheClient.cpp Outdated Show resolved Hide resolved
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the commit message, how about:

Add support for sending query results to the results cache for clp. (#223)?

@kirkrodrigues
Copy link
Member

Actually, that's probably confusing since the clp binary doesn't do search. How about:

Send search results through the results cache in the CLP package. (#223)?

@wraymo
Copy link
Contributor Author

wraymo commented Jan 18, 2024

Actually, that's probably confusing since the clp binary doesn't do search. How about:

Send search results through the results cache in the CLP package. (#223)?

Good to me

@wraymo wraymo merged commit 120692d into y-scope:main Jan 18, 2024
5 checks passed
@wraymo wraymo changed the title Add supports for sending query results to results cache for clp Send search results through the results cache in the CLP package Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants