-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core-clp): Add LZMA Compressor
implementation and LZMA dependency.
#614
Conversation
WalkthroughThis pull request introduces comprehensive support for LZMA compression in the CLP project. The changes span multiple files across the project, including configuration files, CMake scripts, source code, and installation scripts. The implementation adds LZMA as a new compression option, with modifications to the build system, compression interfaces, constants, and test suites. The changes also update library installation scripts for various operating systems to ensure proper setup and dependency management. Changes
Possibly Related PRs
Suggested Reviewers
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
I guess two comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Outside diff range and nitpick comments (18)
components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1)
23-23
: Consider pinning the package version for reproducibility.To ensure consistent builds across different environments and times, consider pinning the package version.
- liblzma-dev \ + liblzma-dev=5.2.5-2ubuntu1 \components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (2)
Line range hint
1-20
: Consider adding version compatibility commentsThe script structure is good, but it would be helpful to document any version interdependencies between the libraries.
Add a comment block at the top of the file:
#!/usr/bin/env bash +# Library versions and compatibility notes: +# - liblzma 5.4.6: Compatible with libarchive 3.5.1 + # Exit on any error set -e
14-14
: Consider error handling for LZMA installationWhile the script has basic error handling with
set -e
, consider adding specific error handling for the LZMA installation.-"$lib_install_scripts_dir"/liblzma.sh 5.4.6 +if ! "$lib_install_scripts_dir"/liblzma.sh 5.4.6; then + echo "Error: LZMA installation failed. Please check system requirements." + exit 1 +ficomponents/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1)
17-17
: Consider installation order implicationsThe LZMA library is installed after libarchive. If libarchive is built with LZMA support, it might need to be rebuilt after LZMA installation.
Consider either:
- Moving the LZMA installation before libarchive
- Adding a rebuild step for libarchive after LZMA installation if necessary
components/core/tools/scripts/lib_install/liblzma.sh (1)
1-66
: Consider adding logging, dry-run mode, and uninstall capabilityTo improve the script's maintainability and user experience, consider these architectural improvements:
- Add logging functionality with different verbosity levels
- Implement a dry-run mode for testing
- Add uninstall capability
- Consider adding a progress indicator for long-running operations
These improvements would make the script more robust and user-friendly for both development and production use.
Would you like me to provide an implementation for any of these features?
components/core/tests/test-StreamingCompression.cpp (2)
60-64
: Consider enhancing test coverage for LZMA-specific scenarios.While the basic compression/decompression test is good, consider adding LZMA-specific test cases to validate:
- Different compression levels
- Dictionary size variations
- Error handling for corrupted LZMA streams
Would you like me to provide example test cases for these scenarios?
Line range hint
1-113
: Consider refactoring to use parameterized tests.The test structure is duplicated across compression types. Consider using Catch2's test case templates to reduce duplication and make it easier to add new compression methods.
Here's a suggested approach:
TEMPLATE_TEST_CASE("StreamingCompression", "[StreamingCompression]", std::pair<clp::streaming_compression::zstd::Compressor, clp::streaming_compression::zstd::Decompressor>, std::pair<clp::streaming_compression::passthrough::Compressor, clp::streaming_compression::passthrough::Decompressor>, std::pair<clp::streaming_compression::lzma::Compressor, clp::streaming_compression::lzma::Decompressor> ) { auto compressor = std::make_unique<typename TestType::first_type>(); auto decompressor = std::make_unique<typename TestType::second_type>(); // ... rest of the test ... }components/core/CMakeLists.txt (4)
21-23
: Standardize the status message formatThe status message format differs from other compressors. Consider using a consistent format:
- message(STATUS "Using Lempel–Ziv–Markov chain Algorithm compression") + message(STATUS "Using LZMA compression")
241-241
: Use STATUS for LZMA include directory messageThe include directory message should use the STATUS level for consistency with other similar messages:
-message("LZMA Include Dir: ${LIBLZMA_INCLUDE_DIRS}") +message(STATUS "LZMA Include Dir: ${LIBLZMA_INCLUDE_DIRS}")
231-232
: Address the static/shared library support TODOThe TODO comment indicates missing functionality for handling static vs. shared libraries.
Would you like me to help create a CMake module to handle static/shared library selection for LZMA? This would align with the project's existing pattern for other libraries.
573-573
: Consider grouping compression-related libraries togetherFor better organization, consider grouping LZMA with other compression-related libraries:
- ${LIBLZMA_LIBRARIES} - ZStd::ZStd + ZStd::ZStd + ${LIBLZMA_LIBRARIES}components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)
120-120
: Use type aliasLzmaAction
for consistency.In the
compress
method, consider using the type aliasLzmaAction
instead oflzma_action
to maintain consistency with the defined type aliases and improve readability.Apply this diff for consistency:
-auto compress(lzma_action action) -> void; +auto compress(LzmaAction action) -> void;components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (3)
9-9
: Remove unnecessary inclusion of<zlib.h>
.The
<zlib.h>
header included at line 9 may be unnecessary since this decompressor uses LZMA compression. Eliminating unused headers can reduce dependencies and improve compile times.Apply this diff to remove the unnecessary include:
-#include <zlib.h>
61-62
: Complete or remove the incomplete Doxygen comment.There is an incomplete Doxygen comment starting at line 61 without any description. To maintain clear documentation, please complete the comment or remove it if it's not needed.
Apply this diff to complete the comment:
/** + * Reads an exact number of bytes from the decompressor into the buffer. + * @param buf The buffer to read data into. + * @param num_bytes_to_read The exact number of bytes to read. + * @param num_bytes_read The actual number of bytes read. */ void exact_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read);
89-90
: Correct the reference to the appropriate method in the comment.The comment at line 89 incorrectly references
streaming_compression::zstd::Decompressor::try_seek_from_begin
, which is not relevant to the LZMA decompressor. It should refer to the appropriate LZMA method or be updated to accurately reflect the functionality.Apply this diff to correct the comment:
- * @return Same as streaming_compression::zstd::Decompressor::try_seek_from_begin + * @return Same as Decompressor::try_seek_from_begincomponents/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
86-86
: UseLZMA_STREAM_INIT
for proper initialization ofm_compression_stream
Instead of using
memset
to initializem_compression_stream
, it's recommended to use theLZMA_STREAM_INIT
macro provided by the LZMA library. This ensures all fields are correctly initialized, including any internal pointers, and is the preferred method according to the LZMA documentation.Apply this diff to initialize
m_compression_stream
correctly:- memset(m_compression_stream.get(), 0, sizeof(LzmaStream)); + *m_compression_stream = LZMA_STREAM_INIT;components/core/src/clp/streaming_compression/lzma/Decompressor.cpp (2)
113-113
: Correct error messages to reference LZMA functionsThe error messages incorrectly mention
inflate()
, which is associated with zlib. Since we're using LZMA, please update the messages to referencelzma_code()
to accurately reflect the context.Apply this diff to fix the error messages:
- SPDLOG_ERROR("streaming_compression::lzma::Decompressor inflate() ran out of memory"); + SPDLOG_ERROR("streaming_compression::lzma::Decompressor lzma_code() ran out of memory"); ... - SPDLOG_ERROR("inflate() returned an unexpected value - {}.", int(return_value)); + SPDLOG_ERROR("lzma_code() returned an unexpected value - {}.", int(return_value));Also applies to: 117-117
130-130
: Update comment to reflect LZMA instead of ZStdThe comment references 'ZStd', but this is an LZMA decompressor. Please correct the comment to avoid confusion.
Apply this diff to fix the comment:
- // ZStd has no way for us to seek back to the desired position, so just reset the stream + // LZMA has no way for us to seek back to the desired position, so just reset the stream
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (14)
components/core/.clang-format
(1 hunks)components/core/CMakeLists.txt
(4 hunks)components/core/src/clp/streaming_compression/Constants.hpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.hpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Constants.hpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Decompressor.cpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Decompressor.hpp
(1 hunks)components/core/tests/test-StreamingCompression.cpp
(2 hunks)components/core/tools/scripts/lib_install/liblzma.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- components/core/.clang-format
- components/core/src/clp/streaming_compression/lzma/Constants.hpp
🧰 Additional context used
🪛 cppcheck (2.10-2)
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp
[error] 41-41: Rethrowing current exception with 'throw;', it seems there is no current exception to rethrow. If there is no current exception this calls std
(rethrowNoCurrentException)
[error] 45-45: Rethrowing current exception with 'throw;', it seems there is no current exception to rethrow. If there is no current exception this calls std
(rethrowNoCurrentException)
🔇 Additional comments (11)
components/core/src/clp/streaming_compression/Constants.hpp (2)
10-10
: Well-structured enum value addition!
The chosen value (0x20) follows good practices:
- Maintains consistent power-of-2 spacing with existing values
- Leaves room for future compression types between values
- Follows the established naming convention
10-10
: Verify enum value handling across the codebase
Let's ensure the new LZMA enum value is properly handled in all relevant code paths.
✅ Verification successful
Let me gather more information about the CompressorType enum usage and switch statements.
Based on the comprehensive analysis of the codebase, I can now provide a final response:
LZMA enum value is properly integrated and handled
The new LZMA enum value (0x20) is correctly integrated into the codebase with proper handling:
- The CompressorType enum is consistently defined across namespaces (clp::streaming_compression)
- LZMA implementation follows the same pattern as existing compressors (ZSTD, Passthrough)
- No switch statements directly use CompressorType enum, as the type is handled through class inheritance
- Proper initialization of LZMA compressor/decompressor classes using CompressorType::LZMA
- Test cases include LZMA compression alongside other compression types
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find potential switch statements and enum handling code that might need updates
# Search for switch statements handling CompressorType
echo "Checking for switch statements handling CompressorType..."
rg -l "switch.*CompressorType"
# Search for serialization/deserialization code
echo "Checking for serialization/deserialization code..."
rg -l "serialize.*CompressorType|deserialize.*CompressorType"
# Look for direct CompressorType comparisons
echo "Checking for CompressorType comparisons..."
rg -l "CompressorType::(ZSTD|Passthrough)"
Length of output: 989
Script:
#!/bin/bash
# Search for CompressorType enum usage with context
echo "Checking CompressorType enum usage with context..."
rg -A 5 "CompressorType"
# Search for switch statements in compression-related files
echo "Checking switch statements in compression files..."
fd -e cpp -e hpp | grep -i "compress" | xargs rg "switch.*\{" -A 5
# Look for factory or creation methods that might use CompressorType
echo "Checking factory methods..."
rg "create.*Compressor|make.*Compressor" -A 5
Length of output: 26592
components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh (1)
23-23
: LGTM! Verify package availability in Ubuntu Jammy.
The addition of liblzma-dev is correct and properly placed in alphabetical order.
✅ Verification successful
Package addition is properly integrated and verified
The liblzma-dev package addition is well-integrated into the existing LZMA compression workflow:
- Package exists in Ubuntu Jammy repositories
- Consistent with Ubuntu Focal configuration
- Properly supports the LZMA compression option in CMakeLists.txt
- Aligns with existing LZMA build infrastructure (liblzma.sh)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify liblzma-dev package availability and its dependencies in Ubuntu Jammy
# Expected: Package should be available and not have conflicting dependencies
# Check if the package exists in Ubuntu Jammy repositories
curl -s "https://packages.ubuntu.com/jammy/liblzma-dev" | grep -q "Page not found" && echo "Package not found" || echo "Package exists"
# Get package details including version and dependencies
curl -s "https://packages.ubuntu.com/jammy/liblzma-dev" | grep -A 5 "Package:"
Length of output: 300
Script:
#!/bin/bash
# Let's verify the package dependencies and version availability more thoroughly
apt-cache policy liblzma-dev 2>/dev/null || echo "Package info not available in sandbox"
# Check if the package is already used elsewhere in the codebase
rg -l "liblzma-dev" --type sh
# Look for any LZMA related build dependencies or configurations
rg -l "lzma" --type cmake --type make --type sh
Length of output: 787
Script:
#!/bin/bash
# Let's check the CMakeLists.txt to understand LZMA integration
cat components/core/CMakeLists.txt | grep -A 5 -B 5 "lzma"
# Check the content of liblzma.sh to understand existing LZMA setup
cat components/core/tools/scripts/lib_install/liblzma.sh
# Check if the package is also used in Focal
cat components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh | grep "liblzma-dev"
Length of output: 3455
components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh (1)
14-14
: Verify LZMA installation script and dependencies
The addition of LZMA library installation looks good, but let's verify a few important aspects:
✅ Verification successful
Let me gather more information about the liblzma.sh script content and its error handling:
LZMA installation script verified and properly configured
The installation script for LZMA 5.4.6 has been thoroughly verified and shows:
- Proper error handling with
set -e
andset -u
- Secure download from the official tukaani-project repository
- Appropriate privilege management
- CMake configuration with position-independent code
- Parallel build support
- Clean installation process
The script follows the project's established patterns and integrates well with the existing build system. The CMake configuration properly requires and validates LZMA presence.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify LZMA installation script and its dependencies
# Check if liblzma.sh exists and is executable
echo "Checking liblzma.sh existence and permissions..."
fd --type f "liblzma.sh" --exec ls -l {}
# Look for version constraints or dependencies in build files
echo -e "\nChecking for version constraints..."
rg -g "CMakeLists.txt" --multiline "find_package\(.*LZMA.*\)" -A 5
Length of output: 872
Script:
#!/bin/bash
# Check the content of liblzma.sh for error handling and version validation
echo "Checking liblzma.sh content..."
cat ./components/core/tools/scripts/lib_install/liblzma.sh
# Check if there are any version-specific dependencies or constraints in the codebase
echo -e "\nChecking for LZMA version references..."
rg "LZMA.*5\.4" --type cpp --type cmake
Length of output: 1777
components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh (1)
17-17
: Verify LZMA version compatibility and security
Please ensure that LZMA version 5.4.6 meets the project's requirements and has no known vulnerabilities.
✅ Verification successful
Let me gather more information about LZMA usage and dependencies in the project.
LZMA 5.4.6 is compatible with the project requirements
Based on the codebase analysis:
- LZMA is used as an optional general-purpose compressor in the project
- The project uses CMake's
find_package(LibLZMA REQUIRED)
without version constraints - No known vulnerabilities were found for LZMA 5.4.6
- The installation script location appears to be incorrect as
liblzma.sh
was not found at the expected path
The version 5.4.6 is suitable for use, but please ensure the installation script is properly placed at components/core/tools/scripts/lib_install/ubuntu-focal/liblzma.sh
before proceeding.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for security advisories and compatibility issues with LZMA 5.4.6
# Check for CVEs in the National Vulnerability Database
curl -s "https://services.nvd.nist.gov/rest/json/cves/2.0?keywordSearch=liblzma%205.4.6" | jq '.vulnerabilities[].cve.descriptions[].value'
# Check if any other components in the project specify LZMA version requirements
rg -i "lzma.*version|version.*lzma" --type cmake --type=txt
Length of output: 302
Script:
#!/bin/bash
# Check for LZMA dependencies and usage
rg -i "lzma" --type cmake -A 3 -B 3
# Check the liblzma installation script
cat components/core/tools/scripts/lib_install/ubuntu-focal/liblzma.sh
# Check if there are any specific LZMA version requirements in package configs
fd -e pc -e cmake -x cat {} \; | rg -i "lzma"
Length of output: 4152
components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh (2)
23-23
: LGTM! The LZMA development package is correctly added.
The addition of liblzma-dev
is properly positioned in alphabetical order and aligns with the script's existing package installation pattern.
23-23
: Consider documenting Ubuntu version compatibility.
Since this script is specifically for Ubuntu Focal (20.04), we should verify and document LZMA package compatibility for other Ubuntu versions that might be supported by the project.
components/core/tests/test-StreamingCompression.cpp (1)
18-19
: LGTM! Headers are properly organized.
The LZMA compression headers follow the established pattern and are correctly placed with other compression-related includes.
components/core/src/clp/streaming_compression/lzma/Decompressor.hpp (2)
69-69
: Verify the correctness of the method reference in the comment.
The comment at line 69 mentions ReaderInterface::try_read_exact_length
, but it's unclear if this is the correct method reference. Please ensure that this comment accurately describes the return behaviour and references the appropriate method.
146-146
:
Ensure proper initialisation and cleanup of m_decompression_stream
.
The member variable m_decompression_stream
is a raw pointer initialised to nullptr
. To prevent potential memory leaks, ensure that it is properly allocated during initialisation and that all resources are released in the destructor.
Would you like assistance in reviewing the resource management for m_decompression_stream
?
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
20-75
: Initialization of the LZMA encoder is correctly implemented
The init_lzma_encoder
function properly initializes the LZMA encoder with the specified compression level and dictionary size. Error handling for initialization failures is thorough and follows best practices.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Decompressor.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
components/core/CMakeLists.txt (3)
234-235
: Fix inconsistent variable name in status messageThe status message uses
Lzma
instead of the package nameLibLZMA
:- message(STATUS "Found Lzma ${LIBLZMA_VERSION_STRING}") + message(STATUS "Found LibLZMA ${LIBLZMA_VERSION_STRING}")
241-241
: Remove redundant debug messageThe include directory message is redundant since CMake already logs this information at a more appropriate debug level.
-message("Lzma Include Dir: ${LIBLZMA_INCLUDE_DIRS}")
231-232
: Add support for static/shared LZMA library selectionThe TODO comment indicates missing support for selecting between static and shared LZMA libraries. This should be implemented to maintain consistency with other library configurations in the project.
Would you like me to help create a FindLZMA.cmake module that supports static/shared library selection?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
components/core/CMakeLists.txt
(4 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.hpp
(1 hunks)components/core/tools/scripts/lib_install/liblzma.sh
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- components/core/src/clp/streaming_compression/lzma/Compressor.hpp
- components/core/tools/scripts/lib_install/liblzma.sh
🔇 Additional comments (3)
components/core/CMakeLists.txt (1)
481-485
: Missing test coverage for LZMA compression implementation
The LZMA source files have been added to the test suite in CMakeLists.txt, but there are no corresponding test cases in the test-StreamingCompression.cpp file. This needs to be addressed to ensure the reliability of the LZMA compression functionality.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)
173-183
:
Prevent potential infinite loops by handling LZMA_BUF_ERROR
appropriately
In the compress
function, when lzma_code
returns LZMA_BUF_ERROR
, the current implementation continues the loop without taking corrective action. This could lead to an infinite loop, as LZMA_BUF_ERROR
indicates that no progress can be made due to insufficient input or output buffer space.
Consider handling LZMA_BUF_ERROR
explicitly to prevent potential infinite loops.
Apply this diff to handle LZMA_BUF_ERROR
:
switch (rc) {
case LZMA_OK:
break;
- case LZMA_BUF_ERROR:
- break;
+ case LZMA_BUF_ERROR:
+ // No progress can be made; handle or report the error
+ SPDLOG_ERROR("Compression stalled: LZMA_BUF_ERROR encountered.");
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
case LZMA_STREAM_END:
hit_stream_end = true;
break;
207-213
: 🛠️ Refactor suggestion
Handle potential exceptions from m_compressed_stream_file_writer->write
Currently, the code assumes that the write
operation always succeeds. To enhance robustness, consider handling any exceptions or errors that may occur during the write operation to ensure that failures are properly reported and managed.
Apply this diff to handle possible write errors:
auto Compressor::pipe_data() -> void {
+ try {
m_compressed_stream_file_writer->write(
size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
);
+ } catch (const std::exception& e) {
+ SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ }
m_compression_stream.next_out = m_compressed_stream_block_buffer.data();
m_compression_stream.avail_out = cCompressedStreamBlockBufferSize;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
components/core/tests/test-StreamingCompression.cpp (2)
59-61
: Add documentation for intentionally omitted decompressorWhile the LZMA test section follows the established pattern, it's worth documenting why the decompressor is intentionally not initialized here, especially since it differs from other test sections.
Consider adding a comment like:
SECTION("LZMA compression") { + // Note: Decompressor intentionally not initialized as it's handled separately compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>(); }
82-85
: Consider adding test reporting for skipped decompressionWhile the null check correctly handles missing decompressors, it silently skips the decompression test. Consider using Catch2's
INFO
orWARN
to report when decompression is skipped.if (nullptr == decompressor) { + WARN("Skipping decompression test - decompressor not initialized"); boost::filesystem::remove(compressed_file_path); return; }
components/core/CMakeLists.txt (2)
241-241
: Remove redundant debug messageThe include directory message is redundant since CMake already logs this information when the package is found.
-message("Lzma Include Dir: ${LIBLZMA_INCLUDE_DIRS}")
231-232
: Add support for static/shared LZMA library selectionThe TODO comment indicates missing support for selecting between static and shared LZMA libraries. This should be implemented to maintain consistency with other compression libraries in the project.
Would you like me to help create a FindLibLZMA.cmake module that supports static/shared library selection?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
components/core/CMakeLists.txt
(4 hunks)components/core/tests/test-StreamingCompression.cpp
(3 hunks)
🔇 Additional comments (4)
components/core/tests/test-StreamingCompression.cpp (2)
18-18
: LGTM! Header inclusion follows project conventions
The LZMA compressor header is properly included alongside other compression-related headers.
59-61
: Verify LZMA-specific test scenarios
While the test covers basic compression functionality, consider whether LZMA-specific scenarios need testing:
- Different compression levels
- Dictionary size variations
- Error handling for invalid LZMA streams
✅ Verification successful
Let me gather more information about the LZMA implementation and test coverage.
Test coverage appears adequate for LZMA compression
The implementation already includes essential LZMA-specific configurations and error handling:
- Compression level configuration via
init_lzma_encoder
- Dictionary size customization
- Error handling for LZMA stream operations
- CRC64 checksum verification
The basic test case is sufficient as it exercises the core compression path, which internally handles these configurations. The implementation's error handling is covered through the LZMA stream workflow.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any existing LZMA-specific test cases
rg -l "LZMA" --type cpp --glob "test-*.cpp"
# Check for LZMA configuration options in source
ast-grep --pattern 'class $_ {
$$$
// LZMA specific configuration
$$$
}'
Length of output: 176
Script:
#!/bin/bash
# Check the LZMA compressor implementation and available configuration options
rg -A 10 "class.*[Ll]zma.*Compressor" --type cpp
# Look for LZMA-specific test cases in more detail
rg -B 2 -A 10 "SECTION.*LZMA" components/core/tests/test-StreamingCompression.cpp
# Search for any LZMA configuration or parameter settings
rg "lzma.*\(" --type cpp
Length of output: 2117
components/core/CMakeLists.txt (2)
235-235
: Fix incorrect variable name in status message
The status message uses Lzma
instead of the package name LibLZMA
.
481-483
: LGTM! Source files and library linking look good
The LZMA source files are properly organized in the streaming_compression directory, and the library is correctly linked in the target_link_libraries command.
Also applies to: 571-571
components/core/src/clp/streaming_compression/lzma/Compressor.hpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.hpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.hpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)
31-35
: Consider simplifying the comment block.The comment about XZ Embedded could be more concise while maintaining its informative value.
- // Initialize the encoder using a preset. Set the integrity to check - // to CRC64, which is the default in the xz command line tool. If - // the .xz file needs to be decompressed with XZ Embedded, use - // LZMA_CHECK_CRC32 instead. + // Initialize encoder with CRC64 integrity check (use LZMA_CHECK_CRC32 for XZ Embedded compatibility)
171-174
: Document LZMA_BUF_ERROR handling strategy.The code handles LZMA_BUF_ERROR but should document why it's treated the same as LZMA_OK.
case LZMA_OK: case LZMA_BUF_ERROR: + // LZMA_BUF_ERROR indicates temporary buffer condition, safe to continue break;
1-207
: Consider adding performance optimizations for large datasets.The current implementation could benefit from:
- Buffer size tuning based on input characteristics
- Parallel compression for large datasets
- Memory-mapped I/O for large files
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/src/clp/streaming_compression/lzma/Compressor.hpp
🔇 Additional comments (5)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (5)
1-17
: LGTM! Header organization and includes are well-structured.
The includes are logically organized into standard libraries, external dependencies, and project-specific headers.
70-93
: LGTM! Well-structured initialization with proper validation.
The method properly validates the initialization state and compression level before setting up the compression stream and buffers.
95-108
: LGTM! Proper resource cleanup and finalization.
The method ensures proper cleanup of resources and follows RAII principles.
110-131
:
Add error handling for the compression operation.
The write operation should handle potential compression failures.
run_lzma(LZMA_RUN);
+ if (!m_compression_stream_is_flushed) {
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ }
199-206
:
Add exception handling for file write operations.
The write operation could fail and should be wrapped in a try-catch block.
auto Compressor::pipe_data() -> void {
+ try {
m_compressed_stream_file_writer->write(
clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
cCompressedStreamBlockBufferSize - m_compression_stream.avail_out
);
+ } catch (const std::exception& e) {
+ SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ }
m_compression_stream.next_out = m_compressed_stream_block_buffer.data();
m_compression_stream.avail_out = cCompressedStreamBlockBufferSize;
}
components/core/src/clp/streaming_compression/lzma/Compressor.hpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
Sorry for missing this question. Yes I think you need to look into CentOS and MacOS. It looks like you have already added the dependency for centos. Actually I would expect the build of MacOS to fail because LZMA library is not installed, and the updated Unittest should require the lzma dependency. Maybe it means the MacOS already comes with LZMA dependencies? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🔭 Outside diff range comments (3)
components/core/tools/scripts/lib_install/centos-stream-9/install-packages-from-source.sh (1)
Line range hint
13-19
: Add missing LZMA dependency installationThe script is missing the LZMA library installation which is required for the new LZMA compression functionality. This should be consistent with the Ubuntu installation scripts.
Add the following line after the existing package installations:
"${lib_install_scripts_dir}/fmtlib.sh" 8.0.1 "${lib_install_scripts_dir}/spdlog.sh" 1.9.2 "${lib_install_scripts_dir}/mongocxx.sh" 3.10.2 "${lib_install_scripts_dir}/msgpack.sh" 7.0.0 +"${lib_install_scripts_dir}/liblzma.sh" 5.4.6
components/core/src/clp/StringReader.cpp (1)
Line range hint
29-35
: Fix type consistency in the read loopThe loop counter should use
size_t
to match the parameter type:- for (int i = 0; i < num_bytes_to_read; i++) { + for (size_t i = 0; i < num_bytes_to_read; i++) { buf[i] = input_string[i + pos]; }components/clp-package-utils/clp_package_utils/scripts/stop_clp.py (1)
Line range hint
37-52
: Add error handling for Docker commands.The
stop_running_container
function should handle potential Docker command failures gracefully.def stop_running_container(container_name: str, already_exited_containers: List[str], force: bool): if is_container_running(container_name): logger.info(f"Stopping {container_name}...") - cmd = ["docker", "stop", container_name] - subprocess.run(cmd, stdout=subprocess.DEVNULL, check=True) + try: + cmd = ["docker", "stop", container_name] + subprocess.run(cmd, stdout=subprocess.DEVNULL, check=True) + except subprocess.CalledProcessError as e: + logger.error(f"Failed to stop container {container_name}: {e}") + return
🧹 Nitpick comments (59)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (4)
118-119
: Consider conditional instantiation of schema trees.
The addition of two separate shared pointers for auto-generated and user-generated schema trees appears correct. However, if there are usage scenarios where one of these structures might be optional, you might consider instantiating it only upon actual usage to reduce resource overhead.
187-193
: Validate parameter usage in function call.
The function passes multiple parameters to deserialize the KV-pair log event (including both schema trees). If any of these parameters are consistently immutable or read-only, passing them by const reference may optimize performance and clarify their usage.
215-215
: Clarify error response for existing node.
Returning a protocol error for an already existing node is plausible; however, it might be beneficial to provide a more specific error code or a more descriptive message to make troubleshooting easier.
225-225
: Consider handling insertion outcomes.
Currently, the insertion result is ignored. If insertions can fail or become unexpectedly modified, it would be prudent to handle, assert, or log any failures for better diagnostic information.components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (1)
90-96
: Consider documenting the purpose of empty brace parameters.While the code is functionally correct, adding comments to explain what the empty braces
{}
represent would improve readability and make the test's intent clearer.auto test_log_event_result{KeyValuePairLogEvent::create( std::make_shared<SchemaTree>(), std::make_shared<SchemaTree>(), - {}, - {}, + {}, // Empty auto-generated key-value pairs + {}, // Empty user-generated key-value pairs cTestUtcOffset )};components/clp-package-utils/clp_package_utils/scripts/search.py (1)
Line range hint
83-117
: Consider enhancing container operation error handlingThe container operations section could benefit from more robust error handling. Consider:
- Adding error handling for container creation failures
- Providing more detailed feedback when container operations fail
- Implementing cleanup in case of partial failures
Example improvement:
cmd = container_start_cmd + search_cmd - subprocess.run(cmd, check=True) + try: + subprocess.run(cmd, check=True, capture_output=True, text=True) + except subprocess.CalledProcessError as e: + print(f"Container operation failed: {e.stderr}", file=sys.stderr) + return -1 + finally: + # Ensure cleanup happens even if container operation fails + if generated_config_path_on_host.exists(): + generated_config_path_on_host.unlink()components/core/src/clp_s/TimestampDictionaryWriter.hpp (2)
52-54
: Switch to std::string_view for ingest_entry is beneficial.
This helps reduce copies and improve performance. Ensure that upstream code does not pass ephemeral references (like temporary string data) that outlive the function.
91-96
: Refactored write_timestamp_entries with stream usage.
Removing the compressor parameter in favour of a stream simplifies the logic. Confirm that compression is correctly handled elsewhere if needed.Also applies to: 102-102
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (2)
18-18
: Logger is declared at the global scope.
Consider fine-grained logging or prefixing messages to differentiate context if logs become large.
47-55
: Robust exception handling for config loading.
Catching broad exceptions can occasionally mask specific errors. If feasible, consider narrower exception handling.components/core/src/clp/ffi/KeyValuePairLogEvent.hpp (1)
65-70
: Consider extra pointer checks for defensive programming.Even though the
create
method already validates non-null pointers, re-checking in getters (such as get_user_gen_keys_schema_tree) can provide additional safety if these methods are ever called outside the normal lifecycle guarantees.components/core/src/clp_s/ArchiveWriter.hpp (1)
181-203
: Check concurrency if archives are written in parallel.Although no concurrency usage is shown here, if future changes introduce parallel writing to single-file archives, ensure proper synchronisation. No immediate issues are found in the current code.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)
90-111
: Watch for potential infinite loopsWithin encode_lzma(), the loop continues while m_lzma_stream.avail_in() > 0. The LZMA_BUF_ERROR case is handled by throwing an exception, which is appropriate to avoid deadlocks or infinite loops. You might consider logging more context (e.g., offset, identifications) to facilitate debugging.
113-152
: Ensure flush action validityflush_lzma() checks LzmaStream::is_flush_action(flush_action) to confirm a valid flush action and throws if invalid. This is a robust measure. To improve clarity, consider capturing the flush status in a local variable or enumerated type, so the code more clearly indicates what each flush action entails.
154-170
: Reset the output buffer upon flush completionflush_stream_output_block_buffer() writes the compressed data and reattaches the output buffer. This is correct. Consider adding debug logs indicating how many bytes were written whenever you flush; it can be useful for performance monitoring and troubleshooting.
components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py (3)
7-7
: Add docstring for type usageImporting typing hints is beneficial. Consider adding or updating docstrings and type annotations for functions that interact with typed parameters or return typed dictionaries.
16-19
: Clearly separate database config itemsThe newly imported S3Config, StorageEngine, StorageType, and WorkerConfig are introduced. It may be helpful to group them logically or add brief inline comments to clarify each import’s purpose. This helps future maintainers quickly grasp how these classes relate to compression tasks.
22-23
: Revisit cross-module dependencies with cautionReading the config with read_yaml_config_file is a powerful approach, but it can clutter the boundary between modules. Consider verifying that config reading is done in a dedicated layer to keep separation of concerns.
components/core/src/clp_s/ArchiveWriter.cpp (2)
59-76
: Carefully handle offset accumulationWithin the single-file-archive branch, each file’s offset is tracked in a vector. This is a straightforward approach. If future expansions require parallel or partial writes, consider a more structured metadata approach (e.g. a dedicated offset manager).
Line range hint
282-398
: Sort large schema tables logicallyThe logic sorts schemas in descending order of their uncompressed size. This seems optimal for chunking. If streaming or concurrency become concerns, splitting large tables across multiple streams might be needed.
components/clp-py-utils/clp_py_utils/clp_config.py (4)
3-3
: Favour consistent type hintsUsing from typing import Literal, Optional, Union fosters clarity. Confirm all function signatures leverage these type hints consistently, benefiting lint checks and maintainability.
312-339
: Add coverage for missing region or bucket credentialsS3Config mandates region_code and bucket. Should any default region be used if region_code is not specified? The current approach of raising a ValueError is safe, but confirm that is the desired user experience.
411-420
: Set directory logic for FS vs. S3The set_directory method updates either the folder path or the staging directory, depending on the storage type. This is a succinct approach. If additional storage types appear (e.g., Azure, GCP), this method may need extension or a strategy pattern.
544-556
: Storage engine vs. storage type correlationvalidate_archive_output_config() verifies that S3 is only used with CLP_S. This is crucial for consistent data handling. Consider adding a more descriptive error that instructs users how to fix the mismatch.
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp (2)
247-247
: Consider vector sizing constructs.
Using a dynamic size for 'key_duplication_checked_node_id_bitmap' is acceptable, but be mindful of potential performance costs if this is invoked frequently.
333-364
: Validate correctness of marking parent nodes in the bitmap.
The logic in 'get_schema_subtree_bitmap' to iteratively mark parent nodes is sound. However, watch for potential off-by-one or boundary errors when node IDs approach the maximum schema size.components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (2)
221-237
: Ensure raw pointer usage is minimized.
The new function is well-structured, but check that the usage of raw pointers in other test areas remains safe, given previous feedback about adopting references where possible.
Line range hint
356-426
: Extend coverage of mismatched types.
The loops that insert invalid node-ID-value pairs are well designed. Consider adding corner cases (e.g., extremely large or empty array structures) to ensure comprehensive coverage.components/core/src/clp_s/CommandLineArguments.cpp (3)
9-9
: Include usage rationale.
The addition of "type_utils.hpp" is not immediately obvious in this file. Include documentation or comments if it’s used for specialized type traits to clarify its necessity.
197-199
: Separate single-file-archive logic.
The new 'single-file-archive' switch is helpful. Consider grouping this near other arguments that affect output style, such as 'print-archive-stats', for improved discoverability.
208-211
: Validate file-type compatibility.
The check preventing 'kv-ir' usage along with '--structurize-arrays' is sensible. Evaluate whether additional constraints or warnings are needed for future compression modes.components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
Line range hint
631-643
: Mount for compression output.
Appending 'mounts.archives_output_dir' for compression workers ensures direct access to the final archive location. Validate path correctness on systems with different directory structures.components/core/src/clp/ffi/ir_stream/IrErrorCode.hpp (2)
9-12
: Consider enhancing the documentation.While the current documentation provides a basic overview, it would be beneficial to add more detailed information about:
- The context in which these error codes are used
- The relationship with the error handling framework
- Example usage scenarios
13-17
: Add documentation for individual enum values.Each enum value would benefit from documentation explaining:
- The specific conditions that trigger this error
- How to handle or recover from each error state
Consider adding documentation like this:
enum class IrErrorCodeEnum : uint8_t { + /// Indicates a failure in the decoding method implementation DecodingMethodFailure, + /// Signals that the end of the stream has been reached EndOfStream, + /// Indicates the stream data is incomplete or truncated IncompleteStream, };components/core/src/clp/clo/constants.hpp (1)
Line range hint
15-20
: Consider documenting the SearchOutput constants.The new
OrigFileId
constant and existing constants would benefit from documentation explaining their purpose and usage context.Consider adding documentation like this:
namespace SearchOutput { + /// Unique identifier for the original file constexpr char OrigFileId[]{"orig_file_id"}; + /// Path to the original file in the filesystem constexpr char OrigFilePath[]{"orig_file_path"}; // ... (document other constants similarly) }components/core/src/clp/ffi/ir_stream/IrErrorCode.cpp (1)
13-26
: Consider enhancing the default error messageWhile the error handling is well-implemented, the default message could be more informative by including the actual enum value.
Consider this enhancement:
- return "Unknown error code enum."; + return "Unknown error code enum: " + std::to_string(static_cast<int>(error_enum));components/core/src/clp/BoundedReader.cpp (1)
21-42
: Consider optimizing the read operationWhile the implementation is correct, the double condition check for EOF could be consolidated:
- if (m_curr_pos == m_bound) { - num_bytes_read = 0; - return ErrorCode_EndOfFile; - } - if ((m_curr_pos + num_bytes_to_read) > m_bound) { + if (m_curr_pos == m_bound) { + num_bytes_read = 0; + return ErrorCode_EndOfFile; + } num_bytes_to_read = m_bound - m_curr_pos; }components/core/src/clp_s/SingleFileArchiveDefs.hpp (3)
17-26
: Consider structure alignment for performanceThe
ArchiveHeader
structure might benefit from explicit alignment:+#pragma pack(push, 1) struct ArchiveHeader { uint8_t magic_number[4]; uint32_t version; uint64_t uncompressed_size; uint64_t compressed_size; uint64_t reserved_padding[4]; uint32_t metadata_section_size; uint16_t compression_type; uint16_t padding; }; +#pragma pack(pop)
38-43
: Address the TODO comment for future fieldsThe TODO comment indicates missing fields in
ArchiveInfoPacket
. Let's track this for future implementation.Would you like me to create a GitHub issue to track the addition of future fields to
ArchiveInfoPacket
?
45-50
: Document the purpose of abbreviated field namesThe
ArchiveFileInfo
structure uses abbreviated field names (n
,o
). Consider adding more detailed documentation.struct ArchiveFileInfo { + // File name within the archive std::string n; // name + // Offset position in the archive where the file content begins uint64_t o; // offset MSGPACK_DEFINE_MAP(n, o); };components/clp-py-utils/clp_py_utils/s3_utils.py (2)
39-50
: Consider using context manager for S3 operationsThe current implementation could benefit from better resource management and more specific error handling.
- with open(src_file, "rb") as file_data: - try: - my_s3_client.put_object( - Bucket=s3_config.bucket, Body=file_data, Key=s3_config.key_prefix + dest_file_name - ) - except ClientError as e: - error_code = e.response["Error"]["Code"] - error_message = e.response["Error"]["Message"] - return Err(f"ClientError: {error_code} - {error_message}") - except Exception as e: - return Err(f"An unexpected error occurred: {e}") + try: + with open(src_file, "rb") as file_data: + my_s3_client.put_object( + Bucket=s3_config.bucket, + Body=file_data, + Key=s3_config.key_prefix + dest_file_name + ) + except FileNotFoundError: + return Err(f"Failed to open file: {src_file}") + except PermissionError: + return Err(f"Permission denied accessing file: {src_file}") + except ClientError as e: + error_code = e.response["Error"]["Code"] + error_message = e.response["Error"]["Message"] + return Err(f"S3 ClientError: {error_code} - {error_message}") + except Exception as e: + return Err(f"Unexpected error during S3 upload: {str(e)}")
31-37
: Extract S3 client creation to a separate functionConsider moving the S3 client initialization to a dedicated function for better reusability and testing.
+def create_s3_client(s3_config: S3Config, total_max_attempts: int) -> boto3.client: + """ + Create an S3 client with the given configuration. + + :param s3_config: S3 configuration containing credentials and region + :param total_max_attempts: Maximum number of retry attempts + :return: Configured S3 client + """ + config = Config(retries=dict(total_max_attempts=total_max_attempts, mode="adaptive")) + return boto3.client( + "s3", + region_name=s3_config.region_code, + aws_access_key_id=s3_config.access_key_id, + aws_secret_access_key=s3_config.secret_access_key, + config=config, + )components/core/tests/test_log_files/test_no_floats_sorted.jsonl (1)
1-4
: Add newline at end of fileThe file should end with a newline character to follow POSIX standards and ensure proper text file handling.
Consider improving test file readability
While the test cases are comprehensive, consider formatting the JSON objects with proper indentation for better readability and maintenance. This can be achieved using tools like
jq
.LGTM: Comprehensive test coverage
The test cases effectively cover:
- Various primitive types
- Edge cases for integer bounds
- Nested objects and arrays
- Empty and non-empty containers
components/core/src/clp_s/JsonParser.hpp (1)
33-33
: Add documentation for new member variablesThe newly added member variables lack documentation explaining their purpose and valid values.
+ /** + * Type of input file to be parsed. Defaults to JSON. + */ CommandLineArguments::FileType input_file_type{CommandLineArguments::FileType::Json}; std::string timestamp_key; std::string archives_dir; size_t target_encoded_size{}; size_t max_document_size{}; size_t min_table_size{}; int compression_level{}; bool print_archive_stats{}; bool structurize_arrays{}; bool record_log_order{true}; + /** + * When true, creates a single archive file instead of splitting into multiple files. + */ bool single_file_archive{false};Also applies to: 43-43
components/core/src/clp_s/TimestampEntry.hpp (2)
Line range hint
47-54
: Consider adding string_view validationThe constructor accepting string_view should validate the input to ensure it's not empty or contains invalid characters.
- TimestampEntry(std::string_view key_name) + explicit TimestampEntry(std::string_view key_name) : m_encoding(UnkownTimestampEncoding), m_epoch_start_double(cDoubleEpochTimeMax), m_epoch_end_double(cDoubleEpochTimeMin), m_epoch_start(cEpochTimeMax), m_epoch_end(cEpochTimeMin), - m_key_name(key_name) {} + m_key_name(key_name) { + if (key_name.empty()) { + throw OperationFailed(ErrorCode_InvalidArgument, __FILE__, __LINE__); + } + }
70-73
: Update method documentationThe documentation for write_to_stream is outdated and refers to the wrong parameter name.
- * Write the timestamp entry to a buffered stream. - * @param compressor + * Write the timestamp entry to a string stream. + * @param stream The output string stream to write to */ void write_to_stream(std::stringstream& stream) const;components/core/tests/test-BoundedReader.cpp (1)
31-32
: Consider following the coding guideline for condition checksPer coding guidelines, prefer
false == <expression>
over!<expression>
. Consider refactoring condition checks to match this style:- REQUIRE(clp::ErrorCode_Success == rc); + REQUIRE(false == (clp::ErrorCode_Success != rc));Also applies to: 43-45, 62-63, 73-75, 84-86, 95-97
components/core/tests/test-error_handling.cpp (2)
19-44
: Consider adding documentation for error categories.The error code enums and categories are well-structured, but adding documentation would improve maintainability by explaining:
- The purpose of
AlwaysSuccessErrorCodeEnum
vsBinaryErrorCodeEnum
- The significance of the error conditions in
cFailureConditions
andcNoneFailureConditions
95-156
: Consider adding more test cases for edge cases.While the current tests are comprehensive for the happy path, consider adding tests for:
- Error code construction with invalid enum values
- Thread safety of the error category singleton
- Boundary conditions for error code comparisons
Example test cases to add:
TEST_CASE("test_error_code_edge_cases", "[error_handling][ErrorCode]") { // Test invalid enum value BinaryErrorCode const invalid{static_cast<BinaryErrorCodeEnum>(255)}; REQUIRE(invalid.message() == cUnrecognizedErrorCode); // Test thread safety of category singleton auto verify_category = []() { return &BinaryErrorCode::get_category() == &BinaryErrorCode::get_category(); }; std::vector<std::thread> threads; for (int i = 0; i < 10; ++i) { threads.emplace_back([&verify_category]() { REQUIRE(verify_category()); }); } for (auto& thread : threads) { thread.join(); } }components/core/src/clp/streaming_compression/lzma/Compressor.hpp (2)
182-182
: Consider making the buffer size configurable.The hard-coded buffer size of 4KiB might not be optimal for all use cases. Consider making it configurable through the constructor or a static configuration.
193-193
: Add documentation for error codes.The
encode_lzma
method's documentation should specify which error codes can be thrown in theOperationFailed
exception.components/clp-package-utils/clp_package_utils/scripts/stop_clp.py (1)
34-34
: Consider adding log format configuration.The current logging setup could benefit from a consistent format across the application. Consider adding a formatter to match other scripts in the codebase.
components/clp-package-utils/clp_package_utils/scripts/decompress.py (2)
84-87
: Consider enhancing error message with supported storage typesThe error message could be more helpful by listing the supported storage types.
- logger.error(f"File extraction is not supported for archive storage type: {storage_type}.") + logger.error(f"File extraction is not supported for archive storage type: {storage_type}. Only {StorageType.FS} is supported.")
164-169
: Maintain consistency with file extraction error messageThe stream extraction error message should follow the same format as the file extraction error message for consistency.
- logger.error( - f"Stream extraction is not supported for archive storage type: {storage_type}." - ) + logger.error(f"Stream extraction is not supported for archive storage type: {storage_type}. Only {StorageType.FS} is supported.")components/core/src/clp/ffi/SchemaTree.hpp (1)
254-256
: Consider adding noexcept specifierSince the vector's operator== is noexcept, this operator could also be marked as noexcept.
- [[nodiscard]] auto operator==(SchemaTree const& rhs) const -> bool { + [[nodiscard]] auto operator==(SchemaTree const& rhs) const noexcept -> bool { return m_tree_nodes == rhs.m_tree_nodes; }components/core/tests/test-NetworkReader.cpp (1)
207-226
: Excellent improvement to test reliability.The addition of retry logic for handling transient HTTP errors is a great improvement. This prevents the test from failing due to temporary server issues while still maintaining its validation purpose.
Consider extracting the retry logic into a helper function for reuse in other network-dependent tests.
components/core/src/clp_s/TimestampEntry.cpp (1)
60-74
: Consider adding error handling for stream operations.While the stream-based writing implementation is good, it could benefit from error handling for potential stream failures.
Consider adding error checking:
void TimestampEntry::write_to_stream(std::stringstream& stream) const { + if (!stream.good()) { + throw OperationFailed(ErrorCodeFailure, __FILENAME__, __LINE__); + } write_numeric_value<uint64_t>(stream, m_key_name.size()); stream.write(m_key_name.data(), m_key_name.size()); + if (stream.fail()) { + throw OperationFailed(ErrorCodeFailure, __FILENAME__, __LINE__); + } // ... rest of the method }components/core/src/clp_s/clp-s.cpp (1)
117-127
: Implementation placeholder for KeyValueIr compressionThe code correctly handles the unsupported file type case. The comment indicates that implementation is planned for a future PR.
Would you like assistance in implementing the KeyValueIr compression functionality or creating a GitHub issue to track this task?
components/core/CMakeLists.txt (1)
230-244
: Add proper CMake module for LZMAThe TODO comment indicates the need for a proper CMake module. This would improve the library detection and configuration process.
Consider creating a
FindLZMA.cmake
module in./cmake/Modules/
that:
- Properly handles both static and dynamic linking
- Provides standardized variables for version, libraries, and includes
- Includes proper version checking
- Handles platform-specific library names and locations
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (82)
.github/workflows/clp-pr-title-checks.yaml
(1 hunks)components/clp-package-utils/clp_package_utils/__init__.py
(1 hunks)components/clp-package-utils/clp_package_utils/general.py
(5 hunks)components/clp-package-utils/clp_package_utils/scripts/compress.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py
(4 hunks)components/clp-package-utils/clp_package_utils/scripts/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/compress.py
(0 hunks)components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
(2 hunks)components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py
(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/search.py
(0 hunks)components/clp-package-utils/clp_package_utils/scripts/search.py
(2 hunks)components/clp-package-utils/clp_package_utils/scripts/start_clp.py
(10 hunks)components/clp-package-utils/clp_package_utils/scripts/stop_clp.py
(1 hunks)components/clp-py-utils/clp_py_utils/clp_config.py
(11 hunks)components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py
(1 hunks)components/clp-py-utils/clp_py_utils/s3_utils.py
(1 hunks)components/clp-py-utils/pyproject.toml
(2 hunks)components/core/CMakeLists.txt
(10 hunks)components/core/src/clp/BoundedReader.cpp
(1 hunks)components/core/src/clp/BoundedReader.hpp
(1 hunks)components/core/src/clp/StringReader.cpp
(1 hunks)components/core/src/clp/clo/CommandLineArguments.cpp
(1 hunks)components/core/src/clp/clo/CommandLineArguments.hpp
(0 hunks)components/core/src/clp/clo/OutputHandler.cpp
(1 hunks)components/core/src/clp/clo/clo.cpp
(4 hunks)components/core/src/clp/clo/constants.hpp
(1 hunks)components/core/src/clp/clp/CommandLineArguments.cpp
(0 hunks)components/core/src/clp/clp/CommandLineArguments.hpp
(0 hunks)components/core/src/clp/clp/FileDecompressor.hpp
(2 hunks)components/core/src/clp/clp/decompression.cpp
(3 hunks)components/core/src/clp/clp/utils.cpp
(1 hunks)components/core/src/clp/clp/utils.hpp
(1 hunks)components/core/src/clp/error_handling/ErrorCode.hpp
(1 hunks)components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
(9 hunks)components/core/src/clp/ffi/KeyValuePairLogEvent.hpp
(3 hunks)components/core/src/clp/ffi/SchemaTree.hpp
(2 hunks)components/core/src/clp/ffi/ir_stream/Deserializer.hpp
(4 hunks)components/core/src/clp/ffi/ir_stream/IrErrorCode.cpp
(1 hunks)components/core/src/clp/ffi/ir_stream/IrErrorCode.hpp
(1 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
(2 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp
(2 hunks)components/core/src/clp/ir/EncodedTextAst.cpp
(1 hunks)components/core/src/clp/ir/LogEvent.hpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.hpp
(1 hunks)components/core/src/clp_s/ArchiveWriter.cpp
(7 hunks)components/core/src/clp_s/ArchiveWriter.hpp
(6 hunks)components/core/src/clp_s/CommandLineArguments.cpp
(5 hunks)components/core/src/clp_s/CommandLineArguments.hpp
(4 hunks)components/core/src/clp_s/JsonConstructor.cpp
(2 hunks)components/core/src/clp_s/JsonParser.cpp
(1 hunks)components/core/src/clp_s/JsonParser.hpp
(3 hunks)components/core/src/clp_s/ParsedMessage.hpp
(3 hunks)components/core/src/clp_s/SingleFileArchiveDefs.hpp
(1 hunks)components/core/src/clp_s/TimestampDictionaryWriter.cpp
(5 hunks)components/core/src/clp_s/TimestampDictionaryWriter.hpp
(5 hunks)components/core/src/clp_s/TimestampEntry.cpp
(2 hunks)components/core/src/clp_s/TimestampEntry.hpp
(3 hunks)components/core/src/clp_s/TimestampPattern.cpp
(10 hunks)components/core/src/clp_s/TimestampPattern.hpp
(3 hunks)components/core/src/clp_s/Utils.hpp
(2 hunks)components/core/src/clp_s/archive_constants.hpp
(2 hunks)components/core/src/clp_s/clp-s.cpp
(2 hunks)components/core/src/clp_s/search/StringLiteral.hpp
(2 hunks)components/core/src/reducer/reducer_server.cpp
(2 hunks)components/core/tests/test-BoundedReader.cpp
(1 hunks)components/core/tests/test-NetworkReader.cpp
(1 hunks)components/core/tests/test-StreamingCompression.cpp
(4 hunks)components/core/tests/test-clp_s-end_to_end.cpp
(1 hunks)components/core/tests/test-error_handling.cpp
(1 hunks)components/core/tests/test-ffi_IrUnitHandlerInterface.cpp
(2 hunks)components/core/tests/test-ffi_KeyValuePairLogEvent.cpp
(7 hunks)components/core/tests/test-ir_encoding_methods.cpp
(1 hunks)components/core/tests/test_log_files/test_no_floats_sorted.jsonl
(1 hunks)components/core/tools/scripts/lib_install/centos-stream-9/install-packages-from-source.sh
(1 hunks)components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh
(1 hunks)components/core/tools/scripts/lib_install/install-boost.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh
(1 hunks)components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh
(1 hunks)components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py
(12 hunks)
⛔ Files not processed due to max files limit (16)
- components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py
- components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
- components/job-orchestration/job_orchestration/executor/query/utils.py
- components/job-orchestration/job_orchestration/executor/utils.py
- components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
- components/log-viewer-webui/client/src/api/query.js
- components/log-viewer-webui/server/src/DbManager.js
- components/log-viewer-webui/yscope-log-viewer
- components/package-template/src/etc/clp-config.yml
- components/package-template/src/sbin/admin-tools/del-archives.sh
- deps-tasks.yml
- docs/requirements.txt
- docs/src/dev-guide/components-core/centos-stream-9-deps-install.md
- docs/src/dev-guide/components-core/ubuntu-focal-deps-install.md
- docs/src/dev-guide/components-core/ubuntu-jammy-deps-install.md
- docs/src/user-guide/reference-json-search-syntax.md
💤 Files with no reviewable changes (5)
- components/core/src/clp/clp/CommandLineArguments.hpp
- components/core/src/clp/clp/CommandLineArguments.cpp
- components/clp-package-utils/clp_package_utils/scripts/native/compress.py
- components/core/src/clp/clo/CommandLineArguments.hpp
- components/clp-package-utils/clp_package_utils/scripts/native/search.py
✅ Files skipped from review due to trivial changes (5)
- components/clp-package-utils/clp_package_utils/init.py
- components/core/src/clp/ir/EncodedTextAst.cpp
- components/core/src/clp/ir/LogEvent.hpp
- components/core/src/clp/clp/utils.hpp
- components/core/src/clp/clp/utils.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
- components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh
- components/core/tools/scripts/lib_install/ubuntu-focal/install-prebuilt-packages.sh
- components/core/tools/scripts/lib_install/ubuntu-jammy/install-packages-from-source.sh
- components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh
- components/core/tools/scripts/lib_install/ubuntu-focal/install-packages-from-source.sh
🧰 Additional context used
📓 Path-based instructions (48)
components/core/src/clp_s/search/StringLiteral.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/StringReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/IrErrorCode.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/JsonConstructor.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/IrErrorCode.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/Utils.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/JsonParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/clp-s.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/clp/FileDecompressor.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/archive_constants.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/ParsedMessage.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-NetworkReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/clo/OutputHandler.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/CommandLineArguments.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-BoundedReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/JsonParser.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-ir_encoding_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/BoundedReader.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/CommandLineArguments.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/TimestampEntry.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/TimestampEntry.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/clo/constants.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/SchemaTree.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/TimestampPattern.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/BoundedReader.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-error_handling.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/clp/decompression.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/clo/clo.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/SingleFileArchiveDefs.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/TimestampDictionaryWriter.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/reducer/reducer_server.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/clo/CommandLineArguments.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-clp_s-end_to_end.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-StreamingCompression.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/TimestampPattern.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/TimestampDictionaryWriter.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/error_handling/ErrorCode.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/ArchiveWriter.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp_s/ArchiveWriter.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (16)
components/core/src/clp/ffi/ir_stream/IrErrorCode.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-11-10T16:46:53.300Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
components/core/tests/test-NetworkReader.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#593
File: components/core/tests/test-NetworkReader.cpp:216-219
Timestamp: 2024-11-15T03:15:45.919Z
Learning: In the `network_reader_with_valid_http_header_kv_pairs` test case in `components/core/tests/test-NetworkReader.cpp`, additional error handling for JSON parsing failures is not necessary, as the current error message is considered sufficient.
components/core/src/clp/BoundedReader.cpp (1)
Learnt from: gibber9809
PR: y-scope/clp#624
File: components/core/src/clp/BoundedReader.hpp:25-27
Timestamp: 2024-12-10T16:56:33.545Z
Learning: In the CLP project, within constructors like `BoundedReader` in `components/core/src/clp/BoundedReader.hpp`, it's acceptable to call methods like `get_pos()` that may throw exceptions without handling potential errors via `try_get_pos()`, since the constructor itself handles errors by throwing.
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:228-236
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, helper functions like `assert_kv_pair_log_event_creation_failure` return booleans to the caller, and the caller asserts using `REQUIRE` to provide clearer failure information.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:85-103
Timestamp: 2024-11-10T16:46:53.300Z
Learning: The function `assert_kv_pair_log_event_creation_failure` is correctly placed within the anonymous namespace in `test-ffi_KeyValuePairLogEvent.cpp`.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
components/core/src/clp/BoundedReader.hpp (1)
Learnt from: gibber9809
PR: y-scope/clp#624
File: components/core/src/clp/BoundedReader.hpp:25-27
Timestamp: 2024-12-10T16:56:33.545Z
Learning: In the CLP project, within constructors like `BoundedReader` in `components/core/src/clp/BoundedReader.hpp`, it's acceptable to call methods like `get_pos()` that may throw exceptions without handling potential errors via `try_get_pos()`, since the constructor itself handles errors by throwing.
components/core/tests/test-error_handling.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/tests/test-error_handling.cpp:44-91
Timestamp: 2024-11-26T19:16:19.933Z
Learning: In `components/core/tests/test-error_handling.cpp` (C++), when specializing template methods of `ErrorCategory` via type aliases (e.g., `AlwaysSuccessErrorCategory`), it's acceptable to define these specializations in the current namespace, even though the primary template is in the `clp::error_handling` namespace, due to the use of type aliases and template instantiation.
components/core/CMakeLists.txt (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210
Timestamp: 2024-11-10T16:46:53.300Z
Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:119-120
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `KeyValuePairLogEvent::create` requires `std::shared_ptr<SchemaTree>`, so `m_auto_generated_schema_tree` and `m_user_generated_schema_tree` need to be `std::shared_ptr`.
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.hpp:65-75
Timestamp: 2024-12-19T01:40:13.272Z
Learning: "try_get_pos" method in `Compressor.hpp` does not throw exceptions, as clarified by Bill-hbrhbr.
components/core/tests/test-clp_s-end_to_end.cpp (1)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
components/core/src/clp/error_handling/ErrorCode.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#486
File: components/core/src/clp/error_handling/ErrorCode.hpp:131-134
Timestamp: 2024-11-26T19:12:09.102Z
Learning: In the error handling system (`components/core/src/clp/error_handling/ErrorCode.hpp`), the `ErrorCode` class is intentionally designed to wrap the `ErrorCodeEnum`, not to use the enum directly. This wrapper provides a general interface for adding customized error enums.
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.hpp:62-68
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `KeyValuePairLogEvent.hpp`, the methods `get_auto_generated_schema_tree()` and `get_user_generated_schema_tree()` do not need to check for null shared pointers because the factory function ensures they are not null.
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp (3)
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.hpp:62-68
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `KeyValuePairLogEvent.hpp`, the methods `get_auto_generated_schema_tree()` and `get_user_generated_schema_tree()` do not need to check for null shared pointers because the factory function ensures they are not null.
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:109-111
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `KeyValuePairLogEvent.cpp`, for the class `JsonSerializationIterator`, it's acceptable to use raw pointers for member variables like `m_schema_tree_node`, and there's no need to replace them with references or smart pointers in this use case.
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/src/clp/ffi/KeyValuePairLogEvent.cpp:474-480
Timestamp: 2024-12-09T15:25:14.043Z
Learning: In `components/core/src/clp/ffi/KeyValuePairLogEvent.cpp`, node IDs are validated before accessing `child_schema_tree_node` in the function `serialize_node_id_value_pairs_to_json`, ensuring `get_node` does not throw exceptions due to invalid node IDs.
🪛 cppcheck (2.10-2)
components/core/src/clp_s/ArchiveWriter.cpp
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
🔇 Additional comments (116)
components/core/tools/scripts/lib_install/centos-stream-9/install-packages-from-source.sh (1)
13-13
: LGTM on Boost version update!
The update to Boost 1.87.0 aligns with the versions in the Ubuntu installation scripts.
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (1)
134-134
: LGTM! The assertion correctly validates user-generated node ID value pairs.
The change properly reflects the separation between auto-generated and user-generated key-value pairs, maintaining consistency with the updated data model.
components/core/src/clp/clp/decompression.cpp (3)
10-10
: LGTM: Include path update is correct
The relative path update for the IR constants header aligns with the project's file structure.
285-285
: LGTM: Parameter name simplification
The simplified parameter name removes redundancy while maintaining clarity. The [[maybe_unused]] attribute is correctly applied.
313-313
: Verify the output directory path change impact
The change from get_ir_temp_output_dir()
to get_output_dir()
needs verification to ensure it doesn't affect other operations expecting the IR temp directory.
✅ Verification successful
Output directory change is safe to proceed
The verification shows that get_ir_temp_output_dir()
is not used anywhere in the codebase, while get_output_dir()
is consistently used across all output directory handling in both CLP and GLT components. This confirms the change aligns with the codebase's unified approach to output directory management.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other usages of get_ir_temp_output_dir() to ensure they're updated consistently
# Search for any remaining uses of get_ir_temp_output_dir
rg "get_ir_temp_output_dir" --type cpp
# Search for related IR output directory handling
ast-grep --pattern 'command_line_args.get_output_dir()'
Length of output: 2771
components/clp-package-utils/clp_package_utils/scripts/search.py (3)
10-10
: LGTM: Clean import addition
The StorageType import is properly placed and necessary for the new storage type validation functionality.
78-82
: Review logging implementation
While the storage type validation is good, there might be an issue with the logging implementation since the logging setup was removed. Consider these alternatives:
- Re-implement proper logging setup
- Use print statements for error messages
- Raise a specific exception instead of logging and returning -1
Let's verify the logging setup in the codebase:
✅ Verification successful
Let me analyze the logging usage in the codebase based on the search results.
The search results show extensive use of logging throughout the codebase with proper logger configuration. Specifically:
- The logging pattern
logger.error/info/debug
is consistently used across multiple components - The logging setup appears to be properly configured as evidenced by:
- Logging level configuration in
extract_stream_task.py
:set_logging_level(logger, clp_logging_level)
- Proper logging handlers in
reducer.py
:logger.removeHandler(logging_file_handler)
- Consistent error handling patterns with logging across the codebase
- Logging level configuration in
Let me generate the final response:
Logging implementation is properly configured
The logging setup is correctly implemented in the codebase. The error logging at line 80 follows the established pattern used throughout the project and will work as expected.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if logging is configured elsewhere or if other files use similar error handling
rg -g '*.py' 'logger\.(error|warning|info|debug)' -A 3 -B 3
Length of output: 68634
83-83
: LGTM: Verify JobType enum usage
The explicit string conversion of JobType.SEARCH is a good practice. However, let's verify that this pattern is consistent across the codebase.
✅ Verification successful
Let's search for both JobType usage and container name generation to understand the patterns better. The previous ast-grep pattern might have failed due to syntax, so we're using ripgrep to get a broader view of the usage patterns.
Pattern is consistent across container name generation
The explicit string conversion of JobType enums is consistently applied across all container name generation calls in the codebase:
str(JobType.SEARCH)
in search.pystr(JobType.FILE_EXTRACTION)
in decompress.pystr(JobType.IR_EXTRACTION)
in decompress.pystr(JobType.COMPRESSION)
in compress.py
The pattern is uniform and type-safe across all usages, confirming that the change aligns with the codebase's practices.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check JobType usage patterns across the codebase
ast-grep --pattern 'generate_container_name($expr)'
Length of output: 53
Script:
#!/bin/bash
# Search for JobType usage and container name generation patterns
rg "JobType\." -A 2
rg "generate_container_name" -A 2
Length of output: 11598
components/core/src/clp_s/TimestampDictionaryWriter.hpp (5)
4-8
: Ensure necessary includes are minimally present.
These added headers (cstdint, map, sstream, string_view, etc.) look appropriate for their usage within this file.
28-28
: Empty default constructor is acceptable.
A default constructor with no initializations can be beneficial if the data members have default initializers. If additional initializations are needed, consider adding them in future.
31-34
: Method signature for write is straightforward.
This new function is a clear approach for stream-based writing. Consider ensuring you handle any stream errors (e.g., fail states) upstream or downstream.
64-66
: Added overload for double timestamp ingestion is well-defined.
Ensure usage scenarios clearly differentiate between double and int64_t timestamps.
84-88
: clear() method for resetting internal state is helpful.
No issues noted. This is a clean approach for reuse.
components/core/src/clp/BoundedReader.hpp (8)
1-2
: Header guards and includes appear correct.
They are properly set up and there are no cyclical dependencies.
Also applies to: 4-5, 7-8
10-20
: Class documentation is clear and descriptive.
It effectively communicates the purpose and constraints of this reader.
23-33
: Constructor checks for null and enforces bound.
Your usage of “if (nullptr == m_reader)” is consistent with the coding style, and throwing on invalid input is valid. This helps maintain a strong contract.
35-45
: No concerns with doc comments for try_get_pos.
Implementation defers to m_reader, which is correct for a wrapper pattern.
53-53
: Method signature for try_seek_from_begin is consistent.
Be mindful to check whether the underlying reader supports rewinding if seeking a position lower than the current.
64-65
: Method signature for try_read is consistent with the interface.
Limiting read by bound is handled carefully downstream. No additional remarks.
73-80
: try_read_to_delimiter is intentionally unsupported.
Returning ErrorCode_Unsupported is appropriate, preventing leakage beyond the bound.
82-86
: Private member variables well declared and initialised.
No concurrency or performance issues identified.
components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py (5)
1-11
: Imports are properly organised.
No potential circular dependencies or extraneous imports noted.
12-16
: Use of clp_config and SQL_Adapter is consistent with the project structure.
Ensuring these modules are available in the environment is crucial.
21-46
: Argument parsing is direct and user-friendly.
Providing a default config path is a helpful convenience for new users.
56-67
: Verifies existence of archives directory prior to deletion.
Early return on failure is a clean approach.
138-139
: Correct invocation of main
via `if name == "main":
Standard practice to ensure code runs only as a script.
components/core/src/clp_s/TimestampDictionaryWriter.cpp (7)
3-6
: Includes align with header declarations.
No duplication or missing headers.
12-17
: write_timestamp_entries: direct write to stream is valid.
Eliminating compression may require reintroducing it at a different layer if needed.
[Architecture_advice]
21-32
: New write method merges ranges before writing.
This is consistent with the removal of file-based compression. Confirm that the function is idempotent if called multiple times.
47-49
: ingest_entry now uses std::string_view for key and timestamp.
This is good for performance. Ensure that key references remain valid as needed.
93-95
: Overload for double timestamp ingestion.
The logic is straightforward. No concurrency concerns identified.
108-108
: Overload for int64_t timestamp ingestion.
Behaviour closely parallels the double-based ingestion. No issues detected.
156-162
: clear method: reset all members.
This is a neat approach for reusability. No pitfalls identified.
components/core/src/clp/error_handling/ErrorCode.hpp (3)
15-19
: Ensure concept constraints are fully enforced.
The new concept "ErrorCodeEnumType" correctly ensures that only integer-based enumerations are used. This approach helps prevent misusage with incompatible types and is well-structured.
41-43
: Confirm robust enum casting.
When converting error numbers back into enums, confirm that the range of valid error codes is not exceeded. Though this seldom occurs with well-defined enumerations, consider adding a verification step if the enum grows large.
128-129
: Leverage the default error condition.
Using the default error condition simplifies equivalence checks. This is a straightforward and lightweight approach to error comparison. No issues or race conditions identified.
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp (2)
46-49
: Validate schema pointers before usage.
The factory method ensures that schema pointers are non-null before constructing the object. Continue to propagate an error code if a null pointer is encountered to prevent unexpected dereferences.
119-122
: Constructor usage of std::move is correct.
The constructor safely moves parameters into member fields. This avoids unnecessary copying when constructing large containers for auto-generated and user-generated keys.
components/core/tests/test-clp_s-end_to_end.cpp (3)
30-44
: Good cleanup strategy.
The TestOutputCleaner class responsibly removes all intermediate files and directories. This practice ensures that test results remain deterministic and free from leftover data.
62-87
: Appropriate usage of test assertions and filesystem checks.
The logic properly ensures that the directory is created before compression and that it is not empty afterwards. Using "false == std::filesystem::is_empty(...)" matches coding guidelines.
118-143
: Safe usage of std::system in a controlled test environment.
Though shelling out can be risky, the test environment is strictly controlled, and the checks for "jq" and "diff" are valid. This approach does not appear to introduce vulnerabilities, given the context and the cleanup protections.
components/core/src/clp_s/ArchiveWriter.hpp (2)
26-26
: single_file_archive flag is a welcome addition.
It accommodates a more streamlined archiving process, enabling single-file versus multi-file approaches. This design choice is clear and well-documented.
125-130
: Efficient usage of std::string_view.
Switching to std::string_view for timestamp ingestion helps reduce copies, improving performance in high-throughput scenarios. This aligns with modern best practices.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)
17-33
: Validate readiness state before opening
The code checks if m_compressed_stream_file_writer is already set, and throws an exception if so. This is good defensive programming to ensure that the compressor is not opened multiple times without being closed first.
35-49
: Process unhandled input at close
The logic processes unhandled input data by performing a flush before finalizing the LZMA stream. This ensures pending data is not lost. Ensure that performing flush_lzma(LZMA_FULL_FLUSH) before LZMA_FINISH is universally the intended behaviour.
components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py (2)
Line range hint 141-172
: Review logic branching for S3 vs. FS pathways
The function run_clp() now checks worker_config to decide if S3 writes should be enabled. The condition that "S3 not supported by CLP engine except for CLP_S" is enforced with an error message. This ensures clarity and prevents misconfiguration.
239-283
: Handle partial archive uploads gracefully
While the code attempts immediate S3 uploads once an archive is complete, it terminates the process upon encountering an error. Evaluate whether partial completion logic or retries are required for large archives. In certain network conditions, immediate termination might hamper recovery strategies.
components/core/src/clp_s/ArchiveWriter.cpp (3)
18-25
: Leverage std::filesystem for portability
Switching to std::filesystem is correct for better portability. Ensure consistent usage of error codes and standard library calls.
104-118
: Ensure robust checks for empty timestamps
In write_timestamp_dict(), we compress the dictionary content. Confirm that an empty or uninitialised dictionary is valid in all usage scenarios. Potentially add an early return or a log statement if the dictionary is empty.
138-175
: ZstdCompressor usage for metadata
Usage of ZstdCompressor for writing metadata is consistent. Confirm that the metadata content remains feasible for Zstd and does not need an alternate algorithm for partial reading. Since metadata is likely smaller than logs, this approach is likely efficient.
components/clp-py-utils/clp_py_utils/clp_config.py (3)
51-54
: Validate new storage types carefully
The new StorageType enumeration (FS, S3) broadens config options. Ensure that code gracefully handles unrecognised values, even if that means raising a clear error early.
81-82
: Confirm secure password handling
Database credentials are stored in optional fields. Consider verifying that these credentials are masked or avoided in logs to maintain secrecy.
638-640
: WorkerConfig fosters reusability
The WorkerConfig class inherits from BaseModel, enabling reuse for different tasks. Confirm that no extraneous fields appear in WorkerConfig for scenarios where streaming is not used.
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp (2)
272-277
: Ensure comprehensive sibling and ancestor checks.
The code properly checks for key uniqueness among siblings and iterates upward to catch duplication among ancestors. Review any potential performance impacts of nested loops where large schema trees might exist.
Also applies to: 278-303
Line range hint 429-506
: Robustness of DFS-based serialization.
The DFS approach for partial JSON assembly is clear. If numbers of nested or child nodes grow large, confirm that this stack-based approach remains performant and doesn’t risk deep recursion limits.
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp (2)
85-103
: 中央ize repeated error-checking logic.
The helper function 'assert_kv_pair_log_event_creation_failure' centralizes repeating logic and clarifies expected errors. This is a good approach for test consistency.
497-517
: Confirm consistent handling of sibling key conflicts.
Tests for sibling key conflicts appear thorough. If multiple sibling conflicts can occur simultaneously, consider adding a second pair of duplicate siblings for deeper coverage.
components/core/src/clp_s/CommandLineArguments.cpp (1)
151-153
: Defaulting file_type to JSON.
Setting 'file_type' to JSON by default is reasonable. Ensure that calls from other modules expecting KV-IR are updated accordingly, to avoid accidental mismatches.
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (3)
32-32
: Check usage of 'StorageType'.
The newly imported 'StorageType' is used below to handle mounting logic for queries. This is consistent and clearly extends existing logic for filesystem-based storage.
46-46
: Good usage of 'generate_worker_config'.
Centralizing worker configuration creation promotes maintainability. Confirm that all relevant worker settings are included, especially for newly added compression features.
657-659
: Conditional mount for FS-based storage.
Adding 'mounts.archives_output_dir' only when 'StorageType.FS' is used is a sensible approach. Confirm that existing monitor logic triggers an error if the path is missing or offline.
components/clp-py-utils/pyproject.toml (1)
13-13
: Verify compatibility and security of new dependencies.
Please ensure:
- The specific version of boto3 (^1.35.81) is required, as it's quite precise
- The result library (^0.17.0) has no known security vulnerabilities
Let's check for any security advisories:
Also applies to: 23-23
✅ Verification successful
Dependencies are up-to-date and secure
The verification shows:
- boto3: Latest version is 1.35.85 (very close to specified 1.35.81)
- result: Using latest version 0.17.0
- No security vulnerabilities found for either package
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for security advisories and latest versions
# Check PyPI for latest versions and known vulnerabilities
for package in "boto3" "result"; do
echo "Checking $package..."
curl -s "https://pypi.org/pypi/$package/json" | jq -r '.info | "Latest version: " + .version'
# Check for security advisories
gh api graphql -f query="{
securityVulnerabilities(first: 5, ecosystem: PIP, package: \"$package\") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}"
done
Length of output: 1121
components/core/src/clp/clo/constants.hpp (1)
8-8
: Verify the impact of renamed constants.
The constants have been renamed:
- Added
StreamId
(likely replacing removedFileSplitId
) - Changed
IsLastIrChunk
toIsLastChunk
Let's verify all usages have been updated:
Also applies to: 11-11
✅ Verification successful
Constants have been properly renamed and updated throughout the codebase
The search results confirm that:
- No references to old constants (
FileSplitId
orIsLastIrChunk
) exist in the codebase - The new constants
StreamId
andIsLastChunk
are consistently used in both declaration and usage:- Declared in
components/core/src/clp/clo/constants.hpp
- Used appropriately in
components/core/src/clp/clo/clo.cpp
- Similar constants exist in a separate namespace in
archive_constants.hpp
with different prefixes (cStreamId
,cIsLastChunk
)
- Declared in
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining references to old constant names
echo "Checking for old constant references..."
rg -l "FileSplitId|IsLastIrChunk" --type cpp --type hpp
echo "Verifying new constant usage..."
rg -A 2 "StreamId|IsLastChunk" --type cpp --type hpp
Length of output: 330
Script:
#!/bin/bash
# Description: Check for any remaining references to old and new constant names using correct file extensions
echo "Checking for old constant references..."
rg -l "FileSplitId|IsLastIrChunk" --type-add 'cpp:*.{cpp,hpp}' --type cpp
echo "Verifying new constant usage..."
rg -A 2 "StreamId|IsLastChunk" --type-add 'cpp:*.{cpp,hpp}' --type cpp
Length of output: 2373
.github/workflows/clp-pr-title-checks.yaml (2)
5-9
: Well-implemented security enhancements!
The changes follow security best practices by:
- Documenting security implications of
pull_request_target
- Following the principle of least privilege with empty default permissions
Also applies to: 13-13
Line range hint 23-24
: LGTM - Explicit permission settings
Properly restricts permissions to only what's needed for the semantic PR check.
components/core/src/clp/ffi/ir_stream/IrErrorCode.cpp (1)
8-11
: LGTM - Clear category identification
The implementation provides a clear, namespaced identifier for the error category.
components/core/tools/scripts/lib_install/install-boost.sh (1)
37-37
: Document dependency addition and verify compatibility
While adding the 'url' library is straightforward, please:
- Add a comment explaining why this library is required
- Verify that the 'url' library is available in all supported Boost versions
Run this script to check 'url' library availability across versions:
✅ Verification successful
Let me try a different approach to find the Boost version requirements.
Boost URL library compatibility verified
From the CMakeLists.txt, I can see that the project requires Boost 1.81 or higher, which includes the URL library as a stable component. The URL library was officially added to Boost in version 1.81.0 (December 2022), so there are no compatibility concerns.
Suggested documentation to add above line 37:
# URL library required for handling URIs in network communications (added in Boost 1.81.0)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if 'url' library exists in supported Boost versions
# Get the minimum required Boost version from CMakeLists.txt
boost_version=$(fd -t f "CMakeLists.txt" | xargs grep -l "find_package.*Boost" | xargs grep "find_package.*Boost" | grep -oP "(?<=\s)\d+\.\d+\.\d+(?=\s|$)" | head -1)
echo "Checking Boost version $boost_version for 'url' library..."
curl -s "https://www.boost.org/doc/libs/$boost_version/libs/url/doc/html/index.html" -o /dev/null -w "%{http_code}\n"
Length of output: 384
Script:
#!/bin/bash
# Search for Boost version requirements in multiple potential locations
echo "=== Searching in CMakeLists.txt files ==="
rg -i "boost" --type cmake -A 3 -B 3
echo -e "\n=== Searching in build configuration files ==="
fd -e sh "build|install" | xargs rg -i "boost.*version|version.*boost" -A 2 -B 2
echo -e "\n=== Checking install-boost.sh for version info ==="
cat components/core/tools/scripts/lib_install/install-boost.sh
Length of output: 8063
components/core/src/clp/BoundedReader.cpp (1)
8-19
: LGTM: Robust seek implementation with proper bounds checking
The implementation correctly handles boundary conditions and maintains consistent state even during errors.
components/core/src/clp/StringReader.cpp (1)
44-47
: LGTM: Proper bounds checking added
The implementation correctly handles seeking beyond the string bounds.
components/core/src/clp_s/archive_constants.hpp (2)
7-9
: LGTM: Temporary file postfix constant added
The addition of cTmpPostfix
follows the established naming convention and is appropriately placed in the constants namespace.
32-32
: Verify the impact of terminology changes
The renaming from file-based to stream-based terminology (cOrigFileId
→ cStreamId
and cIsLastIrChunk
→ cIsLastChunk
) suggests an architectural shift. Ensure all dependent code has been updated accordingly.
Also applies to: 35-35
components/core/src/clp_s/search/StringLiteral.hpp (1)
72-74
: Verify the implementation of StringUtils::has_unescaped_wildcards
The refactoring improves readability by moving the wildcard detection logic to a utility function. However, we should verify its implementation.
✅ Verification successful
Implementation of StringUtils::has_unescaped_wildcards is correct and well-designed
The implementation in Utils.cpp correctly checks for unescaped wildcards ('*' or '?') while properly handling escape sequences. The function:
- Iterates through the string character by character
- Returns true if it finds an unescaped '*' or '?'
- Properly skips escaped characters by incrementing the index when a backslash is encountered
- Is being used consistently across the codebase (found in Output.cpp and StringLiteral.hpp)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for the implementation of has_unescaped_wildcards
ast-grep --pattern 'bool has_unescaped_wildcards($_) {
$$$
}'
Length of output: 65
Script:
#!/bin/bash
# Let's try searching for the implementation with ripgrep
rg -A 10 "has_unescaped_wildcards" --type cpp
# Also search for the header declaration
rg "has_unescaped_wildcards" --type hpp
Length of output: 3255
components/core/src/clp_s/ParsedMessage.hpp (3)
7-7
: LGTM: Appropriate header inclusion for string_view support
The addition of string_view header supports the new method overloads and follows modern C++ practices.
39-41
: LGTM: Well-implemented string_view overload
The implementation efficiently handles string_view parameters by converting to string only when storing in the map, which optimizes string handling performance.
64-66
: LGTM: Consistent implementation of unordered value handling
The implementation maintains consistency with the regular add_value method and properly handles string_view conversion.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)
60-63
: LGTM: Well-documented schema tree separation
The separation of auto-generated and user-generated schema trees is well-implemented and thoroughly documented. The parameter names are descriptive and the documentation clearly explains their purposes.
Verify schema tree parameter usage
Ensure that all callers of this method have been updated to provide both schema tree parameters.
Also applies to: 77-78
✅ Verification successful
Let me gather more context to see the full method call in the Deserializer.hpp file.
Schema tree parameter usage verified successfully
The caller in Deserializer.hpp correctly provides both schema tree parameters (m_auto_gen_keys_schema_tree
and m_user_gen_keys_schema_tree
) when invoking the deserialize_ir_unit_kv_pair_log_event
method. This is the only caller of this method in the codebase, and it has been properly updated to match the new method signature.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for calls to deserialize_ir_unit_kv_pair_log_event
rg -A 3 "deserialize_ir_unit_kv_pair_log_event\(" --type cpp
Length of output: 1393
Script:
#!/bin/bash
# Get more context around the method call in Deserializer.hpp
rg -B 2 -A 5 "deserialize_ir_unit_kv_pair_log_event\(" components/core/src/clp/ffi/ir_stream/Deserializer.hpp
Length of output: 423
components/clp-package-utils/clp_package_utils/scripts/compress.py (1)
61-61
: LGTM: Type conversion ensures compatibility
The explicit conversion of JobType.COMPRESSION to string aligns with the updated signature of generate_container_name function.
components/core/tests/test-BoundedReader.cpp (1)
12-99
: LGTM: Comprehensive test coverage
The test suite thoroughly covers essential scenarios including boundary conditions, error cases, and normal operations.
components/core/src/clp_s/TimestampPattern.hpp (1)
9-10
: LGTM: Performance optimization using string_view
The change from std::string
to std::string_view
for method parameters is a good optimization as it:
- Avoids unnecessary string copies for read-only operations
- Maintains backward compatibility with both string literals and std::string
- Reduces memory overhead
Also applies to: 88-91, 126-130
components/core/tests/test-StreamingCompression.cpp (4)
Line range hint 7-47
: LGTM! Well-structured includes and constants.
The includes are properly ordered, and constants are appropriately scoped in an anonymous namespace. The use of constexpr
for buffer and chunk sizes is a good practice for compile-time optimization.
Line range hint 49-105
: LGTM! Well-designed helper functions.
The helper functions are well-structured:
compress
: Efficiently handles compression in chunksdecompress_and_compare
: Thoroughly validates the decompressed data against the original
Note: The repeated compression of the same data is intentional for testing purposes.
108-119
: LGTM! Effective test data generation.
The test setup creates a predictable pattern using the alphabet, which is ideal for compression testing as it provides both repetition and variability.
134-141
:
Add decompression test for LZMA.
The LZMA section only tests compression, unlike the ZStd and Passthrough sections which test both compression and decompression. This creates an inconsistency in test coverage.
Add decompression testing similar to other sections:
SECTION("LZMA compression") {
compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>(
clp::streaming_compression::lzma::cDefaultCompressionLevel,
clp::streaming_compression::lzma::cDefaultDictionarySize,
LZMA_CHECK_CRC64
);
compress(std::move(compressor), uncompressed_buffer.data());
+ decompressor = std::make_unique<clp::streaming_compression::lzma::Decompressor>();
+ decompress_and_compare(std::move(decompressor), uncompressed_buffer, decompressed_buffer);
}
Likely invalid or redundant comment.
components/core/src/clp/clp/FileDecompressor.hpp (1)
9-19
: LGTM! Minor path and parameter name updates.
The changes only involve:
- Updating include paths to use relative paths
- Renaming parameter from
is_last_ir_chunk
tois_last_chunk
for better clarity
Also applies to: 42-42
components/core/src/clp/clo/OutputHandler.cpp (1)
103-103
: LGTM! Consistent namespace usage.
The update to use cResultsCacheKeys::SearchOutput::
namespace for BSON document keys improves code organization and maintainability.
components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py (1)
55-55
: Verify the necessity of increasing the VARCHAR length.
The change increases the maximum length of status messages from 255 to 512 characters. While this change is backwards compatible, it could impact storage requirements if there are many rows.
✅ Verification successful
Increasing VARCHAR length to 512 is justified
The codebase analysis reveals that status_msg
is primarily used to store error messages that are concatenated from multiple sources:
- Error messages from S3 client errors (including AWS error messages)
- Multiple task errors concatenated with newlines (
error_message += f"task {task_result.task_id}: {task_result.error_message}\n"
) - Detailed error information including log file paths (
error_msgs.append(f"See logs {stderr_log_path}")
)
These concatenated messages, especially when including multiple task failures and log paths, could easily exceed 255 characters, making the increase to 512 a necessary change to prevent message truncation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if there are any status messages that would benefit from the increased length.
# Search for status message assignments in the codebase
rg -A 2 'status_msg.*='
# Search for any existing truncation handling
rg 'truncate.*status_msg|status_msg.*truncate'
Length of output: 765
Script:
#!/bin/bash
# Search for status message content or error messages that might be stored in status_msg
rg -A 3 'status_msg.*=.*' --type py
# Search for SQL operations involving status_msg to understand its usage
rg -A 2 'status_msg.*FROM|INSERT.*status_msg|UPDATE.*status_msg' --type py
# Look for any error message construction that might be stored in status_msg
rg -A 2 'error_message.*=|error_msg.*=' --type py
Length of output: 6970
components/core/src/clp_s/JsonConstructor.cpp (1)
125-125
: Verify MongoDB queries for renamed fields.
The field names in the BSON document have been changed:
cOrigFileId
→cStreamId
cIsLastIrChunk
→cIsLastChunk
These changes could impact existing MongoDB queries and data.
Also applies to: 137-137
components/core/tests/test-error_handling.cpp (1)
46-93
: LGTM! Well-implemented error category methods.
The implementation is clean and efficient:
- Proper use of
std::any_of
for condition matching - Comprehensive switch statements with default cases
- Consistent error message handling
components/core/src/clp/streaming_compression/lzma/Compressor.hpp (1)
89-180
: Well-designed LZMA stream wrapper.
The LzmaStream
class effectively encapsulates LZMA functionality with proper:
- Resource management
- Error handling
- Move semantics
- Input/output stream management
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
8-8
: LGTM: Clean import addition
The addition of StorageType from clp_py_utils.clp_config is well-placed and necessary for the new storage type validation.
components/core/src/clp/ffi/SchemaTree.hpp (1)
131-131
: LGTM: Appropriate use of defaulted operator==
The defaulted equality operator is appropriate here as it will perform member-wise comparison of all Node members.
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)
Line range hint 1-1
: Unable to review the file.
The file's code changes are not visible in the provided content, making it impossible to perform a meaningful review.
components/core/tests/test-NetworkReader.cpp (2)
199-199
: Good type choice improvement.
Changed from int
to size_t
for better type consistency with container sizes.
227-228
: Good error handling with optional.
Using std::optional
for content provides clear error handling and state management.
components/core/src/clp_s/TimestampEntry.cpp (1)
4-6
: LGTM: Clean header organization.
The addition of the stream header and proper ordering with local headers is correct.
components/core/src/reducer/reducer_server.cpp (2)
208-209
: LGTM: Consistent timer API usage with improved readability
The change maintains consistency with the modern boost::asio timer API while improving code readability through better line formatting.
124-124
: LGTM: Timer API update follows best practices
The change from expires_from_now
to expires_after
aligns with modern boost::asio timer API recommendations.
✅ Verification successful
Timer API update is consistently implemented
The codebase shows consistent usage of the modern expires_after
API throughout, with no instances of the deprecated expires_from_now
. The timer implementation is properly using boost::asio::steady_timer with the recommended API.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent usage of expires_after across the codebase
rg -l "expires_from_now"
Length of output: 24
Script:
#!/bin/bash
# Search for both expires_from_now and expires_after to understand the timer API usage
rg "expires_(from_now|after)" -A 2
Length of output: 670
Script:
#!/bin/bash
# Look for timer-related includes to verify boost::asio usage
rg "boost.*timer" --type cpp
Length of output: 245
Script:
#!/bin/bash
# Search for timer-related class and method definitions
ast-grep --pattern 'class $_ {
$$$
operator()($$$) {
$$$
expires_$_($$$);
$$$
}
}'
Length of output: 102
components/core/src/clp_s/clp-s.cpp (1)
91-99
: LGTM: Configuration options properly initialized
The new configuration options are correctly initialized from command line arguments, enhancing the flexibility of the JSON parser.
components/clp-package-utils/clp_package_utils/general.py (3)
244-254
: LGTM: Improved encapsulation through proper method usage
The change properly utilizes setter and getter methods instead of direct property access, improving encapsulation and maintainability.
273-282
: LGTM: Well-structured worker configuration generation
The new function properly encapsulates worker configuration generation with appropriate deep copying to prevent shared state issues.
499-499
: LGTM: Enhanced archive output validation
The change to use validate_archive_output_config suggests more comprehensive validation, which aligns with the S3 integration objectives.
✅ Verification successful
Method change verified and aligns with S3 integration requirements
The implementation confirms that validate_archive_output_config
performs additional S3-specific validation, ensuring that S3 storage type is only used with the appropriate storage engine (CLP_S). This change is necessary and correctly implemented across the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent validation method usage
rg "validate_archive_output"
Length of output: 359
Script:
#!/bin/bash
# Get implementation of validate_archive_output_config
ast-grep --pattern 'def validate_archive_output_config($$$)'
Length of output: 62
Script:
#!/bin/bash
# Get implementation details using ripgrep with context
rg -A 10 "def validate_archive_output_config" components/clp-py-utils/clp_py_utils/clp_config.py
Length of output: 577
components/core/CMakeLists.txt (1)
537-539
:
Missing LZMA decompressor implementation
The LZMA compression implementation is incomplete. Other compression methods (zstd, passthrough) have both compressor and decompressor implementations.
Required files:
src/clp/streaming_compression/lzma/Decompressor.cpp
src/clp/streaming_compression/lzma/Decompressor.hpp
Would you like me to help create these files?
⛔ Skipped due to learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
components/core/src/clp_s/JsonParser.cpp (1)
40-40
: LGTM: Proper initialization of single file archive option
The initialization of m_archive_options.single_file_archive
is correctly placed in the constructor's initialization list and properly copies the value from the constructor parameter.
components/core/src/clp/clo/CommandLineArguments.cpp (1)
11-11
: LGTM: Added necessary include for reducer types
The addition of the reducer types header is appropriate for the functionality.
components/core/src/clp_s/TimestampPattern.cpp (4)
7-8
: LGTM: Proper include and using declarations for string_view
The addition of string_view-related includes and using directive is appropriate for the string handling improvements.
Also applies to: 17-17
77-77
: LGTM: Efficient string parameter passing using string_view
Changed string parameters to string_view for better performance by avoiding unnecessary string copies.
Also applies to: 95-95, 131-131, 160-160, 312-312, 348-348
1073-1073
: LGTM: Consistent string conversion usage
Changed std::to_string to to_string for consistency with the using directive at the top of the file.
Also applies to: 1080-1080
833-846
: Verify the floating-point timestamp parsing logic
The changes to floating-point timestamp parsing look correct but should be verified with test cases, particularly around the nanosecond precision handling.
✅ Verification successful
Floating-point timestamp parsing is well-tested
The test file shows comprehensive testing of timestamp parsing, including:
- Millisecond precision with both dot (.) and comma (,) separators
- Tests for microsecond (%Remove redundant LICENSE files #6) and nanosecond (%Fix workflows to run on main rather than the dev branch #9) precision timestamps
- Validation of timestamp components before and after the decimal point
- Various timestamp formats and patterns
The changes to the floating-point timestamp parsing are supported by existing test cases that verify both the parsing logic and precision handling.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for test cases that verify floating-point timestamp parsing
rg -A 5 "TEST.*floating.*point.*timestamp" || rg -A 5 "TEST.*nanosecond.*precision"
Length of output: 84
Script:
#!/bin/bash
# Search for test files related to TimestampPattern
fd "Test.*\.cpp" | grep -i "timestamp"
# Then search for test cases involving timestamp parsing
rg -A 10 "TEST.*TimestampPattern"
# Look for any test cases with decimal points or dots in timestamps
rg -A 5 "TEST.*[0-9]+\.[0-9]+" components/core/test
Length of output: 1037
Script:
#!/bin/bash
# Let's examine the test cases in test-TimestampPattern.cpp
cat components/core/tests/test-TimestampPattern.cpp
# Also search for any test cases with nanosecond timestamps
rg -A 5 "\..*[0-9]{9}" components/core/tests/test-TimestampPattern.cpp
Length of output: 23149
components/core/tests/test-ir_encoding_methods.cpp (2)
1249-1249
: Method name change reflects improved specificity.
The change from get_node_id_value_pairs()
to get_user_gen_node_id_value_pairs()
better represents the actual data being retrieved, making the test more precise.
1254-1256
: Enhanced test coverage with explicit data type validation.
The new code properly separates and validates auto-generated and user-generated data:
- Destructures the serialization result into auto-generated and user-generated components
- Verifies that auto-generated data is empty
- Validates that user-generated data matches the expected JSON
components/core/src/clp_s/CommandLineArguments.hpp (4)
39-42
: LGTM! Well-structured enum definition.
The FileType
enum is well-defined with clear values and uses space-efficient uint8_t
as the underlying type.
110-110
: LGTM! Clear and concise getter method.
The method follows the getter naming convention and has a straightforward implementation.
124-124
: LGTM! Well-implemented getter with modern C++ practices.
The method uses [[nodiscard]]
appropriately and follows the codebase's trailing return type convention.
188-188
: LGTM! Well-structured member variables with appropriate defaults.
The member variables follow naming conventions and have sensible default values.
Also applies to: 194-194
components/core/src/clp/clo/clo.cpp (1)
174-174
: LGTM! Consistent and well-documented changes.
The changes improve clarity through better naming and align with the updated functionality:
- Parameter name change from
is_last_ir_chunk
tois_last_chunk
improves readability - Key usage updated from
OrigFileId
toStreamId
reflects the new schema - Output directory method call updated to use the new functionality
Also applies to: 198-198, 210-211, 223-223
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
554-555
: LGTM! Changes align with shared_ptr requirements.
The modifications correctly implement the requirement for std::shared_ptr<SchemaTree>
in KeyValuePairLogEvent::create
, as noted in the learnings. The separation of auto-generated and user-generated schema trees improves code organization and type safety.
Also applies to: 583-585
def _delete_archives( | ||
archives_dir: Path, | ||
database_config: Database, | ||
begin_ts: int, | ||
end_ts: int, | ||
) -> int: | ||
""" | ||
Deletes all archives where `begin_ts <= archive.begin_timestamp` and | ||
`archive.end_timestamp <= end_ts` from both the metadata database and disk. | ||
:param archives_dir: | ||
:param database_config: | ||
:param begin_ts: | ||
:param end_ts: | ||
:return: 0 on success, -1 otherwise. | ||
""" | ||
|
||
archive_ids: List[str] | ||
logger.info("Starting to delete archives from the database.") | ||
try: | ||
sql_adapter = SQL_Adapter(database_config) | ||
clp_db_connection_params = database_config.get_clp_connection_params_and_type(True) | ||
table_prefix = clp_db_connection_params["table_prefix"] | ||
with closing(sql_adapter.create_connection(True)) as db_conn, closing( | ||
db_conn.cursor(dictionary=True) | ||
) as db_cursor: | ||
db_cursor.execute( | ||
f""" | ||
DELETE FROM `{table_prefix}archives` | ||
WHERE begin_timestamp >= %s AND end_timestamp <= %s | ||
RETURNING id | ||
""", | ||
(begin_ts, end_ts), | ||
) | ||
results = db_cursor.fetchall() | ||
|
||
if 0 == len(results): | ||
logger.info("No archives (exclusively) within the specified time range.") | ||
return 0 | ||
|
||
archive_ids = [result["id"] for result in results] | ||
db_cursor.execute( | ||
f""" | ||
DELETE FROM `{table_prefix}files` | ||
WHERE archive_id in ({', '.join(['%s'] * len(archive_ids))}) | ||
""", | ||
archive_ids, | ||
) | ||
db_conn.commit() | ||
except Exception: | ||
logger.exception("Failed to delete archives from the database. Aborting deletion.") | ||
return -1 | ||
|
||
logger.info(f"Finished deleting archives from the database.") | ||
|
||
for archive_id in archive_ids: | ||
archive_path = archives_dir / archive_id | ||
if not archive_path.is_dir(): | ||
logger.warning(f"Archive {archive_id} is not a directory. Skipping deletion.") | ||
continue | ||
|
||
logger.info(f"Deleting archive {archive_id} from disk.") | ||
shutil.rmtree(archive_path) | ||
|
||
logger.info(f"Finished deleting archives from disk.") | ||
|
||
return 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comprehensive _delete_archives
function.
Involves database operations and file system updates. Ensure that partial successes (e.g., database updated but partial disk deletion) are either handled or documented.
Consider a transaction or rollback approach, or at least confirm partial cleanup if an error occurs during disk deletion.
void ArchiveWriter::write_single_file_archive(std::vector<ArchiveFileInfo> const& files) { | ||
std::string single_file_archive_path = (std::filesystem::path(m_archives_dir) / m_id).string(); | ||
FileWriter archive_writer; | ||
archive_writer.open(single_file_archive_path, FileWriter::OpenMode::CreateForWriting); | ||
|
||
write_archive_metadata(archive_writer, files); | ||
size_t metadata_section_size = archive_writer.get_pos() - sizeof(ArchiveHeader); | ||
write_archive_files(archive_writer, files); | ||
m_compressed_size = archive_writer.get_pos(); | ||
write_archive_header(archive_writer, metadata_section_size); | ||
|
||
archive_writer.close(); | ||
std::error_code ec; | ||
if (false == std::filesystem::remove(m_archive_path, ec)) { | ||
throw OperationFailed(ErrorCodeFileExists, __FILENAME__, __LINE__); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary directory cleanup
In write_single_file_archive(), the code removes the temporary directory (m_archive_path). Confirm that it contains only ephemeral data or data that is fully transferred. For large archives, a partial error scenario could result in data loss if not carefully handled.
if src_file.stat().st_size > 5 * 1024 * 1024 * 1024: | ||
return Err(f"{src_file} is larger than the limit (5GiB) for a single PutObject operation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Extract file size limit to a constant
The 5 GiB limit should be defined as a named constant at the module level for better maintainability.
+# Maximum file size for S3 PutObject operation (5 GiB)
+MAX_S3_PUT_OBJECT_SIZE = 5 * 1024 * 1024 * 1024
+
def s3_put(
s3_config: S3Config, src_file: Path, dest_file_name: str, total_max_attempts: int = 3
) -> Result[bool, str]:
# ...
- if src_file.stat().st_size > 5 * 1024 * 1024 * 1024:
+ if src_file.stat().st_size > MAX_S3_PUT_OBJECT_SIZE:
Committable suggestion skipped: line range outside the PR's diff.
# Remove generated files | ||
generated_config_path_on_host.unlink() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for file cleanup
The cleanup operation should handle potential failures when removing generated files.
- generated_config_path_on_host.unlink()
+ try:
+ if generated_config_path_on_host.exists():
+ generated_config_path_on_host.unlink()
+ except Exception as e:
+ logger.warning(f"Failed to clean up generated config file: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Remove generated files | |
generated_config_path_on_host.unlink() | |
# Remove generated files | |
try: | |
if generated_config_path_on_host.exists(): | |
generated_config_path_on_host.unlink() | |
except Exception as e: | |
logger.warning(f"Failed to clean up generated config file: {e}") |
except: | ||
logger.exception("Failed to load config.") | ||
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add specific error handling for different failure scenarios
The bare except clause catches all exceptions without distinguishing between different error types. This could mask specific issues that require different handling.
- except:
+ except (FileNotFoundError, PermissionError) as e:
+ logger.exception(f"Failed to access config file: {e}")
+ return -1
+ except ValueError as e:
+ logger.exception(f"Invalid config file format: {e}")
+ return -1
+ except Exception as e:
+ logger.exception(f"Unexpected error loading config: {e}")
+ return -1
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
except: | |
logger.exception("Failed to load config.") | |
return -1 | |
except (FileNotFoundError, PermissionError) as e: | |
logger.exception(f"Failed to access config file: {e}") | |
return -1 | |
except ValueError as e: | |
logger.exception(f"Invalid config file format: {e}") | |
return -1 | |
except Exception as e: | |
logger.exception(f"Unexpected error loading config: {e}") | |
return -1 |
cmd = container_start_cmd + del_archive_cmd | ||
subprocess.run(cmd, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for subprocess execution
The subprocess execution lacks proper error handling and logging for potential failures.
- subprocess.run(cmd, check=True)
+ try:
+ result = subprocess.run(cmd, check=True, capture_output=True, text=True)
+ except subprocess.CalledProcessError as e:
+ logger.error(f"Archive deletion failed: {e.stderr}")
+ return -1
+ except Exception as e:
+ logger.error(f"Failed to execute archive deletion: {e}")
+ return -1
Committable suggestion skipped: line range outside the PR's diff.
components/core/src/clp_s/Utils.hpp
Outdated
/** | ||
* Writes a numeric value to a stringstream. | ||
* @param stream | ||
* @param value | ||
* @tparam ValueType | ||
*/ | ||
template <typename ValueType> | ||
void write_numeric_value(std::stringstream& stream, ValueType value) { | ||
stream.write(reinterpret_cast<char*>(&value), sizeof(value)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Potential undefined behavior in write_numeric_value
The current implementation has several issues:
- Using reinterpret_cast on potentially unaligned data can cause undefined behavior
- No endianness handling for cross-platform compatibility
- No error handling for stream write failures
Consider this safer implementation:
template <typename ValueType>
-void write_numeric_value(std::stringstream& stream, ValueType value) {
- stream.write(reinterpret_cast<char*>(&value), sizeof(value));
+void write_numeric_value(std::stringstream& stream, ValueType value) {
+ // Use bit_cast for safe type punning
+ std::array<char, sizeof(ValueType)> buffer;
+ std::memcpy(buffer.data(), &value, sizeof(ValueType));
+
+ // Handle endianness if needed
+ if constexpr (sizeof(ValueType) > 1) {
+ // Convert to network byte order for consistent cross-platform behavior
+ // Implementation depends on your endianness requirements
+ }
+
+ if (!stream.write(buffer.data(), sizeof(ValueType))) {
+ throw std::runtime_error("Failed to write numeric value to stream");
+ }
+}
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments are straightforward suggested changes.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.hpp
Outdated
Show resolved
Hide resolved
Co-authored-by: Bingran Hu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)
41-43
: Improve warning message clarity.The warning message is verbose and can be more concise while maintaining its meaning.
- SPDLOG_WARN("Trying to close LZMA compressor with unprocessed input data. Processing and " - "flushing remaining data."); + SPDLOG_WARN("Processing and flushing remaining data before closing LZMA compressor.");
91-93
: Improve error message specificity for LZMA_BUF_ERROR.The error message could be more specific about the actual condition causing the error.
- SPDLOG_ERROR("LZMA compressor input stream is corrupt. No encoding " - "progress can be made."); + SPDLOG_ERROR("LZMA compression failed: Unable to make progress with available input and output buffers.");
179-200
: Refactor error message handling in LzmaStream constructor.Consider using a map or constexpr function to handle error messages more elegantly.
- char const* msg{nullptr}; - switch (rc) { - case LZMA_MEM_ERROR: - msg = "Memory allocation failed"; - break; - case LZMA_OPTIONS_ERROR: - msg = "Specified preset is not supported"; - break; - case LZMA_UNSUPPORTED_CHECK: - msg = "Specified integrity check is not supported"; - break; - case LZMA_PROG_ERROR: - msg = "Input arguments are not sane"; - break; - default: - msg = "Unknown error"; - break; - } + static constexpr auto get_error_message(lzma_ret code) -> char const* { + switch (code) { + case LZMA_MEM_ERROR: + return "Memory allocation failed"; + case LZMA_OPTIONS_ERROR: + return "Specified preset is not supported"; + case LZMA_UNSUPPORTED_CHECK: + return "Specified integrity check is not supported"; + case LZMA_PROG_ERROR: + return "Input arguments are not sane"; + default: + return "Unknown error"; + } + } + auto const msg = get_error_message(rc);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/src/clp/streaming_compression/lzma/Compressor.hpp
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
🔇 Additional comments (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
148-151
: 🛠️ Refactor suggestion
Add error handling for write operations.
The write operation to the file writer should be wrapped in a try-catch block to handle potential I/O errors.
+ try {
m_compressed_stream_file_writer->write(
clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
cCompressedStreamBlockBufferSize - m_lzma_stream.avail_out()
);
+ } catch (const std::exception& e) {
+ SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ }
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the code's behavior looks weird to me:
- In
write()
, we always calldetach_lzma_input
before exiting, which will setm_lzma_stream.avail_in()
to be 0. - In close(), we call
flush_lzma(LZMA_FULL_FLUSH)
if(m_lzma_stream.avail_in() > 0)
. Because of (1), this should never happen. - In flush(), we always call flush_lzma(LZMA_SYNC_FLUSH); Similarly, because of (1),
m_lzma_stream.avail_in() = 0
It looks to me that there's some contradiction.
What I read from 2 is that flush_lzma(LZMA_SYNC_FLUSH) has something to do with avail_in. but in 3, we are not checking the avail_in at all.
I didn't read LZMA functions carefully, but my guess is that:
flush_lzma(LZMA_SYNC_FLUSH)
has something to do with the data internally buffered by lzma_string. so we should call it in flush()
flush_lzma(LZMA_FINISH)
may or may not handle the internally buffered data.
- If it does not handle the internally buffered data, shouldn't
close()
always callflush_lzma(LZMA_SYNC_FLUSH)
before callingflush_lzma(LZMA_FINISH)
? - If it handle the internally buffered data,
close()
won't need to callflush_lzma(LZMA_SYNC_FLUSH)
.
If what my guess is wrong and there are some subtlety in the lzma flush API, then we should document it in flush_lzma()'s docstring,
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/streaming_compression/lzma/Compressor.hpp
Outdated
Show resolved
Hide resolved
For (2) I'll make it a class operation error instead of spdlog error |
I see, yeah I think the change is fair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)
17-33
: Consider adding documentation for the open method.The method handles initialization well, but adding documentation about its purpose and possible exceptions would improve maintainability.
35-47
: Consider handling remaining data in close().The method throws an error if there's unprocessed input data. Consider processing the remaining data before closing to ensure no data is lost.
if (m_lzma_stream.avail_in() > 0) { - throw OperationFailed(ErrorCode_Corrupt, __FILENAME__, __LINE__); + SPDLOG_WARN("Processing remaining data before closing"); + encode_lzma(); }
160-202
: Consider separating error handling into a dedicated method.The error handling logic in the constructor is quite lengthy. Consider extracting it into a separate private method for better maintainability.
+ private: + static auto get_error_message(lzma_ret rc) -> const char* { + switch (rc) { + case LZMA_MEM_ERROR: + return "Memory allocation failed"; + case LZMA_OPTIONS_ERROR: + return "Specified preset is not supported"; + case LZMA_UNSUPPORTED_CHECK: + return "Specified integrity check is not supported"; + case LZMA_PROG_ERROR: + return "Input arguments are not sane"; + default: + return "Unknown error"; + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/src/clp/streaming_compression/lzma/Compressor.hpp
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/src/clp/streaming_compression/lzma/Compressor.hpp
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
🔇 Additional comments (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)
1-16
: LGTM! Well-organized includes and namespace declaration.
The includes are properly organized and all necessary dependencies are present.
142-158
: 🛠️ Refactor suggestion
Handle potential write failures.
The write operation to m_compressed_stream_file_writer should be wrapped in a try-catch block to handle potential I/O errors.
+ try {
m_compressed_stream_file_writer->write(
clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()),
cCompressedStreamBlockBufferSize - m_lzma_stream.avail_out()
);
+ } catch (const std::exception& e) {
+ SPDLOG_ERROR("Failed to write compressed data: {}", e.what());
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ }
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
components/core/tests/test-StreamingCompression.cpp (1)
63-72
: Consider enhancing error handling in the compress functionWhile the implementation is correct, consider adding error handling for the file operations and compression steps.
auto compress(std::unique_ptr<Compressor> compressor, char const* src) -> void { FileWriter file_writer; - file_writer.open(string(cCompressedFilePath), FileWriter::OpenMode::CREATE_FOR_WRITING); + auto const result = file_writer.open(string(cCompressedFilePath), FileWriter::OpenMode::CREATE_FOR_WRITING); + REQUIRE(ErrorCode_Success == result); compressor->open(file_writer); for (auto const chunk_size : cCompressionChunkSizes) { - compressor->write(src, chunk_size); + REQUIRE(ErrorCode_Success == compressor->write(src, chunk_size)); } compressor->close(); file_writer.close(); }components/core/src/clp/streaming_compression/lzma/Compressor.cpp (3)
35-47
: Consider error handling refinement in close().The error code
ErrorCode_Corrupt
on line 41 might not accurately represent the condition of having unprocessed input data. Consider usingErrorCode_Failure
or a more descriptive error code.- throw OperationFailed(ErrorCode_Corrupt, __FILENAME__, __LINE__); + throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
79-100
: Enhance LZMA_BUF_ERROR handling in encode_lzma.The error message could be more specific about the nature of the error, especially since we know both buffers are available in this context.
case LZMA_BUF_ERROR: SPDLOG_ERROR("LZMA compressor input stream is corrupt. No encoding " - "progress can be made."); + "progress can be made with available input and output buffers."); throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
191-194
: Consider more specific error handling for LZMA_PROG_ERROR.The error message could provide more specific information about which input arguments are invalid.
case LZMA_PROG_ERROR: - msg = "Input arguments are not sane"; + msg = "Invalid input: compression level, dictionary size, or check parameter is out of range"; break;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/tests/test-StreamingCompression.cpp
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/tests/test-StreamingCompression.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
🔇 Additional comments (5)
components/core/tests/test-StreamingCompression.cpp (3)
7-8
: LGTM! Good use of string_view for optimization
The addition of string_view and related headers is appropriate, and the organization of includes is clean and logical.
Also applies to: 12-12, 32-33
35-61
: Well-structured test constants and helper functions
The interleaved no-op (0-sized) chunks in cCompressionChunkSizes
effectively test the compressor state integrity between operations. The use of string_view
for constants and clear function signatures enhance code efficiency.
140-143
: Document LZMA section's compression-only scope
The LZMA section differs from other compression tests as it only tests compression without decompression. This should be documented for clarity.
- SECTION("LZMA compression") {
+ SECTION("LZMA compression (compression-only test)") {
+ // Note: This section intentionally tests only compression functionality
+ // without decompression, as per the current implementation scope.
compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>();
compress(std::move(compressor), uncompressed_buffer.data());
}
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)
1-16
: LGTM! Well-organized includes and namespace declaration.
The includes are logically grouped (standard library, external dependencies, and project headers) and all necessary dependencies are present.
142-158
: LGTM! Robust buffer management implementation.
The method properly handles buffer checks, writes, and memory management with appropriate error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/core/tests/test-StreamingCompression.cpp (1)
34-54
: Consider adding documentation for the helper functionsThe constants and function declarations are well-structured, but adding documentation would improve maintainability.
Add function documentation like this:
+/** + * Compresses the source data using the provided compressor with varying chunk sizes + * @param compressor The compression algorithm implementation + * @param src The source data to compress + */ auto compress(std::unique_ptr<Compressor> compressor, char const* src) -> void; +/** + * Decompresses the data and verifies it matches the original content + * @param decompressor The decompression algorithm implementation + * @param uncompressed_buffer The original uncompressed data for comparison + * @param decompressed_buffer Buffer to store decompressed data + */ auto decompress_and_compare( std::unique_ptr<Decompressor> decompressor, Array<char> const& uncompressed_buffer, Array<char>& decompressed_buffer ) -> void;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/core/tests/test-StreamingCompression.cpp
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/tests/test-StreamingCompression.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
🔇 Additional comments (4)
components/core/tests/test-StreamingCompression.cpp (4)
7-8
: LGTM! Header includes and using declarations are well-organized
The additions follow the existing pattern and improve code readability.
Also applies to: 20-20, 31-32
Line range hint 56-104
: LGTM! Helper functions are well-implemented
The functions demonstrate good practices:
- RAII for resource management
- Proper error handling
- Comprehensive validation of decompressed data
133-136
: Improve documentation for LZMA compression test section
The LZMA section differs from other compression tests as it only tests compression. This should be clearly documented.
Apply this change:
- SECTION("LZMA compression") {
+ SECTION("LZMA compression (compression-only test)") {
+ // Note: Decompression testing is intentionally omitted as this section
+ // focuses on verifying the compression functionality
compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>();
compress(std::move(compressor), uncompressed_buffer.data());
}
138-138
: LGTM! Proper cleanup of test artifacts
The cleanup code ensures no test files are left behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
components/core/tests/test-StreamingCompression.cpp (1)
LZMA decompressor implementation is missing
Based on the codebase analysis, while LZMA compression is implemented (
Compressor.hpp
,Compressor.cpp
,Constants.hpp
), there is no corresponding decompressor implementation. This is unlike the ZStd and Passthrough implementations which have both compressor and decompressor components. The test file should not include decompressor header as it doesn't exist yet.
components/core/src/clp/streaming_compression/lzma/
needs a decompressor implementation- Remove LZMA-related test cases from
test-StreamingCompression.cpp
until decompression support is added🔗 Analysis chain
Line range hint
20-24
: Consider adding LZMA decompressor header for consistencyWhile the LZMA compressor header is included, the corresponding decompressor header is missing, unlike the ZStd and Passthrough implementations which include both. This might indicate an incomplete implementation or missing functionality.
Let's verify if the LZMA decompressor exists:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for LZMA decompressor header fd -e hpp -e cpp "Decompressor" | rg "lzma"Length of output: 41
Script:
#!/bin/bash # Let's first check if LZMA decompressor exists in the streaming_compression directory rg -l "lzma.*Decompressor" --type cpp --type hpp # Also check the directory structure to understand the implementation pattern fd . "src/clp/streaming_compression"Length of output: 232
Script:
#!/bin/bash # Let's check the directory structure first to understand the correct paths fd streaming_compression # Then search for any LZMA-related files rg -l "lzma" --type cpp --type c --type hLength of output: 411
🧹 Nitpick comments (4)
components/core/tests/test-StreamingCompression.cpp (2)
58-58
: Consider avoiding string conversion overheadThe
string(cCompressedFilePath)
conversion could be avoided by updatingFileWriter::open
to acceptstring_view
.
133-136
: Improve test section documentationWhile the compression-only testing for LZMA is intentional (as discussed in PR comments), the test section could be more self-documenting.
Consider adding a comment explaining why decompression is not tested:
SECTION("LZMA compression") { + // Note: This section only tests compression as LZMA decompression + // is handled separately. See PR #614 for details. compressor = std::make_unique<clp::streaming_compression::lzma::Compressor>(); compress(std::move(compressor), uncompressed_buffer.data()); }components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)
146-149
: Add error handling for write operations.The write operation to the file writer should be wrapped in a try-catch block to handle potential I/O errors.
+ try { m_compressed_stream_file_writer->write( clp::size_checked_pointer_cast<char>(m_compressed_stream_block_buffer.data()), cCompressedStreamBlockBufferSize - m_lzma_stream.avail_out() ); + } catch (const std::exception& e) { + SPDLOG_ERROR("Failed to write compressed data: {}", e.what()); + throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__); + }
200-201
: Enhance error logging with additional context.Include the compression level, dictionary size, and check type in the error message to aid in debugging.
- SPDLOG_ERROR("Error initializing the encoder: {} (error code {})", msg, static_cast<int>(rc)); + SPDLOG_ERROR("Error initializing the encoder: {} (error code {}). Compression level: {}, Dictionary size: {}, Check: {}", + msg, static_cast<int>(rc), compression_level, dict_size, static_cast<int>(check));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/streaming_compression/lzma/Compressor.cpp
(1 hunks)components/core/tests/test-StreamingCompression.cpp
(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/tests/test-StreamingCompression.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
components/core/tests/test-StreamingCompression.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/tests/test-StreamingCompression.cpp:45-54
Timestamp: 2024-12-04T15:50:37.827Z
Learning: In `components/core/tests/test-StreamingCompression.cpp`, within the `compress` function, compressing the same data repeatedly by passing the same `src` pointer without advancing is intentional to test the compressor with the same data multiple times.
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (1)
Learnt from: Bill-hbrhbr
PR: y-scope/clp#614
File: components/core/src/clp/streaming_compression/lzma/Compressor.cpp:169-170
Timestamp: 2024-12-19T01:39:30.094Z
Learning: In the LzmaStreamOperations context (Compressor.cpp), m_p is guaranteed never to be null.
🔇 Additional comments (3)
components/core/tests/test-StreamingCompression.cpp (1)
34-54
: LGTM! Well-structured constants and function declarations
The code demonstrates good practices:
- Constants properly scoped in anonymous namespace
- Use of constexpr for compile-time evaluation
- Modern C++ auto return type syntax
- Clear function signatures with appropriate parameter types
components/core/src/clp/streaming_compression/lzma/Compressor.cpp (2)
40-42
: 🛠️ Refactor suggestion
Consider processing remaining data before throwing an error.
Instead of throwing an error when there's unprocessed input data during close, consider processing the remaining data to ensure no data is lost.
if (m_lzma_stream.avail_in() > 0) {
- throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ encode_lzma(); // Process remaining data
}
Likely invalid or redundant comment.
126-130
: 🛠️ Refactor suggestion
Enhance error handling for different flush actions.
The current error handling for LZMA_BUF_ERROR doesn't distinguish between different flush actions. For LZMA_FULL_FLUSH and LZMA_FULL_BARRIER, a buffer error might be expected when the output buffer is full.
case LZMA_BUF_ERROR:
- // NOTE: this can happen if we are using LZMA_FULL_FLUSH or LZMA_FULL_BARRIER. These
- // two actions keeps encoding input data alongside flushing buffered encoded data.
- SPDLOG_ERROR("LZMA compressor input stream is corrupt. No encoding "
- "progress can be made.");
- throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ if (flush_action == LZMA_FULL_FLUSH || flush_action == LZMA_FULL_BARRIER) {
+ if (m_lzma_stream.avail_in() > 0 && m_lzma_stream.avail_out() > 0) {
+ SPDLOG_ERROR("LZMA flush failed: Unable to make progress with available buffers");
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
+ }
+ break;
+ }
+ SPDLOG_ERROR("Unexpected LZMA_BUF_ERROR during flush operation");
+ throw OperationFailed(ErrorCode_Failure, __FILENAME__, __LINE__);
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now. Have you guys agreed on a PR title?
Yes. The PR title is changed per David and Kirk's suggestion. |
Description
test-StreamingCompression
for conciseness. #599Validation performed
Updated
test-StreamingCompression.cpp
to include LZMA compressors. Tested that compressing workflow finishes without errors.Summary by CodeRabbit
New Features
Bug Fixes
Chores