Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core-clp): Add BoundedReader to prevent out-of-bound reads in segmented input streams. #624

Merged
merged 15 commits into from
Dec 13, 2024

Conversation

gibber9809
Copy link
Contributor

@gibber9809 gibber9809 commented Dec 5, 2024

Description

This PR adds a BoundedReader class that can help avoid backwards seeks when reading input streams segmented into several logical chunks. The BoundedReader is a ReaderInterface that prevents reading or seeking beyond a certain "bound" byte offset.

This is used in a follow-up PR to ensure that readers for different parts of a single-file-archive being streamed over the network do not accidentally read past the end of their section (this happens frequently with readers that buffer input beyond what the user has requested such as ZstdDecompressor).

For example consider an input stream divided into the following logical chunks:
| zstd stream 1 | header bytes | zstd stream 2|

If a ZstdDecompressor reading zstd stream 1 directly wraps that input stream it will almost certainly end up consuming the header bytes and some parts of zstd stream 2 while populating its internal buffer. In fact this sort of speculative buffering is required if we want to ZstdDecompressor to be performant. As a result, reading the header bytes and decompressing zstd stream 2 requires first seeking backwards in the original input stream.

BoundedReader addresses this problem by wrapping the input stream making it so that the ZstdDecompressor is unable to consume any bytes beyond the end of the logical section it belongs to, meaning that the following header and stream sections can always be read without backwards seeks.

This BoundedReader approach has some advantages over approaches that allow backwards seeking by buffering the input stream:

  1. It uses less memory and requires less data-copy
  2. The implementation is simple and easy to verify
  3. The BoundedReader approach helps prevent a whole class of bugs involving faulty readers reading past the end of their logical section, and can help catch issues with corrupt archives

Validation performed

  • Added tests for seeking and reading edge cases

Summary by CodeRabbit

  • New Features

    • Introduced the BoundedReader class to manage reading limits within input streams.
    • Added unit tests for BoundedReader functionalities, ensuring robust error handling and boundary checks.
  • Bug Fixes

    • Enhanced error handling in the StringReader class to prevent out-of-bounds seeking.

@gibber9809 gibber9809 requested a review from haiqi96 December 5, 2024 19:39
Copy link
Contributor

coderabbitai bot commented Dec 5, 2024

Walkthrough

The pull request introduces changes to the CLP project by adding a new class, BoundedReader, along with its corresponding header and test files. The BoundedReader class implements methods for reading data with boundary checks and error handling. Additionally, modifications are made to the StringReader class to enhance error handling in the try_seek_from_begin method. The CMakeLists.txt file is updated to include the new source and test files, while existing configurations and functionalities remain unchanged.

Changes

File Path Change Summary
components/core/CMakeLists.txt Added new source file BoundedReader.cpp, header BoundedReader.hpp, and test file test-BoundedReader.cpp.
components/core/src/clp/StringReader.cpp Updated try_seek_from_begin method to include a condition for out-of-bounds position handling.
components/core/src/clp/BoundedReader.cpp Introduced BoundedReader class with methods try_seek_from_begin and try_read, implementing boundary checks.
components/core/src/clp/BoundedReader.hpp Added BoundedReader class definition, constructor, and method overrides for ReaderInterface.
components/core/tests/test-BoundedReader.cpp Created unit tests for BoundedReader using Catch2 framework, covering various functionalities.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dcb7ff and d261ca2.

📒 Files selected for processing (1)
  • components/core/tests/test-BoundedReader.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/tests/test-BoundedReader.cpp

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
components/core/src/clp/CheckpointReader.cpp (1)

10-12: Clarify condition by using equality check

Since m_cur_pos is set to be at most m_checkpoint in line 5, the condition m_cur_pos >= m_checkpoint will only be true when m_cur_pos == m_checkpoint. For clarity, consider changing the condition to:

if (m_cur_pos == m_checkpoint) {
    return ErrorCode_EndOfFile;
}
components/core/tests/test-CheckpointReader.cpp (1)

9-94: Add tests for null m_reader scenarios

Currently, there are no tests covering the case where CheckpointReader is constructed with a nullptr for m_reader. Adding such tests would enhance coverage and ensure the class handles this scenario gracefully.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ec0821d and 34ad765.

📒 Files selected for processing (5)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/CheckpointReader.cpp (1 hunks)
  • components/core/src/clp/CheckpointReader.hpp (1 hunks)
  • components/core/src/clp/StringReader.cpp (1 hunks)
  • components/core/tests/test-CheckpointReader.cpp (1 hunks)
🔇 Additional comments (4)
components/core/src/clp/CheckpointReader.hpp (1)

16-74: Overall class implementation is correct

The CheckpointReader class correctly implements the necessary methods from ReaderInterface and enforces the checkpoint limit as intended.

components/core/src/clp/StringReader.cpp (1)

44-47: Enhanced error handling for seeking beyond the end

The added condition in try_seek_from_begin properly handles attempts to seek beyond the end of the input string. It sets the position to the end of the string and returns ErrorCode_EndOfFile, preventing undefined behaviour.

components/core/tests/test-CheckpointReader.cpp (1)

9-94: Unit tests comprehensively cover CheckpointReader functionality

The test cases thoroughly validate the behaviour of CheckpointReader, including reading and seeking operations relative to the checkpoint and the end of the underlying stream.

components/core/CMakeLists.txt (1)

356-357: New files correctly added to the build configuration

The source files CheckpointReader.cpp, CheckpointReader.hpp, and the test file test-CheckpointReader.cpp have been appropriately included in the CMakeLists.txt, ensuring they are part of the build and test processes.

Also applies to: 554-554

components/core/src/clp/CheckpointReader.hpp Outdated Show resolved Hide resolved
components/core/src/clp/CheckpointReader.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with Devin offline and there is a concrete example that explains the usecase of this class. Devin, please add it to the PR when you get a chance.

In general the PR makes sense to me. While there might be a more elegant design to achieve the same target, I feel we can go with the design in this PR given
1.the class design is simple and straight forward
2. We have a rather tight deadline , including other upcoming changes.

Left a few comments, and also we can rename the class to be BoundedReader. Unless @kirkrodrigues has other naming suggestions

Note this class has different purpose from BufferedFileReader. The BufferedFileReader is designed for user who knows a specific pos they will need to seek back, but this class is designed for user who knows a specific pos that they don't want to seek beyond.

components/core/src/clp/CheckpointReader.hpp Outdated Show resolved Hide resolved
@@ -41,6 +41,10 @@ ErrorCode StringReader::try_read(char* buf, size_t num_bytes_to_read, size_t& nu
}

ErrorCode StringReader::try_seek_from_begin(size_t pos) {
if (pos > input_string.size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for supporting a new test case?

What you intend to do makes sense to me, but It's bit annoying that the behavior of standard seek interface allows seeking beyond the file ending position, so I feel we need some justification when we decide to change the behavior.

@kirkrodrigues any comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would classify this as a bug I'm fixing in the StringReader class so that I can use it for tests. Every other reader we have will EOF if you seek past the end from what I've seen.

Actually, the way the rest of this class is written if you first seek past the end of the input buffer it will happily let you read data beyond the end of the buffer. I.e. it is very explicitly a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, The FileReader internally calls fseeko, which I believe would allow seeking beyond the end of file.

BufferedFileReader and BufferReader would return ErrorCode_Truncated, and won't update the pos at all if the pos is greater than the maximum length,

From the consistency point of view, maybe we should let it return ErrorCode_Truncated. But also wonder if the rest of your code is dependent on the current behavior.

Copy link
Member

@kirkrodrigues kirkrodrigues Dec 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The implementations of ReaderInterface have inconsistent behaviour.
    • Some (e.g., FileReader) will rely on the lower implementation to return an error if the seek fails.
    • Some (e.g. BufferReader) will return an error if the seek is past the end, but they won't modify m_pos.
    • Some (e.g., NetworkReader) will return an error if the seek is past the end, and they will modify m_pos.
    • Each of the above returns different error codes.

I think the practical implementation is for try_seek_from_begin to:

  • try to seek until pos
  • if that's past the end of the medium (file/buffer/etc.), m_pos should be updated to just past the last byte.
  • the method should return ErrorCode_EndOfFile.

The reason to update m_pos even though we get to pos is because for some implementations like FileReader, we can't easily check what the last byte is until we seek, and if we seek up to the end of the file, we may not be able to seek backwards to the original m_pos.

If y'all agree, we should open a GH issue to refactor the existing implementations. And for this implementation, we implement the proposal above (basically what Devin's implemented---although on error, we're still updating m_cur_pos, even tough the error may not be EOF?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the standard behavior sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

Also went and updated BoundedReader to only update m_curr_pos on error if that error is EOF.

components/core/tests/test-CheckpointReader.cpp Outdated Show resolved Hide resolved
components/core/tests/test-CheckpointReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CheckpointReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CheckpointReader.cpp Outdated Show resolved Hide resolved
components/core/src/clp/CheckpointReader.cpp Outdated Show resolved Hide resolved
@gibber9809 gibber9809 requested a review from haiqi96 December 6, 2024 19:03
@kirkrodrigues
Copy link
Member

I'm fine with BoundedReader.

@gibber9809 gibber9809 changed the title feat(clp): Add CheckpointReader class to help avoid backwards seeks when reading input streams segmented into logical chunks. feat(clp): Add BoundedReader class to help avoid backwards seeks when reading input streams segmented into logical chunks. Dec 7, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
components/core/src/clp/BoundedReader.hpp (2)

22-28: Document Exceptions Thrown in Constructor

In the constructor, exceptions are thrown if m_reader is nullptr or if m_curr_pos exceeds m_bound. It's advisable to document these exceptions clearly in the class interface to inform users of potential exceptions during object instantiation.


38-38: Modify try_get_pos to Reflect Bounded Position

Currently, try_get_pos delegates directly to m_reader->try_get_pos(pos). Consider modifying it to return m_curr_pos instead, ensuring that users receive the position within the bounded context, adhering to the boundary constraints enforced by BoundedReader.

Apply this diff to adjust the method:

-auto try_get_pos(size_t& pos) -> ErrorCode override { return m_reader->try_get_pos(pos); }
+auto try_get_pos(size_t& pos) -> ErrorCode override {
+    pos = m_curr_pos;
+    return ErrorCode_Success;
+}
components/core/tests/test-BoundedReader.cpp (2)

22-33: Clarify Test Section Name for Accuracy

The test section named "BoundedReader does not allow reads beyond end of underlying stream." may be misleading. The test actually verifies that reads are limited to the available data without causing errors when attempting to read beyond the stream's end. Consider renaming the section for clarity.

Apply this change:

-SECTION("BoundedReader does not allow reads beyond end of underlying stream.") {
+SECTION("BoundedReader limits reads to available data when reading beyond stream end.") {

76-86: Remove Unused Variables in Test Section

In the test section "BoundedReader does not allow seeks beyond checkpoint.", the variables buf and num_bytes_read are declared but not used. Removing these unused variables will clean up the test code.

Apply this diff:

-    char buf[cTestStringLen];
-    size_t num_bytes_read{};
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2ab88fd and 8977dc4.

📒 Files selected for processing (4)
  • components/core/CMakeLists.txt (2 hunks)
  • components/core/src/clp/BoundedReader.cpp (1 hunks)
  • components/core/src/clp/BoundedReader.hpp (1 hunks)
  • components/core/tests/test-BoundedReader.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (3)
components/core/tests/test-BoundedReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/BoundedReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/BoundedReader.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (2)
components/core/src/clp/BoundedReader.cpp (2)

4-15: Verify Error Handling in try_seek_from_begin Method

The method try_seek_from_begin adjusts next_pos based on m_bound and handles errors from the underlying reader. However, when ErrorCode_EndOfFile is returned from m_reader->try_seek_from_begin, m_curr_pos is updated to next_pos. Please verify that this behaviour correctly reflects the end-of-file condition and does not lead to inconsistencies in m_curr_pos.


17-38: Ensure Consistent Handling of Partial Reads in try_read

The try_read method correctly limits num_bytes_to_read to prevent reading beyond m_bound. After the read operation, it handles end-of-file scenarios, especially when ErrorCode_EndOfFile is returned with num_bytes_read equal to zero. Please confirm that this logic aligns with the expected behaviour of the underlying reader, particularly in cases of partial reads.

@haiqi96
Copy link
Contributor

haiqi96 commented Dec 7, 2024

Left another two comments for nit. otherwise the code looks good to me

@gibber9809 gibber9809 requested a review from haiqi96 December 8, 2024 16:05
haiqi96
haiqi96 previously approved these changes Dec 9, 2024
Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @kirkrodrigues do you want to do another round of review?

@haiqi96
Copy link
Contributor

haiqi96 commented Dec 9, 2024

@gibber9809 Can you please also open an issue to track the proposed change for commonizing seek interface?

@gibber9809
Copy link
Contributor Author

@gibber9809 Can you please also open an issue to track the proposed change for commonizing seek interface?

Added issue #628.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the header + implementation files, didn't go through unit tests yet

components/core/src/clp/BoundedReader.hpp Show resolved Hide resolved
auto
try_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read) -> ErrorCode override;

auto try_read_to_delimiter(char delim, bool keep_delimiter, bool append, std::string& str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a doc string to explain why we override the default implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason is really just that BoundedReader can't delegate to a potentially more efficient implementation in the underlying reader (since it won't respect the bounds), and most code really shouldn't use this interface anyway since its a performance trap.

Can add a docstring saying as much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's add a doc string

auto
try_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read) -> ErrorCode override;

auto try_read_to_delimiter(char delim, bool keep_delimiter, bool append, std::string& str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since parameters are not used, shall we add [[maybe_unused]] to silence clang-tidy warnings?

* @return ErrorCode_Success on success
* @return ErrorCode_errno on failure
*/
auto try_get_pos(size_t& pos) -> ErrorCode override { return m_reader->try_get_pos(pos); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add [[nodiscard]] for any non-void returns

components/core/src/clp/BoundedReader.hpp Show resolved Hide resolved
components/core/src/clp/BoundedReader.hpp Outdated Show resolved Hide resolved
components/core/src/clp/BoundedReader.hpp Outdated Show resolved Hide resolved
components/core/src/clp/BoundedReader.hpp Outdated Show resolved Hide resolved
components/core/src/clp/BoundedReader.cpp Show resolved Hide resolved
components/core/src/clp/BoundedReader.cpp Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
components/core/src/clp/BoundedReader.hpp (3)

12-13: Correct grammatical error in the class description.

In the comment explaining the purpose of BoundedReader, there's a missing preposition.

Please modify the comment as follows to improve readability:

- * want to prevent a reader for an earlier segment consuming any bytes from a later segment.
+ * want to prevent a reader for an earlier segment from consuming any bytes from a later segment.

58-59: Consistent formatting for function declaration.

For better readability, place the return type and function name on the same line.

Modify the declaration as follows:

-auto
-try_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read) -> ErrorCode override;
+auto try_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read) -> ErrorCode override;

67-69: Consider using smart pointers for member variables to manage ownership and lifetime.

Using a raw pointer for m_reader does not enforce lifetime management, potentially leading to dangling references if the underlying reader is destroyed before BoundedReader.

Consider using std::shared_ptr or std::unique_ptr to manage the ownership:

-    ReaderInterface* m_reader{nullptr};
+    std::shared_ptr<ReaderInterface> m_reader;

This change would ensure that the underlying reader remains valid for the lifetime of BoundedReader.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6c3a537 and 6321c0c.

📒 Files selected for processing (1)
  • components/core/src/clp/BoundedReader.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/BoundedReader.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

🔇 Additional comments (5)
components/core/src/clp/BoundedReader.hpp (5)

4-4: Include missing headers for size_t and ErrorCode.

The file is missing includes for cstddef (for size_t) and the header where ErrorCode is defined. This could lead to compilation issues.


26-27: Verify the boundary condition in the constructor.

Currently, the check uses m_curr_pos > m_bound. Should the condition be m_curr_pos >= m_bound to prevent the position from being equal to the bound?

Please confirm whether the equality case should be considered an error, ensuring that the reader does not start at the exact boundary position.


38-38: Add [[nodiscard]] attribute to non-void return functions.

To prevent unintended ignoring of return codes, consider adding [[nodiscard]] to the try_get_pos method.


47-47: Add [[nodiscard]] attribute to try_seek_from_begin.

Since the method returns an ErrorCode, adding [[nodiscard]] encourages checking the return value.


61-64: Add [[maybe_unused]] to unused parameters to avoid warnings.

The parameters in try_read_to_delimiter are unused, which may trigger compiler warnings.

Consider updating the method signature:

-auto try_read_to_delimiter(char delim, bool keep_delimiter, bool append, std::string& str)
+auto try_read_to_delimiter([[maybe_unused]] char delim, [[maybe_unused]] bool keep_delimiter, [[maybe_unused]] bool append, [[maybe_unused]] std::string& str)

Alternatively, you can omit parameter names if they are unused:

-auto try_read_to_delimiter(char delim, bool keep_delimiter, bool append, std::string& str)
+auto try_read_to_delimiter(char, bool, bool, std::string&) -> ErrorCode override {

components/core/src/clp/BoundedReader.hpp Show resolved Hide resolved
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me. Sorry that there are still a few suggestions for fixing clang-tidy warnings in the test file. Your IDE/clang-tidy run might also complain about REQUIRE macro, but we can ignore that for now since we're planning to fix this by upgrading catch2 to a higher version.

#include "../src/clp/StringReader.hpp"

TEST_CASE("Test Bounded Reader", "[BoundedReader]") {
constexpr char cTestString[]{"0123456789"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use constexpr std::string_view for const strings

Copy link
Contributor Author

@gibber9809 gibber9809 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using constexpr char[] because StringReader takes std::string const& and I don't want to manually initialize an std::string every time I open a StringReader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know, I've read the coding guidelines, and I understand that both ways of doing it are functionally the same. What I'm saying is that I don't like the explicit std::string{} initialization that you have to do when passing string_view in this specific case since it wastes horizontal space and reads worse.

I'll make the change to avoid wasting time, but I think it makes the code less readable.

components/core/tests/test-BoundedReader.cpp Show resolved Hide resolved
clp::StringReader string_reader;
string_reader.open(cTestString);
clp::BoundedReader bounded_reader{&string_reader, cTestStringLen + 1};
char buf[cTestStringLen + 1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using std::array to:

  • Silence clang-tidy warnings
  • Enforce initialization on the allocated array memory

@gibber9809
Copy link
Contributor Author

The implementation looks good to me. Sorry that there are still a few suggestions for fixing clang-tidy warnings in the test file. Your IDE/clang-tidy run might also complain about REQUIRE macro, but we can ignore that for now since we're planning to fix this by upgrading catch2 to a higher version.

Interesting. Yeah I didn't get any clang-tidy warnings (besides some incorrect ones) running from the command line likely because it was getting confused about the catch2 macro expansions. I'm guessing clion quietly does a lot of extra configuration to deal with this sort of thing.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(core-clp): Add `BoundedReader` to prevent out-of-bound reads in segmented input streams.

@gibber9809 gibber9809 changed the title feat(clp): Add BoundedReader class to help avoid backwards seeks when reading input streams segmented into logical chunks. feat(core-clp): Add BoundedReader to prevent out-of-bound reads in segmented input streams. Dec 13, 2024
@gibber9809 gibber9809 merged commit ddba9b9 into y-scope:main Dec 13, 2024
21 checks passed
davidlion pushed a commit to Bill-hbrhbr/clp that referenced this pull request Dec 20, 2024
…segmented input streams. (y-scope#624)

Co-authored-by: haiqi96 <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants