Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523

Open
wants to merge 72 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
edbcc6d
first batch of change
haiqi96 Jul 26, 2024
21b4471
Remove dictionary utils
haiqi96 Jul 26, 2024
3a3c9ba
fix exception issues and make sure unit tests and clg can build
haiqi96 Jul 26, 2024
2c40e29
Linter
haiqi96 Jul 26, 2024
d503436
Fixes
haiqi96 Jul 26, 2024
1ff5ea8
Rearrange code
haiqi96 Jul 26, 2024
8a7ddc7
fix clo build
haiqi96 Jul 26, 2024
70325a9
fix
haiqi96 Jul 26, 2024
99d45a6
Fix make-dictionaries-readable
haiqi96 Jul 26, 2024
b086953
fix constructors
haiqi96 Jul 26, 2024
746834f
refactor
haiqi96 Jul 26, 2024
d2a6836
Update docstrings and comments. Also updated destructor
haiqi96 Jul 29, 2024
7d347c1
Initial change
haiqi96 Jul 28, 2024
63f0cfd
Linter fix
haiqi96 Aug 7, 2024
4a0a323
Merge branch 'main' into BufferedReaderRefactor
haiqi96 Aug 8, 2024
7192022
Factor out duplicated function
haiqi96 Aug 8, 2024
f6b6757
Add sysfile Reader
haiqi96 Aug 9, 2024
712d10b
Update BufferedFileReader to use move instead of reference
haiqi96 Aug 9, 2024
80532ea
FileDescriptor update
haiqi96 Aug 9, 2024
d30030a
Merge branch 'main' into FileReaderRefactor
haiqi96 Aug 12, 2024
4dabc96
Revert unintended changes
haiqi96 Aug 12, 2024
a8d0747
fix
haiqi96 Aug 12, 2024
ff0b9d8
Missing fix
haiqi96 Aug 12, 2024
b3f2795
Fixes
haiqi96 Aug 12, 2024
4d0d80b
Revert unintended changes
haiqi96 Aug 13, 2024
70c8342
Fix one weird issue
haiqi96 Aug 13, 2024
83ee77e
small fix
haiqi96 Aug 13, 2024
9601fa8
Merge branch 'FileReaderRefactor' into BufferedReaderRefactor
haiqi96 Aug 13, 2024
59db88b
Improvement
haiqi96 Aug 13, 2024
450632a
Merge branch 'main' into BufferedReaderRefactor
haiqi96 Aug 14, 2024
c6c8c9b
Add initial class and Unittest
haiqi96 Aug 14, 2024
d5db2a6
Small fix
haiqi96 Aug 14, 2024
39c9bb3
Add missing unit test file
haiqi96 Aug 15, 2024
8909eab
Fix clang tidy issues
haiqi96 Aug 15, 2024
c5cb603
suppress pointer warning for now
haiqi96 Aug 15, 2024
cfcd0f7
Apply suggestions from code review
haiqi96 Aug 16, 2024
948b315
fixes
haiqi96 Aug 16, 2024
ee41996
Use span to replace pointer arithmetic.
haiqi96 Aug 20, 2024
1b53cf7
Merge branch 'main' into SysFileReader
haiqi96 Aug 20, 2024
d9b4de4
Use CLP array to avoid buffer warning
haiqi96 Aug 20, 2024
707817a
Linter
haiqi96 Aug 20, 2024
2a6248c
Apply suggestions from code review
haiqi96 Aug 21, 2024
aff7a2e
Fixes
haiqi96 Aug 21, 2024
f807eea
Rename class and slight modify the comments
haiqi96 Aug 21, 2024
435e12a
small touch
haiqi96 Aug 21, 2024
1cd11b5
Merge branch 'SysFileReader' into BufferedReaderRefactor
haiqi96 Aug 21, 2024
0c99da6
missing changes in merge
haiqi96 Aug 21, 2024
4acf9b5
Merge branch 'BufferedReaderRefactor' of https://github.com/haiqi96/c…
haiqi96 Aug 22, 2024
d3e54c4
Merge branch 'main' into BufferedReaderRefactor
haiqi96 Aug 22, 2024
fd4eb36
Fixes
haiqi96 Aug 22, 2024
5c301d7
A few more fixes
haiqi96 Aug 22, 2024
ce7376b
A small change
haiqi96 Aug 22, 2024
1bbf2a4
Apply suggestions from code review
haiqi96 Sep 9, 2024
1260043
small fixes
haiqi96 Sep 9, 2024
aaa8c64
allow BufferReader to support null initialization
haiqi96 Sep 9, 2024
c147b41
A few more polishing
haiqi96 Sep 9, 2024
02b0d10
A few more polishing again
haiqi96 Sep 9, 2024
810a77b
Apply suggestions from code review
haiqi96 Sep 11, 2024
654eb35
update unit-test
haiqi96 Sep 16, 2024
c176c67
Merge branch 'BufferedReaderRefactor' of https://github.com/haiqi96/c…
haiqi96 Sep 16, 2024
c2b6333
linter
haiqi96 Sep 16, 2024
e4bbd30
Code review suggestions
haiqi96 Oct 8, 2024
7fc448a
Apply suggestions from code review
haiqi96 Oct 8, 2024
4e5d3ff
update code based on comments
haiqi96 Oct 8, 2024
676dc3c
fix
haiqi96 Oct 8, 2024
dfb1211
Update unit test
haiqi96 Oct 8, 2024
38076be
linter
haiqi96 Oct 8, 2024
a4eb75f
Remove unnecessary function
haiqi96 Oct 8, 2024
17e8cdb
Missing docstrings
haiqi96 Oct 8, 2024
8cc9e89
Other docstrings fixes
haiqi96 Oct 8, 2024
74f57bc
Merge branch 'main' into BufferedReaderRefactor
kirkrodrigues Oct 24, 2024
3d843ec
variable name refactor
haiqi96 Oct 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions components/core/src/clp/BufferReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,23 @@
#include <cstring>

namespace clp {
BufferReader::BufferReader(char const* data, size_t data_size, size_t pos) {
if (nullptr == data) {
BufferReader::BufferReader(char const* data, size_t data_size, size_t pos)
: m_internal_buf{data},
m_internal_buf_size{data_size},
m_internal_buf_pos{pos} {
if (nullptr == data && (data_size != 0 || pos != 0)) {
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
throw OperationFailed(ErrorCode_BadParam, __FILENAME__, __LINE__);
}
else if (pos > data_size) {
throw OperationFailed(ErrorCode_BadParam, __FILENAME__, __LINE__);
}
throw OperationFailed(ErrorCode_BadParam, __FILENAME__, __LINE__);
}
m_internal_buf = data;
m_internal_buf_size = data_size;
m_internal_buf_pos = pos;
}

auto BufferReader::peek_buffer(char const*& buf, size_t& peek_size) const -> void {
peek_size = get_remaining_data_size();
buf = m_internal_buf + m_internal_buf_pos;
buf = 0 == peek_size ? nullptr : m_internal_buf + m_internal_buf_pos;
}

auto BufferReader::try_read_to_delimiter(
Expand Down
2 changes: 2 additions & 0 deletions components/core/src/clp/BufferReader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ class BufferReader : public ReaderInterface {
};

// Constructors
BufferReader() : BufferReader(nullptr, 0, 0) {}

BufferReader(char const* data, size_t data_size) : BufferReader(data, data_size, 0) {}

BufferReader(char const* data, size_t data_size, size_t pos);
Expand Down
216 changes: 70 additions & 146 deletions components/core/src/clp/BufferedFileReader.cpp
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
#include "BufferedFileReader.hpp"

#include <fcntl.h>

#include <algorithm>
#include <cerrno>

#include <boost/filesystem.hpp>

#include <cstddef>
#include <memory>
#include <span>
#include <string>
#include <utility>

#include "BufferReader.hpp"
#include "ErrorCode.hpp"
#include "math_utils.hpp"
#include "ReaderInterface.hpp"

using std::span;
using std::string;
using std::unique_ptr;

namespace clp {
namespace {
Expand All @@ -22,133 +29,59 @@ namespace {
* @return ErrorCode_EndOfFile on EOF
* @return ErrorCode_Success on success
*/
auto read_into_buffer(int fd, char* buf, size_t num_bytes_to_read, size_t& num_bytes_read)
-> ErrorCode;

auto read_into_buffer(int fd, char* buf, size_t num_bytes_to_read, size_t& num_bytes_read)
-> ErrorCode {
auto read_into_buffer(
ReaderInterface* reader,
char* buf,
size_t num_bytes_to_read,
size_t& num_bytes_read
) -> ErrorCode;
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unnecessary forward declaration of read_into_buffer to simplify code

Since the implementation of read_into_buffer follows immediately and there are no calls to it before its definition, the forward declaration within the anonymous namespace is unnecessary.

You can safely remove the forward declaration at lines 32–37.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkrodrigues I believe our coding standard is to keep the forward declaration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!


auto read_into_buffer(
ReaderInterface* reader,
char* buf,
size_t num_bytes_to_read,
size_t& num_bytes_read
) -> ErrorCode {
num_bytes_read = 0;
while (true) {
auto const bytes_read = ::read(fd, buf, num_bytes_to_read);
if (0 == bytes_read) {
break;
}
if (bytes_read < 0) {
return ErrorCode_errno;
}

buf += bytes_read;
num_bytes_read += bytes_read;
num_bytes_to_read -= bytes_read;
if (num_bytes_read == num_bytes_to_read) {
return ErrorCode_Success;
}
}
auto const error_code = reader->try_read(buf, num_bytes_to_read, num_bytes_read);
if (0 == num_bytes_read) {
return ErrorCode_EndOfFile;
}
return ErrorCode_Success;
return error_code;
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
}
} // namespace

BufferedFileReader::BufferedFileReader(size_t base_buffer_size) {
BufferedFileReader::BufferedFileReader(
std::unique_ptr<ReaderInterface> reader_interface,
size_t base_buffer_size
)
: m_file_reader(std::move(reader_interface)) {
if (base_buffer_size % cMinBufferSize != 0) {
throw OperationFailed(ErrorCode_BadParam, __FILENAME__, __LINE__);
}
m_base_buffer_size = base_buffer_size;
m_buffer.resize(m_base_buffer_size);
}

BufferedFileReader::~BufferedFileReader() {
close();
}

auto BufferedFileReader::try_open(string const& path) -> ErrorCode {
// Cleanup in case caller forgot to call close before calling this function
close();

m_fd = ::open(path.c_str(), O_RDONLY);
if (-1 == m_fd) {
if (ENOENT == errno) {
return ErrorCode_FileNotFound;
}
return ErrorCode_errno;
}
m_path = path;
m_file_pos = 0;
m_file_pos = m_file_reader->get_pos();
m_buffer_begin_pos = 0;
m_buffer_reader.emplace(m_buffer.data(), 0);
m_highest_read_pos = 0;
return ErrorCode_Success;
}

void BufferedFileReader::open(string const& path) {
auto const error_code = try_open(path);
if (ErrorCode_Success != error_code) {
if (ErrorCode_FileNotFound == error_code) {
throw OperationFailed(
error_code,
__FILENAME__,
__LINE__,
"File not found: " + boost::filesystem::weakly_canonical(path).string()
);
}
throw OperationFailed(error_code, __FILENAME__, __LINE__);
}
}

auto BufferedFileReader::close() -> void {
if (-1 == m_fd) {
return;
}

if (m_checkpoint_pos.has_value()) {
m_buffer.resize(m_base_buffer_size);
m_checkpoint_pos.reset();
}

// NOTE: We don't check errors for close since, in the read case, it seems the only reason it
// could fail is if it was interrupted by a signal
::close(m_fd);
m_fd = -1;
m_buffer_reader = BufferReader{m_buffer.data(), 0};
m_highest_read_pos = m_file_pos;
}

auto BufferedFileReader::try_refill_buffer_if_empty() -> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
if (m_buffer_reader->get_buffer_size() > 0) {
if (m_buffer_reader.get_buffer_size() > 0) {
return ErrorCode_Success;
}
return refill_reader_buffer(m_base_buffer_size);
}

void BufferedFileReader::refill_buffer_if_empty() {
auto error_code = try_refill_buffer_if_empty();
if (ErrorCode_Success != error_code) {
throw OperationFailed(error_code, __FILENAME__, __LINE__);
}
}

auto BufferedFileReader::try_peek_buffered_data(char const*& buf, size_t& peek_size) const
-> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
m_buffer_reader->peek_buffer(buf, peek_size);
return ErrorCode_Success;
}

void BufferedFileReader::peek_buffered_data(char const*& buf, size_t& peek_size) const {
auto error_code = try_peek_buffered_data(buf, peek_size);
if (ErrorCode_Success != error_code) {
throw OperationFailed(error_code, __FILENAME__, __LINE__);
}
m_buffer_reader.peek_buffer(buf, peek_size);
}

auto BufferedFileReader::set_checkpoint() -> size_t {
if (m_checkpoint_pos.has_value() && m_checkpoint_pos < m_file_pos
&& m_buffer_reader->get_buffer_size() != m_base_buffer_size)
&& m_buffer_reader.get_buffer_size() != m_base_buffer_size)
{
drop_content_before_current_pos();
}
Expand All @@ -171,17 +104,11 @@ auto BufferedFileReader::clear_checkpoint() -> void {
}

auto BufferedFileReader::try_get_pos(size_t& pos) -> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
pos = m_file_pos;
return ErrorCode_Success;
}

auto BufferedFileReader::try_seek_from_begin(size_t pos) -> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
if (pos == m_file_pos) {
return ErrorCode_Success;
}
Expand All @@ -191,16 +118,17 @@ auto BufferedFileReader::try_seek_from_begin(size_t pos) -> ErrorCode {
return ErrorCode_Unsupported;
}

auto error_code = m_buffer_reader->try_seek_from_begin(get_buffer_relative_pos(pos));
auto error_code = m_buffer_reader.try_seek_from_begin(get_buffer_relative_pos(pos));
if (ErrorCode_Truncated == error_code) {
if (false == m_checkpoint_pos.has_value()) {
// If checkpoint is not set, simply move the file_pos and invalidate
// the buffer reader
auto offset = lseek(m_fd, static_cast<off_t>(pos), SEEK_SET);
if (-1 == offset) {
return ErrorCode_errno;
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
if (auto const error_code = m_file_reader->try_seek_from_begin(pos);
error_code != ErrorCode_Success)
{
return error_code;
}
m_buffer_reader.emplace(m_buffer.data(), 0);
m_buffer_reader = BufferReader{m_buffer.data(), 0};
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider simplifying error handling logic

The nested conditions and error handling could be made more readable by extracting the checkpoint-related logic into a separate method.

Consider refactoring like this:

-    if (ErrorCode_Truncated == error_code) {
-        if (false == m_checkpoint_pos.has_value()) {
-            // If checkpoint is not set, simply move the file_pos and invalidate
-            // the buffer reader
-            if (auto const error_code = m_file_reader->try_seek_from_begin(pos);
-                error_code != ErrorCode_Success)
-            {
-                return error_code;
-            }
-            m_buffer_reader = BufferReader{m_buffer.data(), 0};
-            m_buffer_begin_pos = pos;
+    if (ErrorCode_Truncated == error_code && !m_checkpoint_pos.has_value()) {
+        auto const seek_error = m_file_reader->try_seek_from_begin(pos);
+        if (seek_error != ErrorCode_Success) {
+            return seek_error;
+        }
+        m_buffer_reader = BufferReader{m_buffer.data(), 0};
+        m_buffer_begin_pos = pos;
+        return ErrorCode_Success;

Committable suggestion was skipped due to low confidence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this applies to the original code snippet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haiqi96, understood. I appreciate your feedback.

(つ﹏⊂)


✏️ Learnings added
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/BufferedFileReader.cpp:96-106
Timestamp: 2024-10-24T14:45:26.095Z
Learning: In `components/core/src/clp/BufferedFileReader.cpp`, refactoring the nested error handling conditions may not apply due to the specific logic in the original code.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

m_buffer_begin_pos = pos;
} else {
auto const num_bytes_to_refill = pos - get_buffer_end_pos();
Expand All @@ -211,7 +139,7 @@ auto BufferedFileReader::try_seek_from_begin(size_t pos) -> ErrorCode {
if (ErrorCode_Success != error_code) {
return error_code;
}
error_code = m_buffer_reader->try_seek_from_begin(get_buffer_relative_pos(pos));
error_code = m_buffer_reader.try_seek_from_begin(get_buffer_relative_pos(pos));
if (ErrorCode_Success != error_code) {
return error_code;
}
Expand All @@ -225,41 +153,35 @@ auto BufferedFileReader::try_seek_from_begin(size_t pos) -> ErrorCode {

auto BufferedFileReader::try_read(char* buf, size_t num_bytes_to_read, size_t& num_bytes_read)
-> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
if (nullptr == buf) {
return ErrorCode_BadParam;
}
if (num_bytes_to_read == 0) {
return ErrorCode_Success;
}

span dst_view{buf, num_bytes_to_read};
num_bytes_read = 0;
while (true) {
while (false == dst_view.empty()) {
size_t bytes_read{0};
auto error_code = m_buffer_reader->try_read(buf, num_bytes_to_read, bytes_read);
auto error_code = m_buffer_reader.try_read(dst_view.data(), dst_view.size(), bytes_read);
if (ErrorCode_Success == error_code) {
buf += bytes_read;
dst_view = dst_view.subspan(bytes_read);
num_bytes_read += bytes_read;
num_bytes_to_read -= bytes_read;
update_file_pos(m_file_pos + bytes_read);
if (0 == num_bytes_to_read) {
} else if (ErrorCode_EndOfFile == error_code) {
error_code = refill_reader_buffer(m_base_buffer_size);
if (ErrorCode_EndOfFile == error_code) {
break;
}
} else if (ErrorCode_EndOfFile != error_code) {
return error_code;
}

error_code = refill_reader_buffer(m_base_buffer_size);
if (ErrorCode_EndOfFile == error_code) {
break;
}
if (ErrorCode_Success != error_code) {
if (ErrorCode_Success != error_code) {
return error_code;
}
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
} else {
return error_code;
}
}
if (0 == num_bytes_read) {
if (dst_view.size() == num_bytes_to_read) {
return ErrorCode_EndOfFile;
}
return ErrorCode_Success;
Expand All @@ -271,17 +193,14 @@ auto BufferedFileReader::try_read_to_delimiter(
bool append,
string& str
) -> ErrorCode {
if (-1 == m_fd) {
return ErrorCode_NotInit;
}
if (false == append) {
str.clear();
}
bool found_delim{false};
size_t total_num_bytes_read{0};
while (true) {
size_t num_bytes_read{0};
if (auto ret_code = m_buffer_reader->try_read_to_delimiter(
if (auto ret_code = m_buffer_reader.try_read_to_delimiter(
delim,
keep_delimiter,
str,
Expand Down Expand Up @@ -314,7 +233,7 @@ auto BufferedFileReader::try_read_to_delimiter(

auto BufferedFileReader::refill_reader_buffer(size_t num_bytes_to_refill) -> ErrorCode {
auto const buffer_end_pos = get_buffer_end_pos();
auto const data_size = m_buffer_reader->get_buffer_size();
auto const data_size = m_buffer_reader.get_buffer_size();
auto const available_buffer_space = m_buffer.size() - data_size;

size_t num_bytes_to_read{0};
Expand Down Expand Up @@ -342,27 +261,32 @@ auto BufferedFileReader::refill_reader_buffer(size_t num_bytes_to_refill) -> Err
}

size_t num_bytes_read{0};
auto error_code
= read_into_buffer(m_fd, &m_buffer[next_buffer_pos], num_bytes_to_read, num_bytes_read);
auto error_code = read_into_buffer(
m_file_reader.get(),
&m_buffer[next_buffer_pos],
num_bytes_to_read,
num_bytes_read
);
if (error_code != ErrorCode_Success && ErrorCode_EndOfFile != error_code) {
return error_code;
}
// NOTE: We still want to set the buffer reader if no bytes were read on EOF
m_buffer_reader.emplace(m_buffer.data(), next_buffer_pos + num_bytes_read, next_buffer_pos);
m_buffer_reader
= BufferReader{m_buffer.data(), next_buffer_pos + num_bytes_read, next_buffer_pos};
m_buffer_begin_pos = next_buffer_begin_pos;
return error_code;
}

auto BufferedFileReader::drop_content_before_current_pos() -> void {
auto buffer_reader_pos = m_buffer_reader->get_pos();
auto const new_data_size = m_buffer_reader->get_buffer_size() - buffer_reader_pos;
auto buffer_reader_pos = m_buffer_reader.get_pos();
auto const new_data_size = m_buffer_reader.get_buffer_size() - buffer_reader_pos;
auto const new_buffer_size = int_round_up_to_multiple(new_data_size, m_base_buffer_size);

haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
m_buffer.erase(m_buffer.begin(), m_buffer.begin() + static_cast<long>(buffer_reader_pos));
m_buffer.resize(new_buffer_size);
m_buffer_begin_pos += buffer_reader_pos;

m_buffer_reader.emplace(m_buffer.data(), new_data_size);
m_buffer_reader = BufferReader{m_buffer.data(), new_data_size};
}

auto BufferedFileReader::update_file_pos(size_t pos) -> void {
Expand Down
Loading
Loading