-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ClpKeyValuePairStreamHandler
to support logging dictionary type log events into CLP key-value pair IR format.
#46
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request involve updates to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Logger
participant Handler
User->>Logger: Log event with key-value pairs
Logger->>Handler: Process log event
Handler->>Handler: Serialize log event to CLP format
Handler-->>User: Log event confirmation
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (9)
src/clp_logging/auto_generated_kv_pairs_utils.py (3)
19-44
: Consider making the buffer immutableThe buffer is initialized with a mutable dictionary that could be modified externally. Consider using a frozen or immutable data structure to prevent accidental modifications.
45-76
: Add thread safety warning in docstringThe
generate
method modifies shared state (self._buf) which could lead to race conditions in multi-threaded environments. Consider adding a warning in the docstring about thread safety, or implement thread-safe mechanisms.Example addition to docstring:
""" Generated auto-generated key-value pairs by populating the underlying buffer with the given log event metadata. + Note: This class is not thread-safe. Each thread should maintain its own + instance of AutoGeneratedKeyValuePairsBuffer. + :param timestamp: The Unix epoch timestamp in millisecond of the log event.
1-94
: Consider adding schema validationThe key-value pair structure is critical for the CLP IR format. Consider adding schema validation to ensure the generated structures always conform to the expected format, especially for the user-generated key-value pairs that will interact with this module.
src/clp_logging/handlers.py (3)
858-866
: Avoid storing unsupported formattersSince formatters are not supported by
ClpKeyValuePairStreamHandler
, assigning the formatter toself._formatter
may cause confusion. Consider removing this assignment to prevent misleading users.Apply this diff to remove the unnecessary assignment:
def setFormatter(self, fmt: Optional[logging.Formatter]) -> None: if fmt is None: return warnings.warn( f"Formatter is currently not supported in the current {self.__class__.__name__}", category=RuntimeWarning, ) - self._formatter = fmt
929-942
: Userecord.created
for consistent timestampsInstead of using
time.time()
, utilizerecord.created
to obtain the timestamp when theLogRecord
was created. Multiplyingrecord.created
by 1000 will provide the timestamp in milliseconds, ensuring consistency and accuracy in log event timing.Apply this diff to use
record.created
:def _write(self, record: logging.LogRecord) -> None: if self._is_closed(): raise IOError("The handler has been closed.") if not isinstance(record.msg, dict): raise TypeError("The log msg must be a valid Python dictionary.") - curr_ts: int = floor(time.time() * 1000) + curr_ts: int = floor(record.created * 1000) if self._loglevel_timeout is not None: self._loglevel_timeout.update(record.levelno, curr_ts, self._log_level_timeout_callback) self._serialize_kv_pair_log_event( self._auto_generated_kv_pairs_buf.generate(curr_ts, self._tz, record), record.msg )
964-967
: Maintain consistent timestamping in callbacksFor consistency with other methods, consider using
time.time()
directly in_log_level_timeout_callback
, or refactor the method to accept aLogRecord
if applicable.tests/test_handlers.py (3)
774-797
: Consider using@dataclass
to simplifyExpectedLogEvent
.The
ExpectedLogEvent
class can be simplified by using the@dataclass
decorator from thedataclasses
module. This reduces boilerplate code and enhances readability.You can refactor the class as follows:
+from dataclasses import dataclass -class ExpectedLogEvent: - - def __init__( - self, - utc_epoch_ms: int, - tz: Optional[str], - level_no: int, - level_name: str, - path: Path, - line: Optional[int], - user_generated_kv_pairs: Dict[str, Any], - ) -> None: - self.utc_epoch_ms: int = utc_epoch_ms - self.tz: Optional[str] = tz - self.level_no: int = level_no - self.level_name: str = level_name - self.path: Path = path - self.line: Optional[int] = line - self.user_generated_kv_pairs: Dict[str, Any] = user_generated_kv_pairs +@dataclass +class ExpectedLogEvent: + utc_epoch_ms: int + tz: Optional[str] + level_no: int + level_name: str + path: Path + line: Optional[int] + user_generated_kv_pairs: Dict[str, Any]
803-805
: Convert TODO comment into a GitHub issue for tracking refactoring.The TODO comment suggests refactoring
TestCLPBase
to support both raw-text and key-value pair logging. Consider creating a GitHub issue to formalize and track this enhancement.Would you like me to open a new GitHub issue to track this refactoring effort?
927-930
: Remove unnecessarycopy.deepcopy()
for immutable data structures.Since
primitive_dict
andprimitive_array
contain immutable primitive types, usingcopy.deepcopy()
is unnecessary. You can use a shallow copy withcopy()
or simply assign the variables directly to improve performance.Apply this diff to simplify the code:
- dict_with_array: Dict[str, Any] = copy.deepcopy(primitive_dict) + dict_with_array: Dict[str, Any] = primitive_dict.copy() dict_with_array["array"] = primitive_array - array_with_dict: List[Any] = copy.deepcopy(primitive_array) + array_with_dict: List[Any] = primitive_array.copy() array_with_dict.append(primitive_dict)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (5)
pyproject.toml
(1 hunks)src/clp_logging/auto_generated_kv_pairs_utils.py
(1 hunks)src/clp_logging/handlers.py
(4 hunks)tests/__init__.py
(2 hunks)tests/test_handlers.py
(4 hunks)
🔇 Additional comments (6)
pyproject.toml (1)
17-17
: Verify the necessity of pinning to a beta version
The dependency clp-ffi-py
has been pinned to a specific beta version (0.1.0b1). While this ensures reproducible builds, using beta versions in production dependencies could introduce stability risks.
Let's verify the changes and requirements:
Consider:
- Is this beta version specifically required for the new key-value pair functionality?
- Are there any known issues or breaking changes in this version?
- Would a stable version constraint (e.g.,
>=0.1.0
) be more appropriate?
✅ Verification successful
Let me gather more information about the dependency requirements and changes.
Based on the gathered information, I can now provide the final response:
Pinning to beta version 0.1.0b1 is necessary
The codebase heavily integrates with clp-ffi-py
for key-value pair logging functionality, as evidenced by the extensive usage in handlers and tests. The README explicitly states that the library relies on clp-ffi-py
for IR analytics interface and advanced features. The beta version requirement appears intentional as it introduces necessary features for the key-value pair functionality implemented throughout the codebase.
- The library uses
KeyValuePairLogEvent
and IR-related features fromclp-ffi-py
in multiple files - The README specifically references
clp-ffi-py
for "newest and improved CLP IR analytics interface" - The implementation shows tight coupling with specific IR format handling capabilities
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check the changelog and related issues for breaking changes and verify if this version is required for the key-value pair functionality.
# Check for related issues or PRs mentioning the version requirement
gh search issues "clp-ffi-py 0.1.0b1" --repo y-scope/clp-loglib-py
# Check the changelog in clp-ffi-py repository
gh api repos/y-scope/clp-ffi-py/contents/CHANGELOG.md 2>/dev/null | jq -r '.content' | base64 -d || echo "Changelog not found"
# Look for any references to key-value pair functionality in the codebase
rg -i "key.*value|kv.*pair" --type py
Length of output: 8118
Script:
#!/bin/bash
# Check imports and usage of clp-ffi-py in the codebase
rg -l "from clp_ffi_py" --type py
# Check the specific version requirement in git history
git log -p pyproject.toml | grep -A 5 -B 5 "clp-ffi-py"
# Look for any documentation about dependencies
fd README.md | xargs cat
Length of output: 10770
tests/__init__.py (2)
4-8
: LGTM! Clean import organization.
The grouped import of test base classes maintains a clear and organized structure.
42-44
: LGTM! Verify test discovery for the new handler.
The addition follows the established pattern for test collection and properly integrates with the existing test loading mechanism.
Let's verify that the test discovery is working as expected:
✅ Verification successful
Test discovery is working correctly for the new handler
The test discovery is properly implemented with multiple test classes inheriting from TestClpKeyValuePairLoggingBase
:
TestClpKeyValuePairHandlerBase
- Base test class for key-value pair handlersTestClpKeyValuePairHandlerLogLevelTimeoutBase
- Base test class for log level timeout functionality- Concrete test implementations:
TestClpKeyValuePairStreamingHandlerRaw
TestClpKeyValuePairStreamingHandlerZstd
TestClpKeyValuePairStreamingHandlerLogLevelTimeoutRaw
TestClpKeyValuePairStreamingHandlerLogLevelTimeoutZstd
The test discovery mechanism in __init__.py
will correctly collect all these test classes through __subclasses__()
.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the test discovery for ClpKeyValuePairLoggingBase
# Check for test class implementations
echo "Checking for concrete test implementations:"
ast-grep --pattern 'class $_($$$TestClpKeyValuePairLoggingBase$$$):'
# Verify test file structure
echo -e "\nChecking test file organization:"
fd -e py . tests/
Length of output: 309
Script:
#!/bin/bash
# Let's check the content of test files to verify the test class hierarchy
# Check the base test class definition
echo "Checking TestClpKeyValuePairLoggingBase definition:"
rg "class TestClpKeyValuePairLoggingBase" tests/ -A 5
# Look for any test classes that inherit from TestClpKeyValuePairLoggingBase
echo -e "\nChecking for test classes inheriting from TestClpKeyValuePairLoggingBase:"
rg "class.*\(.*TestClpKeyValuePairLoggingBase.*\)" tests/
# Check the test handler implementations
echo -e "\nChecking test handler implementations:"
cat tests/test_handlers.py
Length of output: 45299
src/clp_logging/auto_generated_kv_pairs_utils.py (2)
4-16
: LGTM! Well-structured constant definitions.
The constants are logically grouped, clearly named, and properly typed. They align well with the PR objectives for handling auto-generated key-value pairs.
78-94
: Verify consistent usage of timestamp format
The implementation looks good. Let's verify that the timestamp format is consistent across the codebase.
✅ Verification successful
Timestamp format is consistently implemented across the codebase
The verification shows that the timestamp implementation in create_loglib_generated_log_event_as_auto_generated_kv_pairs
is consistent with the rest of the codebase:
- All timestamps are consistently handled as Unix epoch milliseconds (integer type)
- The function follows the same timestamp structure used throughout:
- Uses
ZONED_TIMESTAMP_KEY
with nestedZONED_TIMESTAMP_UTC_EPOCH_MS_KEY
andZONED_TIMESTAMP_TZ_KEY
- Accepts timezone as an optional string parameter
- Uses
- The implementation aligns with the protocol's timestamp handling (
ID_TIMESTAMP
,TIMESTAMP_DELTA_*
constants) and reader/handler components
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for timestamp-related code to ensure consistent format usage
rg -A 2 "timestamp.*:.*int" --type python
Length of output: 74
Script:
#!/bin/bash
# Search for timestamp-related code using ripgrep without file type filter
rg -A 2 "timestamp.*:.*int"
Length of output: 3170
Script:
#!/bin/bash
# Search for timestamp usage in function parameters and type hints
ast-grep --pattern 'def $FUNC($$$, timestamp: int, $$$)'
Length of output: 58
Script:
#!/bin/bash
# Search for timestamp-related constants and type definitions
rg "TIMESTAMP|timestamp" -A 2
Length of output: 29362
src/clp_logging/handlers.py (1)
227-231
: Update CLPLogLevelTimeout
to support Serializer
The inclusion of Serializer
in the ostream
attribute and the set_ostream
method enhances the flexibility of CLPLogLevelTimeout
, allowing it to work with a broader range of output streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (4)
src/clp_logging/handlers.py (3)
858-865
: Consider removing unused formatter assignmentIn the
setFormatter
method, after issuing a warning that formatters are not supported, you assign the formatter toself._formatter
. Since the formatter isn't utilized elsewhere in the class, consider removing this assignment to prevent potential confusion.
862-864
: Adjust warning message for clarityIn the warning message within
setFormatter
, the phrasing can be improved for clarity. Consider rephrasing it to:"Formatter is currently not supported in
ClpKeyValuePairStreamHandler
"
916-919
: Optimize compressor instantiationThe
ZstdCompressor
instancecctx
is instantiated even when compression is disabled. To enhance performance, consider creating the compressor only whenself._enable_compression
isTrue
.Apply this change:
def _init_new_serializer(self, stream: IO[bytes]) -> None: """ Initializes a new serializer that writes to the given stream. :param stream: The stream that the underlying serializer will write to. """ if self._enable_compression: cctx: ZstdCompressor = ZstdCompressor() self._serializer = Serializer(cctx.stream_writer(stream)) else: self._serializer = Serializer(stream)tests/test_handlers.py (1)
1028-1031
: Adhere to PEP 8 line length guidelines for readabilityLines 1028-1031 exceed the recommended maximum line length of 79 characters. Wrapping these lines according to PEP 8 guidelines would improve code readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
src/clp_logging/auto_generated_kv_pairs_utils.py
(1 hunks)src/clp_logging/handlers.py
(4 hunks)tests/test_handlers.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/clp_logging/auto_generated_kv_pairs_utils.py
🧰 Additional context used
📓 Learnings (1)
src/clp_logging/handlers.py (1)
Learnt from: LinZhihao-723
PR: y-scope/clp-loglib-py#46
File: src/clp_logging/handlers.py:883-894
Timestamp: 2024-11-30T03:17:38.238Z
Learning: `clp_ffi_py.ir.Serializer` needs to allow closing the serializer without closing the underlying output stream.
Sorry, haven't yet finished a review. Will get back to you today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The wrapping of docstrings is inconsistent across this PR. Can we stick to 100 characters and open an issue to refactor the existing docstrings?
- Do we need indents for multiline docstrings?
@@ -14,7 +14,7 @@ readme = "README.md" | |||
requires-python = ">=3.7" | |||
dependencies = [ | |||
"backports.zoneinfo >= 0.2.1; python_version < '3.9'", | |||
"clp-ffi-py >= 0.0.11", | |||
"clp-ffi-py == 0.1.0b1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to release clp-ffi-py and update the version here before merging this PR?
ZONED_TIMESTAMP_KEY: str = "zoned_timestamp" | ||
ZONED_TIMESTAMP_UTC_EPOCH_MS_KEY: str = "utc_epoch_ms" | ||
ZONED_TIMESTAMP_TZ_KEY: str = "timezone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the ZONED_
? Especially considering the user may pass in None
for the timezone?
from typing import Any, Dict, Optional | ||
|
||
ZONED_TIMESTAMP_KEY: str = "zoned_timestamp" | ||
ZONED_TIMESTAMP_UTC_EPOCH_MS_KEY: str = "utc_epoch_ms" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Aren't all epoch timestamps UTC?
- How about
unix_timestamp_ms
?- Technically,
ms
is ambiguous since it could mean Ms or ms due to capitalization, so I would rather usemillisecs
for the least ambiguity; but I guess that would be a larger change across our codebase.
- Technically,
A reusable buffer for creating auto-generated key-value pairs for log | ||
events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reusable buffer for creating auto-generated key-value pairs for log | |
events. | |
A reusable buffer for auto-generated key-value pairs. |
This buffer maintains a predefined dictionary structure for common metadata | ||
fields, allowing efficient reuse without creating new dictionaries for each | ||
log event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This buffer maintains a predefined dictionary structure for common metadata | |
fields, allowing efficient reuse without creating new dictionaries for each | |
log event. | |
This buffer maintains a predefined dictionary for common metadata fields, | |
to enable efficient reuse without creating new dictionaries for each log | |
event. |
|
||
# override | ||
def setUp(self) -> None: | ||
post_fix: str = ".clp.zst" if self._enable_compression else ".clp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about file_extension
?
auto_generated_kv_pairs[LEVEL_KEY][LEVEL_NAME_KEY], expected.level_name | ||
) | ||
|
||
# Check the path by converting the path string to `Path` object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Check the path by converting the path string to `Path` object |
Feels redundant, but you can leave it if you want.
|
||
self._close_and_compare() | ||
|
||
def test_empty(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test creating an entirely empty file? I.e., one without a log event.
|
||
@unittest.skipIf( | ||
"macOS" == os.getenv("RUNNER_OS"), | ||
"Github macos runner tends to fail LLT tests with timing issues.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Github macos runner tends to fail LLT tests with timing issues.", | |
"The GitHub macOS runner tends to fail LLT tests with timing issues.", |
|
||
@unittest.skipIf( | ||
"macOS" == os.getenv("RUNNER_OS"), | ||
"Github macos runner tends to fail LLT tests with timing issues.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Github macos runner tends to fail LLT tests with timing issues.", | |
"The GitHub macOS runner tends to fail LLT tests with timing issues.", |
Description
This PR adds
ClpKeyValuePairStreamHandler
to support serializing key-value pair log events using the latest CLP IR format. It implements everything thatCLPStreamHandler
supports, except that the log event given by users is a Python dictionary but not a text string:logging.StreamHandler
CLPLoglevelTimeout
.Since auto-generated and user-generated kv pairs are not implemented in CLP core yet (as tracked in this feature request), the current logging library will structure the user-given Python dictionary and auto-generated log event metadata in the following manner:
AUTO_GENERATED_KV_PAIRS
includes:This PR also adds new unit test cases to test
ClpKevValuePairStreamHandler
's functionalities:NOTE: due to the limitation of the current unit test files, the current unit test cases duplicate some existing code, as explained in #47.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Tests