-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stats): Chart migration 7 #1162
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request introduces comprehensive enhancements to the stats configuration and tracking system, focusing on adding new metrics for contracts and verified contracts. The changes span multiple files across the project, introducing two primary new counters: Possibly related PRs
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
4547cad
to
8131878
Compare
…nd it's still fast)
8131878
to
85cd2c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (11)
stats/stats/src/data_source/kinds/remote_db/query/one.rs (2)
58-63
: Add a doc comment for clarity
A short explanatory doc comment aboveStatementFromUpdateTime
would help future maintainers or implementers understand its purpose.+/// Trait that provides an SQL statement based on `update_time` +/// and completed migrations. Intended for single-value queries +/// whose timespan is derived from the current update time. pub trait StatementFromUpdateTime { fn get_statement( update_time: DateTime<Utc>, completed_migrations: &BlockscoutMigrations, ) -> Statement; }
65-71
: Consider unifying withPullOne
to reduce duplication
PullOneValue
andPullOne
share a similar pattern and could likely be combined or parameterized further to reduce code duplication. This is not critical but worth considering for improved maintainability.stats/config/utils/find_free_timeslot.py (2)
11-13
: Promote enumerations over string constants
DefiningDURATION_MENU_MANUAL
andDURATION_MENU_CONFIG
as string constants works, but consider using anEnum
to improve readability and type-safety across the codebase.+from enum import Enum + +class DurationSource(Enum): + MANUAL = "manual" + CONFIG = "config" + # Replace references to DURATION_MENU_MANUAL / DURATION_MENU_CONFIG with: - DURATION_MENU_MANUAL = "manual" - DURATION_MENU_CONFIG = "config" + DURATION_MENU_MANUAL = DurationSource.MANUAL.value + DURATION_MENU_CONFIG = DurationSource.CONFIG.value
219-221
: Fallback to default duration is a good practice
Falling back toself.default_duration
if the manual input is invalid prevents unexpected crashes. It might help to also notify users that the default duration was applied, so they realize their custom input was ignored.stats/stats/src/charts/counters/total_verified_contracts.rs (1)
21-22
: Consider providing a doc comment for this struct
Documenting the purpose ofTotalVerifiedContractsStatement
will help maintainers quickly understand its role in forming SQL statements.stats/stats/src/charts/counters/new_verified_contracts_24h.rs (2)
22-36
: Consolidate filter logic into a single function if reusability is needed.
Overall, the usage ofinterval_24h_filter
is correct for limiting rows to the last 24 hours. If there are multiple charts requiring similar logic, consider abstracting the filtration to keep the code DRY.
60-62
: Documentation clarifies scope.
Marking that internal transactions are not included adds clarity to the usage context.stats/config/utils/durations/durations_producer.py (3)
9-12
: Consider clarifying regex patterns in docstringThe regex usage in
convert_camel_to_snake
is clear enough, but adding a short comment clarifying the capturing groups and expectations could be helpful for future maintainers.
15-19
: Potential integration with custom period namesCurrently,
normalize_period
handles a fixed map for certain periods. If there's a future need for custom period names or additional time units, consider centralizing the mapping or retrieving it from an external config.
21-56
: Robust parsing for singleton and complex groupsThe
parse_rust_groups
function does a good job extracting both singleton and complex groups. As a potential enhancement, you could guard against potential edge cases with more thorough error handling, for example if the file is empty or the regex fails to parse.stats/stats/src/tests/mock_blockscout.rs (1)
144-145
: Clarify the commented-out dateA short note on why
"2022-11-17T00:00:00"
is intentionally skipped can preclude confusion for future maintainers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
stats/config/charts.json
(2 hunks)stats/config/layout.json
(1 hunks)stats/config/update_groups.json
(1 hunks)stats/config/utils/durations/.gitignore
(1 hunks)stats/config/utils/durations/README.md
(1 hunks)stats/config/utils/durations/data.csv.example
(1 hunks)stats/config/utils/durations/durations.json
(1 hunks)stats/config/utils/durations/durations_producer.py
(1 hunks)stats/config/utils/find_free_timeslot.py
(3 hunks)stats/config/utils/requirements.txt
(1 hunks)stats/stats-server/src/runtime_setup.rs
(1 hunks)stats/stats-server/tests/it/counters.rs
(1 hunks)stats/stats/src/charts/counters/mod.rs
(2 hunks)stats/stats/src/charts/counters/new_contracts_24h.rs
(1 hunks)stats/stats/src/charts/counters/new_verified_contracts_24h.rs
(1 hunks)stats/stats/src/charts/counters/total_contracts.rs
(2 hunks)stats/stats/src/charts/counters/total_verified_contracts.rs
(2 hunks)stats/stats/src/charts/db_interaction/mod.rs
(1 hunks)stats/stats/src/charts/db_interaction/utils.rs
(1 hunks)stats/stats/src/data_source/kinds/remote_db/mod.rs
(1 hunks)stats/stats/src/data_source/kinds/remote_db/query/mod.rs
(1 hunks)stats/stats/src/data_source/kinds/remote_db/query/one.rs
(2 hunks)stats/stats/src/tests/mock_blockscout.rs
(3 hunks)stats/stats/src/update_group.rs
(2 hunks)stats/stats/src/update_groups.rs
(1 hunks)
✅ Files skipped from review due to trivial changes (5)
- stats/stats/src/charts/db_interaction/mod.rs
- stats/config/utils/durations/.gitignore
- stats/config/utils/durations/README.md
- stats/config/utils/durations/data.csv.example
- stats/config/utils/durations/durations.json
🔇 Additional comments (57)
stats/stats/src/update_group.rs (4)
536-536
: Confirm consistent naming usage between ContractsGrowth
and NewContracts
.
You have introduced both ContractsGrowth
and NewContracts
. If there's a dependency relationship between them, ensure that the names reflect their roles consistently (e.g., “Contracts Growth” might internally use “New Contracts” values). Otherwise, this import might lead to confusion or mismatch if developers expect a consistent naming scheme.
543-543
: Clarify whether NewContracts
should be part of this group.
The group definition includes only ContractsGrowth
, but the tests suggest that NewContracts
might be required as well. If this omission is intentional, to trigger an error for demonstration or testing, it's fine. Otherwise, consider adding NewContracts
to this group if that chart belongs here logically.
549-549
: Re-check dependency coverage.
You have only the mutex for ContractsGrowth
here. Given the test below, there may be a functional dependency on NewContracts
. Confirm whether you need to add NewContracts
’s mutex to cover all actual dependencies.
568-568
: Validate that it correctly flags the missing mutex.
This test ensures the error reports NewContracts
as missing. This is good for confirming that required charts are present in the group’s mutexes. If ContractsGrowth
no longer depends on NewContracts
, consider removing this from the missing list to avoid confusion.
stats/stats/src/data_source/kinds/remote_db/query/one.rs (3)
4-4
: Imports necessary for new functionality
Bringing TryGetable
into scope is appropriate and essential for the upcoming usage in PullOneValue
. Nothing to fix here.
9-9
: New imports for data source types and timespan
Imports for BlockscoutMigrations
, UpdateContext
, WrappedValue
, Timespan
and TimespanValue
are consistent with the new trait and struct usage. Looks good.
Also applies to: 12-12
72-94
: Verify dropping the time portion is appropriate
When constructing the timespan
, note that date_naive()
strips away the time zone and hour/minute/second information. Ensure this aligns with the intended behavior for your 24-hour window or daily metrics.
✅ Verification successful
Based on the search results, I can provide the final response as the usage of date_naive()
is consistent with the intended behavior across the codebase. Here's my analysis:
Dropping time portion is correct and consistent with timespan implementations
The usage of date_naive()
to strip time information when constructing timespans is the correct approach because:
- All timespan implementations (Day, Month, Week, Year) consistently use only the date portion when creating timespans via
from_date()
. - The test cases across the codebase confirm this behavior, particularly in handling daily/monthly/yearly boundaries.
- The implementation aligns with the
Timespan
trait's documentation which indicates that timespans are date-based rather than time-based:I.e. for some date d, Timespan::from_date(d).into_date() != d
.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search usage of `from_date(` across the codebase
# to confirm if ignoring the time portion is consistent elsewhere.
rg -A 5 "from_date\\("
Length of output: 41411
stats/config/utils/find_free_timeslot.py (5)
28-32
: BooleanVar toggling is clear and intuitive
Good approach using tk.BooleanVar
to toggle between config-based or manual durations. This cleanly signals the source of truth for task durations.
33-41
: Validate loaded JSON content
While this block elegantly handles missing or invalid files, consider adding checks to validate the content structure of the durations JSON. This ensures that any malformed file is caught early before being used downstream.
68-74
: Straightforward flow for loading JSON
The logic for loading and updating visualization & schedule list is consistent. Be sure that any missing fields in the loaded JSON are gracefully handled to prevent runtime errors.
228-232
: Config-based vs manual durations logic looks consistent
Well-handled logic for determining which duration source to use. Make sure to confirm that each schedule name in schedules
has a corresponding duration entry in task_durations
if the config-based option is chosen.
322-330
: Robust approach to dynamic tags parsing
Safely searching for "tasks_"
prevents out-of-bounds errors. Switching to "/"
as the separator is a concise fix. This logic is more reliable for future expansions.
stats/stats/src/charts/counters/total_verified_contracts.rs (6)
2-8
: Imports look good
All newly introduced imports are cohesive and align well with the existing architecture.
13-13
: Smart contracts entity usage
Using smart_contracts
from blockscout_db::entity
is consistent with the data flow for verified contracts.
16-19
: Additional sea-orm imports
The sea_orm
imports (e.g., sea_query
and QueryFilter
) are relevant and properly scoped.
23-34
: Verify Index on InsertedAt for performance
The query filters on InsertedAt
; ensure there is an appropriate database index to optimize counting for large tables. If not, performance may degrade over time.
36-37
: Remote database source type looks good
Defining TotalVerifiedContractsRemote
succinctly delegates query logic to the new statement.
59-59
: Check necessity of MapToString
The final alias leverages MapToString<TotalVerifiedContractsRemote>
. Confirm that string conversion is needed for consumer layers.
stats/stats/src/data_source/kinds/remote_db/query/mod.rs (1)
7-7
: Addition of PullOneValue and StatementFromUpdateTime exports
These newly exported items will help unify remote query logic. The approach appears consistent with the existing structure.
stats/stats/src/charts/db_interaction/utils.rs (2)
1-3
: Imports are appropriate
The chrono and sea_orm imports lay the groundwork for the 24h interval filtering functionality.
4-16
: Confirm edge cases for the 24h interval filter
Consider how intervals beyond 24 hours or negative time differences are handled (e.g., future timestamps). If not needed, this is fine. Otherwise, clarify the behavior.
stats/stats/src/charts/counters/mod.rs (2)
5-6
: New modules for 24h counters
Introducing new_contracts_24h
and new_verified_contracts_24h
modules is logically consistent with the new metrics requirement.
28-29
: Public exports for 24h counters
Exporting NewContracts24h
and NewVerifiedContracts24h
ensures they are accessible for chart generation.
stats/stats/src/charts/counters/total_contracts.rs (1)
Line range hint 17-31
: Selective query usage and performance note
- Adding
.select_only()
helps optimize the query to return only required columns. - The performance comment is valid; ensure
InsertedAt
is indexed to keep the counting query efficient as data grows.
stats/stats/src/data_source/kinds/remote_db/mod.rs (1)
37-38
: Exports of new query types look good.
These new exports (PullOneValue
and StatementFromUpdateTime
) expand the module’s capabilities without altering the existing RemoteDatabaseSource
logic.
stats/stats/src/charts/counters/new_verified_contracts_24h.rs (9)
1-12
: Imports are consistent and concise.
They logically group the traits and types needed for building the chart source and referencing DB entities.
14-19
: Entity imports for smart_contracts
It's correct that you're using the SeaORM entity to build queries for the smart_contracts
table.
20-21
: Clear statement representation.
NewVerifiedContracts24hStatement
is properly scoped and named to reflect its purpose of counting new verified contracts over 24 hours.
38-39
: Remote source type alias is helpful.
Defining NewVerifiedContracts24hRemote
clarifies the data flow from the remote DB source for single-value retrieval.
41-41
: Properties struct is a minimal chart descriptor.
Good approach — a single source of truth for naming and metadata.
43-47
: Naming convention consistent with other charts.
"newVerifiedContracts24h"
aligns with naming patterns used across the codebase.
49-58
: Chart properties align with the standard approach.
Using ChartType::Counter
and MissingDatePolicy::FillPrevious
is consistent with typical “counter” charts.
63-64
: Local DB chart type alias
NewVerifiedContracts24h
references the DirectPointLocalDbChartSource
, consistent with other charts.
66-78
: Integration test is well-structured.
Although ignored for now, it provides a direct verification approach once a database is available.
stats/stats/src/charts/counters/new_contracts_24h.rs (11)
1-12
: Imports are coherent.
They match the structure for integrating chart code with SeaORM utilities.
14-19
: Entity-based logic fosters maintainability.
Referencing the transactions
entity ensures good modularity when building statements.
20-21
: Struct name matches its function.
NewContracts24hStatement
is intuitive and descriptive.
22-36
: Query filter usage is consistent with the contract creation approach.
Filtering on Status.eq(1)
and time-based logic is presumably correct for this domain.
Do confirm that Status=1
is indeed the correct numeric code for contract-creation success. If needed, we can run a verification script searching for references to status codes. Would you like that?
38-39
: Single-value remote source type alias aligns with the preceding pattern.
This is consistent with the newly introduced PullOneValue
usage.
42-42
: Properties struct declared.
No issues; it's aligned with the naming pattern.
44-47
: Aligned ID for the chart.
"newContracts24h"
maintains consistency.
49-59
: Resolution, chart type, and missing date policy are standard.
No concerns found.
61-63
: Helpful clarifications.
The documentation states how internal transactions are excluded for performance.
63-64
: Local DB chart source type alias
Consistent approach, no issues.
66-81
: Test design matches the pattern from other counters.
Including an ignored integration test is helpful for future real-world validation.
stats/stats-server/tests/it/counters.rs (1)
52-53
: Updates to expected counter IDs.
Adding "newContracts24h"
and "newVerifiedContracts24h"
ensures integration test coverage for new metrics.
stats/stats/src/update_groups.rs (1)
227-240
: Group Composition Looks Good
Great job bundling these charts into a single group (VerifiedContractsPageGroup
) for practical scheduling. This ensures closely-related metrics update together without overloading the database.
stats/config/utils/durations/durations_producer.py (2)
58-151
: Ensure accurate unit conversion when reading durations
Within process_durations
, you assume the CSV stores time in seconds and convert by dividing by 60. If there's any mismatch in how data is produced, it could lead to inaccurate partitioning. It may be worth adding a small assertion or documentation note about the CSV format.
153-203
: CLI design is user-friendly
Using Typer for command-line arguments and structured error handling provides a great developer experience. The code is readable, well-organized, and easy to extend.
stats/stats-server/src/runtime_setup.rs (1)
299-299
: New group integrated seamlessly
Adding Arc::new(VerifiedContractsPageGroup)
extends the runtime setup without introducing complexity. This aligns well with the existing flow in all_update_groups()
.
stats/stats/src/tests/mock_blockscout.rs (2)
471-480
: Good approach to tracking contract indexing time
Adding created_contract_code_indexed_at
for contract creation transactions is a solid step toward better metric accuracy. Be sure to validate that the 10-minute offset is correct for all contract-based indexing scenarios.
509-509
: Field addition properly references the new logic
The created_contract_code_indexed_at
field in transactions::ActiveModel
correctly references the newly introduced logic. It cleanly integrates into the existing model and ensures indexing time can be tracked for each transaction.
stats/config/utils/requirements.txt (1)
3-3
: Verify pandas usage and version compatibility
The addition of pandas aligns with the chart migration goals. However, let's verify its usage in the codebase.
✅ Verification successful
Pandas dependency is correctly used and version is appropriate
The pandas library is used in durations_producer.py
for reading CSV files using pd.read_csv()
. The specified version requirement pandas>=2.2.3
is appropriate as:
- It's being used for basic CSV reading functionality which is stable across versions
- The version is recent and includes security fixes
- The usage is minimal and doesn't rely on newer pandas features
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for pandas imports and usage
rg -l 'import pandas|from pandas' stats/config/utils/
Length of output: 106
Script:
#!/bin/bash
# Check pandas usage in durations_producer.py
rg -A 5 'import pandas|from pandas' stats/config/utils/durations/durations_producer.py
Length of output: 213
Script:
#!/bin/bash
# Check how pandas is used in the file
ast-grep --pattern 'pd.$_' stats/config/utils/durations/durations_producer.py
Length of output: 173
stats/config/layout.json (1)
7-8
: Verify counter definitions in charts.json
The placement of new counters is logical. Let's verify their definitions in charts.json.
✅ Verification successful
Counter definitions are properly configured in charts.json
Both counters are well-defined in charts.json with appropriate titles and descriptions:
new_contracts_24h
: "Contracts indexed within last 24h"new_verified_contracts_24h
: "Number of verified contracts within last 24h"
The definitions are consistent with their intended purpose and follow the same pattern as other similar counters in the configuration.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for counter definitions in charts.json
rg -A 10 '"new_contracts_24h"|"new_verified_contracts_24h"' stats/config/charts.json
Length of output: 958
stats/config/update_groups.json (2)
36-36
: Verify impact of schedule change for new_accounts_group
The schedule for new_accounts_group has been moved one hour earlier. Let's verify there are no timing dependencies affected by this change.
✅ Verification successful
Let me gather more information about the dependencies and timing relationships.
Schedule change for new_accounts_group is safe to proceed
Based on the analysis of the codebase:
- The
new_accounts_group
is responsible for updating three related metrics: new_accounts, accounts_growth, and total_accounts - The duration setting in durations.json shows this group takes 102 seconds to complete
- Moving the schedule from 5:00 to 4:00 doesn't conflict with any dependent jobs as:
- No other groups directly depend on this group's execution
- The 102-second duration is well within the one-hour time slot
- The metrics are daily aggregates, so the exact hour doesn't impact data accuracy
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for references to new_accounts_group in the codebase
rg "new_accounts_group" stats/
Length of output: 564
Script:
#!/bin/bash
# Check the migration file to understand the context of these groups
cat stats/config/migration/m20240618_080000_update_groups_and_separate_layout/split.py
# Check durations file to understand timing constraints
cat stats/config/utils/durations/durations.json
Length of output: 11522
43-43
: Verify verified_contracts_page_group implementation
The new group runs every 30 minutes. Let's verify its implementation and dependencies.
✅ Verification successful
Verified contracts page group implementation is properly configured
The implementation appears to be correct and well-integrated:
- The group is defined as
VerifiedContractsPageGroup
inupdate_groups.rs
and includes relevant charts for tracking verified contracts - The timing "0 15,45 * * * * *" (every 30 minutes) is appropriate for the charts it updates:
- Total contracts and verified contracts counts
- 24h new contracts and verified contracts metrics
- The implementation is supported by dedicated chart modules in the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for group definition and usage
ast-grep --pattern 'struct verified_contracts_page_group {
$$$
}' stats/stats/src/update_groups.rs
# Check for any timing dependencies with other groups
rg "verified_contracts_page_group" stats/
Length of output: 237
Script:
#!/bin/bash
# Let's check for the implementation details and dependencies
rg -A 10 "verified_contracts_page_group" stats/stats/src/update_groups.rs
# Check if there are any other files that might be using this group
fd . stats/ -t f -e rs -x rg -l "verified_contracts_page_group" {}
Length of output: 142
Script:
#!/bin/bash
# Let's try a broader search to understand the context
rg -l "struct.*Page" stats/stats/src/update_groups.rs
# Also check for any page-related implementations
rg -A 5 "Page" stats/stats/src/update_groups.rs
# Look for any verified contracts related code
rg -l "verified.*contract" stats/
Length of output: 1009
stats/config/charts.json (1)
Line range hint 72-87
: Verify complete integration of new counters.
The new counters appear to be part of a larger chart migration effort. Let's verify their complete integration across the system.
✅ Verification successful
Integration of new counters is properly implemented across the system
The verification shows complete integration of the new counters:
- Both counters are properly ordered in
layout.json
- Counter implementations exist in dedicated Rust modules:
stats/stats/src/charts/counters/new_contracts_24h.rs
stats/stats/src/charts/counters/new_verified_contracts_24h.rs
- Test coverage is present in
stats/stats-server/tests/it/counters.rs
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify integration of new counters across the system
# Check if counters are properly ordered in layout.json
echo "Checking layout.json for counter ordering..."
rg -A 10 '"counters_order"' stats/config/layout.json
# Check if counter implementations exist in Rust
echo "Checking for counter implementations..."
rg -l "new_contracts_24h|new_verified_contracts_24h" stats/stats/src/charts/counters/
# Check if counters are covered in tests
echo "Checking test coverage..."
rg -A 5 "newContracts24h|newVerifiedContracts24h" stats/stats-server/tests/
Length of output: 1422
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (15)
stats/config/utils/free_timeslots/README.md (4)
3-6
: Installation instructions could be more comprehensive.While the basic installation steps are provided, consider enhancing them for better user experience.
Add these improvements:
## Preparations 1. Install tkinter (e.g. `apt-get install python3-tk` or `brew install python-tk`) for `find_free_timeslot.py` GUI -2. Install other dependencies from `requirements.txt`: `pip install -r requirements.txt` +2. Install other dependencies from `requirements.txt`: + ```bash + pip install -r requirements.txt + ``` +3. Verify installation: + ```bash + python -c "import tkinter; import pandas; import croniter; print('All dependencies installed successfully!')" + ```
8-17
: Improve clarity in tool description and usage section.The description needs minor grammatical improvements and could benefit from more detailed usage examples.
Apply these refinements:
## `find_free_timeslot.py` -It's a tool to roughly visualize the busyness of update schedule to find a timeslot for some new update group. +It's a tool to roughly visualize the busyness of the update schedule to find a timeslot for some new update group. ### Usage -Just run `python find_free_timeslot.py` and use GUI to find less crowded timeslots. +Just run `python find_free_timeslot.py` and use the GUI to find less crowded timeslots. You can regenerate durations config for more accurate representation. -See below for details +See below for details. + +Example usage: +```bash +python find_free_timeslot.py +```🧰 Tools
🪛 LanguageTool
[uncategorized] ~10-~10: Possible missing article found.
Context: ...ol to roughly visualize the busyness of update schedule to find a timeslot for some ne...(AI_HYDRA_LEO_MISSING_AN)
[uncategorized] ~14-~14: You might be missing the article “the” here.
Context: ...python find_free_timeslot.py
and use GUI to find less crowded timeslots. You ca...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
24-31
: Enhance the data extraction steps with more context.The Grafana data extraction steps could be clearer with additional context and formatting.
Consider restructuring the steps like this:
-1. Get data fetch time statistics (e.g. from grafana) (example: `data.csv.example`). In our case, you can: +1. Get data fetch time statistics from Grafana (see `data.csv.example` for the expected format): + + Navigate to Grafana: - Open "Microservices > Stats" dashboard - Find "Average data fetch time" - - [Three dots] > [Inspect] > [Data] - - [Data options] > [Show data frame] > [Series joined by time] - - [Formatted data] = off - - [Download CSV] + + Export the data: + - Click [Three dots] > [Inspect] > [Data] + - Under [Data options], select [Show data frame] > [Series joined by time] + - Set [Formatted data] = off + - Click [Download CSV]
31-32
: Add more details about script parameters.The reference to
--help
could be more informative.Add example command and common parameters:
-2. Run the script (preferably from this folder, to correctly use default parameters) (see `--help` for details) -3. Enjoy newly generated `durations.json` after running `find_free_timeslot` script. +2. Run the script from this folder to use default parameters: + ```bash + python generate_durations.py data.csv + # Or see available options: + python generate_durations.py --help + ``` + +3. The script will generate `durations.json`, which will be automatically used by the `find_free_timeslot` script + for more accurate visualization.stats/config/utils/free_timeslots/durations/durations.json (1)
36-39
: Consider performance impact of frequent updatesThe following metrics are updated every time unit:
"new_accounts_group": 102, "new_contracts_group": 1, "new_txns_group": 1, "new_verified_contracts_group": 1,While
new_accounts_group
has a longer interval (102 units), the other metrics update very frequently. This could impact performance if these queries are resource-intensive.Consider:
- Evaluating if such frequent updates are necessary for these metrics
- Implementing caching mechanisms if not already in place
- Using incremental updates where possible instead of full recalculations
stats/config/utils/free_timeslots/find_free_timeslot.py (8)
10-16
: Consider renaming enum for clarity.
DurationMenu
is a fine approach, but naming itDurationSource
might be more descriptive.-class DurationMenu(Enum): +class DurationSource(Enum): MANUAL = "manual" CONFIG = "config"
Line range hint
23-60
: Encapsulate configuration loading and initialization.
The constructor does multiple actions (config load, default path checks, GUI setup). For clarity and maintainability, consider extracting config loading into a separate function or class responsible for reading from JSON.
60-148
: Consider splitting GUI setup for readability.
setup_gui
is quite large, creating multiple frames, plus the calendar, canvas, and schedule list. Splitting it into smaller helper functions (e.g.,setup_top_controls()
,setup_calendar()
, etc.) can improve readability and maintainability.
181-181
: Well-structured color generation.
The approach using HSV ensures a smooth color transition. If you want a different visual effect, you could consider alternative hues or color gradients.Also applies to: 187-187, 189-189
190-215
: Optional logging for empty schedules or parse errors.
The code gracefully returns an empty list whencroniter
fails, but consider logging or user feedback to surface scheduling errors.
218-246
: Time complexity is acceptable, but watch for large durations.
get_task_overlaps
loops through up to 1440 minutes per day. This is acceptable for moderate usage. If tasks can have extremely large durations or lists, consider early-limiting or rethinking the approach.
248-293
: Refactor drawing logic for maintainability.
update_visualization
does the timeline drawing, color-coding, and concurrency calculations in one place. Consider isolating the drawing code in a helper method to keep logic more concise and testable.
310-341
: Hover event logic is clear and robust.
Using canvas tags to identify tasks is a smart solution. The code gracefully handles no matching tags. For large numbers of tasks, consider shaping the message or grouping.stats/config/utils/free_timeslots/generate_durations.py (2)
1-7
: Describe assumptions about CSV formatting for consistency.Currently, the script assumes the CSV contains at least one row for durations and a "Time" column that should be skipped while reading. In scenarios where the CSV is empty or lacks expected columns, the script might fail. Consider adding checks or warnings for file validation to ensure robust behavior and prevent runtime errors.
153-200
: Provide user-friendly error messages with possible fixes.When an exception occurs, the user sees
Error: <message>
. Consider adding a hint or recommended steps, such as checking CSV format or Rust file content, to guide resolution of common issues.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
stats/config/utils/README.md
(0 hunks)stats/config/utils/free_timeslots/README.md
(1 hunks)stats/config/utils/free_timeslots/durations/.gitignore
(1 hunks)stats/config/utils/free_timeslots/durations/data.csv.example
(1 hunks)stats/config/utils/free_timeslots/durations/durations.json
(1 hunks)stats/config/utils/free_timeslots/find_free_timeslot.py
(3 hunks)stats/config/utils/free_timeslots/generate_durations.py
(1 hunks)stats/config/utils/free_timeslots/requirements.txt
(1 hunks)stats/stats/src/update_groups.rs
(1 hunks)
💤 Files with no reviewable changes (1)
- stats/config/utils/README.md
✅ Files skipped from review due to trivial changes (3)
- stats/config/utils/free_timeslots/durations/.gitignore
- stats/config/utils/free_timeslots/durations/data.csv.example
- stats/config/utils/free_timeslots/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- stats/stats/src/update_groups.rs
🧰 Additional context used
🪛 LanguageTool
stats/config/utils/free_timeslots/README.md
[uncategorized] ~10-~10: Possible missing article found.
Context: ...ol to roughly visualize the busyness of update schedule to find a timeslot for some ne...
(AI_HYDRA_LEO_MISSING_AN)
[uncategorized] ~14-~14: You might be missing the article “the” here.
Context: ... python find_free_timeslot.py
and use GUI to find less crowded timeslots. You ca...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
🔇 Additional comments (11)
stats/config/utils/free_timeslots/README.md (1)
1-32
: Overall documentation structure is good!
The README provides clear instructions and covers the essential aspects of the tool. The suggested improvements above will enhance readability and user experience, but the current version is already functional and informative.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~10-~10: Possible missing article found.
Context: ...ol to roughly visualize the busyness of update schedule to find a timeslot for some ne...
(AI_HYDRA_LEO_MISSING_AN)
[uncategorized] ~14-~14: You might be missing the article “the” here.
Context: ... python find_free_timeslot.py
and use GUI to find less crowded timeslots. You ca...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
stats/config/utils/free_timeslots/durations/durations.json (2)
12-23
: Verify the necessity of multiple recurrence periods
The configuration includes recurring account metrics for multiple periods (60, 90, 120 days) and frequencies (daily, weekly, monthly, yearly). This creates 12 different metrics that need to be maintained.
Let's check if all these metrics are actively used:
✅ Verification successful
Let me try another search strategy to find any code that might be using these metrics.
Multiple recurrence periods are intentionally configured but currently disabled
Based on the codebase analysis:
- All three recurrence periods (60, 90, 120 days) are properly implemented with corresponding Rust types and test cases in
active_recurring_accounts.rs
- The metrics are configured in the layout (
stats/config/layout.json
) and have scheduled updates inupdate_groups.json
- However, all three charts are explicitly disabled in
charts.json
with"enabled": false
This suggests that these metrics were intentionally implemented but are currently disabled, likely for future use or A/B testing. The multiple periods allow for different analysis windows of recurring account activity, which could be valuable for different business insights.
Locations that confirm this:
stats/config/charts.json
: All three charts are defined but disabledstats/config/layout.json
: Charts are included in the "Accounts" section layoutstats/stats/src/charts/lines/active_recurring_accounts.rs
: Full implementation with testsstats/config/update_groups.json
: Scheduled updates configured for all metrics
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check usage of recurring account metrics in the codebase
# Expected: Find references to these metrics in charts or UI components
# Search for metric usage in code
rg "recurring_accounts.*recurrence(60|90|120)_days" --type rust --type typescript --type javascript
# Search for metric names in chart configurations
fd -e json -e yaml -e toml | xargs rg "recurring_accounts.*recurrence(60|90|120)_days"
Length of output: 1741
Script:
#!/bin/bash
# Search for any files that might be processing these metrics
rg -l "recurring_accounts" --type-list
rg "recurring_accounts" -A 5 -B 5
# Look for any test files or specs that might be using these metrics
fd -e test.js -e spec.js -e test.ts -e spec.ts | xargs rg "recurring_accounts"
# Check if there are any SQL files that might be generating these metrics
fd -e sql | xargs rg "recurring_accounts"
Length of output: 24435
1-42
: Document the time unit and explain update frequency variations
The configuration defines update frequencies for various blockchain metrics, but there are a few concerns:
- The time unit for these durations is not documented (seconds/minutes/hours?)
- There are unexplained variations in update frequencies:
- Most metrics: 1 time unit
new_accounts_group
: 102 time unitstxns_success_rate_group
: 2 time unitscompleted_txns_group
: 19 time units
Let's verify if these frequencies match the original configuration:
✅ Verification successful
Time units are in minutes, update frequencies are intentionally varied
Based on the Python script that generates the durations.json file, I can clarify:
- The time units in durations.json are in minutes (converted from seconds in the source data)
- The varying update frequencies are intentionally calculated based on the actual duration of operations for each metric:
- The script reads raw durations from a CSV file
- Converts them from seconds to minutes
- Ensures a minimum of 1 minute for each group
- Groups with multiple charts sum up their individual durations
The seemingly inconsistent values like 102, 19, and 2 minutes are not arbitrary but derived from the actual processing times of their respective operations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check the original configuration file mentioned in PR description
# Expected: Find similar update frequencies in the original config
curl -s "https://raw.githubusercontent.com/blockscout/blockscout/367dd7e48dc93eb556670659783c70cbb720fc51/apps/explorer/config/config.exs" | grep -A 10 "stats_period"
Length of output: 164
Script:
#!/bin/bash
# Let's search for any documentation or references about these time units in the codebase
rg -i "time.*unit|duration.*unit|update.*frequenc|stats.*period" --type md --type txt
# Also search for any code that uses these durations
ast-grep --pattern 'durations.json'
# Search for any configuration or documentation files related to stats
fd -e json -e md -e txt -e yaml -e yml stats
Length of output: 806
Script:
#!/bin/bash
# Let's check the stats service implementation for more details about the time units
rg -A 5 -B 5 "duration.*group|stats.*period" stats/
# Also check for any scheduler or timer related code
ast-grep --pattern 'time.Duration'
# Look for any configuration loading code
ast-grep --pattern 'durations.*json' -l | xargs cat
Length of output: 66125
Script:
#!/bin/bash
# Let's check the Python script that generates these durations
cat stats/config/utils/free_timeslots/generate_durations.py
# Also check for any config files that might document the time units
fd -e yaml -e toml -e conf -e ini -e env stats/config/
Length of output: 7041
stats/config/utils/free_timeslots/find_free_timeslot.py (4)
158-160
: Validate the cron field count.
The conversion from 7-field to 5-field cron looks fine, but if the string has fewer than 7 fields, it falls through. Consider logging a warning or returning early for nonconforming cron strings.
163-176
: Robust handling of JSON loading.
The load_json
function uses a try-except block to prevent crashes and informs the user of issues. This is a best practice for user-facing file loads.
299-301
: Efficient update of schedule list.
Rebuilding the TreeView ensures a clean refresh. Implementation here looks solid and aligns well with the rest of the code.
304-307
: Simple, clear date selection handling.
Switching both visualization and schedule listing upon date selection is a straightforward approach.
stats/config/utils/free_timeslots/generate_durations.py (4)
9-13
: Ensure underscore handling for edge cases in convert_camel_to_snake
.
The regex-based approach should be effective for typical camelCase patterns. However, consider testing/handling scenarios with multiple uppercase letters in a row (e.g., "HTTPServer") or leading capitals (e.g., "AppSettings").
15-19
: Confirm period normalization for a wider set of intervals.
Currently, only four periods are mapped (WEEK, MONTH, YEAR, DAY). If the code might ingest any new or unexpected period strings (e.g., "HOUR" or "QUARTER"), it will return the unchanged string. Make sure that's the intended fallback logic.
21-55
: Handle invalid or commented-out lines within group definitions.
The regex-based approach may accidentally capture lines that are commented out or have special characters. Consider a more robust parsing approach if the Rust file's structure evolves (e.g., adding macros or new group definitions).
202-203
: Maintain consistent entry point usage.
Using if __name__ == "__main__":
is standard for Python CLI tools. Ensure internal or external documentation references python generate_durations.py ...
usage if needed, so that end-users know how to invoke it properly.
schedule was taken from here.
I encourage to review by commits (took feedback from the previous time)
Summary by CodeRabbit
New Features
new_contracts_24h
andnew_verified_contracts_24h
.VerifiedContractsPageGroup
for better update management.Bug Fixes
Documentation
find_free_timeslot.py
tool.Chores