-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-design time series management #349
Conversation
- Store time series metadata in a SQLite database instead of per-component dictionaries. This allows system-wide SQL queries instead of looping across component dictionaries. - Consolidate management of time series in TimeSeriesManager instead of individual time series storage implementations. - Support addition of user-defined features to time series arrays. - Add support for different time series resolutions.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #349 +/- ##
==========================================
+ Coverage 74.37% 75.61% +1.24%
==========================================
Files 64 68 +4
Lines 4952 4876 -76
==========================================
+ Hits 3683 3687 +4
+ Misses 1269 1189 -80
Flags with carried forward coverage won't be shown. Click here to find out more.
|
6ad6e31
to
1ba9637
Compare
time_series_type = time_series_type, | ||
) | ||
|
||
# TODO: do we need this? The old way of calculating this required a single resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want this feature, it will have to be implemented in a different way.
fe417a8
to
c33e8ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see anything that looks problematic. Let's merge this as soon as as possible to make the PSI integration and open other PR's if further improvements are needed.
d99e02d
to
80013e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all but src/time_series_manager.jl
and src/time_series_metadata_store.jl
, I'll do those in a follow-up review. Marked some minor requests and questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part 2: I have now reviewed src/time_series_manager.jl
in its entirety and src/time_series_metadata_store.jl
up to line 146. I'll pick it up again tomorrow.
src/time_series_metadata_store.jl
Outdated
"owner_category TEXT NOT NULL", | ||
"features TEXT NOT NULL", | ||
# The metadata is included as a convenience for serialization/de-serialization, | ||
# specifically for types: time_series_type and scaling_factor_multplier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# specifically for types: time_series_type and scaling_factor_multplier. | |
# specifically for types: time_series_type and scaling_factor_multiplier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_series_type
is already a column though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its module is not. As it stands, we do not have columns for time_series_type's module, scaling_factor_multiplier, and the type and module for scaling_factor_multiplier. We could add them, guarantee that we will always have all columns for all metadata fields, and then remove this. Also, we would have to handle deserialization in a slightly more complicated way. Certainly possible.
# The metadata is included as a convenience for serialization/de-serialization, | ||
# specifically for types: time_series_type and scaling_factor_mulitplier. | ||
# There is a lot duplication of data. | ||
"metadata JSON NOT NULL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the length of this field for a time series with two features. 459 bytes. Here is likely a worst-case system: 100,000 components each with 10 time series arrays each with 2 features.
100_000 * 10 * 459 / (1024*1024)
437.73651123046875
437 MB wasted is likely not a big deal, but I'll prototype the alternative implementation just to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now covered all the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm comfortable with the state of this now. Let's merge!
New features:
Refactor/re-design:
Features removed: