Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-design time series management #349

Merged
merged 20 commits into from
Apr 26, 2024
Merged

Re-design time series management #349

merged 20 commits into from
Apr 26, 2024

Conversation

daniel-thom
Copy link
Contributor

@daniel-thom daniel-thom commented Apr 18, 2024

New features:

  • Allows addition of user-defined features to time series arrays.
  • Adds support for different time series resolutions.

Refactor/re-design:

  • Store time series metadata in a SQLite database instead of per-component dictionaries. This allows system-wide SQL queries instead of looping across component dictionaries.
  • Consolidate management of time series in TimeSeriesManager instead of individual time series storage implementations.

Features removed:

  • get_time_series and get_time_series_multiple no longer support abstract types. This could be restored, but I think it’s better this way. list_time_series* methods support abstract types.

- Store time series metadata in a SQLite database instead of
per-component dictionaries. This allows system-wide SQL queries
instead of looping across component dictionaries.
- Consolidate management of time series in TimeSeriesManager instead of
individual time series storage implementations.
- Support addition of user-defined features to time series arrays.
- Add support for different time series resolutions.
@daniel-thom daniel-thom changed the title feat(time-series): Re-design time series management Re-design time series management Apr 18, 2024
Copy link

codecov bot commented Apr 18, 2024

Codecov Report

Attention: Patch coverage is 92.61905% with 62 lines in your changes are missing coverage. Please review.

Project coverage is 75.61%. Comparing base (335a5a5) to head (5c89df2).
Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
+ Coverage   74.37%   75.61%   +1.24%     
==========================================
  Files          64       68       +4     
  Lines        4952     4876      -76     
==========================================
+ Hits         3683     3687       +4     
+ Misses       1269     1189      -80     
Flag Coverage Δ
unittests 75.61% <92.61%> (+1.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/InfrastructureSystems.jl 80.00% <ø> (ø)
src/abstract_time_series.jl 85.71% <100.00%> (+10.71%) ⬆️
src/component.jl 91.66% <100.00%> (-3.60%) ⬇️
src/containers.jl 100.00% <100.00%> (ø)
src/deterministic_metadata.jl 95.45% <100.00%> (+86.36%) ⬆️
src/probabilistic.jl 81.66% <100.00%> (ø)
src/scenarios.jl 85.18% <100.00%> (ø)
src/serialization.jl 70.33% <ø> (+9.70%) ⬆️
src/single_time_series.jl 68.00% <100.00%> (ø)
src/supplemental_attribute.jl 94.44% <100.00%> (+13.59%) ⬆️
... and 22 more

... and 3 files with indirect coverage changes

src/system_data.jl Outdated Show resolved Hide resolved
test/test_time_series.jl Outdated Show resolved Hide resolved
test/test_time_series.jl Outdated Show resolved Hide resolved
@daniel-thom daniel-thom force-pushed the dt/time-series-sqlite branch from 6ad6e31 to 1ba9637 Compare April 19, 2024 17:05
src/hdf5_time_series_storage.jl Show resolved Hide resolved
src/system_data.jl Show resolved Hide resolved
time_series_type = time_series_type,
)

# TODO: do we need this? The old way of calculating this required a single resolution.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want this feature, it will have to be implemented in a different way.

@daniel-thom daniel-thom force-pushed the dt/time-series-sqlite branch from fe417a8 to c33e8ba Compare April 19, 2024 23:28
Copy link
Member

@jd-lara jd-lara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see anything that looks problematic. Let's merge this as soon as as possible to make the PSI integration and open other PR's if further improvements are needed.

@daniel-thom daniel-thom force-pushed the dt/time-series-sqlite branch from d99e02d to 80013e9 Compare April 23, 2024 19:56
Copy link
Contributor

@GabrielKS GabrielKS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all but src/time_series_manager.jl and src/time_series_metadata_store.jl, I'll do those in a follow-up review. Marked some minor requests and questions.

src/InfrastructureSystems.jl Show resolved Hide resolved
src/InfrastructureSystems.jl Show resolved Hide resolved
src/component.jl Outdated Show resolved Hide resolved
src/descriptors/structs.json Outdated Show resolved Hide resolved
src/hdf5_time_series_storage.jl Show resolved Hide resolved
src/utils/print.jl Show resolved Hide resolved
src/utils/sqlite.jl Show resolved Hide resolved
src/utils/test.jl Show resolved Hide resolved
test/test_time_series.jl Show resolved Hide resolved
test/test_time_series.jl Outdated Show resolved Hide resolved
Copy link
Contributor

@GabrielKS GabrielKS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2: I have now reviewed src/time_series_manager.jl in its entirety and src/time_series_metadata_store.jl up to line 146. I'll pick it up again tomorrow.

src/time_series_manager.jl Show resolved Hide resolved
src/time_series_manager.jl Show resolved Hide resolved
src/time_series_manager.jl Show resolved Hide resolved
src/time_series_manager.jl Show resolved Hide resolved
src/time_series_manager.jl Show resolved Hide resolved
"owner_category TEXT NOT NULL",
"features TEXT NOT NULL",
# The metadata is included as a convenience for serialization/de-serialization,
# specifically for types: time_series_type and scaling_factor_multplier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# specifically for types: time_series_type and scaling_factor_multplier.
# specifically for types: time_series_type and scaling_factor_multiplier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time_series_type is already a column though, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its module is not. As it stands, we do not have columns for time_series_type's module, scaling_factor_multiplier, and the type and module for scaling_factor_multiplier. We could add them, guarantee that we will always have all columns for all metadata fields, and then remove this. Also, we would have to handle deserialization in a slightly more complicated way. Certainly possible.

src/time_series_metadata_store.jl Outdated Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
# The metadata is included as a convenience for serialization/de-serialization,
# specifically for types: time_series_type and scaling_factor_mulitplier.
# There is a lot duplication of data.
"metadata JSON NOT NULL",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the length of this field for a time series with two features. 459 bytes. Here is likely a worst-case system: 100,000 components each with 10 time series arrays each with 2 features.

100_000 * 10 * 459 / (1024*1024)
437.73651123046875

437 MB wasted is likely not a big deal, but I'll prototype the alternative implementation just to see.

Copy link
Contributor

@GabrielKS GabrielKS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now covered all the code.

test/test_time_series.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_manager.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Show resolved Hide resolved
src/time_series_metadata_store.jl Outdated Show resolved Hide resolved
Copy link
Contributor

@GabrielKS GabrielKS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable with the state of this now. Let's merge!

@daniel-thom daniel-thom merged commit e87f7b8 into main Apr 26, 2024
6 of 9 checks passed
@daniel-thom daniel-thom deleted the dt/time-series-sqlite branch April 26, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants