Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize optimization.Table DB storage #143

Open
wants to merge 5 commits into
base: enh/remove-superfluous-ellipses
Choose a base branch
from

Conversation

glatterf42
Copy link
Member

This PR is a follow-up to #122: moving to the next optimization item, it redesigns the data model to normalize it and hopefully also make it more performant.

For Tables, this is a bit more tricky than for Indexsets. Tables have multiple columns, each of which needs to be linked to an Indexset but can have a name other than the Indexset's name. Each Table also needs to store data in table form, where different Tables will have a different number of columns.
Previously, we solved this by dumping the data into a JSON field, but that did not achieve the desired performance, nor did it follow DB model normalization rules.

In preparation of this PR, I first thought that our ideal model would look like this:

  • Table and Indexset should get a many-to-many relationship. Sqlalchemy's association proxy extension is an elegant solution for this, as it also offers to save optional fields (like distinct names) along the relationship.
  • Table.data only consists of values that are stored inside Indexset.data (that the Table is linked to), so ideally, we save just references to these values in a new table TableData and have a one-to-many relationship between Table and that. A relationship between that and IndexSetData seems like the ideal object for this.

Unfortunately, only the first point worked out. Sqlalchemy can't handle a relationship as part of a UniqueConstraint, preventing the INSERT workflow that this whole refactoring is aimed at achieving. Instead, the TableData is now storing the actual str form of the values, increasing storage requirements (compared to the ideal solution). This might be a source of future improvement.
To tackle the problem of varying numbers of columns, TableData is now hardcoded to store up to 15 values per row, most of which will always be NULL. This requires some clean up before returning the actual DB object like dropping NULL columns and renaming columns (to the name from either their indexset or specific column_name), which is likely the messiest part of this PR. Please let me know how that could be improved.

Locally, the tests run just as long as they do on main, but this is not proper benchmarking, which would be the actual test for this PR: it should handle a million values faster than main (and hopefully old ixmp). I previously tested this for Parameters (which utilize upserting rather than just inserting), though, so I might have to draft another PR very similar to this one for them and run the manual benchmark again. A proper benchmark setup for optimization items (to run automatically) would be great (depending on how long we need to reach ixmp's performance).

@glatterf42 glatterf42 added the enhancement New feature or request label Dec 17, 2024
@glatterf42 glatterf42 self-assigned this Dec 17, 2024
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 97.27273% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.1%. Comparing base (64d7787) to head (afd1630).

Files with missing lines Patch % Lines
ixmp4/data/db/optimization/table/repository.py 85.7% 3 Missing ⚠️
Additional details and impacted files
@@                       Coverage Diff                       @@
##           enh/remove-superfluous-ellipses    #143   +/-   ##
===============================================================
  Coverage                             88.1%   88.1%           
===============================================================
  Files                                  231     231           
  Lines                                 8125    8187   +62     
===============================================================
+ Hits                                  7160    7219   +59     
- Misses                                 965     968    +3     
Files with missing lines Coverage Δ
ixmp4/core/optimization/table.py 92.4% <100.0%> (-0.2%) ⬇️
ixmp4/data/abstract/optimization/table.py 96.9% <100.0%> (+<0.1%) ⬆️
ixmp4/data/api/optimization/table.py 94.0% <100.0%> (ø)
ixmp4/data/db/base.py 92.3% <100.0%> (+<0.1%) ⬆️
ixmp4/data/db/optimization/__init__.py 100.0% <100.0%> (ø)
ixmp4/data/db/optimization/base.py 100.0% <100.0%> (ø)
ixmp4/data/db/optimization/indexset/__init__.py 100.0% <100.0%> (ø)
ixmp4/data/db/optimization/indexset/repository.py 98.1% <100.0%> (ø)
ixmp4/data/db/optimization/table/model.py 100.0% <100.0%> (ø)
ixmp4/data/db/optimization/utils.py 100.0% <100.0%> (ø)
... and 4 more

@glatterf42 glatterf42 requested a review from meksor December 20, 2024 07:40
@glatterf42 glatterf42 force-pushed the enh/remove-superfluous-ellipses branch from f996fb5 to 64d7787 Compare December 20, 2024 13:24
@glatterf42 glatterf42 force-pushed the enh/normalize-table-DB-storage branch from 7a65d41 to afd1630 Compare December 20, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant