Parameterize treatment of item dimensions' types (#218) #219

zikolach · 2019-11-22T15:54:00Z

bring back special handling for year dim
extend to any dim containing year reference
simplify df creation by using again from_dict

Note: it turns out from_dict has same/better performance in comparison with alternative aproach passing index to empty df upon creation.
It also allows to have int dtype (which empty df doesn't support because of NaNs)

- bring back special handling for year dim - extend to any dim containing year reference - simplify df creation by using again from_dict Note: it turns out from_dict has same/better performance in comparison with alternative aproach passing index to empty df upon creation. It also allows to have int dtype (which empty df doesn't support because of NaNs)

- add pytest-benchmark dependency - add 2 tests - change jdbc module (_temp_dbprops method) to allow in memory hsqldb That's not final implementation. Intended just as a trial to see what can be done with pytest-benchmark. Still need to figure out: - how to make measurements stable between runs - where to save test results to do comparison with previous runs

zikolach · 2019-11-23T14:24:29Z

@khaeru one of the test is failing because I changed backend method to let using url for hsqldb. I needed that to use in-memory db in test. What was your idea behind limiting hsqldb to use only file-based Dbs?

khaeru

@zikolach thanks for making a start at this. Adding benchmarks will partly address #215.

To your question:

I changed backend method to let using url for hsqldb. I needed that to use in-memory db in test. What was your idea behind limiting hsqldb to use only file-based Dbs?

Using in-memory storage for HyperSQL was never (a) advertised, (b) documented, nor (c) tested in ixmp. If it worked previously, that was purely by accident. I didn't “limit” it intentionally; I preserved all the behaviour that we had advertised, documented, and tested.

That said, I think it would be a nice additional feature to support in-memory databases. (This could actually be used to simplify the test infrastructure a lot.) But if it is to be actually supported, that requires all of (a), (b), and (c).

Can you please…

Remove type conversion for years. Per Handle 'year' integer types in message_ix.Scenario message_ix#268, this does not belong in ixmp.
- Please change the title of the PR accordingly.
- As you do this, I will make a new PR to add the code in message_ix, where it belongs. I will also add tests, which are missing from this PR.
Move tests to the correct locations:
- tests/backend/test_jdbc.py already exists; add tests here instead of creating a new file.
- Place the methods save_par and read_par inside their corresponding test functions. Then you can even move lines line scen = ixmp.Scenario(… out of save_par into the surrounding function test_read_par_10000—since that line is not what is being benchmarked.
- Make init_platform a pytest fixture; see the example of ixmp.testing.test_mp.
Add comments inline indicating what the desired performance is for the benchmarked code. Otherwise, we cannot use these to answer the question, “Is the code still as fast as is required?”
- If possible, also leave a comment here with the benchmark values prior to this PR, to substantiate the claim “has same/better performance in comparison with alternative aproach”. (I personally believe this is true! But I would like to set a good example for how to discuss performance improvements in the future!)
Consider the how in-memory hsqldb will be handled in the config.json file and via the command-line. See ixmp._config.Config.add_platform and reference ixmp.cli.platform.
- Add tests for the new feature. See test_config_platform in tests/test_config.py and test_platform in tests/test_cli.py.
- Document the new feature. See for instance doc/source/api-backend.rst.
Address code style per Stickler.

After these, please ask for a re-review.

zikolach · 2019-11-24T12:47:38Z

@khaeru thank you very much for reviewing and commenting on PR. Sorry for not clarifying it in first place - I should not probably request "normal" review, but rather ask for suggestions as I consider this changes as draft.

Remove type conversion for years...

I commented on that in linked ticket iiasa/message_ix#268 (comment). I am ok with both variants mentioned there. The change made here as appropriate ticket #218 exists in this repo. I will remove it from code for the time being, but let's discuss it quickly next week.

Place the methods save_par and read_par inside their corresponding test functions. Then you can even move lines line scen = ixmp.Scenario(… out of save_par into the surrounding function test_read_par_10000—since that line is not what is being benchmarked.

Under normal circumstances your suggestion totally makes sense. But in current setup benchmark does execute the method multiple times (calculating min/max/mean etc), therefore it needs every time new version of scenario (or new database, or other aproach). Creating new scenario every time add constant time and should not have significant effect on large data volumes. Can you suggest how to implement it better with pytest-benchmark?
I'll put test to subfolder, but I would put it to separate file as non-functional, performance test.

Consider the how in-memory hsqldb...

My intention was not to expose this "feature" to external usage, but rather make it possible to use for tests. Should we document features which solely used for testing?
My comment regarding "limiting" was referring to the fact that new implementation is restrictive in this particular aspect (not being said it is good or bad). Sorry for any possible misinterpretation.
If we are going to "advertise" this feature to be used by package users - let's make a separate issue/PR adding documentation/tests/etc.

Originally I didn't want to introduce a lot of changes, but since @OFR-IIASA experienced issues with performance reading parameters elements I got an impression ticket #218 is critical to fix rather quickly.

khaeru · 2019-11-24T13:25:28Z

it needs every time new version of scenario (or new database

Okay, that's a good point—I would then suggest to add a comment like # Create a new, empty Scenario for the test above that line.

I'll put test to subfolder, but I would put it to separate file as non-functional, performance test.

Sure, maybe something like tests/backend/test_jdbc_performance.py. Then other backends could have a similar split.

Should we document features which solely used for testing?

Yes, absolutely. Our test suite is already large and complex and I'm not sure anyone fully understands it. To give people in the future any hope of maintaining it (or allow other people, e.g. colleagues of @gidden, to contribute easily) we need to make it easy to read. I think that should include not making use of 'hidden' features. So if the feature is documented well enough to read the test suite, it is not much additional work to make it a public feature. (Our new RAs are a resource here: if they can't understand the feature used in testing, that means we can do better.)

I understand that the message_data performance issue @OFR-IIASA encountered was urgent to fix; however, in our haste to respond, #213 was merged before we had time to think about it carefully. That in turn spawned #218, this PR, iiasa/message_ix#268, and iiasa/message_ix#269.

Since the performance of #213 was ruled “acceptable”, I think the additional performance improvement from this PR (BTW, see above where I said “leave a comment here with the benchmark values prior to this PR”) does not need to be similarly rushed.

- move file to subfolder - introduce fixture (in-mem db platform instance)

zikolach · 2019-11-25T10:15:42Z

Add comments inline indicating...

~~Can we discuss it?~~
Originally I read your comment not as "leave a comment here", but rather in test script.

zikolach · 2019-11-25T10:26:22Z

Sample output of jdbc backend performance test.

-------------------------------------------------------------------------------------- benchmark: 2 tests -------------------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean             StdDev              Median                 IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_save_par_1000        74.7618 (1.0)      102.1507 (1.0)       87.6907 (1.0)       8.6063 (1.0)       89.6419 (1.0)       12.8233 (1.0)           5;0  11.4037 (1.0)          13           1
test_read_par_100000     690.2758 (9.23)     875.4846 (8.57)     786.7553 (8.97)     82.7296 (9.61)     756.9678 (8.44)     144.6853 (11.28)         3;0   1.2710 (0.11)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

It indicated roughly:

saving 1000-element parameter takes 87 milliseconds
reading 100000-element parameter takes 786 milliseconds
The results are highly environment dependent and should not be used in combination with results taken from another environment.

zikolach · 2019-11-25T13:04:13Z

As discussed with @khaeru we close this PR for the time being as:

moving conversion to non-string dtypes to ixmp core is delayed until figuring out all the requirements (possibly for another typed of backends)
in-memory hsqldb support would be addressed as separate ticket (Document support for in-memory HyperSQL databases #224)
performance tests will be added separately (Add performance tests #215)

zikolach requested review from khaeru and OFR-IIASA November 22, 2019 15:54

zikolach force-pushed the feature/year-int-conversion branch from 09fee42 to 2dfcbd2 Compare November 22, 2019 15:56

zikolach self-assigned this Nov 23, 2019

khaeru requested changes Nov 24, 2019

View reviewed changes

zikolach mentioned this pull request Nov 24, 2019

Handle 'year' integer types in message_ix.Scenario iiasa/message_ix#268

Closed

zikolach added 3 commits November 25, 2019 10:53

Allow parameterize item_get_elements with dtypes_map

b35b3bb

Update jdbc perf test

5359ef2

- move file to subfolder - introduce fixture (in-mem db platform instance)

Add hsqldb url validation

b4bad4f

zikolach changed the title ~~Convert year dimensions to integers (#218)~~ Parameterize treatment of item dimensions' types (#218) Nov 25, 2019

zikolach mentioned this pull request Nov 25, 2019

Document support for in-memory HyperSQL databases #224

Closed

zikolach closed this Nov 25, 2019

zikolach deleted the feature/year-int-conversion branch February 7, 2020 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameterize treatment of item dimensions' types (#218) #219

Parameterize treatment of item dimensions' types (#218) #219

zikolach commented Nov 22, 2019

zikolach commented Nov 23, 2019

khaeru left a comment •

edited by zikolach

Loading

zikolach commented Nov 24, 2019

khaeru commented Nov 24, 2019

zikolach commented Nov 25, 2019 •

edited

Loading

zikolach commented Nov 25, 2019

zikolach commented Nov 25, 2019

Parameterize treatment of item dimensions' types (#218) #219

Parameterize treatment of item dimensions' types (#218) #219

Conversation

zikolach commented Nov 22, 2019

zikolach commented Nov 23, 2019

khaeru left a comment • edited by zikolach Loading

Choose a reason for hiding this comment

zikolach commented Nov 24, 2019

khaeru commented Nov 24, 2019

zikolach commented Nov 25, 2019 • edited Loading

zikolach commented Nov 25, 2019

zikolach commented Nov 25, 2019

khaeru left a comment •

edited by zikolach

Loading

zikolach commented Nov 25, 2019 •

edited

Loading