Merge of the transfer entropy sources and fixes, modification to master #5

mafycek · 2021-12-02T20:02:54Z

No description provided.

…ek_refactoring_mutual_information_part

# Conflicts: # .github/workflows/ci.yml # examples/1-basic_manipulation_and_interpolation.py # examples/3-nans_in_data.py # examples/4-wavelet_analysis.py # examples/5-surrogates_and_VARmodel.py # examples/6-mutual_inf_and_surrogates.py # examples/7-SSA_analysis.py # examples/8-empirical_model.py # pyclits/data_loaders.py # pyclits/empirical_model.py # pyclits/geofield.py # pyclits/mutual_inf.py # pyclits/surrogates.py # pyclits/var_model.py # pyclits/var_model_acc.pyx # pyclits/wavelet_analysis.py

jajcayn · 2021-12-03T10:44:43Z

Hey @mafycek,
Thanks for the PR! So this is HUGE! When Milan told me a while ago, you want to add some functionality regarding (transfer) entropy, and I was expecting a few files with some functions:)

First and foremost, and maybe up to a debate, pyclits was meant to be a Python CLImate TimeSeries package, hence the name... To be honest, I do not like finance data plugins, Roessler, etc... pyclits's primary purpose was to simplify the work with climate data, some methods to decompose and seek (quasi-)oscillatory phenomena, and perform analyses using the Shannon entropy framework. Having other methods utilizing, e.g., Renyi entropy, etc., is more than welcome, but having a lot more code for working with non-climate data feels over the top. The question is whether to not keep pyclits really about the climate and maybe create another package specifically for entropies and measures related to that...
To be honest, I am leaning towards creating a package for entropies that would include examples and data plugins from testing systems like Roessler or Lorenz, climate data, finance data, and theoretically neuro data, and then maybe use it in pyclits :)

Anyway, in the meantime, I started rewriting the main functionality of pyclits, which is handling the geospatial data, wavelets to bring it to more quality code. That being said, I rewrote almost everything, added tests, coverage checks, and renamed some files. Before I would dive it the code itself, I would like to ask to you restructure the changes:

I see you merged my rewrite/master branch here, but I think that is not how it should be done. It'll make git history really ugly. Either we first merge your branch to master, then rebase mine and merge mine, or the other way around. I do not really care about the order. I think we can first merge yours and then my rewrite. If you need some features from my branch (e.g., setup of CI), you can cherry-pick some commits. If you agree, please reset your branch to the commit before merging my rewrite here; if needed, cherry-pick some commits, and then push.

Some other comments regarding the code:

I am using black formatter to not think about the formatting, have minimal changes, and so that each file is formatted the same... you can use it in your code editor or as a pre-commit feature. I am using it as black -l 80, which keeps line length to 80, which is a reasonable python default. Could you please use black on all pyclits code?
I am also using isort with default settings to keep imports sorted
all tests go to folder tests (not test); please move
all test filenames should be named as test_<name of the python file>.py
for testing, I suggest using the unittest package so that each test is a class with unittest.TestCase as a parent, individual tests are methods of that class (the method name has to start with test_), and each test file is ended with a code block:

if __name__ == "__main__":
    unittest.main()

you can see the general structure, e.g., here> https://github.com/jajcayn/pyclits/blob/rewrite/master/tests/test_mutual_inf.py
for running the tests, I am using pytest: when you are in the pyclits root, you can just run pytest in your command line

I see some files in your test directory are named example... I suggest making a clear division between tests and examples... first of all, examples should showcase the code, tests should test it, e.g., compute something, and you assert that the computation is correct... Please do think about whether something is an example (and put it into the example folder), or a test (to a tests folder and use the unittest package)
I see you are using some new python packages; please add them to requirements.txt and think about whether you really need to use that package... I do not want pyclits to become huge in size, and it should be a small-ish library
after adding your files, the pyclits folder is now huge and unreadable.. we need to find a structure and divide the whole package into sub-packages... it's not OK having one folder with 20+ python files.
- e.g. sub-package data_generators would contain random_samples, roessler_system, sample_generator, finance_data_plugin, etc
- sub-package visualization or plotting would contain all plotting functions
- sub-package entropy would contain all things related to computation of entropies, MI, CMI, TE, etc
I like to have a clear division between "codebase" (providing functions and classes) and "scripts" (utilizing the codebase to do something specific)... I see many of your files defining functions and then the name == main block, which computes something (e.g., roessler_system.py)... In the case of Roessler, I would, e.g., keep just the function and put roessler_example.py to the example folder
a general rule of programming - a function should do one thing... E.g., your roessler_oscillator function from roessler_system.py unpacks parameters, integrates the system, and saves to a file... What if I do not want to save it? Why not make one general saving function (e.g., to a file utils.py) which accepts NumPy arrays of arbitrary dimensionality and size and saves it?
I can see a lot of plotting functions... I am not a fan of having a specific plotting function within packages... When plotting is specific, then sure. But, e.g., visualization of Roessler is just a line plot... I do believe when some is using our package and integrating the Roessler system, they can plot the result, so I do not see a need for these simple plotting functions
two specific functions: roessler_3d_plot and roessler_3d_multocolor_plot are practically the same, and only one line is changed... make it one function with an argument, e.g., multicolor=False
do not use print; apart from simple debugging, it is really frowned upon... python has an excellent logging library (import logging), and you can use 4 levels of logging (logging.info - instead of print, logging.warning, logging.error and logging.debug)
all filenames should be lowercase, functions should use snake_case naming, classes should use CamelCase naming

Sorry for a huge comment, I just want to make sure we are on the same page regarding this package

mafycek · 2022-01-17T09:36:18Z

Hi @jajcayn ,
since I started to work on the grant with Milan and Petr. The Renyi transfer entropy(RTE) method is now implemented with number use cases. The PT is big because it contains these changes but I rewrited virtual environment to use poetry(btw. it is the reason why tests fail - I forgot to update the test scripts) and I also merged your new changes to the master.

Surely, it does not make sense to include use cases of the RTE if the package is intended to be only for climate data. Examples can be extracted into a new project intended for examples. It would need finalization and stabilization of the actual package and its deployment to the Python package repository.

Regarding, order of merging to the master. It should be done like you mentioned. So I'll reject the PR. However, I do not agree that the main issue is the history. You can squash set of commits into one during PR.

Then, I'll follow your comments. Regarding black formatter, I use PyCharm from JetBrains that has number of plugins and it automatically holds format of the sources.

Best regards,
Hynek
P.S. We can make a phone/skype/googlemeet call to sync our activities.

Nikola Jajcay and others added 30 commits September 3, 2019 17:06

started rewriting base class DataField to python3 and xarray backend

1fcb970

added assert for months variable

410dc5b

finished basic datafield

0e9e39d

add save method

8195f3b

fix in save

751855d

simple init for now

4405e51

started with tests for geofield: load-save test

e8e6f67

continue datafield tests

5ca45ef

new tests and now datafield seems to work properly

247db94

fix in deseasonalise

1de1121

Implementation of transfer entropy and entropy according to Paly,2010

48d9ebe

Conditional transfer entropy

b09c1ae

Merge remote-tracking branch 'remotes/origin/rewrite/master' into hyn…

60ccc9e

…ek_refactoring_mutual_information_part

Renyi entropy for k-th neighbor

5154c6a

Renyi entropy for k-th neighbor

bd73041

Testing of Renyi entropy

966ce0d

Fix of Renyi entropy of normal distribution

061fd15

Implementation of knn renyi entropy

90f76ae

Fix of knn estimator for Shannon entropy

4aeb222

Fixes of error and improvement of simulation of gaussian noise

3678ce5

Calculation of Renyi entropy

f8ef57c

Tests of the implementation

8b921b6

Poetry in the project

f8e359d

Roessler oscilator modification

b90cdb2

Sample generator from an array

2e2b821

Renyi transfer entropy

5504913

Improving error handling

e7db244

Fix of transfer entropy

3a02caa

Refactoring of mutual information

8e95f48

Fix of transfer entropy

8fc2063

jajcayn and others added 23 commits November 2, 2021 20:44

macos problems with p36 and cartopy

688791e

finish tests for mutual inf

cec10b9

add geoplot tests

88412dd

started surrogate tests

04f77f2

continue surrogate tests

4d71553

try python3.10?

7ba7907

try python3.10?

37de7ce

CCTX example

2d5f577

finish surrogate tests

61ed627

finish geofield tests

fac82b4

rewrite SSA and add tests

26ac4bf

edit functions and add tests

8bd2dab

relax function tests

2919d50

ignore empirical model in codecov for now

e87491d

add num workers argument to geofield

5f4d9a1

comment in empirical_model

16632ab

Renyi entropy

8c34c1e

Renyi entropy restructuring into special file

eec9464

Fixing dependencies and arbitrary precision parameter

cba137c

Fix of the titles and labels of graphs for financial datasets

d41eab0

Fix of output figures for transfer entropy

ab2b2c0

Update packages

ceb0976

Merge branch 'master' into TE_merge

f98bf91

mafycek added the enhancement label Dec 2, 2021

mafycek added this to the Transfer entropy calculation and analysis milestone Dec 2, 2021

mafycek requested a review from jajcayn December 2, 2021 20:02

mafycek self-assigned this Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge of the transfer entropy sources and fixes, modification to master #5

Merge of the transfer entropy sources and fixes, modification to master #5

mafycek commented Dec 2, 2021

jajcayn commented Dec 3, 2021

mafycek commented Jan 17, 2022

Merge of the transfer entropy sources and fixes, modification to master #5

Are you sure you want to change the base?

Merge of the transfer entropy sources and fixes, modification to master #5

Conversation

mafycek commented Dec 2, 2021

jajcayn commented Dec 3, 2021

mafycek commented Jan 17, 2022