Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge of the transfer entropy sources and fixes, modification to master #5

Open
wants to merge 246 commits into
base: master
Choose a base branch
from

Conversation

mafycek
Copy link
Collaborator

@mafycek mafycek commented Dec 2, 2021

No description provided.

@mafycek mafycek requested a review from jajcayn December 2, 2021 20:02
@mafycek mafycek self-assigned this Dec 2, 2021
# Conflicts:
#	.github/workflows/ci.yml
#	examples/1-basic_manipulation_and_interpolation.py
#	examples/3-nans_in_data.py
#	examples/4-wavelet_analysis.py
#	examples/5-surrogates_and_VARmodel.py
#	examples/6-mutual_inf_and_surrogates.py
#	examples/7-SSA_analysis.py
#	examples/8-empirical_model.py
#	pyclits/data_loaders.py
#	pyclits/empirical_model.py
#	pyclits/geofield.py
#	pyclits/mutual_inf.py
#	pyclits/surrogates.py
#	pyclits/var_model.py
#	pyclits/var_model_acc.pyx
#	pyclits/wavelet_analysis.py
@jajcayn
Copy link
Owner

jajcayn commented Dec 3, 2021

Hey @mafycek,
Thanks for the PR! So this is HUGE! When Milan told me a while ago, you want to add some functionality regarding (transfer) entropy, and I was expecting a few files with some functions:)

First and foremost, and maybe up to a debate, pyclits was meant to be a Python CLImate TimeSeries package, hence the name... To be honest, I do not like finance data plugins, Roessler, etc... pyclits's primary purpose was to simplify the work with climate data, some methods to decompose and seek (quasi-)oscillatory phenomena, and perform analyses using the Shannon entropy framework. Having other methods utilizing, e.g., Renyi entropy, etc., is more than welcome, but having a lot more code for working with non-climate data feels over the top. The question is whether to not keep pyclits really about the climate and maybe create another package specifically for entropies and measures related to that...
To be honest, I am leaning towards creating a package for entropies that would include examples and data plugins from testing systems like Roessler or Lorenz, climate data, finance data, and theoretically neuro data, and then maybe use it in pyclits :)

Anyway, in the meantime, I started rewriting the main functionality of pyclits, which is handling the geospatial data, wavelets to bring it to more quality code. That being said, I rewrote almost everything, added tests, coverage checks, and renamed some files. Before I would dive it the code itself, I would like to ask to you restructure the changes:

I see you merged my rewrite/master branch here, but I think that is not how it should be done. It'll make git history really ugly. Either we first merge your branch to master, then rebase mine and merge mine, or the other way around. I do not really care about the order. I think we can first merge yours and then my rewrite. If you need some features from my branch (e.g., setup of CI), you can cherry-pick some commits. If you agree, please reset your branch to the commit before merging my rewrite here; if needed, cherry-pick some commits, and then push.

Some other comments regarding the code:

  • I am using black formatter to not think about the formatting, have minimal changes, and so that each file is formatted the same... you can use it in your code editor or as a pre-commit feature. I am using it as black -l 80, which keeps line length to 80, which is a reasonable python default. Could you please use black on all pyclits code?
  • I am also using isort with default settings to keep imports sorted
  • all tests go to folder tests (not test); please move
  • all test filenames should be named as test_<name of the python file>.py
  • for testing, I suggest using the unittest package so that each test is a class with unittest.TestCase as a parent, individual tests are methods of that class (the method name has to start with test_), and each test file is ended with a code block:
if __name__ == "__main__":
    unittest.main()

you can see the general structure, e.g., here> https://github.com/jajcayn/pyclits/blob/rewrite/master/tests/test_mutual_inf.py
for running the tests, I am using pytest: when you are in the pyclits root, you can just run pytest in your command line

  • I see some files in your test directory are named example... I suggest making a clear division between tests and examples... first of all, examples should showcase the code, tests should test it, e.g., compute something, and you assert that the computation is correct... Please do think about whether something is an example (and put it into the example folder), or a test (to a tests folder and use the unittest package)
  • I see you are using some new python packages; please add them to requirements.txt and think about whether you really need to use that package... I do not want pyclits to become huge in size, and it should be a small-ish library
  • after adding your files, the pyclits folder is now huge and unreadable.. we need to find a structure and divide the whole package into sub-packages... it's not OK having one folder with 20+ python files.
    • e.g. sub-package data_generators would contain random_samples, roessler_system, sample_generator, finance_data_plugin, etc
    • sub-package visualization or plotting would contain all plotting functions
    • sub-package entropy would contain all things related to computation of entropies, MI, CMI, TE, etc
  • I like to have a clear division between "codebase" (providing functions and classes) and "scripts" (utilizing the codebase to do something specific)... I see many of your files defining functions and then the name == main block, which computes something (e.g., roessler_system.py)... In the case of Roessler, I would, e.g., keep just the function and put roessler_example.py to the example folder
  • a general rule of programming - a function should do one thing... E.g., your roessler_oscillator function from roessler_system.py unpacks parameters, integrates the system, and saves to a file... What if I do not want to save it? Why not make one general saving function (e.g., to a file utils.py) which accepts NumPy arrays of arbitrary dimensionality and size and saves it?
  • I can see a lot of plotting functions... I am not a fan of having a specific plotting function within packages... When plotting is specific, then sure. But, e.g., visualization of Roessler is just a line plot... I do believe when some is using our package and integrating the Roessler system, they can plot the result, so I do not see a need for these simple plotting functions
  • two specific functions: roessler_3d_plot and roessler_3d_multocolor_plot are practically the same, and only one line is changed... make it one function with an argument, e.g., multicolor=False
  • do not use print; apart from simple debugging, it is really frowned upon... python has an excellent logging library (import logging), and you can use 4 levels of logging (logging.info - instead of print, logging.warning, logging.error and logging.debug)
  • all filenames should be lowercase, functions should use snake_case naming, classes should use CamelCase naming

Sorry for a huge comment, I just want to make sure we are on the same page regarding this package

@mafycek
Copy link
Collaborator Author

mafycek commented Jan 17, 2022

Hi @jajcayn ,
since I started to work on the grant with Milan and Petr. The Renyi transfer entropy(RTE) method is now implemented with number use cases. The PT is big because it contains these changes but I rewrited virtual environment to use poetry(btw. it is the reason why tests fail - I forgot to update the test scripts) and I also merged your new changes to the master.

Surely, it does not make sense to include use cases of the RTE if the package is intended to be only for climate data. Examples can be extracted into a new project intended for examples. It would need finalization and stabilization of the actual package and its deployment to the Python package repository.

Regarding, order of merging to the master. It should be done like you mentioned. So I'll reject the PR. However, I do not agree that the main issue is the history. You can squash set of commits into one during PR.

Then, I'll follow your comments. Regarding black formatter, I use PyCharm from JetBrains that has number of plugins and it automatically holds format of the sources.

Best regards,
Hynek
P.S. We can make a phone/skype/googlemeet call to sync our activities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants