Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Automate Tracking of PEcAn Package Dependencies Across the Project #3286

Closed
Sweetdevil144 opened this issue Apr 12, 2024 · 5 comments

Comments

@Sweetdevil144
Copy link
Contributor

Description

Is your feature request related to a problem? Please describe.
Identifying which PEcAn sub-packages are used across our project is currently manual, error-prone, and inefficient. This issue becomes significant especially given the goals of the GSoC project "Optimize PEcAn for freestanding use of single packages."

Proposed Solution

Describe the solution you'd like
I've developed a script that automatically scans R scripts in the PEcAn project, identifies usage of PEcAn sub-packages, and outputs a CSV file listing dependencies. This solution simplifies tracking dependencies, aiding in optimization and modularization efforts.

Alternatives Considered

Describe alternatives you've considered

  1. Manual Tracking: Inefficient and unsustainable with project scale.
  2. Static Code Analysis Tools: Less customized, not directly tailored to PEcAn's needs.

Additional Context

@mdietze
Copy link
Member

mdietze commented Apr 13, 2024

@mdietze mdietze closed this as completed Apr 13, 2024
@Sweetdevil144
Copy link
Contributor Author

Sweetdevil144 commented Apr 13, 2024

What you're suggesting already exists:

Thanks. I've been aware of the scripts/generate_dependencies.R file responsible for generating this script. But what I proposed was addition of a script that lists all PEcAN Packages and respective functions utilised internally by other PEcAn Packages. Although, now I realise that this would just be a subset of the original generate_dependencies.R script. Thanks for Correction. Also, below are links to my .R script and generated .csv file for a review:

https://github.com/Sweetdevil144/module-dependencies/blob/main/pecan_dependencies.csv

https://github.com/Sweetdevil144/module-dependencies/blob/main/find_package_utilizations.R

@Sweetdevil144
Copy link
Contributor Author

Sweetdevil144 commented Apr 13, 2024

Another point that I wanted to add was that my custom pecan_dependencies.csv also provides details on What functions are Utilised from our Imported Packages making it easy for us to determine our Process of Optimiation of Packages. Although a lot more Optimization in my .csv may be needed (for example : removal of common imports like PEcAn.logger which are being utilised for logging. Another removal may be related to PEcAn.db)

@infotroph
Copy link
Member

Being able to see which functions are called from which package does sound like a useful feature, though I have to say I’m much more often looking for all the functions a package calls from one particular dependency than I am in all functions from all its dependencies. If this can support that use case while providing an improvement in ergonomics over my default grep pkgname -R dirname, it could become a tool I reached for regularly.

A few other limitations I see in the current implementation:

  • It can only find calls to PEcAn packages, not other dependencies. I need to look for both often enough to prefer an approach that works for any package.
  • A corner case that would be easy to fix: You match on PEcAn followed by a literal dot, but
    some packages like PEcAnAssimSequential do not have a dot in their name.
  • Not sure why it ignores the tests/ directory (which often does contain unique dependencies) but not, say, inst/ (which I sometimes do and sometimes don’t care about — in many packages we use it to store outdated versions of scripts to be updated later).
  • only finds functions called via ::, which is by far the most common but there are legitimate cases where we import functions into the package and call them without namespaces instead.

Overall I doubt I’d use it in its current form, but if it helps you don’t let me stop you from using it! If you want to spent more time on it as a learning tool, I recommend thinking through how it could find all the functions from one arbitrary package.

@infotroph
Copy link
Member

infotroph commented Apr 25, 2024

A higher-level comment: Knowing what functions we use from where is a great strategy for debugging and for planning refactoring, but I’m less sure it’s necessary to automate it the way this issue proposes. The times I’d use this script would be manual invocations while aaking a focused question like “Ugh, [dependency] is causing installation problems, which functionality in [package] do we import it for? What would break if I remove it?” That’s usually easier to answer by searching for [dependency] on the fly than by looking it up in a big list of all the called functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants