-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precise versioning with local branches #118
base: next_release
Are you sure you want to change the base?
Conversation
One upshot of this commit is that installing in developer mode with |
Hmm, I definitely need to think a little about this. A few points:
I haven't really thought through how all this relates to what you're doing in this branch, but I at least wanted to share my initial reactions. |
Thanks for the quick feedback. Re: Points 1-3 For people who stick to official releases, the only impact will be an unambiguous record of which version of Switch was used to make their results, and a clear indication of whether they accidentally wandered into a branch that diverged from an official release. In other projects, I've found that accurately tracking (and recording) local versions to be invaluable for expedient troubleshooting and retrospectively understanding how results change as code evolves. While local versioning can be helpful for releasing results for a study (like the pegged git checkout strategy you describe), I've primarily used it for maintaining good records internally. As far as I can remember, every study I've done or collaborated on has required some code customizations, only a subset of which ever made it into a master branch. This is both with Switch v2 & v1. I expect the only exceptions to the need for custom branches will be if every edit that is needed for a particular study is accepted into the master branch and tested for backwards compatibility (easier to guarantee if working solo or unilaterally, harder if working on a shared codebase). Even in cases where people primarily wished to adjust inputs of an established study (like the Rhodium Group's extension of a Hawaiian study), they still required custom exports and other tweaks. As the codebase evolves and matures, the need to push the boundaries may reduce, but I don't expect it to fully disappear. Re: Point 4 For people who use official releases of Switch, this will have no impact on the version they see. While Switch 2.0 makes it possible to do any customization by writing new modules outside of the switch_model package (including copy + edit of core modules), I generally recommend learning git and committing to a branch because:
Re: Point 5 Re: Point 6 Re: Point 7 |
I forgot to respond to the data upgrade issue. I see support for data upgrades & backwards compatibility as strictly limited to official sequential releases. |
… exact version of code can be known: release[+gitsha]+localmod. If git is available, the release will be based on the last tag in the git history that starts with "2". [+gitsha] will be ignored if this is exactly a release. +localmod will be dropped if there are no uncommitted modifications to the code. If the git is unavailable or the attempt to get more precise information fails for whatever reason, the base version as recorded in version.py will be used. Also, save the installed switch version in the output directory for improved record keeping and reproducibility, especially during active software development.
f099573
to
b5b1a28
Compare
0f21b73
to
8a7ec83
Compare
… git descriptions in some contexts.
…ged version. Also use the git-standard "dirty" suffix instead of "localmod" for installations from code that hasn't been committed.
Finally getting back to this pull request, and I forgot we even had this much discussion of it. I'll check back to your comments above, but after looking at the code, I'm inclined to simplify this a lot:
This simpler version would support
|
@josiahjohnston, I think we have two fairly different workflows for using Switch, so I'm looking for something that will work for both. To do that, it would help to know a little more about your workflow. The code in this branch seems to assume that you will run By the way, my workflow is generally to have one environment that I use for most active models, and I use In your workflow, the git repository is visible when you run |
I used virtual environments instead of docker containers. Docker containers were the next step up. I tried offering to set those up while I was still working on this in a professional capacity (not sure if I communicated that intent well), but never got around to that. It's not clear to me if that would help usability with target user base. Dockerfiles are easy enough to set up, and I might be able to pull one together if I stayed up late some night. If you went with dockerfiles, then Yup, you are right with impacts of Yup, data upgrade support wouldn't and shouldn't be applied to the precise versions that only differ in the git hash suffix. That functionality only applies if you bother bumping the version number. All that being said, most people I've worked with are sloppy about git repos and traceability. I keep hoping people will up their game, possibly with the aid of data science curriculums and "Best Practices in Scientific Computing", but that's probably too optimistic. I regard this functionality as crucial for traceability & reproducibility for scientific computing. This is especially important in planning major long-term societal investments and the fate of our planet with global warming, since minor changes to models can produce wildly different results (whether by intention or accident), long-term models are often not numerically stable, and inputs have large uncertainties (both for present and long-term forecasts). But if most practitioners never bother to go through systematic processes, and most published policy papers on energy models decline to release their datasets or code, then I don't know if this feature matters from a practical perspective. And if your use cases involve releasing code and final runs with a single version of code, without needing traceability in your intermediate runs because you are that good, then maybe this isn't useful for you either.. I don't know what changes you are proposing or how that would impact things I used to use on a day-to-day basis to solve my pain points. I'm not working with this codebase in a professional capacity now and don't have the bandwidth to contribute in any real way, or get a deeper dive into how active or hypothetical energy modelers will use this software. If this PR seems useful to you or other users, then keep. If not, do whatever seems useful. If I manage to return to this in the future, I'll take a look at the outcome and can always restore portions that I need for my process & workflow. |
Thanks, that's good to know. I may postpone this for now because it's getting complicated. For later reference, I think there is a strategy that could meet both of our needs (stamping a copy of Switch with repository status while copying it into a virtual environment, and also retrieving repository status directly from a developer install of Switch):
I'm a little unsure how this fits with distributions though. PyPi uses wheels, which could potentially be stamped with repository info during the build process. If the repository info is then reported as part of the version number, it may prevent the wheel from uploading to PyPi (probably a good thing). If it isn't, then we can freely upload a dev or final version without worrying about whether it has been committed to the repository yet (maybe a good thing, maybe not). On the other hand, the conda-forge package builds from the source repository on pypi. I don't think this goes through a 'build' phase before it is uploaded, so I'd need to find some other hook to stamp the source distribution. |
Cleanup and minor fixes to get_inputs post process
Cleanup and minor fixes to get_inputs post process
Cleanup and minor fixes to get_inputs post process
This enables clear records of local versions of software, which can be invaluable during R&D for customizations. For example, let's say I check out a current copy the development branch, then add new modules and customize behavior to deal with edge cases and subtle bugs. Each commit I make may result in different solutions for the same dataset, but if every version is labeled as
v2.0.4
, I lack a clear record of which scenarios I need to re-execute, or how I generated a particular set of results.PEP 440 explains the concept of local identifiers for this type of use case. In the development environment of my example, installing a copy of switch via
pip install path/to/checkout
will update the version from2.0.4
to2.0.4+[git_sha]
, or if I have uncommitted changes in the repository, it will be2.0.4+[git_sha]+localmod
. If the current git checkout is tagged as a release (having a git tag starting with2
in our case), then the local modifier suffix is dropped.This implementation should have no impact on "quickstart" instructions that install from pypi or conda repositories.
This implementation will try to find the precise local version (relies on git being installed), and write it into
switch_model/data/installed_version.txt
in the installed package directory. If the attempt to call a git subprocess fails, it will print a warning and provide the base version which is recorded inswitch_model/version.py
.version.py
will attempt to loadinstalled_version.txt
from the data directory and will return that string if available; if unavailable, version.py will return the hard-coded version number. Finally, the version is written to the outputs directory to ensure a clear record for archival purposes. This version number is accessible in a) the pip catalog, b)switch --version
, c)switch_model.__version__
, d) inoutputs/software_version.txt
I've used this pattern successfully in other software for scientific computing & medical devices, and it has been a life-saver. The code used here has worked effectively in Mac & Linux environments, and can be compatible with docker packaging. It could use validation in a Windows environment (minimally a basic sniff test), but since it is a nonessential add-on that fails gracefully, I expect it could be integrated even if it doesn't work seamlessly in all development environments.
Additionally, I think we would be better served if pre-release branches update the hard-coded version from
2.0.4
to2.0.4+next_release
, or a similar indication that it isn't a packaged release, and hasn't received the same degree of scrutiny.