Ptypy extension philosophy #456

pierrethibault · 2019-03-11T20:57:30Z

pierrethibault
Mar 11, 2019
Maintainer

How to implement a new feature is a question that comes up quite often in discussions with @aaron-parsons lately.

Subclassing ScanModel and Engine seemed the cleanest by design, but there are a few important hurdles. First, many of ptypy's methods are very long, so a subclass is likely to reproduce a large part of the parent's code just to introduce a few lines somewhere in the middle. This code duplication makes maintenance difficult. The solution would be to break up long methods in smaller chunks. I'd be happy to see that, but I wonder if there is a cost in code readability and performance.

The other extreme is to pack existing ScanModels and Engines with more features. This way there is much less code duplication, but we end up with all sorts of switches everywhere, and an explosion of parameters that most users would not need.

This issue is showing up now with Benedikt's (@daurer) pull request, on which I also worked (#198), which is implemented according to the latter scenario. I understand Bjoern's (@bjoernenders) suggestion of creating instead a new ScanModel, and his reticence with the switches deep in DM and ML, but the alterative, as it stands, is to duplicate a lot of code, and possibly to add complexity. For instance, how to deal with combining features, such as resampling and (upcoming) OPR or multislice? Yet another Engine subclass that supports both?

I'd like to hear your thoughts on this.

aaron-parsons · 2019-03-12T12:23:19Z

aaron-parsons
Mar 12, 2019
Maintainer

My thoughts are that there are two separate things. The new ScanModel stuff
is to get rid of anything that would have previously been hacked into engine.prepare.

I think there is a future refactor in ScanModel to make the methods shorter,
but that currently we don't have enough information yet to guide this refactor. This is
why accepting PR's like Benedikts is important. Since the ScanModel has the name Full,
it should either be renamed, or be allowed to contain everything where possible.

Where this isn't possible we should make a new model which explicitly highlights
the differences via inheritance. This might involve refactoring Full along the way.

I also have the same cringe with the switches in the engines. This is also a
future refactor in the making. Alex had to do this too with his Bragg style engines, and so points out that something is amiss.

I am slowly in the process of making the multislice and opr models, and will
push these too.

I would suggest that we accept these PR's so long as they have some form of tests
to guide the future refactor.

From my point of view, I would then accept this PR pending a future core refactor when we understand the limitations.

0 replies

alexbjorling · 2019-03-12T19:39:14Z

alexbjorling
Mar 12, 2019
Collaborator

I mean I have no opinion other than that I've noticed the problem and have wanted to chop up long methods a few times, like here
https://github.com/ptycho/ptypy/pull/113/files#diff-6256976fd8a26060572b2cb350cc0eb7

You could also limit the scope more to avoid things being shoehorned in, even if that doesn't apply to OPR or multislice of course. In hindsight, maybe the 3d Bragg extension was a stretch, I use it regularly but probably few others.

0 replies

bjoernenders · 2019-03-14T07:18:29Z

bjoernenders
Mar 14, 2019
Maintainer

Thanks @pierrethibault , for opening this issue for discussion and @aaron-parsons and @alexbjorling for your opinions. Here are my 2 cents :)

I would like to add that for second variant that you may end up adding parameters which, in turn, changes the parameter tree. This should be something we do when we create a new 0.x release and not in a rolling fashion, especially when it comes to "core" classes/functions. I am more more relaxed for the "user" classes/functions in this regard.

So I guess I'd propose this: If you have to add input parameters to a class to extent its functionality then you need to subclass it if you want it to be merged in a rolling fashion, because it wouldn't affect the rest of the parameter tree and confuse people. It would be great if code duplication can be avoided at that stage by splitting up long methods of the parent class in smaller chunks but that can also wait for a consolidation window before a release where we try to move common parts from the subclass into the parent or parts that turned out to be special into subclasses. Another option would be to insert hooks, i.e. empty method calls into the parent class that are only populated in a subclass.

For PR #198 In this case then, I would argue to keep the PR open until the next release cycle where we will change the parameter tree and also update the docs.

Unfortunately that does not help regarding the "switch hack" in DM and ML, because I can see your point here in that the changes are very small, so another subclass does not make much sense. However, we have a compatibility layer for the engines where we state with which models they work, which one could interpret is that minor alterations to accommodate a model can be ok. In order to make that clear, I would not go introspective through pod.model.xx but instead go from the top and see : if this is model XXX then I can expect to find YYY. If at some point a model goes away, these switches might then be easier to spot as they are declared in an obvious way. If the switch situation becomes untenable, we need to split the engines at some point.

However, if the engine gets more parameters, I would suggest a subclass until the next release just like for the model and use code refactoring or hooks.

Does that make sense?

0 replies

aaron-parsons · 2019-03-14T12:21:38Z

aaron-parsons
Mar 14, 2019
Maintainer

Hey @bjoernenders,
I see where you are coming from with this, essentially trying to use encapsulation to protect the core functionality. The problem is that we don't release regularly enough for this strategy to work. This risks everybody running on separate branches that have mulitiple feature commits in. We see this already with poor benedikts merge. It also will make you very sad when coming to make the next release.

I would suggest we use the following, continuous integration (https://en.wikipedia.org/wiki/Continuous_integration), model. This is used for h5py, numpy, scikit etc...

Release goes on a branch, bug fixes go here and are cherrypicked/merged to master. Release branch is always stable with no new features.
Run with a "hot" master. This means that we only ever PR with features into master, if the tests pass, we allow the merge. Nobody should commit directly to master. This way it gets used and we can spot bugs. If we find bugs, we write a test to reproduce and commit the fixed test. This way master is almost stable, but expected to occassionally flake.
We should aim to close PR's as soon as possible. Sometimes a refactor is necessary as well as the tests passing, but we should point these out inline with the code.
Documentation-wise: I am already trying to get Diamond to pay for readthedocs. My users at Diamond found that the documentation is already out of date. After talking to you last year, and Pierre more recently, I have also asked our legal team for advice about making the majority of the framework public (but I guess this is a discussion for us offline), so we can get readthedocs (and other such widgets) for free (i won't pursue this without your explicit go ahead obviously).
We can do .3.x releases more frequently to update which is the stable release branch.

I just can't see that keeping the PR open until the next release helps our developers/users at all, and instead might make a massive merge mess when we get round to the release.

I also agree with you about the switches in DM and ML, but I think that new functionality should just go straight, via PR, into master.

What do you think?

0 replies

pierrethibault · 2019-03-18T21:56:41Z

pierrethibault
Mar 18, 2019
Maintainer Author

@aaron-parsons and @bjoernenders did you continue the discussion offline? I'd like to know what is the conclusion. I think that PRs should be merged as quickly as possible to avoid heavy backlogs and to make new features available to users as quickly as possible. They could be merged on a "release_candidate" or "dev" branch, but given the popularity and success of continuous integration I would not mind going for a "hot" master.

0 replies

bjoernenders · 2019-03-20T17:32:38Z

bjoernenders
Mar 20, 2019
Maintainer

@pierrethibault We did not continue the discussion yet. I am a bit conflicted about a "hot" master, though I see the benefits of a more rapid merging cycle. However some merges have been held back by me for the reason, that they interfere with users who are used for things to happen a specific way. So whenever someone checks out the repository, I want something to come as download that is API and parameter compatible to the latest release. I know that you can change the default branch on git, but that would stage the PR and commits automatically against that branch. Not a big deal, I could shoot these PRs down in review. So I'm fine have a release branch for stability and a "hot" master as long as we can avoid user confusion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ptypy extension philosophy #456

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Ptypy extension philosophy #456

pierrethibault Mar 11, 2019 Maintainer

Replies: 6 comments

aaron-parsons Mar 12, 2019 Maintainer

alexbjorling Mar 12, 2019 Collaborator

bjoernenders Mar 14, 2019 Maintainer

aaron-parsons Mar 14, 2019 Maintainer

pierrethibault Mar 18, 2019 Maintainer Author

bjoernenders Mar 20, 2019 Maintainer

pierrethibault
Mar 11, 2019
Maintainer

aaron-parsons
Mar 12, 2019
Maintainer

alexbjorling
Mar 12, 2019
Collaborator

bjoernenders
Mar 14, 2019
Maintainer

aaron-parsons
Mar 14, 2019
Maintainer

pierrethibault
Mar 18, 2019
Maintainer Author

bjoernenders
Mar 20, 2019
Maintainer