Ptypy extension philosophy #456
Replies: 6 comments
-
My thoughts are that there are two separate things. The new ScanModel stuff I think there is a future refactor in ScanModel to make the methods shorter, Where this isn't possible we should make a new model which explicitly highlights I also have the same cringe with the switches in the engines. This is also a I am slowly in the process of making the multislice and opr models, and will I would suggest that we accept these PR's so long as they have some form of tests From my point of view, I would then accept this PR pending a future core refactor when we understand the limitations. |
Beta Was this translation helpful? Give feedback.
-
I mean I have no opinion other than that I've noticed the problem and have wanted to chop up long methods a few times, like here You could also limit the scope more to avoid things being shoehorned in, even if that doesn't apply to OPR or multislice of course. In hindsight, maybe the 3d Bragg extension was a stretch, I use it regularly but probably few others. |
Beta Was this translation helpful? Give feedback.
-
Thanks @pierrethibault , for opening this issue for discussion and @aaron-parsons and @alexbjorling for your opinions. Here are my 2 cents :) I would like to add that for second variant that you may end up adding parameters which, in turn, changes the parameter tree. This should be something we do when we create a new 0.x release and not in a rolling fashion, especially when it comes to "core" classes/functions. I am more more relaxed for the "user" classes/functions in this regard. So I guess I'd propose this: If you have to add input parameters to a class to extent its functionality then you need to subclass it if you want it to be merged in a rolling fashion, because it wouldn't affect the rest of the parameter tree and confuse people. It would be great if code duplication can be avoided at that stage by splitting up long methods of the parent class in smaller chunks but that can also wait for a consolidation window before a release where we try to move common parts from the subclass into the parent or parts that turned out to be special into subclasses. Another option would be to insert hooks, i.e. empty method calls into the parent class that are only populated in a subclass. For PR #198 In this case then, I would argue to keep the PR open until the next release cycle where we will change the parameter tree and also update the docs. Unfortunately that does not help regarding the "switch hack" in DM and ML, because I can see your point here in that the changes are very small, so another subclass does not make much sense. However, we have a compatibility layer for the engines where we state with which models they work, which one could interpret is that minor alterations to accommodate a model can be ok. In order to make that clear, I would not go introspective through pod.model.xx but instead go from the top and see : if this is model XXX then I can expect to find YYY. If at some point a model goes away, these switches might then be easier to spot as they are declared in an obvious way. If the switch situation becomes untenable, we need to split the engines at some point. However, if the engine gets more parameters, I would suggest a subclass until the next release just like for the model and use code refactoring or hooks. Does that make sense? |
Beta Was this translation helpful? Give feedback.
-
Hey @bjoernenders, I would suggest we use the following, continuous integration (https://en.wikipedia.org/wiki/Continuous_integration), model. This is used for h5py, numpy, scikit etc...
I just can't see that keeping the PR open until the next release helps our developers/users at all, and instead might make a massive merge mess when we get round to the release. I also agree with you about the switches in DM and ML, but I think that new functionality should just go straight, via PR, into master. What do you think? |
Beta Was this translation helpful? Give feedback.
-
@aaron-parsons and @bjoernenders did you continue the discussion offline? I'd like to know what is the conclusion. I think that PRs should be merged as quickly as possible to avoid heavy backlogs and to make new features available to users as quickly as possible. They could be merged on a "release_candidate" or "dev" branch, but given the popularity and success of continuous integration I would not mind going for a "hot" master. |
Beta Was this translation helpful? Give feedback.
-
@pierrethibault We did not continue the discussion yet. I am a bit conflicted about a "hot" master, though I see the benefits of a more rapid merging cycle. However some merges have been held back by me for the reason, that they interfere with users who are used for things to happen a specific way. So whenever someone checks out the repository, I want something to come as download that is API and parameter compatible to the latest release. I know that you can change the default branch on git, but that would stage the PR and commits automatically against that branch. Not a big deal, I could shoot these PRs down in review. So I'm fine have a release branch for stability and a "hot" master as long as we can avoid user confusion. |
Beta Was this translation helpful? Give feedback.
-
How to implement a new feature is a question that comes up quite often in discussions with @aaron-parsons lately.
Subclassing ScanModel and Engine seemed the cleanest by design, but there are a few important hurdles. First, many of ptypy's methods are very long, so a subclass is likely to reproduce a large part of the parent's code just to introduce a few lines somewhere in the middle. This code duplication makes maintenance difficult. The solution would be to break up long methods in smaller chunks. I'd be happy to see that, but I wonder if there is a cost in code readability and performance.
The other extreme is to pack existing ScanModels and Engines with more features. This way there is much less code duplication, but we end up with all sorts of switches everywhere, and an explosion of parameters that most users would not need.
This issue is showing up now with Benedikt's (@daurer) pull request, on which I also worked (#198), which is implemented according to the latter scenario. I understand Bjoern's (@bjoernenders) suggestion of creating instead a new ScanModel, and his reticence with the switches deep in DM and ML, but the alterative, as it stands, is to duplicate a lot of code, and possibly to add complexity. For instance, how to deal with combining features, such as resampling and (upcoming) OPR or multislice? Yet another Engine subclass that supports both?
I'd like to hear your thoughts on this.
Beta Was this translation helpful? Give feedback.
All reactions