-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine batches of successive roles for same nodes, #1345
base: master
Are you sure you want to change the base?
Conversation
batches << [roles, nodes_in_batch] unless nodes_in_batch.empty? | ||
if pnodes_in_batch == nodes_in_batch && !proles.nil? | ||
# proles is same as batches[-1][0] | ||
proles << roles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/IndentationWidth: Use 2 (not 4) spaces for indentation. (https://github.com/bbatsov/ruby-style-guide#spaces-indentation)
058b6e4
to
ae3a82f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle I think this sounds like a great idea, so thanks a lot for working on this!
As you can see from my comments, there are some serious code legibility issues here. The vast majority of these issues existed with #apply_role
for many years before you joined SUSE ;-) However it's super-important that any changes to it should remove technical debt, not add new technical debt, otherwise we just continue the trend of this method getting worse and worse over the years.
In particular, #apply_role
has quite clearly delimited blocks of logic, so it lends itself very well to extraction of a few new methods (one commit at a time!). This would make it far easier to see the inputs, outputs, and actions of each chunk, and thus the overall flow of logic. If you could do this before enhancing it then you'd probably get a +1000 from me :-)
@@ -1071,6 +1071,8 @@ def apply_role(role, inst, in_queue, bootstrap = false) | |||
|
|||
# element_order is an Array where each item represents a batch of roles and | |||
# the batches must be applied sequentially in this order. | |||
pnodes_in_batch = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use easily understandable variable names. It is not clear from the names what these two variables are for. I'm guessing p
means "previously added to batches
" rather than "previously seen in element_order
", is that right? IIUC the distinction is subtle but crucial to how you've written this code. If so, call them something like nodes_in_prev_batch
and roles_for_prev_batch
.
Also, by inserting these assignments here, you are divorcing the comment explaining element_order
from its usage, so please don't do that.
@@ -1169,7 +1171,14 @@ def apply_role(role, inst, in_queue, bootstrap = false) | |||
end | |||
end # roles.each | |||
|
|||
batches << [roles, nodes_in_batch] unless nodes_in_batch.empty? | |||
if pnodes_in_batch == nodes_in_batch && !proles.nil? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The #apply_role
method is already WAY too long, and hopelessly unreadable as a result, so I don't want even 8 new lines to be added. Please can you find a way to make this change which makes the method smaller not larger, by extracting out 1 or more blocks into new methods, one commit at a time?
Also, what if nodes_in_batch
is a subset of pnodes_in_batch
? Couldn't that also be a situation where the batches could be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
element_order = [
[b1]
[c1,c2]
]
batch 1: node1=[b1] node2=[b1]
batch 2: node1=[c1]
It think : this could break, if c1
needs b1
to be entirely functional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could break, if c1 needs b1 to be entirely functional
Not necessarily, but that demonstrates my point 2) below - it depends on whether all the recipes for b1 were ordered to happen before the recipes for c1. For example:
element_order = [
[b1]
[c1,c2]
]
batch 1: node1=[b1] node2=[b1]
batch 2: node1=[c1] node2=[c1]
This PR would change that to:
batch 1: node1=[b1,c1] node2=[b1,c1]
If some of c1's recipes come before b1's in the runlist then there's a problem. Normally runlist ordering is determined for all roles within a barclamp by chef_order
so this should not be an issue (assuming barclamps are deployed in an order matching chef_order
?), but this can be overridden per role by element_run_list_order
. The only place we actually use that is with pacemaker remotes, but even there I'm not sure we actually use it - maybe the code only has remnants of a previous usage which we reverted when @vuntz came up with a different approach? I can't remember but @vuntz will know for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If some of c1's recipes come before b1's in the runlist then there's a problem.
yes this is correct.
I did look at element_run_list_order
in nova, ceilometer swift and other barclamps.
Didnt find the issue you mentioned. As each one of them had explicit priorities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I must have done the grep wrong last time because I see them now.
batches << [roles, nodes_in_batch] unless nodes_in_batch.empty? | ||
if pnodes_in_batch == nodes_in_batch && !proles.nil? | ||
# proles is same as batches[-1][0] | ||
proles << roles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really struggling to understand this because #apply_role
is such horrible spaghetti code and I don't look at it often enough to remember all the details (I guess @vuntz is the only one who does - which is maybe why he fails to see the need for it to be refactored, but that's another discussion I'll continue with him separately ;-) ...
But IIUC it seems like this would result in say, ceilometer-agent
being appended to proles
so that it runs in the same batch as ceilometer-central
, and moving the line which appends to batches
into the else
clause means that no new batch is added during this iteration of the element_order.each
loop. Is that right? If so, I think it deserves more comments explaining what's going on, because it just took me about 20 minutes to understand these few lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly.
Will add comments.
# proles is same as batches[-1][0] | ||
proles << roles | ||
else | ||
proles = roles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so these two lines are only in the else
block because that allows merging of even more than two batches into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Two more thoughts:
|
And another:
|
|
Sorry to barge in here but:
Totally agree here after having to deal with apply_role, and exactly what I was thinking of doing shortly, extracting Plus the test coverage for it is minimal. @sjamgade I understand this falls outside of the scope of the card but let me know if you plan of doing anything to |
@aspiers gating passed. |
I'm mixed here: this is probably fine at the moment, but this would conflict with a potential change where we restrict what chef is doing on apply (which is something we were slowly heading towards). So if anything, we should have a big comment explaining the shortcut and that this is not compatible with the minimal chef-client approach on apply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so I now think this is probably OK (modulo the nitpick), but I agree with @vuntz that it could conflict with the future direction. However if we ever get to that, we can make sure to properly refactor #apply_role
(thanks @Itxaka, great to hear you agree with me on this!) and then it should be relatively easy to adapt the logic accordingly.
@@ -1069,6 +1069,8 @@ def apply_role(role, inst, in_queue, bootstrap = false) | |||
proposal = Proposal.where(barclamp: @bc_name, name: inst).first | |||
save_proposal = false | |||
|
|||
nodes_previousbatch = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: please be consistent in your variable naming style. "previous batch" is two words, not one. Since the existing style is to put _
between words, put it between all words ("nodes_previous_batch"). Also, since maximising legibility is an important goal, I prefer names which are as close as possible to natural English language, which is why I suggested nodes_in_prev_batch
and roles_for_prev_batch
. Admittedly that's more of a personal preference, but still worth considering.
YES that's exactly what I was thinking of!
It would make me sooo happy to see this refactoring done :-) |
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/TrailingWhitespace: Trailing whitespace detected. (https://github.com/bbatsov/ruby-style-guide#no-trailing-whitespace)
# role (which would be an improvement over the requirement to | ||
# explicitly list all nodes). | ||
new_nodes.reject! do |node_name| | ||
not pre_cached_nodes[node_name].nil? && Rails.logger.debug("skipping deleted node #{node_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/Not: Use ! instead of not. (https://github.com/bbatsov/ruby-style-guide#bang-not-not)
Metrics/LineLength: Line is too long. [106/100] (https://github.com/SUSE/style-guides/blob/master/Ruby.md#metricslinelength)
# have some alias that be used to assign all existing nodes to a | ||
# role (which would be an improvement over the requirement to | ||
# explicitly list all nodes). | ||
new_nodes.reject! do |node_name| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/InverseMethods: Use select! instead of inverting reject!.
# Don't add deleted nodes to the run order, they clearly won't have | ||
# the old role | ||
old_nodes.reject! do |node_name| | ||
not pre_cached_nodes[node_name].nil? && Rails.logger.debug("skipping deleted node #{node_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/Not: Use ! instead of not. (https://github.com/bbatsov/ruby-style-guide#bang-not-not)
Metrics/LineLength: Line is too long. [106/100] (https://github.com/SUSE/style-guides/blob/master/Ruby.md#metricslinelength)
end | ||
# Don't add deleted nodes to the run order, they clearly won't have | ||
# the old role | ||
old_nodes.reject! do |node_name| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/InverseMethods: Use select! instead of inverting reject!.
end | ||
merged_batches << current_batch | ||
end | ||
return merged_batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style/RedundantReturn: Redundant return detected. (https://github.com/bbatsov/ruby-style-guide#no-explicit-return)
@@ -919,6 +919,35 @@ def self.proposal_to_role(proposal, bc_name) | |||
RoleObject.new role | |||
end | |||
|
|||
# we can speed-up the application of (n+1)th role if both(n,n+1) | |||
# roles are being applied on the same node. | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/TrailingWhitespace: Trailing whitespace detected. (https://github.com/bbatsov/ruby-style-guide#no-trailing-whitespace)
@@ -919,6 +919,35 @@ def self.proposal_to_role(proposal, bc_name) | |||
RoleObject.new role | |||
end | |||
|
|||
# we can speed-up the application of (n+1)th role if both(n,n+1) | |||
# roles are being applied on the same node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout/TrailingWhitespace: Trailing whitespace detected. (https://github.com/bbatsov/ruby-style-guide#no-trailing-whitespace)
# a batch is [roles, nodes] | ||
def mergebatches(batches) | ||
merged_batches = [] | ||
unless bathces.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batches
typo. If you are pushing code before testing it, please clearly label and mark the PR as WIP, otherwise there is a risk broken code will get merged.
# merging batches together | ||
# | ||
# a batch is [roles, nodes] | ||
def mergebatches(batches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much better to have this as a separate method!
But as mentioned before, please stay consistent with the existing style, so this should be merge_batches
.
# | ||
# eg. In our 2node deployment ceilometer{server,central} are always | ||
# applied on the same node, given that they have different priorities, | ||
# they are to be applied, one after the other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to explicitly mention element_run_list_order
here and explain its effect on the run-list, otherwise the reader will be wondering why/how they end up with different priorities.
@@ -919,6 +919,35 @@ def self.proposal_to_role(proposal, bc_name) | |||
RoleObject.new role | |||
end | |||
|
|||
# we can speed-up the application of (n+1)th role if both(n,n+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this comment block - definitely important to have it since this batch merging is not immediately obvious without explanation!
# is run rather than speeding up execution of any single run, by | ||
# merging batches together | ||
# | ||
# a batch is [roles, nodes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I find comments like this really helpful for understanding the data structures being manipulated.
We can speed-up the application of (n+1)th role if both(n,n+1) roles are being applied on the same node. This speedup of deployment of ceilometer by atleast 1m20s (measured: 90sec) and swift by ~20s. eg. In our 2node deployment ceilometer{server,central} are always applied on the same node, given that they have different priorities, they are to be applied, one after the other. This does not violate any order constraints as the application procedure of (n+1)th role is transparent to the nth role
please do review the latest changes
# eg. In our 2node deployment ceilometer{server,central} are always | ||
# applied on the same node, given that they have different priorities, | ||
# they are to be applied, one after the other, these priorities come | ||
# from element_run_list_order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nitpicks here:
- The priorities come from
chef_order
by default but can be overridden per role byelement_run_list_order
ceilometer-{server,central}
is just one example, but this code will merge any adjacent batches regardless of what run_list priority the roles in. Is it possible to make the code automatically check that the run_list priority of batchn
is lower than the priority of batchn+1
, and only merge if that is true? If that was added then I think this would feel safe enough to me for a +1 vote.
we can speed-up the application of (n+1)th role if both(n,n+1)
roles are being applied on the same node. This speedsup of deployment of
ceilometer by atleast 1m20s (measured: 90sec) and swift by ~20s.
eg. In our 2node deployment ceilometer{server,central} are always
applied on the same node, given that they have different priorities, they are
to be applied, one after the other.
This does not violate any order constraints as the application procedure
of (n+1)th role is transparent to the nth role
Why is this change necessary?
Speed up the chef-client execution