-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default implementation of under/over speed checks #382
Conversation
@spilkey-cisco , can you please add the needed test file to cover your new added functions? This is needed in order to pass the coverage requirement. |
Are you referring to sonic-mgmt? PR is here: sonic-net/sonic-mgmt#8587 |
NO... I am referring to the pull request coverage verification for this PR which you can see is failing... Pull Request Coverage It appears that you have not added any new tests for the PR code coverage of the new code you added... |
@spilkey-cisco please provide more context. |
@gechiang sure, I'll update the PR ASAP. I didn't see an existing file for fans so I was unsure of the requirements. I'll create a file to cover the new fan APIs. |
def is_under_speed(self): | ||
""" | ||
Calculates if the fan speed is under the tolerated low speed threshold | ||
|
||
Default calculation requires get_speed_tolerance to be implemented, and checks | ||
if the current fan speed (expressed as a percentage) is lower than <get_speed_tolerance> | ||
percent below the target fan speed (expressed as a percentage) | ||
|
||
Returns: | ||
A boolean, True if fan speed is under the low threshold, False if not | ||
""" | ||
speed = self.get_speed() | ||
target_speed = self.get_target_speed() | ||
tolerance = self.get_speed_tolerance() | ||
|
||
for param, value in (('speed', speed), ('target speed', target_speed), ('speed tolerance', tolerance)): | ||
if not isinstance(value, int): | ||
raise TypeError(f'Fan {param} is not an integer value: {param}={value}') | ||
if value < 0 or value > 100: | ||
raise ValueError(f'Fan {param} is not a valid percentage value: {param}={value}') | ||
|
||
return speed * 100 < target_speed * (100 - tolerance) | ||
|
||
def is_over_speed(self): | ||
""" | ||
Calculates if the fan speed is over the tolerated high speed threshold | ||
|
||
Default calculation requires get_speed_tolerance to be implemented, and checks | ||
if the current fan speed (expressed as a percentage) is higher than <get_speed_tolerance> | ||
percent above the target fan speed (expressed as a percentage) | ||
|
||
Returns: | ||
A boolean, True if fan speed is over the high threshold, False if not | ||
""" | ||
speed = self.get_speed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spilkey-cisco This is an abstract class. Where is the HLD changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that https://github.com/sonic-net/sonic-platform-common/blob/master/sonic_fan/fan_base.py is the (obsolete, from what I can tell) fan abstract base class, and the file in this changeset follows the design from https://github.com/sonic-net/SONiC/blob/master/doc/platform_api/new_platform_api.md indicating ABCs are no longer used, 'No longer abstract base classes: All abstract methods simply raise "NotImplementedError"'. The new methods are not abstract since they have a default implementation, but can be overridden with vendor specific implementations if desired. If an HLD is needed, please advise on where it should go.
Please correct me if this understanding is inaccurate or incomplete.
The tolerance calculations that thermalctld performed (the defaults in the base file now) were not sufficient for Cisco's fan tolerance requirements. Since the default tolerance checks were required to calculate using fan speed as percentage values (1-100), low fan speeds had a large margin of error, and frequently falsely indicated the fan speed was outside the allowable tolerance. Allowing vendors to specify their own tolerance calculations if desired resolves this issue (such as performing calculations on larger rpm values instead of percentages, as an example), and the default implementation here allows existing vendor code to continue to work with no modifications. |
I guess there is a PR for sonic-platform-daemons to use these new APIs, where can I find it? |
|
It looks ok to me |
Added |
MSFT ADO: 24645343 |
@StormLiangMS , @yxieca , MSFT ADO: 24645343 created. Please approve to cherry pick this to 202205, 202211, and 202305. |
@gechiang, @spilkey-cisco , this change is a feature change and impacts all platforms. I don't think we should cherry-pick this change to feature branches. |
@yxieca , This is not really a feature change. it is rather a bug fix where the original design is insufficient to accommodate the vendors where the old way of specifying the tolerance can cause quite a bit of inaccuracy and this PR fixes this issue. We originally found this issue in 202205 branch and asked the vendor to fix this. |
* Default implementation of under/over speed checks * Update comment, remove requirement for float conversion * Fan test coverage
@yxieca , @StormLiangMS , Appreciate if this PR can also be approved for 202211 and 202305 so we can mark the ADO as done. |
* Default implementation of under/over speed checks * Update comment, remove requirement for float conversion * Fan test coverage
Description
Provide default implementation of fan under and over speed threshold checks, providing backwards compatibility for vendors that only implement
get_speed_tolerance
Motivation and Context
Fan under/over speed checks should be vendor customizable, since a tolerance based off the pwm/percentage fan speed can easily give false failures, especially for low fan speeds.
How Has This Been Tested?