Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vendor customizable fan speed threshold checks #1

Closed
wants to merge 1 commit into from

Conversation

spilkey-cisco
Copy link
Owner

Description

Use vendor customizable fan speed threshold checks in thermalctld instead of hard-coded calculations based on pwm/percentage speed.

Motivation and Context

Fan under/over speed checks should be vendor customizable, since a tolerance based off the pwm/percentage fan speed can easily give false failures, especially for low fan speeds.

How Has This Been Tested?

root@sonic:/home/cisco# echo 10000 > /opt/cisco/etc/fantray0.fan0.rpm
root@sonic:/home/cisco# grep thermalctld /var/log/syslog
<snip>
May 19 05:09:47.763970 sonic WARNING pmon#thermalctld: Fan high speed warning: fantray0.fan0 current speed=91, target speed=20
May 19 05:09:51.129298 sonic INFO pmon#supervisord 2023-05-19 05:09:51,128 INFO success: thermalctld entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
May 19 05:10:01.347935 sonic INFO pmon#supervisord: thermalctld WARNING:cisco.pacific.thermal.thermal_zone:level minor: fantray0.fan0: pwm 20; motor out of tolerance @ rpm 10000; maximum rpm 2950
root@sonic:/home/cisco# echo 2400 > /opt/cisco/etc/fantray0.fan0.rpm
root@sonic:/home/cisco# grep thermalctld /var/log/syslog
<snip>
May 19 05:09:47.763970 sonic WARNING pmon#thermalctld: Fan high speed warning: fantray0.fan0 current speed=91, target speed=20
May 19 05:09:51.129298 sonic INFO pmon#supervisord 2023-05-19 05:09:51,128 INFO success: thermalctld entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
May 19 05:10:01.347935 sonic INFO pmon#supervisord: thermalctld WARNING:cisco.pacific.thermal.thermal_zone:level minor: fantray0.fan0: pwm 20; motor out of tolerance @ rpm 10000; maximum rpm 2950
May 19 05:10:47.023156 sonic NOTICE pmon#thermalctld: Fan high speed warning cleared: fantray0.fan0 speed is back to normal

@spilkey-cisco spilkey-cisco self-assigned this May 23, 2023
Copy link

@amulyan7 amulyan7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

speed_target = try_get(fan.get_target_speed)
is_under_speed = try_get(fan.is_under_speed)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to check if this function is not implemented and fall back to old code than putting an implementation in the base class (which one would not understand unless he knew about this change) esp because the changes span across repos.

Copy link
Owner Author

@spilkey-cisco spilkey-cisco Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a default implementation in the base class allows for the same calculation to be used across other code, such as sonic-mgmt tests. If we fall back to a hard-coded calculation inside thermalctld, then the calculation must also be hard-coded anywhere else fan under/over speed conditions should be checked. Existing sonic-mgmt speed tolerance checks use different (and incorrect) calculation logic, so having a default API implementation solves this issue (https://wwwin-github.cisco.com/spilkey/sonic-test/pull/1)

@spilkey-cisco
Copy link
Owner Author

Closing to open new PR to merge into sonic-net/sonic-platform-daemons

@spilkey-cisco
Copy link
Owner Author

sonic-net#378

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants