Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimise log_rotate for modular chassis #15296

Merged

Conversation

auspham
Copy link
Contributor

@auspham auspham commented Oct 31, 2024

Description of PR

Summary:
Fixes # (issue) 29752643

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Currently log rotate for supervisor takes 1 to 2 minutes with a maximum of 2 minutes on pc/test_lag_2

Since log_rotate is now running on function fixture. With all test case running, this will add up. On recent nightly run, it added up to 2:03:58 hours which slows down the test to 2 hours.

The reason for log rotating is documented in #2161 to save spaces on 7060 devices. This change for T2 device make sure that we only rotate for T2 at module level instead of functions.

This will optimise the time from 2 hours to 2 minutes.

Details of the stats can be seen here for pc/test_lag_2

{
    "analyzer_logrotate_time": {
        "total": "2:03:58.135243",
        "average": "0:01:01.984460",
        "max": "0:02:00.298079",
        "min": "0:00:57.740233",
        "number of runs": 120
    },
    "analyzer_add_marker_time": {
        "total": "0:07:09.586112",
        "average": "0:00:03.579884",
        "max": "0:00:07.686170",
        "min": "0:00:02.248445",
        "number of runs": 120
    },
    "analyze_logs_time": {
        "total": "0:18:48.677592",
        "average": "0:00:11.880817",
        "max": "0:00:17.943104",
        "min": "0:00:06.689129",
        "number of runs": 95
    },
    "total_time": "2:29:56.398947",
    "longest_analyzer_logrotate_time": {
        "line": 8467,
        "time": "0:02:00.298079"
    },
    "longest_analyzer_add_marker_time": {
        "line": 10299,
        "time": "0:00:07.686170"
    },
    "longest_analyze_logs_time": {
        "line": 47906,
        "time": "0:00:17.943104"
    }
}

Break down of analyzer_logrotate_time in details

lc4-1 lc1-1 lc2-1 sup-1
/usr/sbin/logrotate -f /etc/logrotate.conf > /dev/null 2>&1 Start 22:08:46 22:08:46 22:08:46 22:08:47
End 22:09:02 22:09:02 22:09:07 22:10:43
sed -i 's/^#//g' /etc/cron.d/logrotate Start 22:09:02 22:09:02 22:09:07 22:10:43
End 22:09:03 22:09:03 22:09:07 22:10:44
systemctl start logrotate.timer Start 22:09:03 22:09:03 22:09:07 2:10:44
End 22:09:03 22:09:03 22:09:07 22:10:45

Complete everything around 22:10:45

everyone was waiting for sup-1 which goes from 22:08:44 -> 22:10:45 which is around 2 minutes. This is reasonable speed.

  • The rest of the task start around 22:08:44 -> 22:09:03 which is 19 seconds. But we have to wait for supervisor to be done.

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@auspham auspham force-pushed the austinpham/29752643-improve-t2-log-rotate branch from 135597f to f476003 Compare October 31, 2024 05:11
Copy link
Collaborator

@yejianquan yejianquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yejianquan yejianquan merged commit 4ca9c6d into sonic-net:master Oct 31, 2024
15 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Oct 31, 2024
Description of PR
Summary:
Fixes # (issue) 29752643

Approach
What is the motivation for this PR?
Currently log rotate for supervisor takes 1 to 2 minutes with a maximum of 2 minutes on pc/test_lag_2

Since log_rotate is now running on function fixture. With all test case running, this will add up. On recent nightly run, it added up to 2:03:58 hours which slows down the test to 2 hours.

The reason for log rotating is documented in sonic-net#2161 to save spaces on 7060 devices. This change for T2 device make sure that we only rotate for T2 at module level instead of functions.

This will optimise the time from 2 hours to 2 minutes.

Details of the stats can be seen here for pc/test_lag_2

{
    "analyzer_logrotate_time": {
        "total": "2:03:58.135243",
        "average": "0:01:01.984460",
        "max": "0:02:00.298079",
        "min": "0:00:57.740233",
        "number of runs": 120
    },
    "analyzer_add_marker_time": {
        "total": "0:07:09.586112",
        "average": "0:00:03.579884",
        "max": "0:00:07.686170",
        "min": "0:00:02.248445",
        "number of runs": 120
    },
    "analyze_logs_time": {
        "total": "0:18:48.677592",
        "average": "0:00:11.880817",
        "max": "0:00:17.943104",
        "min": "0:00:06.689129",
        "number of runs": 95
    },
    "total_time": "2:29:56.398947",
    "longest_analyzer_logrotate_time": {
        "line": 8467,
        "time": "0:02:00.298079"
    },
    "longest_analyzer_add_marker_time": {
        "line": 10299,
        "time": "0:00:07.686170"
    },
    "longest_analyze_logs_time": {
        "line": 47906,
        "time": "0:00:17.943104"
    }
}
Break down of analyzer_logrotate_time in details

lc4-1	lc1-1	lc2-1	sup-1
/usr/sbin/logrotate -f /etc/logrotate.conf > /dev/null 2>&1	Start	22:08:46	22:08:46	22:08:46	22:08:47
End	22:09:02	22:09:02	22:09:07	22:10:43
sed -i 's/^#//g' /etc/cron.d/logrotate	Start	22:09:02	22:09:02	22:09:07	22:10:43
End	22:09:03	22:09:03	22:09:07	22:10:44
systemctl start logrotate.timer	Start	22:09:03	22:09:03	22:09:07	2:10:44
End	22:09:03	22:09:03	22:09:07	22:10:45
Complete everything around 22:10:45

everyone was waiting for sup-1 which goes from 22:08:44 -> 22:10:45 which is around 2 minutes. This is reasonable speed.

The rest of the task start around 22:08:44 -> 22:09:03 which is 19 seconds. But we have to wait for supervisor to be done.

co-authorized by: [email protected]
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #15302

mssonicbld pushed a commit that referenced this pull request Oct 31, 2024
Description of PR
Summary:
Fixes # (issue) 29752643

Approach
What is the motivation for this PR?
Currently log rotate for supervisor takes 1 to 2 minutes with a maximum of 2 minutes on pc/test_lag_2

Since log_rotate is now running on function fixture. With all test case running, this will add up. On recent nightly run, it added up to 2:03:58 hours which slows down the test to 2 hours.

The reason for log rotating is documented in #2161 to save spaces on 7060 devices. This change for T2 device make sure that we only rotate for T2 at module level instead of functions.

This will optimise the time from 2 hours to 2 minutes.

Details of the stats can be seen here for pc/test_lag_2

{
    "analyzer_logrotate_time": {
        "total": "2:03:58.135243",
        "average": "0:01:01.984460",
        "max": "0:02:00.298079",
        "min": "0:00:57.740233",
        "number of runs": 120
    },
    "analyzer_add_marker_time": {
        "total": "0:07:09.586112",
        "average": "0:00:03.579884",
        "max": "0:00:07.686170",
        "min": "0:00:02.248445",
        "number of runs": 120
    },
    "analyze_logs_time": {
        "total": "0:18:48.677592",
        "average": "0:00:11.880817",
        "max": "0:00:17.943104",
        "min": "0:00:06.689129",
        "number of runs": 95
    },
    "total_time": "2:29:56.398947",
    "longest_analyzer_logrotate_time": {
        "line": 8467,
        "time": "0:02:00.298079"
    },
    "longest_analyzer_add_marker_time": {
        "line": 10299,
        "time": "0:00:07.686170"
    },
    "longest_analyze_logs_time": {
        "line": 47906,
        "time": "0:00:17.943104"
    }
}
Break down of analyzer_logrotate_time in details

lc4-1	lc1-1	lc2-1	sup-1
/usr/sbin/logrotate -f /etc/logrotate.conf > /dev/null 2>&1	Start	22:08:46	22:08:46	22:08:46	22:08:47
End	22:09:02	22:09:02	22:09:07	22:10:43
sed -i 's/^#//g' /etc/cron.d/logrotate	Start	22:09:02	22:09:02	22:09:07	22:10:43
End	22:09:03	22:09:03	22:09:07	22:10:44
systemctl start logrotate.timer	Start	22:09:03	22:09:03	22:09:07	2:10:44
End	22:09:03	22:09:03	22:09:07	22:10:45
Complete everything around 22:10:45

everyone was waiting for sup-1 which goes from 22:08:44 -> 22:10:45 which is around 2 minutes. This is reasonable speed.

The rest of the task start around 22:08:44 -> 22:09:03 which is 19 seconds. But we have to wait for supervisor to be done.

co-authorized by: [email protected]
sreejithsreekumaran pushed a commit to sreejithsreekumaran/sonic-mgmt that referenced this pull request Nov 15, 2024
Description of PR
Summary:
Fixes # (issue) 29752643

Approach
What is the motivation for this PR?
Currently log rotate for supervisor takes 1 to 2 minutes with a maximum of 2 minutes on pc/test_lag_2

Since log_rotate is now running on function fixture. With all test case running, this will add up. On recent nightly run, it added up to 2:03:58 hours which slows down the test to 2 hours.

The reason for log rotating is documented in sonic-net#2161 to save spaces on 7060 devices. This change for T2 device make sure that we only rotate for T2 at module level instead of functions.

This will optimise the time from 2 hours to 2 minutes.

Details of the stats can be seen here for pc/test_lag_2

{
    "analyzer_logrotate_time": {
        "total": "2:03:58.135243",
        "average": "0:01:01.984460",
        "max": "0:02:00.298079",
        "min": "0:00:57.740233",
        "number of runs": 120
    },
    "analyzer_add_marker_time": {
        "total": "0:07:09.586112",
        "average": "0:00:03.579884",
        "max": "0:00:07.686170",
        "min": "0:00:02.248445",
        "number of runs": 120
    },
    "analyze_logs_time": {
        "total": "0:18:48.677592",
        "average": "0:00:11.880817",
        "max": "0:00:17.943104",
        "min": "0:00:06.689129",
        "number of runs": 95
    },
    "total_time": "2:29:56.398947",
    "longest_analyzer_logrotate_time": {
        "line": 8467,
        "time": "0:02:00.298079"
    },
    "longest_analyzer_add_marker_time": {
        "line": 10299,
        "time": "0:00:07.686170"
    },
    "longest_analyze_logs_time": {
        "line": 47906,
        "time": "0:00:17.943104"
    }
}
Break down of analyzer_logrotate_time in details

lc4-1	lc1-1	lc2-1	sup-1
/usr/sbin/logrotate -f /etc/logrotate.conf > /dev/null 2>&1	Start	22:08:46	22:08:46	22:08:46	22:08:47
End	22:09:02	22:09:02	22:09:07	22:10:43
sed -i 's/^#//g' /etc/cron.d/logrotate	Start	22:09:02	22:09:02	22:09:07	22:10:43
End	22:09:03	22:09:03	22:09:07	22:10:44
systemctl start logrotate.timer	Start	22:09:03	22:09:03	22:09:07	2:10:44
End	22:09:03	22:09:03	22:09:07	22:10:45
Complete everything around 22:10:45

everyone was waiting for sup-1 which goes from 22:08:44 -> 22:10:45 which is around 2 minutes. This is reasonable speed.

The rest of the task start around 22:08:44 -> 22:09:03 which is 19 seconds. But we have to wait for supervisor to be done.

co-authorized by: [email protected]
yutongzhang-microsoft pushed a commit to yutongzhang-microsoft/sonic-mgmt that referenced this pull request Nov 21, 2024
Description of PR
Summary:
Fixes # (issue) 29752643

Approach
What is the motivation for this PR?
Currently log rotate for supervisor takes 1 to 2 minutes with a maximum of 2 minutes on pc/test_lag_2

Since log_rotate is now running on function fixture. With all test case running, this will add up. On recent nightly run, it added up to 2:03:58 hours which slows down the test to 2 hours.

The reason for log rotating is documented in sonic-net#2161 to save spaces on 7060 devices. This change for T2 device make sure that we only rotate for T2 at module level instead of functions.

This will optimise the time from 2 hours to 2 minutes.

Details of the stats can be seen here for pc/test_lag_2

{
    "analyzer_logrotate_time": {
        "total": "2:03:58.135243",
        "average": "0:01:01.984460",
        "max": "0:02:00.298079",
        "min": "0:00:57.740233",
        "number of runs": 120
    },
    "analyzer_add_marker_time": {
        "total": "0:07:09.586112",
        "average": "0:00:03.579884",
        "max": "0:00:07.686170",
        "min": "0:00:02.248445",
        "number of runs": 120
    },
    "analyze_logs_time": {
        "total": "0:18:48.677592",
        "average": "0:00:11.880817",
        "max": "0:00:17.943104",
        "min": "0:00:06.689129",
        "number of runs": 95
    },
    "total_time": "2:29:56.398947",
    "longest_analyzer_logrotate_time": {
        "line": 8467,
        "time": "0:02:00.298079"
    },
    "longest_analyzer_add_marker_time": {
        "line": 10299,
        "time": "0:00:07.686170"
    },
    "longest_analyze_logs_time": {
        "line": 47906,
        "time": "0:00:17.943104"
    }
}
Break down of analyzer_logrotate_time in details

lc4-1	lc1-1	lc2-1	sup-1
/usr/sbin/logrotate -f /etc/logrotate.conf > /dev/null 2>&1	Start	22:08:46	22:08:46	22:08:46	22:08:47
End	22:09:02	22:09:02	22:09:07	22:10:43
sed -i 's/^#//g' /etc/cron.d/logrotate	Start	22:09:02	22:09:02	22:09:07	22:10:43
End	22:09:03	22:09:03	22:09:07	22:10:44
systemctl start logrotate.timer	Start	22:09:03	22:09:03	22:09:07	2:10:44
End	22:09:03	22:09:03	22:09:07	22:10:45
Complete everything around 22:10:45

everyone was waiting for sup-1 which goes from 22:08:44 -> 22:10:45 which is around 2 minutes. This is reasonable speed.

The rest of the task start around 22:08:44 -> 22:09:03 which is 19 seconds. But we have to wait for supervisor to be done.

co-authorized by: [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants