Modify StepFit metric to be RMSE #72

CliveUnger · 2021-06-22T19:21:14Z

This PR makes a slight modification to how the StepFit algorithm calculates the step function fitness.

I came across this project through these two blogs:

I did not understand the method for calculating the error of the step function fit. The blog calculates

    val totalSquaredError = before.sumSquaredError() + after.sumSquaredError()
    // how much error we have against a step function
    val stepError = sqrt(totalSquaredError) / (before.size + after.size)

and the code similarly calculates (lse == stepError)

lse = float32(math.Sqrt(float64(lse))) / float32(len(trace))

The comment in the code mentions that lse should be set to sqrt(lse / len(trace)), which would be the Root Mean Squared Error,(RMSE) instead it computes sqrt(lse) / len(trace). I am not sure what this quantity represents. One would typically use Mean Square Error or RSME to calculate how well a model fits some data.

Therefore, this PR changes the function to use RMSE as the measure of error of the step function fit.

References:
https://en.wikipedia.org/wiki/Root-mean-square_deviation

google-cla · 2021-06-22T19:21:18Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

CliveUnger · 2021-06-22T19:24:18Z

@googlebot I signed it!

google-cla · 2021-06-22T19:24:21Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

CliveUnger · 2021-06-22T19:33:13Z

@googlebot I fixed it.

skia-codereview-bot · 2021-06-23T18:06:14Z

This PR (HEAD: 734ebcc) has been imported to Gerrit for code review.

Please visit review.skia.org/420563 to see it. Please CC yourself to the Gerrit change.

Note:

Skia uses only Gerrit for reviews and submitting code (doc).
All comments are handled within Gerrit. Any comments on the GitHub PR will be ignored.
The PR author can continue to upload commits to the branch used by the PR in order to address feedback from Gerrit.
Once the code is ready to be merged, a maintainer will submit the change on Gerrit and skia-codereview-bot will close this PR.
Similarly, if a change is abandoned on Gerrit, the corresponding PR will be closed with a note.

jcgregorio · 2021-06-23T18:32:13Z

Copying my comment over from https://skia-review.googlesource.com/c/buildbot/+/420563:

Thanks, as you found in the comments, this is a known issue.

The problem is that 'fixing' it would require changing
all the threshholds of all the alerts in all the installed
instances of Perf. Instead I'd like to keep the current calculation
as is.

A preferred fix would be to add a new stepDetection type that
does the fixed calculation, allowing current users to opt it
to switching to the new calculation.

This reverts commit 734ebcc.

CliveUnger · 2021-06-25T00:54:58Z

I see, that makes sense. I made a change to add a new stepDetection type which uses RMSE in the regression calculation.

Out of curiosity, was the original equation simply a mistake, or does the square root of the sum of squared errors divided by the sample size, sqrt(sse)/n, actually represent some statistical quantity? The closest thing I could find is Standard Error which is se = stddev / sqrt(n). The code is technically not calculating the standard deviation, but it's close. The difference being the part that is squared. Sum of squared errors(SSE) is sse = sum((actual - predicted)**2), while standard deviation is sd = sqrt(sum((actual - mean)**2) / n). Since the StepFit uses two different means to estimate a step function, the two quantities are similar.

In fact, if you substitute SSE for sum((actual - mean)**2) in the standard deviation calculation and divide by the square root of the sample size, sqrt(n), to find the standard error, you get sqrt(sse / n) / sqrt(n). That equation simplifies to sqrt(sse)/n which is what the code does! So it may be correct to say the OriginalStep uses the standard error?

I hope that makes sense (and I didn't make any errors). I may have made a logical jump or applied statistical quantities where there assumptions do not hold up. Also writing math as text is hard!

jcgregorio · 2021-06-30T12:00:33Z

Thanks, let's continue the review over in Gerrit: https://skia-review.googlesource.com/c/buildbot/+/420563

google-cla bot added the cla: no A signed CLA is not on file. label Jun 22, 2021

Modify StepFit metric to be RMSE

734ebcc

CliveUnger force-pushed the main branch from 99d19f3 to 734ebcc Compare June 22, 2021 19:32

google-cla bot added cla: yes A signed CLA is on file. and removed cla: no A signed CLA is not on file. labels Jun 22, 2021

CliveUnger added 2 commits June 24, 2021 18:43

Revert "Modify StepFit metric to be RMSE"

95e8bec

This reverts commit 734ebcc.

Add "RatioStep" detection based on OriginalStep

eb77046

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify StepFit metric to be RMSE #72

Modify StepFit metric to be RMSE #72

CliveUnger commented Jun 22, 2021

google-cla bot commented Jun 22, 2021

CliveUnger commented Jun 22, 2021

google-cla bot commented Jun 22, 2021

CliveUnger commented Jun 22, 2021

skia-codereview-bot commented Jun 23, 2021

jcgregorio commented Jun 23, 2021

CliveUnger commented Jun 25, 2021 •

edited

Loading

jcgregorio commented Jun 30, 2021

Modify StepFit metric to be RMSE #72

Are you sure you want to change the base?

Modify StepFit metric to be RMSE #72

Conversation

CliveUnger commented Jun 22, 2021

google-cla bot commented Jun 22, 2021

What to do if you already signed the CLA

Individual signers

Corporate signers

CliveUnger commented Jun 22, 2021

google-cla bot commented Jun 22, 2021

CliveUnger commented Jun 22, 2021

skia-codereview-bot commented Jun 23, 2021

jcgregorio commented Jun 23, 2021

CliveUnger commented Jun 25, 2021 • edited Loading

jcgregorio commented Jun 30, 2021

CliveUnger commented Jun 25, 2021 •

edited

Loading