Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify StepFit metric to be RMSE #72

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

CliveUnger
Copy link

This PR makes a slight modification to how the StepFit algorithm calculates the step function fitness.

I came across this project through these two blogs:

I did not understand the method for calculating the error of the step function fit. The blog calculates

    val totalSquaredError = before.sumSquaredError() + after.sumSquaredError()
    // how much error we have against a step function
    val stepError = sqrt(totalSquaredError) / (before.size + after.size)

and the code similarly calculates (lse == stepError)

lse = float32(math.Sqrt(float64(lse))) / float32(len(trace)) 

The comment in the code mentions that lse should be set to sqrt(lse / len(trace)), which would be the Root Mean Squared Error,(RMSE) instead it computes sqrt(lse) / len(trace). I am not sure what this quantity represents. One would typically use Mean Square Error or RSME to calculate how well a model fits some data.

Therefore, this PR changes the function to use RMSE as the measure of error of the step function fit.

References:
https://en.wikipedia.org/wiki/Root-mean-square_deviation

@google-cla
Copy link

google-cla bot commented Jun 22, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no A signed CLA is not on file. label Jun 22, 2021
@CliveUnger
Copy link
Author

@googlebot I signed it!

@google-cla
Copy link

google-cla bot commented Jun 22, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: yes A signed CLA is on file. and removed cla: no A signed CLA is not on file. labels Jun 22, 2021
@CliveUnger
Copy link
Author

@googlebot I fixed it.

@skia-codereview-bot
Copy link
Collaborator

This PR (HEAD: 734ebcc) has been imported to Gerrit for code review.

Please visit review.skia.org/420563 to see it. Please CC yourself to the Gerrit change.

Note:

  • Skia uses only Gerrit for reviews and submitting code (doc).
  • All comments are handled within Gerrit. Any comments on the GitHub PR will be ignored.
  • The PR author can continue to upload commits to the branch used by the PR in order to address feedback from Gerrit.
  • Once the code is ready to be merged, a maintainer will submit the change on Gerrit and skia-codereview-bot will close this PR.
  • Similarly, if a change is abandoned on Gerrit, the corresponding PR will be closed with a note.

@jcgregorio
Copy link
Contributor

Copying my comment over from https://skia-review.googlesource.com/c/buildbot/+/420563:

Thanks, as you found in the comments, this is a known issue.

The problem is that 'fixing' it would require changing
all the threshholds of all the alerts in all the installed
instances of Perf. Instead I'd like to keep the current calculation
as is.

A preferred fix would be to add a new stepDetection type that
does the fixed calculation, allowing current users to opt it
to switching to the new calculation.

@CliveUnger
Copy link
Author

CliveUnger commented Jun 25, 2021

I see, that makes sense. I made a change to add a new stepDetection type which uses RMSE in the regression calculation.

Out of curiosity, was the original equation simply a mistake, or does the square root of the sum of squared errors divided by the sample size, sqrt(sse)/n, actually represent some statistical quantity? The closest thing I could find is Standard Error which is se = stddev / sqrt(n). The code is technically not calculating the standard deviation, but it's close. The difference being the part that is squared. Sum of squared errors(SSE) is sse = sum((actual - predicted)**2), while standard deviation is sd = sqrt(sum((actual - mean)**2) / n). Since the StepFit uses two different means to estimate a step function, the two quantities are similar.

In fact, if you substitute SSE for sum((actual - mean)**2) in the standard deviation calculation and divide by the square root of the sample size, sqrt(n), to find the standard error, you get sqrt(sse / n) / sqrt(n). That equation simplifies to sqrt(sse)/n which is what the code does! So it may be correct to say the OriginalStep uses the standard error?

I hope that makes sense (and I didn't make any errors). I may have made a logical jump or applied statistical quantities where there assumptions do not hold up. Also writing math as text is hard!

@jcgregorio
Copy link
Contributor

Thanks, let's continue the review over in Gerrit: https://skia-review.googlesource.com/c/buildbot/+/420563

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes A signed CLA is on file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants