Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Position-Based Percentile Calculation for Regulatory Compliance | timeAverage #396

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

marcelooyaneder
Copy link

Description:

This pull request updates the percentile calculation method to be compliant with regulatory requirements. Previously, percentiles were calculated using interpolation, which could generate intermediate values. This method is not permitted for regulatory calculations.

With this update:

  • Percentiles are calculated by position in the sorted dataset.
  • No intermediate values are generated, ensuring the calculations adhere to the regulatory guidelines.

This change improves accuracy and ensures that the calculations align with the required standards for normative use.

Key Changes:

  1. Switched from interpolation-based percentile calculation to position-based calculation.
  2. Removed any logic that generates intermediate values.

Please review and let me know if any additional adjustments are needed.

I made this pull request to the master branch because I didn't see any other appropriate branch. (Tested and working on my enviroments)

For regulatory percentile calculations, generating intermediate values is not permitted. With this change, the percentile is calculated by position.
change from ceiling to floor
update on how the round of a number is made
@davidcarslaw
Copy link
Collaborator

Thanks for this suggestion and apologies for the delay in responding (I was on holiday). This is an issue I have not looked at closely but can see how the method used will matter. Do you have a source / link for the preferred method to use, as I'm not familiar with that (at least in the UK)?

All the best
David

@marcelooyaneder
Copy link
Author

marcelooyaneder commented Sep 13, 2024

Hello David, I hope you are doing well. The above comes from Chilean regulations (based on USEPA) which detail the procedure for calculating percentiles. I am attaching the link (Chilean Regulation). As you understand, it is in Spanish, but here is its translation:

"To calculate the percentile, all values of the PM10 respirable particulate concentrations will be listed in ascending order: X1 ≤ X2 ≤ X3 ≤... ≤ Xk < Xn-1 ≤ Xn. The k-th percentile will be the value of the element of rank 'k,' where 'k' is calculated using the following formula: k = q * n, where 'q' = 0.98, and 'n' corresponds to the total number of data points in the ordered list. The value of 'k' will be rounded to the nearest integer."

Given this, I searched for the direct source in the EPA and found the following reference (EPA Regulation) in section 5, where I found an update to the regulation. While they still calculate by rank, the position is based on the number of valid records (I could implement this if you would like).

As a complement, this form of calculation is quite common, as it is also the methodology used by air quality numerical simulation software for calculating percentiles. Here is a reference (CALPUFF View Percentiles).

Additionally, I found the following text in section 2.5.2.1 of the WHO air quality guidelines (https://iris.who.int/bitstream/handle/10665/345329/9789240034228-eng.pdf):

"In keeping with established practice, as a starting point, short-term AQG levels were considered by the GDG as the 99th percentiles of daily concentrations empirically observed in distributions with a mean equal to the long-term AQG level," where it is explicitly stated that the data must be empirically observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants