Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling trimmed mean #132

Open
ethanbsmith opened this issue Mar 1, 2024 · 7 comments
Open

rolling trimmed mean #132

ethanbsmith opened this issue Mar 1, 2024 · 7 comments

Comments

@ethanbsmith
Copy link
Contributor

ethanbsmith commented Mar 1, 2024

Description

new functionality supporting rolling trimmed means

when trim * n is not an integer, there is special handling needed for the boundaries:
https://stats.stackexchange.com/questions/4252/how-to-calculate-the-truncated-or-trimmed-mean

very hokey rough implementation.

runTrimmedMean <- function(x, n, cumulative = FALSE, trim) {
  if ((trim > 0.5) || (trim < 0)) stop("trim must be between 0,.5")
  x <- try.xts(x, error = as.matrix)
  if (ncol(x) > 1) stop("only supports univariate x")
  if (nrow(x) < n) {
    is.na(x) <- T
    return(x)
  }

  v <- as.vector(x)
  if (cumulative) {
    r <- sapply(seq_len(nrow(x)), FUN = \(n) {mean(v[1:n], na.rm = T, trim = trim)})
  } else {
    #data.table::frollapply slightly faster than zoo::rollapply
    r <- data.table::frollapply(v, n = n, FUN = \(vals) {mean(vals, na.rm = T, trim = trim)})
  }
  return(reclass(r, x))
}
@ethanbsmith
Copy link
Contributor Author

ethanbsmith commented Mar 1, 2024

also, not sure if this should be a standalone function in R, or just add a trim parameter to runMean, like how base R does it

I think that one way to think about this is that there are just variants of a single function, with runMean being an optimization for runTrimmedMean(trim = 0) and runMedian an optimization for runTrimmedMean(trim = .5)

@serkor1
Copy link
Contributor

serkor1 commented Mar 2, 2024

also, not sure if this should be a standalone function in R, or just add a trim parameter to runMean, like how base R does it

If I may join the discussion; I think having it as a trim-parameter is a good idea - then it could applied on all rolling means there is in TTR.

@braverock
Copy link
Collaborator

braverock commented Mar 2, 2024

it seems to me that the place to put this would be in runMean, since it already exists, not as a new function

trim in mean.default is defined as:

  if (trim > 0 && n) {
    if (is.complex(x)) 
      stop("trimmed means are not defined for complex data")
    if (anyNA(x)) 
      return(NA_real_)
    if (trim >= 0.5) 
      return(stats::median(x, na.rm = FALSE))
    lo <- floor(n * trim) + 1
    hi <- n + 1 - lo
    x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
  }

so your prototype functionality would deliver the median of the data for any value of trim>0.05

@ethanbsmith
Copy link
Contributor Author

ethanbsmith commented Mar 27, 2024

agree this should ideally just extend runMean
util i get time to work on a proper C pr, here is a usable R implementation:

runTrimmedMean <- function(x, n, cumulative = FALSE, trim = 0) {
  if (trim <= 0) return(runMean(x, n = n, cumulative = cumulative))
  if (trim >= 0.5) return(runMedian(x, n = n, cumulative = cumulative))
  x <- try.xts(x, error = as.matrix)
  if (NCOL(x) > 1) stop("ncol(x) > 1. runSum only supports univariate 'x'")

  len.x <- nrow(x)
  r <- rep(NA_real_, len.x)
  v <- as.vector(x)
  start <- max(last(which(is.na(v))) + 1, 1)

  if (len.x >= (start + n)) {
    trim.start <- floor(trim * n) + 1
    trimmed.range <- trim.start:(n + 1 - trim.start)
    i <- start + n - 1
    buff <- sort(v[start:i])
    r[i] <- sum(buff[trimmed.range]) / length(trimmed.range)
    while(i < len.x) {
      i <- i+1
      if (cumulative) {
        #expand buffer
        n <- n + 1
        trim.start <- floor(trim * n) + 1
        trimmed.range <- trim.start:(n + 1 - trim.start)
      } else {
        #remove old value from buffer. will always get found by match, so no need for error handling
        buff <- buff[-match(v[i-n], buff)]
      }
      #add new value to buffer. match +  insertion is faster than resorting whole buffer
      after <- match(TRUE, v[i] <= buff, nomatch = length(buff) + 1) - 1
      buff <- append(buff, v[i], after = after) # < 1 will pre-pend
      r[i] <- sum(buff[trimmed.range]) / length(trimmed.range) #na's already handled, so bypass checks in mean
    }
  }
  return(reclass(r, x))
}

@joshuaulrich
Copy link
Owner

Great start! Would it make sense to have unit tests of this function versus something like rollapply that uses mean with a trim value?

@ethanbsmith
Copy link
Contributor Author

yes. thats exactly how i tested this version. ill formalize them, as i need to that anyways

@joshuaulrich
Copy link
Owner

Awesome. You can use 'tinytest' (it's a near drop-in replacement for 'testthat') instead of 'RUnit'. I intend to convert all the TTR tests to 'tinytest' at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants