Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Mean Absolute Calculation in findCorrelation_exact #1372

Open
igorbraga13 opened this issue Oct 24, 2024 · 0 comments
Open

Incorrect Mean Absolute Calculation in findCorrelation_exact #1372

igorbraga13 opened this issue Oct 24, 2024 · 0 comments

Comments

@igorbraga13
Copy link

igorbraga13 commented Oct 24, 2024

Since the commit f2ad13509ba3d6ab28069840c9631a66a9e7ecc8, the function findCorrelation_exact on row 38 computes the incorrect mean absolute correlation for mn2. The function calculates the mean of the matrix without the variable j, instead of calculating the mean of variable j against the other variables. There are two possible solutions:

1 - Remove the minus sign from j, thereby comparing two different rows of the matrix. However, when verbose = TRUE, the output prints: "Compare row i and column j with corr...". Despite the mirroring of the row and column (being the same), this text is incorrect.

2 - Remove the minus sign and change j to the column position.

#before the commit
(mean(x[i, -i]) > mean(x[-j, j]) #compare the mean of two vectors
#after the commit
mn1 <- mean(x2[i,], na.rm = TRUE) #return the mean of a vector
mn2 <- mean(x2[-j,], na.rm = TRUE) #return the mean of a matrix
#possible solution 1
mn2 <- mean(x2[j,], na.rm = TRUE) #return the mean of the row 
#possible solution2
mn2 <- mean(x2[, j], na.rm = TRUE) #return the mean of the column

Minimal dataset:

dt <- structure(c(1, -0.117569784133002, 0.871753775886583, 0.817941126271576, 
                       -0.117569784133002, 1, -0.42844010433054, -0.366125932536439, 
                       0.871753775886583, -0.42844010433054, 1, 0.962865431402796, 0.817941126271576, 
                       -0.366125932536439, 0.962865431402796, 1), dim = c(4L, 4L), dimnames = list(
                         c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"
                         ), c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"
                         )))

Minimal, reproducible example:

require(reprex)
#> Carregando pacotes exigidos: reprex
dt <- structure(c(1, -0.117569784133002, 0.871753775886583, 0.817941126271576, 
                  -0.117569784133002, 1, -0.42844010433054, -0.366125932536439, 
                  0.871753775886583, -0.42844010433054, 1, 0.962865431402796, 0.817941126271576, 
                  -0.366125932536439, 0.962865431402796, 1), dim = c(4L, 4L), dimnames = list(
                    c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"
                    ), c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"
                    )))

cor_mat <- dt

#Finding correlation manually ----
## Example with row 3 and column 4 ----
diag(cor_mat) <- NA
cor_mat <- abs(cor_mat)
i <- 3 #simulating the i-th iteraction when i = 3
j <- 4 #simulating the j-th iteraction when j = 4

mn1 <- mean(cor_mat[i,], na.rm = TRUE)
mn2 <- mean(cor_mat[,j], na.rm = TRUE)

cat("Compare row", i, " and column ", j,
    "with corr ", round(cor_mat[i,j], 3), "\n",
    "  Means: ", round(mn1, 3), "vs", round(mn2,3))
#> Compare row 3  and column  4 with corr  0.963 
#>    Means:  0.754 vs 0.716

## Example with row 4 and column 1 ----
cor_mat[3,] <- NA #removing row 3 and column 3 to simulate the function's iteraction based on the previous step
cor_mat[,3] <- NA

i <- 4 #simulating the i-th iteraction when i = 3
j <- 1 #simulating the j-th iteraction when j = 4

mn1 <- mean(cor_mat[i,], na.rm = TRUE)
mn2 <- mean(cor_mat[,j], na.rm = TRUE)

cat("Compare row", i, " and column ", j,
    "with corr ", round(cor_mat[i,j], 3), "\n",
    "  Means: ", round(mn1, 3), "vs", round(mn2,3))
#> Compare row 4  and column  1 with corr  0.818 
#>    Means:  0.592 vs 0.468

#Finding correlation with findCorrelation_exact ----
cor_mat <- dt

vars_cor <- caret::findCorrelation(
  cor_mat,
  cutoff = 0.7,
  names = T,
  exact = T,
  verbose = T)
#> Compare row 3  and column  4 with corr  0.963 
#>   Means:  0.754 vs 0.554 so flagging column 3 
#> Compare row 4  and column  1 with corr  0.818 
#>   Means:  0.592 vs 0.417 so flagging column 4 
#> All correlations <= 0.7

Created on 2024-10-23 with reprex v2.1.1

Session Info:

R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=Portuguese_Brazil.utf8  LC_CTYPE=Portuguese_Brazil.utf8   
[3] LC_MONETARY=Portuguese_Brazil.utf8 LC_NUMERIC=C                      
[5] LC_TIME=Portuguese_Brazil.utf8    

time zone: America/Sao_Paulo
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reprex_2.1.1

loaded via a namespace (and not attached):
 [1] gtable_0.3.5         xfun_0.47            ggplot2_3.5.1        recipes_1.1.0       
 [5] processx_3.8.4       lattice_0.22-6       callr_3.7.6          ps_1.7.7            
 [9] vctrs_0.6.5          tools_4.4.1          generics_0.1.3       stats4_4.4.1        
[13] parallel_4.4.1       tibble_3.2.1         fansi_1.0.6          pkgconfig_2.0.3     
[17] ModelMetrics_1.2.2.2 Matrix_1.7-0         data.table_1.15.4    lifecycle_1.0.4     
[21] compiler_4.4.1       stringr_1.5.1        munsell_0.5.1        codetools_0.2-20    
[25] htmltools_0.5.8.1    class_7.3-22         yaml_2.3.10          prodlim_2024.06.25  
[29] pillar_1.9.0         MASS_7.3-60.2        gower_1.0.1          iterators_1.0.14    
[33] rpart_4.1.23         foreach_1.5.2        nlme_3.1-164         parallelly_1.38.0   
[37] lava_1.8.0           tidyselect_1.2.1     digest_0.6.37        stringi_1.8.4       
[41] future_1.34.0        dplyr_1.1.4          reshape2_1.4.4       purrr_1.0.2         
[45] listenv_0.9.1        splines_4.4.1        fastmap_1.2.0        grid_4.4.1          
[49] colorspace_2.1-1     cli_3.6.3            magrittr_2.0.3       survival_3.6-4      
[53] utf8_1.2.4           future.apply_1.11.2  clipr_0.8.0          withr_3.0.1         
[57] scales_1.3.0         lubridate_1.9.3      timechange_0.3.0     rmarkdown_2.28      
[61] globals_0.16.3       nnet_7.3-19          timeDate_4032.109    evaluate_0.24.0     
[65] knitr_1.48           hardhat_1.4.0        caret_6.0-94         rlang_1.1.4         
[69] Rcpp_1.0.13          glue_1.7.0           pROC_1.18.5          ipred_0.9-15        
[73] rstudioapi_0.16.0    R6_2.5.1             plyr_1.8.9           fs_1.6.4  
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant