Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error applying colors to continuous clinicalFeatures #1050

Open
juferban opened this issue Sep 20, 2024 · 12 comments
Open

Error applying colors to continuous clinicalFeatures #1050

juferban opened this issue Sep 20, 2024 · 12 comments
Labels

Comments

@juferban
Copy link

juferban commented Sep 20, 2024

Describe the issue
Hello,

I am having an issue trying to add continuous clinical features to my oncoplot.

If I add more than one clinicalFeature that has continuous values the colors applied to the values seem to be mixed and not match the values they are supposed to represent.

More specifically, I had an oncoplot where I wanted to add to clinical features that represent to different way to measure response.
If I only add one of the features to the plot, the color gradient applies correctly but if I add both clinical Features to the plots, most samples show the correct colors but random samples show colors that don't match.
In my test, I specified the sample order using the sampleOrder variable in the oncoplot command and the sample order corresponded to the first clinical feature so the gradient should show from lowest to highest (which correctly does when only adding that first clinical Feature to the oncoplot). As soon as I add the second clinical feature some samples get a random color assigned.

The command do not throw any error.

Thanks for a great package!.

Command

oncoplot(maf = maf_object, 
          removeNonMutated = FALSE, 
          fill = TRUE, 
          clinicalFeatures = c('Treatment_Group','Response_IRC','Treatment_Duration'),
          sampleOrder = sorted_samples,
          showTitle = TRUE,
          titleFontSize = 1.5,
          legendFontSize = 1,
          annotationFontSize = 1,
          SampleNamefontSize = 0.7,
          fontSize = 0.7,
          showTumorSampleBarcodes = TRUE,
          barcode_mar = 4,
          gene_mar = 6,
          legend_height = 4,
          anno_height = 1.5,
          annoBorderCol = "white",
          annotationColor = annotationColor,
        )

Session info

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRblas.so 
LAPACK: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] circlize_0.4.16    maftools_2.18.0    RColorBrewer_1.1-3 statmod_1.5.0     
 [5] ggrepel_0.9.5      edgeR_4.0.16       limma_3.58.1       reshape2_1.4.4    
 [9] openxlsx_4.2.5.2   lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1     
[13] dplyr_1.1.4        purrr_1.0.2        readr_2.1.5        tidyr_1.3.1       
[17] tibble_3.2.1       ggplot2_3.5.0      tidyverse_2.0.0    data.table_1.15.4 
[21] optparse_1.7.5     monoceRos_1.0.5   

loaded via a namespace (and not attached):
 [1] gtable_0.3.4        shape_1.4.6.1       GlobalOptions_0.1.2
 [4] lattice_0.22-6      tzdb_0.4.0          Cairo_1.6-2        
 [7] vctrs_0.6.5         tools_4.3.2         generics_0.1.3     
[10] getopt_1.20.4       fansi_1.0.6         pkgconfig_2.0.3    
[13] Matrix_1.6-5        uuid_1.2-0          lifecycle_1.0.4    
[16] compiler_4.3.2      munsell_0.5.1       repr_1.1.7         
[19] getPass_0.2-4       htmltools_0.5.8.1   pillar_1.9.0       
[22] crayon_1.5.2        tidyselect_1.2.1    locfit_1.5-9.9     
[25] zip_2.3.1           digest_0.6.35       stringi_1.8.3      
[28] splines_4.3.2       fastmap_1.1.1       colorspace_2.1-0   
[31] cli_3.6.2           magrittr_2.0.3      base64enc_0.1-3    
[34] survival_3.5-8      utf8_1.2.4          IRdisplay_1.1      
[37] withr_3.0.0         scales_1.3.0        IRkernel_1.3.2     
[40] timechange_0.3.0    pbdZMQ_0.3-11       hms_1.1.3          
[43] DNAcopy_1.76.0      evaluate_0.23       rlang_1.1.3        
[46] Rcpp_1.0.13         glue_1.7.0          jsonlite_1.8.8     
[49] R6_2.5.1            plyr_1.8.9         
@biosunsci
Copy link
Contributor

hi @juferban, could you post some of your data which can lead to the bug to make us easy to Reproduce the bug?

@juferban
Copy link
Author

Hi, Thanks for your quick response.
I will generate a couple of files and will upload them so you can use them for testing.
I will upload them soon.

Thanks a lot.

@juferban
Copy link
Author

juferban commented Sep 26, 2024

Hi @biosunsci

I am attaching the example files to be able to reproduce my problem.
Also the code Is used for testing is as follow:

## Load MAF files
maf_object = read.maf(maf = "mutations_filtered.maf", 
                      clinicalData = "sample_annot_for_maf.txt", isTCGA = FALSE)

## Make sure the continuous variables are shows as continuous
[email protected]$Response = as.numeric([email protected]$Response)
[email protected]$Volume_Change = as.numeric([email protected]$Volume_Change)
[email protected]$Treatment_Duration = as.numeric([email protected]$Treatment_Duration)

# Sort the clinical data by multiple variables, as I want to make sure I use my predefined sample sorting
sorted_clinical_data <- [email protected][order(
  [email protected]$Gender,
  [email protected]$Treatment_Group,
  
  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na([email protected]$Response ), 1000, as.numeric([email protected]$Response ))),
  
  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na([email protected]$Volume_Change ), 1000, as.numeric([email protected]$Volume_Change ))),
  
  # Handle NAs: NA values are set to 1000 so they appear first
  ifelse(is.na([email protected]$Treatment_Duration), 1000, as.numeric([email protected]$Treatment_Duration))
), ]

[email protected] <- sorted_clinical_data

# Extract the sorted sample names
sorted_samples <- sorted_clinical_data$Tumor_Sample_Barcode

## Create the oncoplot
oncoplot(maf = maf_object, 
         removeNonMutated = FALSE, 
         fill = TRUE, 
         clinicalFeatures = c('Gender','Treatment_Group','Response','Volume_Change','Treatment_Duration'),
         sampleOrder = sorted_samples,
         showTitle = TRUE,
         titleFontSize = 1.5,
         legendFontSize = 1,
         annotationFontSize = 1,
         SampleNamefontSize = 0.5,
         fontSize = 0.7,
         showTumorSampleBarcodes = TRUE,
         barcode_mar = 3,
         gene_mar = 5,
         legend_height = 4,
         anno_height = 1.5,
         annoBorderCol = "white",
         drawRowBar = TRUE,
         genesToIgnore = 'KRAS',
         numericAnnoCol = TRUE,
         showPct = TRUE,
         rightBarLims = c(0, 100),
         leftBarLims = c(0, 100),
)

If I only use the clinical variables 'Gender', 'Treatment_Group' and 'Response' with Response being the only continuous variable, the coloring is correctly applied. As soon as I incorporate the other two continuous variables the coloring gets mixed up.

Thanks a lot,

oncoplots_examples.zip

This is my session info:

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRblas.so 
LAPACK: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] pROC_1.18.5          trackViewer_1.38.2   GenomicRanges_1.54.1
 [4] GenomeInfoDb_1.38.8  IRanges_2.36.0       S4Vectors_0.40.2    
 [7] BiocGenerics_0.48.1  maftools_2.18.0      pheatmap_1.0.12     
[10] survminer_0.4.9      ggpubr_0.6.0         survival_3.5-8      
[13] RColorBrewer_1.1-3   statmod_1.5.0        ggrepel_0.9.5       
[16] edgeR_4.0.16         limma_3.58.1         reshape2_1.4.4      
[19] openxlsx_4.2.5.2     lubridate_1.9.3      forcats_1.0.0       
[22] stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2         
[25] readr_2.1.5          tidyr_1.3.1          tibble_3.2.1        
[28] ggplot2_3.5.0        tidyverse_2.0.0      data.table_1.15.4   
[31] optparse_1.7.5       monoceRos_1.0.5     

loaded via a namespace (and not attached):
  [1] splines_4.3.2               pbdZMQ_0.3-11              
  [3] BiocIO_1.12.0               bitops_1.0-7               
  [5] filelock_1.0.3              graph_1.80.0               
  [7] XML_3.99-0.16.1             rpart_4.1.23               
  [9] lifecycle_1.0.4             rstatix_0.7.2              
 [11] ensembldb_2.26.0            lattice_0.22-6             
 [13] backports_1.4.1             magrittr_2.0.3             
 [15] Hmisc_5.1-2                 rmarkdown_2.26             
 [17] plotrix_3.8-4               yaml_2.3.8                 
 [19] zip_2.3.1                   Gviz_1.46.1                
 [21] DBI_1.2.2                   abind_1.4-5                
 [23] zlibbioc_1.48.2             AnnotationFilter_1.26.0    
 [25] biovizBase_1.50.0           RCurl_1.98-1.14            
 [27] nnet_7.3-19                 VariantAnnotation_1.48.1   
 [29] rappdirs_0.3.3              GenomeInfoDbData_1.2.11    
 [31] KMsurv_0.1-5                grImport_0.9-7             
 [33] codetools_0.2-20            getopt_1.20.4              
 [35] DelayedArray_0.28.0         xml2_1.3.6                 
 [37] DNAcopy_1.76.0              tidyselect_1.2.1           
 [39] matrixStats_1.3.0           BiocFileCache_2.10.2       
 [41] base64enc_0.1-3             GenomicAlignments_1.38.2   
 [43] jsonlite_1.8.8              Formula_1.2-5              
 [45] tools_4.3.2                 progress_1.2.3             
 [47] strawr_0.0.91               Rcpp_1.0.13                
 [49] glue_1.7.0                  gridExtra_2.3              
 [51] SparseArray_1.2.4           xfun_0.43                  
 [53] MatrixGenerics_1.14.0       IRdisplay_1.1              
 [55] withr_3.0.0                 fastmap_1.1.1              
 [57] rhdf5filters_1.14.1         latticeExtra_0.6-30        
 [59] fansi_1.0.6                 digest_0.6.35              
 [61] timechange_0.3.0            R6_2.5.1                   
 [63] colorspace_2.1-0            Cairo_1.6-2                
 [65] jpeg_0.1-10                 dichromat_2.0-0.1          
 [67] biomaRt_2.58.2              RSQLite_2.3.6              
 [69] utf8_1.2.4                  generics_0.1.3             
 [71] rtracklayer_1.62.0          InteractionSet_1.30.0      
 [73] prettyunits_1.2.0           httr_1.4.7                 
 [75] htmlwidgets_1.6.4           S4Arrays_1.2.1             
 [77] pkgconfig_2.0.3             gtable_0.3.4               
 [79] blob_1.2.4                  XVector_0.42.0             
 [81] survMisc_0.5.6              htmltools_0.5.8.1          
 [83] carData_3.0-5               ProtGenerics_1.34.0        
 [85] scales_1.3.0                Biobase_2.62.0             
 [87] png_0.1-8                   knitr_1.46                 
 [89] km.ci_0.5-6                 rstudioapi_0.16.0          
 [91] tzdb_0.4.0                  rjson_0.2.21               
 [93] uuid_1.2-0                  checkmate_2.3.1            
 [95] curl_5.2.1                  rhdf5_2.46.1               
 [97] repr_1.1.7                  cachem_1.0.8               
 [99] zoo_1.8-12                  parallel_4.3.2             
[101] foreign_0.8-86              AnnotationDbi_1.64.1       
[103] restfulr_0.0.15             pillar_1.9.0               
[105] vctrs_0.6.5                 car_3.1-2                  
[107] dbplyr_2.5.0                xtable_1.8-4               
[109] cluster_2.1.6               htmlTable_2.4.2            
[111] Rgraphviz_2.46.0            evaluate_0.23              
[113] GenomicFeatures_1.54.4      cli_3.6.2                  
[115] locfit_1.5-9.9              compiler_4.3.2             
[117] Rsamtools_2.18.0            rlang_1.1.3                
[119] crayon_1.5.2                ggsignif_0.6.4             
[121] interp_1.1-6                getPass_0.2-4              
[123] plyr_1.8.9                  stringi_1.8.3              
[125] deldir_2.0-4                BiocParallel_1.36.0        
[127] munsell_0.5.1               Biostrings_2.70.3          
[129] lazyeval_0.2.2              Matrix_1.6-5               
[131] IRkernel_1.3.2              BSgenome_1.70.2            
[133] hms_1.1.3                   bit64_4.0.5                
[135] Rhdf5lib_1.24.2             KEGGREST_1.42.0            
[137] SummarizedExperiment_1.32.0 broom_1.0.5                
[139] memoise_2.0.1               bit_4.0.5                 

@PoisonAlien
Copy link
Owner

Hi,

Thank you for the files. I have fixed the issue. You should be able to define your own color codes for each continuoius variable as well.

Just mention any of the sequetial color codes from RcolorBrewer package and it should do the trick.

oncoplot(
  maf = maf_object,
  removeNonMutated = FALSE,
  fill = TRUE,
  clinicalFeatures = c('Treatment_Duration', 'Treatment_Group', 'Response', 'Volume_Change', 'Gender'),
  sortByAnnotation = T,
  anno_height = 3,
  annotationColor = list(Gender = c("M" = "black", 'F' = "pink"),
    Treatment_Group = c("Treatment1" = "royalblue", "Treatment2" = "maroon"),
    Treatment_Duration = "Blues", Response = "Reds",Volume_Change = "Purples"),
  annoBorderCol = 'black')

If not provided, it will randomly select from the available pallets.

Please let me know if this fixes the issue.

@juferban
Copy link
Author

Thanks a lot for the quick fix. Really appreciate it.

I will give it I try on my analysis and will report back if still having any issues.

Thanks again,

Julio

@Zhongqige
Copy link

I had similar issues. I think the issue happened when sampleOrder is applied, then the continuous clinical feature did not match the ordered samples.

@PoisonAlien
Copy link
Owner

Hi @Zhongqige ,

This is fixed in the recent commit. Could you please try a fresh installation from GitHub and let me know if it works?

BiocManager::install("PoisonAlien/maftools")

@Zhongqige
Copy link

tcga_test_w_sampleOrder.pdf
tcga_test_wo_sampleOrder.pdf
Hi, Thanks for the quick response! However, I just tested, using @biosunsci tcga data, and attached result with and without the parameter sampleOrder = sorted_samples, seems still the same sample got different Response value.

@PoisonAlien
Copy link
Owner

Hi @Zhongqige ,

I have trouble reproducing the issue. The function respects the sample order and the corresponding variables.
Could you maybe post the complete set of commands that you used? Please make sure that you have updated the package from GitHub and restarted your R session to make changes.

@Zhongqige
Copy link

Zhongqige commented Oct 10, 2024

@PoisonAlien I did install the latest version 2.21.1 and restarted my R session, and below is my command (Basically using @juferban):

## Load MAF files
maf_object = read.maf(maf = "./oncoplots_examples/mutations_filtered.maf", 
                      clinicalData = "./oncoplots_examples/sample_annot_for_maf.txt", isTCGA = FALSE)

## Make sure the continuous variables are shows as continuous
maf_object@clinical.data$Response = as.numeric(maf_object@clinical.data$Response)
maf_object@clinical.data$Volume_Change = as.numeric(maf_object@clinical.data$Volume_Change)
maf_object@clinical.data$Treatment_Duration = as.numeric(maf_object@clinical.data$Treatment_Duration)

# Sort the clinical data by multiple variables, as I want to make sure I use my predefined sample sorting
sorted_clinical_data <- maf_object@clinical.data[order(
  maf_object@clinical.data$Gender,
  maf_object@clinical.data$Treatment_Group,
  
  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Response ), 1000, as.numeric(maf_object@clinical.data$Response ))),
  
  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Volume_Change ), 1000, as.numeric(maf_object@clinical.data$Volume_Change ))),
  
  # Handle NAs: NA values are set to 1000 so they appear first
  ifelse(is.na(maf_object@clinical.data$Treatment_Duration), 1000, as.numeric(maf_object@clinical.data$Treatment_Duration))
), ]

maf_object@clinical.data <- sorted_clinical_data

# Extract the sorted sample names
sorted_samples <- sorted_clinical_data$Tumor_Sample_Barcode

pdf("./tcga_test_wo_sampleOrder.pdf", 12, 8)
oncoplot(maf = maf_object, 
         removeNonMutated = FALSE, 
         fill = TRUE, 
         clinicalFeatures = c('Gender','Treatment_Group','Response'), #, 'Volume_Change','Treatment_Duration'
         #sampleOrder = sorted_samples,
         annotationColor = list(Gender = c("F" = "deeppink", "M" = "dodgerblue"),
                                Treatment_Group = c("Treatment1" = "salmon", "Treatment2" = "yellowgreen"),
                                Response = "Blues"
           
         ),
         showTitle = TRUE,
         titleFontSize = 1.5,
         legendFontSize = 1,
         annotationFontSize = 1,
         SampleNamefontSize = 0.5,
         fontSize = 0.7,
         showTumorSampleBarcodes = TRUE,
         barcode_mar = 3,
         gene_mar = 5,
         legend_height = 4,
         anno_height = 1.5,
         annoBorderCol = "white",
         drawRowBar = TRUE,
         genesToIgnore = 'KRAS',
         numericAnnoCol = TRUE,
         showPct = TRUE,
         rightBarLims = c(0, 100),
         leftBarLims = c(0, 100),
)
dev.off()

pdf("./tcga_test_w_sampleOrder.pdf", 12, 8)
oncoplot(maf = maf_object, 
         removeNonMutated = FALSE, 
         fill = TRUE, 
         clinicalFeatures = c('Gender','Treatment_Group','Response'), #, 'Volume_Change','Treatment_Duration'
         sampleOrder = sorted_samples,
         annotationColor = list(Gender = c("F" = "deeppink", "M" = "dodgerblue"),
                                Treatment_Group = c("Treatment1" = "salmon", "Treatment2" = "yellowgreen"),
                                Response = "Blues"
                                
         ),
         showTitle = TRUE,
         titleFontSize = 1.5,
         legendFontSize = 1,
         annotationFontSize = 1,
         SampleNamefontSize = 0.5,
         fontSize = 0.7,
         showTumorSampleBarcodes = TRUE,
         barcode_mar = 3,
         gene_mar = 5,
         legend_height = 4,
         anno_height = 1.5,
         annoBorderCol = "white",
         drawRowBar = TRUE,
         genesToIgnore = 'KRAS',
         numericAnnoCol = TRUE,
         showPct = TRUE,
         rightBarLims = c(0, 100),
         leftBarLims = c(0, 100),
)
dev.off()

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.7

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] maftools_2.21.1

loaded via a namespace (and not attached):
 [1] DNAcopy_1.72.3      rstudioapi_0.14     magrittr_2.0.3      splines_4.2.2       tidyselect_1.2.0   
 [6] lattice_0.20-45     R6_2.5.1            rlang_1.0.6         fansi_1.0.4         dplyr_1.1.0        
[11] tools_4.2.2         grid_4.2.2          pkgbuild_1.4.0      data.table_1.14.6   utf8_1.2.3         
[16] cli_3.6.0           withr_2.5.0         remotes_2.5.0       survival_3.4-0      rprojroot_2.0.3    
[21] tibble_3.1.8        lifecycle_1.0.3     crayon_1.5.2        Matrix_1.5-3        processx_3.8.0     
[26] BiocManager_1.30.19 RColorBrewer_1.1-3  callr_3.7.3         vctrs_0.5.2         ps_1.7.2           
[31] curl_5.0.0          glue_1.6.2          compiler_4.2.2      pillar_1.8.1        desc_1.4.2         
[36] generics_0.1.3      prettyunits_1.1.1   pkgconfig_2.0.3   

@juferban
Copy link
Author

@PoisonAlien

Hi, Sorry for my delay with additional testing. I am having the same issue as reported by @Zhongqige when testing the code after the update using the BiocManager::install("PoisonAlien/maftools"). The samples are still getting the colors assigned in a somehow random way even though the order is correct.

PoisonAlien added a commit that referenced this issue Nov 7, 2024
@PoisonAlien
Copy link
Owner

Hello all!

Sorry for the delay. It took a while to figure out the issue. It turns out that just the colors were flipped. I have fixed it. Please install it from GitHub for changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants