-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicate Figure2A and Figure3A on manuscript #8
Comments
Thank you for your interest, the filtering parameters we used are as follows: sites_initial: 10, runs_initial: 5, sites_quality: 10, and runs_quality: 1000. In addition to the filtering parameters, for Figure2A, the imputation parameters we iterated with were knn=20 and iteration=3. We also adjusted the weighting ratio between imputation information and sparsity information used in clustering based on data characteristics to 0.6 (this parameter is typically stable and not included in the readme, but users can adjust it according to their data requirements at line 108 in normalize.py). For Figure3A, our iteration imputation parameters were knn=10 and iteration=3. Furthermore, the parameter settings from the initial mapping step to the subsequent clustering step (including seed numbers) could also influence the final clustering results. We hope the above information proves helpful to you. |
Hi, thanks for your reply. I dont quite understand Also, I was using all Fastq files (756 in total) with only one patient for Figure2A, which is same as you used. And all Fastq files for Figure3A (1533 in total). I'm not sure if you only reported filtered fastqs in table 1 after running SCASL, because it seems like the paper did not explicitly mention the processing step. |
The line 108 in normalize.py defines the imputation information used during clustering. Since the The involvement of other cells in clustering can indeed have a significant impact on the clustering results. Initially, we did not perform any specific data processing; it was mostly standard filtering methods such as quality screening and selection of cell types (information about cell types is mentioned in the original literature of public data). For Figure2A, we only used cells from patient H010, retaining 422 cells after rigorous cell quality control and splicing profile filtering. In the case of Figure3A, as our main focus was on studying tumor cells, we specifically chose epithelial cells for analysis. The 1533 files you mentioned likely contain cells of all other cell types. Hope this information can help you. |
Thanks for your reply. If I understand correctly, numeric change for You reported 422 cells for Figure2A in Table 1, but there was actually only 405 cells for Figure2A (see supply data). That's why I'm wondering if:
I dont worry about too much for Figure3A cause it seems like more reproducible. Thanks. |
For the first question, while Regarding the second question, we did not perform any additional preprocessing; instead, we filtered all fastq files through SCASL. The 422 cells used in clustering (as shown in Table 1 and Figure 2A). However, a few cells within this set were not utilized in the results of the data source paper and were not labeled as "tumor" or "metastasis". Consequently, in the final supply data, I removed these cells (leaving only 405 cells, as displayed in the source data). I apologize for any confusion this may have caused. I hope this information proves helpful to you. |
Hi, |
Hi, Thanks for your reply. I modified the parameter to balance the relative contribution as you said (0.6), but it did not show too much difference compared to the default one, which means still showed a signifcantly dfferent results compared to Fig2A. Also, you said only 405 cells being used becasue of unavailable labels, which sounds not true to me because its original GEO resource listed labels for every cells in the meta file. |
Hi,
I was trying to replicate your results using same set of fastq files as your paper used.
But I failed to reproduce the same trend of results. I was wondering what specific parameters you used in configure file?
I was using:
sites_initial: 1 runs_initial: 20 sites_quality: 10 runs_quality: 1000
It seems like Figure3A is more likely to be reproduced (still show discrepancy but ARI still can be accepted), but Figure2A is hugely different by using thereshold as above. All processed junction reads were carefuly handled (I used both STAR and HISAT2 for benchmarking purpose before using LeafCutter). Thanks.
The text was updated successfully, but these errors were encountered: