Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data split for all the few-shot experiments? #3

Open
Lewislou opened this issue May 13, 2024 · 5 comments
Open

Data split for all the few-shot experiments? #3

Lewislou opened this issue May 13, 2024 · 5 comments

Comments

@Lewislou
Copy link

Hi ,what's your data split for few-shot experiments? For example, in one-shot or two-shot setting, how to split training/validation set?

@miccaiif
Copy link
Owner

Hi ,what's your data split for few-shot experiments? For example, in one-shot or two-shot setting, how to split training/validation set?

Hello! Thanks for your attention. The data split for few-shot experiments is controlled by "random seed" directly in the code, e.g. the seed in exp_CAMELYON.sh. We did not manually split the dataset because of the inconvenience. And in specific shots, the experimental settings between different methods are the same. You could benchmark the dataset using the seed by yourself, and maintain that setting to implement various comparison methods. Also, the few-shot WSI classification indeed needs a benchmark dataset, and that is our future direction.

@Lewislou
Copy link
Author

Hi,

Thanks for your quick response~. I have diligently tried seeds 0, 128, 192, 111111, and 101012, but have found it challenging to replicate the high AUC results described in your recent paper. Would it be possible for you to share the specific seed or the training/testing slide names that were used for the Camelyon and NSCLC datasets? This information would greatly assist us in accurately following your implementation and could significantly contribute to extending the impact of your important work.

@Eli-YiLi
Copy link

The setting is not reasonable, because few-shot performance is highly related to the split. Repeat 5 times at different data split is much more standard instead of a fixed split for each run. If one good split is used, the results are hard to reproduce.

@miccaiif
Copy link
Owner

Hi,

Thanks for your quick response~. I have diligently tried seeds 0, 128, 192, 111111, and 101012, but have found it challenging to replicate the high AUC results described in your recent paper. Would it be possible for you to share the specific seed or the training/testing slide names that were used for the Camelyon and NSCLC datasets? This information would greatly assist us in accurately following your implementation and could significantly contribute to extending the impact of your important work.

Thank you very much for your response, and I apologize for the delayed reply.

In the challenging few-shot WSI setting, we found that the results are highly sensitive to the random seed configuration. Specifically, different selections of few-shot WSIs can significantly impact the outcomes, particularly when the chosen training samples are not representative or when there is a substantial mismatch between the selected samples and the corresponding language prompts. Unfortunately, there is currently no standardized benchmark dataset available for few-shot WSIs. In our code, we assessed the performance across various seed settings to ensure consistency, applying the same seed across all methods.

During training on the server, we used seeds to preserve the settings and recorded the results manually in a notepad, without making any manual dataset splits. I will take the time to identify a dataset split that is better suited to this method when my schedule permits.

@miccaiif
Copy link
Owner

The setting is not reasonable, because few-shot performance is highly related to the split. Repeat 5 times at different data split is much more standard instead of a fixed split for each run. If one good split is used, the results are hard to reproduce.

Thank you for your comment. Indeed! In our initial submission, we evaluated the results using experiments with 5 random seeds. This involved calculating the average and variance across 5 random splits while keeping the same settings for all methods. However, most reviewers criticized this approach of using 5 random seeds and suggested using the same seed for all five runs. Therefore, in the final version, we followed this suggestion, running five times with the same seed while maintaining the same settings across all methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants