Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the scores reported in the paper #5

Open
pxliang opened this issue Oct 21, 2024 · 4 comments
Open

Cannot reproduce the scores reported in the paper #5

pxliang opened this issue Oct 21, 2024 · 4 comments

Comments

@pxliang
Copy link

pxliang commented Oct 21, 2024

I am trying to reproduce the results from the paper using the code provided in this repository. However, I am encountering significantly lower scores than those reported in the paper for the TCGA-Lung dataset. For the TCGA-lung 16 shot setting, I can only get around 0.68 AUC score rather than 0.82 AUC score reported in the paper. Since the pre-processing details are not given, I worked on 10x magnification and utilized CLIP to extract features. Could you please give more instructions on reproducing the scores reported in the paper?

@miccaiif
Copy link
Owner

Thank you very much for your interest in our work!

In our experiments, we used 20x magnification for the TCGA dataset, dividing it into 224x224 patches, and utilized CLIP for feature extraction. In the challenging few-shot WSI setting, we observed that the results are highly sensitive to the random seed configuration. Specifically, different selections of few-shot WSIs can significantly affect the outcomes, especially if the chosen training samples are not representative or if there is a substantial difference between the selected samples and the corresponding language prompts. Unfortunately, the field currently lacks a standardized benchmark dataset for few-shot WSIs. In our code, we explored the performance across various seed settings for training samples, ensuring that all methods used the same seed.

@pxliang
Copy link
Author

pxliang commented Oct 25, 2024

Thank you very much for your reply! I think in the pre-processing step, the patch features from CLIP model need to be normalized. I got better results using the normalized patch features.

I have another question regarding the COop baseline. What kind of pooling strategy you used for COop baseline?

@pxliang
Copy link
Author

pxliang commented Oct 27, 2024

And by the way, what's the magnification for the Camelyon16 dataset? I saw that in the released code, Camelyon16 uses the 5x. Is it the result reported in the paper?

@invisprints
Copy link

@pxliang Hi, could you please tell me how these files train_feats.npy, train_corresponding_slide_label.npy, train_corresponding_slide_index.npy, and train_corresponding_slide_name.npy — were generated? Many thanks!

self.all_patches = np.load(os.path.join(feat_dir, "train_feats.npy"))
self.patch_corresponding_slide_label = np.load(os.path.join(feat_dir, "train_corresponding_slide_label.npy"))
self.patch_corresponding_slide_index = np.load(os.path.join(feat_dir, "train_corresponding_slide_index.npy"))
self.patch_corresponding_slide_name = np.load(os.path.join(feat_dir, "train_corresponding_slide_name.npy"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants