Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue about Metabric dataset #1

Open
heyhola opened this issue Nov 24, 2022 · 3 comments
Open

Issue about Metabric dataset #1

heyhola opened this issue Nov 24, 2022 · 3 comments

Comments

@heyhola
Copy link

heyhola commented Nov 24, 2022

Hello Rahul,
Thanks for the great work ! I tried to use metabric dataset, but I have applied for a long time without passing. I see that you have this dataset here, but it is not marked. Could you tell me what x1-x9 in metabric.csv stands for?
Best wishes!
Siqi

@sourcesync
Copy link

@heyhola I had the same question. Look at the paper and especially this section:

METABRIC: The Molecular Taxonomy of Breast Cancer International Consortium
(METABRIC) is a clinical dataset which consists of gene expressions used to determine different subgroups of breast cancer. We consider the data for 1,904 patients
with each patient having 9 covariates - 4 gene indicators (MKI67, EGFR, PGR, and
ERBB2) and 5 clinical features (hormone treatment indicator, radiotherapy indicator,
chemotherapy indicator, ER-positive indicator, age at diagnosis). Furthermore, out
of the total 1,904 patients, 801 (42.06%) are right-censored, and the rest are deceased
(event). We obtained the DAG as depicted in Fig. 3 using a modified DAG-GNN
algorithm.

@rahulk207
Copy link
Owner

Hi folks, and sorry @heyhola for missing your comment. For METABRIC, please assume x1-x9 to be in the same order as mentioned in the above paper snippet shared by @sourcesync. However for GBSG, this isn't the case and the order is different than the one mentioned in the paper. Will check and update the repo for that if required by you guys. Sorry for the confusion.

@sourcesync
Copy link

Thanks @rahulk207. Just so you know...I'm currently running "main.py" and looks good so far. I'm just trying to reproduce your results on my machine before exploring some ideas around attention and transformer models. It would be nice to know what the gene expression numeric number means as well ( I'm not a domain expert in this area. ). Thanks for your follow-up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants