Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility of Supplementary Table 7 #57

Open
Zengggg opened this issue Nov 15, 2024 · 0 comments
Open

Reproducibility of Supplementary Table 7 #57

Zengggg opened this issue Nov 15, 2024 · 0 comments

Comments

@Zengggg
Copy link

Zengggg commented Nov 15, 2024

Hi,

I am having trouble in reproducing the results in Supplementary Table 7.

I preprocessed PBMC 10k dataset according to https://github.com/Durenlab/LINGER/blob/main/docs/PBMC.md, remaining 9543 barcodes, 25485 genes and 143885 peaks. After training the model, I inferred GRNs for all cell types by setting " celltype='all' ". For evaluation, I downloaded 20 ChIP-seq datasets from CistromeDB and just kept targets with score > 0.

By running below code:

def cal_auc_aupr(df, df_ref):
    df = pd.merge(df, df_ref, how='left').fillna(0)
    true = df.real_score
    scores = df.score
    fpr, tpr, thresholds = metrics.roc_curve(true, scores, pos_label=1)
    AUC = metrics.auc(fpr, tpr)

    precision, recall, _thresholds2 = metrics.precision_recall_curve(true, scores)
    AUPRC = metrics.auc(recall, precision)

    return AUC, AUPRC, df.real_score.sum() / len(df)

df_eval = pd.DataFrame(columns=['CistromeID', 'TF', 'cell_type', 'AUROC', 'AUPRC', 'TP', 'len(df)'])
df_grn = pd.read_csv("gold_standard_from_cistromeDB/GRN_info.csv", sep='\t').iloc[:, :-1]
for i in range(len(df_grn)):
    ID, TF, type = df_grn.iloc[i, :]
    out_file = f"/mnt/second19T/zengyp/project/LINGER/output/cell_type_specific_trans_regulatory_{type}.txt"
    df = pd.read_csv(out_file, sep='\t', index_col=0).T
    df = df.stack().reset_index()
    df.columns = ['TF', 'TG', 'score']
    df_TF = df.loc[df.TF==TF]
    df_ref = pd.read_csv(f"gold_standard_from_cistromeDB/{ID}_{TF}.csv").iloc[:, :2]
    df_ref['real_score'] = 1
    AUC, AUPRC, TP = cal_auc_aupr(df_TF, df_ref)
    res = {'CistromeID':[ID], 'TF':[TF], 'cell_type':[type], 'AUROC':[AUC], 'AUPRC':[AUPRC], 'TP':[TP], 'len(df)':[len(df_TF)]}
    res = pd.DataFrame(res)
    df_eval = pd.concat([df_eval, res], ignore_index=True)
print(df_eval)

The result is:
image

It is different from Supplementary Table 7. Did you filter more targets on ChIP-seq datasets? Besides, how to define random predictor? Is it #TP in the specific network / #genes in the scRNA-seq after preprocessing?

Can you point out what the problem is in my evaluation?
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant