Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut offs for Volcano Plots greatly shifted from significance #270

Open
steph-robinson opened this issue Mar 4, 2024 · 0 comments
Open
Assignees

Comments

@steph-robinson
Copy link

steph-robinson commented Mar 4, 2024

Cut-offs_VolcanoPlots

Inserting image up top show the discrepancy I'm seeing. I'm employing the stats from this package by applying select formulas to log2(LFQ) by repeatedly calling the perform_ttest_analysis function in the following function.

def alphastats_ttest(data, s0, fdr): 
    dfs = [df.reset_index() for df in data]
    stat_dfs = []
    t_limits = []
    for df in dfs: 
        grp1colnames = df.columns[df.columns.str.contains('WT')].to_list()
        grp2colnames = df.columns[df.columns.str.contains('Test')].to_list()
        
        stats, tmax = perform_ttest_analysis(
                                            df,  
                                            grp2colnames, 
                                            grp1colnames, 
                                            s0=s0, #Refer to Tusher et al. 2001 for s0 definition. 
                                            n_perm=2,
                                            fdr=fdr, #5% FDR 
                                            id_col="Uniprot",
                                            plot_fdr_line=True,
                                            parallelize=True
        )

        stat_dfs.append(stats)
        t_limits.append(tmax)
        
    return stat_dfs, t_limits

I then calculate df's from the list of returned t_limits, before plotting and getting the volcanoes attached here. Seems that there is a disconnect between how the ttest cut-off is being calculated in the get_MaxS vs the usual perform_ttest_analysis?

Here's the last portion of my code before plotting.

cut_offs = []

for i, df in enumerate(stats):
    n_x, n_y = len(df), len(df)
    s0 = 1.5
    
    cut_off = get_fdr_line(t_limits[i],
                        s0, n_x, n_y, plot=False,
                        fc_s=np.arange(0, 10, 0.05),
                        s_s=np.arange(0.005, 10, 0.05))
    
    cut_off['-logp'] = -np.log(cut_off['pvals']) #transform into -log10 space for plotting
    cut_offs.append(cut_off)
@JuliaS92 JuliaS92 self-assigned this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants