You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
I have recently downloaded the May 2024 version of PRScs and updated the scripts as per our discussion in issue #76 and wanted to send you my tested code for the same.
These scripts work well with GWAS summary stats downloaded from GWAS-catalog which typically have the following header.
zcat ../GWAS/GCST90399677.h.tsv.gz | head
chromosome base_pair_location effect_allele other_allele beta standard_error effect_allele_frequency p_value rsid is_strand_flip rs_id N_ctrl n_bbk is_diff_AF_gnomAD n_dataset inv_var-het_p direction N_case hm_coordinate_conversion hm_code variant_id
1 16226 A AG -0.1643 0.079081 0.01037 0.037739999999999996 rs755466349 no rs755466349 330526 2 no 2 0.6037 ??--?????? 24773 lo 10 1_16226_AG_A
1 48186 G T -0.20347 0.15655 0.003232 0.1937 rs199900651 no rs199900651 269406 2 no 2 0.5154 ???-??-??? 17186 lo 10 1_48186_T_G
1 55326 C T 0.0046873 0.14151 0.07157000000000001 0.9736 rs3107975 no rs3107975 94636 2 no 2 0.9798 ?+???+???? 1282 lo 10 1_55326_T_C
This is how to use the script above. python /softwares/PRScs/PRScs_colnames.py --ref_dir=$ldref/ldblk_1kg_eur --bim_prefix=$bim_dir/inputfile --sst_file=$SUM_STATS_FILE --n_gwas=$GWAS_SAMPLE_SIZE --out_dir=$out_dir/output.PRScs --SNP=rs_id --A1=effect_allele --A2=other_allele --BETA=beta --P=p_value
I do have a followup request.
Many a times, the GWAS summary stats files only provide the tested allele/minor allele/A1 . Can you please modify PRScs so that It can work with such data input. It would not be hard to grab common snps between the Ldref , BIM and summStats files, by RSID (snp id) and A1 allele at a minimum, while adjusting the Beta if the A1 is major allele instead of minor, by looking up in the hap map snp list provided by your software at $PRScsREFDIR/ldblk_1kg_eur/snpinfo_1kg_hm3
Example:
This summary stats file from the GWAS catalog (GCST006479 [https://www.ebi.ac.uk/gwas/studies/GCST006479]) does not have A2 allele , but provides all the other required columns.
SNP ALLELE iscores NBETA-clinical_c_K57 NSE-clinical_c_K57 PV-clinical_c_K57
10:62535_C_A A 0.8509559999999999 -0.04962 0.087016 0.56852
10:66208_T_C C 0.239414 -0.05802 0.051094 0.25614000000000003
10:67991_A_C C 0.299157 -0.022293 0.036826 0.54495
rs11252546 C 0.9844569999999999 -0.00016589 0.0005241000000000001 0.7516
rs12255619 C 0.995585 -0.00072331 0.00088709 0.41486000000000006
rs7909677 G 1.0 -0.00067211 0.00088585 0.44803000000000004
rs10904494 C 0.9968440000000001 -4.3866e-05 0.00051934 0.93269
rs11591988 T 0.978547 -0.00018362 0.00080765 0.82015
rs4508132 C 0.99779 -0.00084295 0.00066752 0.20665999999999998
rs9419461 T 0.9875889999999999 -0.0004753 0.00071174 0.50427
rs10904561 G 0.99069 -0.00021088 0.00053584 0.69391
rs11253478 T 0.978096 -0.00016417 0.00080747 0.8388899999999999
rs4495823 A 0.995431 -0.0008757 0.00066695 0.18919
And the common snps with the $PRScsREFDIR/ldblk_1kg_eur/snpinfo_1kg_hm3 file are
10 rs12255619 98481 C A 0.066600
10 rs11252546 104427 C T 0.369800
10 rs7909677 111955 G A 0.067590
10 rs10904494 113934 C A 0.368800
10 rs9419461 124767 T C 0.126200
10 rs11591988 126070 T C 0.103400
10 rs4508132 131636 T C 0.154100
10 rs10904561 135656 G T 0.351900
10 rs7917054 135708 A G 0.469200
It would be very helpful to have PRScs work, when A2 column is not provided in the summary stats file.
Hope my scripts help others too.
The text was updated successfully, but these errors were encountered:
Thank you for sharing the scripts—I believe they will be beneficial for many people.
Regarding your request, currently I don’t think I have the bandwidth to modify PRScs to accommodate the new format. GWAS summary statistics come in various formats, some of which are not best practices (such as failing to report A2), making it challenging to accommodate all variations. Therefore, I've decided to leave it to users to preprocess the summary statistics into a specified format.
However, as I mentioned in issue 76, we are working on an algorithmic extension of PRScs, along with command-line options to select columns, which will accommodate a much larger range of formats. We hope to release those tools soon.
Hi
I have recently downloaded the May 2024 version of PRScs and updated the scripts as per our discussion in issue #76 and wanted to send you my tested code for the same.
parse_genet_newColNames.py.txt
PRScs_colnames.py.txt
These scripts work well with GWAS summary stats downloaded from GWAS-catalog which typically have the following header.
This is how to use the script above.
python /softwares/PRScs/PRScs_colnames.py --ref_dir=$ldref/ldblk_1kg_eur --bim_prefix=$bim_dir/inputfile --sst_file=$SUM_STATS_FILE --n_gwas=$GWAS_SAMPLE_SIZE --out_dir=$out_dir/output.PRScs --SNP=rs_id --A1=effect_allele --A2=other_allele --BETA=beta --P=p_value
I do have a followup request.
Many a times, the GWAS summary stats files only provide the tested allele/minor allele/A1 . Can you please modify PRScs so that It can work with such data input. It would not be hard to grab common snps between the Ldref , BIM and summStats files, by RSID (snp id) and A1 allele at a minimum, while adjusting the Beta if the A1 is major allele instead of minor, by looking up in the hap map snp list provided by your software at
$PRScsREFDIR/ldblk_1kg_eur/snpinfo_1kg_hm3
Example:
This summary stats file from the GWAS catalog (GCST006479 [https://www.ebi.ac.uk/gwas/studies/GCST006479]) does not have A2 allele , but provides all the other required columns.
And the common snps with the $PRScsREFDIR/ldblk_1kg_eur/snpinfo_1kg_hm3 file are
It would be very helpful to have PRScs work, when A2 column is not provided in the summary stats file.
Hope my scripts help others too.
The text was updated successfully, but these errors were encountered: