Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculated LAI is too large #171

Open
yaoxkkkkk opened this issue May 21, 2024 · 6 comments
Open

Calculated LAI is too large #171

yaoxkkkkk opened this issue May 21, 2024 · 6 comments

Comments

@yaoxkkkkk
Copy link

yaoxkkkkk commented May 21, 2024

Thank you for your effort on these tools, I am using EDTA (v2.2.1) to carry out de novo TE annotation. After I used EDTA output file fasta.mod.EDTA.raw/LTR/.fasta.mod.pass.list and .fasta.mod.EDTA.anno/.fasta.mod.out as input to calculate LAI, but the LAI seemed abnormal:

Chr	From	To	Intact	Total	raw_LAI	LAI
whole_genome	1	521293298	0.0515	0.0667	77.13	74.94
Chr01_hap1	1	3000000	0.0355	0.0234	100.1	97.91
Chr01_hap1	300001	3300000	0.0423	0.0256	100.1	97.91
Chr01_hap1	600001	3600000	0.0475	0.0277	100.1	97.91
Chr01_hap1	900001	3900000	0.0475	0.0285	100.1	97.91
Chr01_hap1	1200001	4200000	0.0415	0.0262	100.1	97.91

For another haplotype:

Chr	From	To	Intact	Total	raw_LAI	LAI
whole_genome	1	503369137	0.0497	0.0594	83.75	82.73
Chr01_hap2	1	3000000	0.0285	0.0232	100.1	99.08
Chr01_hap2	300001	3300000	0.0263	0.0277	94.79	93.77
Chr01_hap2	600001	3600000	0.0311	0.0299	100.1	99.08
Chr01_hap2	900001	3900000	0.0287	0.0286	100.1	99.08
Chr01_hap2	1200001	4200000	0.0287	0.0277	100.1	99.08
Chr01_hap2	1500001	4500000	0.0303	0.0293	100.1	99.08
Chr01_hap2	1800001	4800000	0.0289	0.0283	100.1	99.08
Chr01_hap2	2100001	5100000	0.0297	0.0289	100.1	99.08
Chr01_hap2	2400001	5400000	0.0197	0.0284	69.32	68.30
Chr01_hap2	2700001	5700000	0.0214	0.0315	67.83	66.81
Chr01_hap2	3000001	6000000	0.0212	0.0304	69.72	68.70

Here is the log of LAI:

######################################
### LTR Assembly Index (LAI) beta3.2 ###
######################################

Developer: Shujun Ou

Please cite:

Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730: https://doi.org/10.1093/nar/gky730

Parameters: -t 64 -genome /dssg/home/acct-jiang.lu/jiang.lu/.yaoxk_dir/genome_denovo/SH/04-asm/03-final/hap2/Vitis_amurensis_ShuangHong_hap2.fasta -intact /dssg/home/acct-jiang.lu/jiang.lu/.yaoxk_dir/genome_denovo/SH/05-ant/01-repeat/hap2/repeat_library/TE/Vitis_amurensis_ShuangHong_hap2.fasta.mod.EDTA.raw/LTR/Vitis_amurensis_ShuangHong_hap2.fasta.mod.pass.list -all /dssg/home/acct-jiang.lu/jiang.lu/.yaoxk_dir/genome_denovo/SH/05-ant/01-repeat/hap2/repeat_library/TE/Vitis_amurensis_ShuangHong_hap2.fasta.mod.EDTA.anno/Vitis_amurensis_ShuangHong_hap2.fasta.mod.out


Tue May 21 13:29:55 CST 2024	Dependency checking: Passed!
Tue May 21 13:29:55 CST 2024	Calculation of LAI will be based on the whole genome.
				Please use the -mono parameter if your genome is a recent ployploid, otherwise high identity between LTR homeologues will overcorrect raw LAI scores and result in low LAI.
Tue May 21 13:29:55 CST 2024	Estimate the identity of LTR sequences in the genome: standard mode
Tue May 21 13:35:26 CST 2024	The identity of LTR sequences: 94.3636850089836%
Tue May 21 13:35:26 CST 2024	Calculate LAI:

						Done!

Tue May 21 13:35:35 CST 2024	Result file: Vitis_amurensis_ShuangHong_hap2.fasta.mod.out.LAI

				You may use either raw_LAI or LAI for intraspecific comparison
				but please use ONLY LAI for interspecific comparison

Looking forward to you advice!

@yaoxkkkkk
Copy link
Author

I have changed the input file as .fasta.mod.EDTA.anno/.mod.EDTA.TEanno.out, here is the result file:

Chr	From	To	Intact	Total	raw_LAI	LAI
whole_genome	1	521293298	0.0515	0.0605	85.11	83.61
Chr01_hap1	1	3000000	0.0355	0.0214	100.1	98.60
Chr01_hap1	300001	3300000	0.0423	0.0235	100.1	98.60
Chr01_hap1	600001	3600000	0.0475	0.0256	100.1	98.60
Chr01_hap1	900001	3900000	0.0475	0.0267	100.1	98.60
Chr01_hap1	1200001	4200000	0.0415	0.0247	100.1	98.60
Chr01_hap1	1500001	4500000	0.0421	0.0275	100.1	98.60
Chr01_hap1	1800001	4800000	0.0422	0.0273	100.1	98.60
Chr01_hap1	2100001	5100000	0.0435	0.0293	100.1	98.60
Chr01_hap1	2400001	5400000	0.0296	0.0241	100.1	98.60
Chr01_hap1	2700001	5700000	0.0242	0.0293	82.67	81.17
Chr01_hap1	3000001	6000000	0.0294	0.0312	94.45	92.95
Chr01_hap1	3300001	6300000	0.0210	0.0304	68.95	67.45
Chr01_hap1	3600001	6600000	0.0232	0.0315	73.63	72.13
Chr01_hap1	3900001	6900000	0.0271	0.0328	82.54	81.04
Chr01_hap1	4200001	7200000	0.0275	0.0361	76.26	74.76

@yaoxkkkkk
Copy link
Author

I realised that EDTA seems to mask only long TEs (>=1 kb), maybe I should use RepeatMasker additionally to abtain the .out file?

@yaoxkkkkk
Copy link
Author

yaoxkkkkk commented May 24, 2024

OK I have tried, the results are similar... I don't know where goes wrong. I noticed that the Intact column is the same, but the Total column is slightly different.

Chr	From	To	Intact	Total	raw_LAI	LAI
whole_genome	1	521293298	0.0515	0.0682	75.46	73.50
Chr01_hap1	1	3000000	0.0355	0.0261	100.1	98.14
Chr01_hap1	300001	3300000	0.0423	0.0278	100.1	98.14
Chr01_hap1	600001	3600000	0.0475	0.0294	100.1	98.14
Chr01_hap1	900001	3900000	0.0475	0.0291	100.1	98.14
Chr01_hap1	1200001	4200000	0.0415	0.0271	100.1	98.14
Chr01_hap1	1500001	4500000	0.0421	0.0304	100.1	98.14
Chr01_hap1	1800001	4800000	0.0422	0.0303	100.1	98.14
Chr01_hap1	2100001	5100000	0.0435	0.0331	100.1	98.14
Chr01_hap1	2400001	5400000	0.0296	0.0290	100.1	98.14
Chr01_hap1	2700001	5700000	0.0242	0.0347	69.93	67.97
Chr01_hap1	3000001	6000000	0.0294	0.0364	80.82	78.86
Chr01_hap1	3300001	6300000	0.0210	0.0359	58.34	56.38
Chr01_hap1	3600001	6600000	0.0232	0.0377	61.62	59.66
Chr01_hap1	3900001	6900000	0.0271	0.0401	67.46	65.50
Chr01_hap1	4200001	7200000	0.0275	0.0432	63.75	61.79
Chr01_hap1	4500001	7500000	0.0253	0.0431	58.69	56.73

@oushujun
Copy link
Owner

Hello,

Sorry for the delayed response. Your total LTR estimation seems off. It's unlikely the total LTR content in your genome is 6.82% when you have 5.15% of the genome is intact LTR. Please double check.

Thanks!
Shujun

@yaoxkkkkk
Copy link
Author

Your total LTR estimation seems off

Should I rerun the whole pipeline?

@oushujun
Copy link
Owner

oushujun commented Aug 14, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants