make_Lastz on Cactus-447-mammalian-genome dataset #69

KabitaBaral1 · 2024-11-04T20:33:12Z

Hi,
I have a question regarding running LASTZ similar to what they did for the TOGA paper. In my case, I have Cactus 447 mammalian genome dataset. I converted it from Hal to fasta, removed ancestral sequences. and now I have two fasta files from that dataset: one with just human genome sequence and another with the rest 446 mammalian genomes as one fasta file. I am wondering if I can run make_lastz_chains on that query fasta file? thank you.

MichaelHiller · 2024-11-05T11:13:27Z

Good question. I think there is no point of extracting the genomic fasta seqs from the Cactus alignment and then aligning them again to human to get chains. If you want to do that, you can also just start with the full genomes of these species.

But I guess the best would be to extract pairwise alignments (in chain format) from the cactus alignment.
This should hopefully be possible, but how to do this is something that should pls be directed to Benedict Paten and the Cactus developers.

KabitaBaral1 · 2024-11-06T19:41:02Z

Hi Michael,
Thank you for getting back to me. I have a couple of follow-up questions.
I am trying to run LASTZ & then TOGA to get coordinates of protein-coding regions for all 447 mammalian genomes in the Cactus dataset.
I thought that similar to your TOGA paper, the approach would be to perform LASTZ and then TOGA on the dataset.
Is there a better way to do this? Or an alternative?
"If you want to do that, you can also start with the full genomes of these species." Could you please elaborate on this?
Thank you

MichaelHiller · 2024-11-07T06:38:54Z

Hi,

the coordinates of all orthologs that TOGA found are in the bed or gtf files we provided. If this is what you need, you don't have to run anything.

If you have new genomes, then the easiest is to align them using our lastz/chain pipeline to a reference and then running TOGA.

Hope this helps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make_Lastz on Cactus-447-mammalian-genome dataset #69

make_Lastz on Cactus-447-mammalian-genome dataset #69

KabitaBaral1 commented Nov 4, 2024

MichaelHiller commented Nov 5, 2024

KabitaBaral1 commented Nov 6, 2024

MichaelHiller commented Nov 7, 2024

make_Lastz on Cactus-447-mammalian-genome dataset #69

make_Lastz on Cactus-447-mammalian-genome dataset #69

Comments

KabitaBaral1 commented Nov 4, 2024

MichaelHiller commented Nov 5, 2024

KabitaBaral1 commented Nov 6, 2024

MichaelHiller commented Nov 7, 2024