Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull additional metadata as header file #9

Closed
priyanka-surana opened this issue Jun 27, 2023 · 1 comment · Fixed by #21
Closed

Pull additional metadata as header file #9

priyanka-surana opened this issue Jun 27, 2023 · 1 comment · Fixed by #21
Assignees
Labels
feature Requests for new features user request Requests made by users and public

Comments

@priyanka-surana
Copy link
Contributor

priyanka-surana commented Jun 27, 2023

Description of feature

Include additional metadata in aligned file headers. Requested by Shane.

I would like/expect some additional information in the @SQ headers, especially if these are to go to the ENA.

  • SP (species)
  • AS (assembly identifier) - the GCA accession
  • AN (alternate name) - the chromosome names that have been assigned, so that we can see, for example, which chromosomes are X/Y/W/Z/MT etc. I would argue that these should be the primary names and the INSDC accessions the alternates, but that would probably have to be discussed.
  • I would also suggest pointing to the NCBI version of the assembly in the UR tag rather than a location on Sanger disk - unreadable outside of Sanger and exposes our directory structure to the outside.
@priyanka-surana priyanka-surana added enhancement Improvement of the existing features feature Requests for new features labels Jun 27, 2023
@muffato muffato transferred this issue from sanger-tol/readmapping Jul 12, 2023
@muffato muffato linked a pull request Sep 21, 2023 that will close this issue
9 tasks
@muffato muffato removed a link to a pull request Sep 21, 2023
9 tasks
@muffato muffato removed the enhancement Improvement of the existing features label Jun 1, 2024
@muffato muffato changed the title Include additional metadata in aligned file headers Pull additional metadata as header file Jun 1, 2024
@muffato muffato added the user request Requests made by users and public label Jun 1, 2024
@tkchafin
Copy link
Contributor

Posting link for reference on expected BAM header fields for ENA submission

@HD	VN:1.4	GO:none	SO:coordinate
@SQ	SN:2L	LN:49364325	UR:http://www.vectorbase.org/content/anopheles-gamb
iae-pestchromosomesagamp3fagz	AS:AgamP3	M5:a4da4bafa82830c0a418c5a42138377b
	SP:Anopheles gambiae

This was referenced Jun 26, 2024
@tkchafin tkchafin linked a pull request Jun 27, 2024 that will close this issue
9 tasks
@muffato muffato moved this from Todo to In Progress in Genome After Party Jul 3, 2024
@tkchafin tkchafin closed this as completed Jul 8, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Genome After Party Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Requests for new features user request Requests made by users and public
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants