Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build 37 LOKI #16

Open
katiecardone26 opened this issue Oct 23, 2024 · 3 comments
Open

build 37 LOKI #16

katiecardone26 opened this issue Oct 23, 2024 · 3 comments
Labels
3.0.0 enhancement New feature or request

Comments

@katiecardone26
Copy link

Is your feature request related to a problem? Please describe.
Biofilter is compatible with both b37 and b38 genomes. However, the b37 version of LOKI on the LPC is from 2014, whereas the b38 version is from 2022. I think there should be a more up to date version of b37 LOKI.

Describe the solution you'd like
I think there should be a more up to date b37 version of LOKI. I think this could be achieved by using liftover to convert the b38 version to b37.

@AndreRico
Copy link
Collaborator

Hi @katiecardone26 , would you like to discuss this point in our next topic meeting? We would implement it in version 2.4.3?

@katiecardone26
Copy link
Author

I think it would be helpful to implement in version 2.4.3, because I believe b37 annotations can be queried with the existing databases. I don't think I am on the biofilter/loki project meetings but feel free to bring up this point at the next meeting.

@van-truong van-truong added enhancement New feature or request 3.0.0 labels Nov 12, 2024
@AndreRico
Copy link
Collaborator

Hi @katiecardone26!

The latest version of the LOKI database is built on GRCh38, while I understand that you intend to provide positions from a different assembly (e.g., GRCh37). Is this correct?

To address this, I created a test scenario where a different assembly is provided as input than the one the LOKI data file was built on. I observed that Biofilter compares the assembly version of the LOKI file with the one specified in the arguments (--ucsc-build-version, e.g., "19"). If a mismatch is detected, a conversion table (e.g., from GRCh37 to GRCh38) is used to convert the input positions, and all queries are processed using the converted coordinates.

The output will retain the original positions as labels (e.g., GRCh37/hg19), while the results will be aligned with the GRCh38 coordinates.

I have documented this process in more detail here: (https://github.com/RitchieLab/biofilter/tree/development/tests/issues/l16_build_37_loki)

Does this approach address your requirements? If not, I’d be happy to discuss alternative solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0.0 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants