Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the case when a peak overlaps with the promoter of two or more genes #73

Open
yushengak47 opened this issue Jan 21, 2021 · 1 comment

Comments

@yushengak47
Copy link

Hi,

I found that, when a peak overlaps with the promoter of two or more genes, the default settings of annotate_cds_by_site only record one of them in the 'gene' column of fData(input_cds). As a result, some genes are missing in the gene activity matrix. I have tried to set all = T when running annotate_cds_by_site, this indeed list multiple gene names in the 'gene' column. However, it seems that build_gene_activity_matrix doesn't handle it properly. The generated matrix might be redundant and problematic, for example, it has rows named "HES2,HES2,HES2,HES2", "ESPN,ESPN,HES2", et. al.

Any idea for solving the problem?

Thanks

@hpliner
Copy link
Collaborator

hpliner commented Feb 8, 2021

Hmm, this is a case that would require some modifications to fix. However I will say that the gene activity score values for two genes with the same promoter peak will be identical, so if you have a list of the sets of genes that share a promoter, you would be able to add in the appropriate rows.

I will leave this open and hopefully find time to find a solution in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants