Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with 32mers #5

Open
7PintsOfCherryGarcia opened this issue Oct 21, 2021 · 0 comments
Open

Problem with 32mers #5

7PintsOfCherryGarcia opened this issue Oct 21, 2021 · 0 comments

Comments

@7PintsOfCherryGarcia
Copy link

7PintsOfCherryGarcia commented Oct 21, 2021

Hi @lh3

I noticed a little problem when counting 32 mers.

Given the sequence:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAT

    ./kc-c1 -k 32 seq.fa | head
    1       0
    2       1
    3       0
    4       0
    5       0
    6       0
    7       0
    8       0
    9       0
    10      0

There are two unique kmers in this sequence but kc-c1 is reporting only one.

After inspecting your code I noticed:

mask = (1ULL<<k*2) - 1

Which is used to extract the k*2 bits corresponding with k bases.

Testing how mask is set I got:

//mask.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(int argc, char *argv[]) {
    int k = atoi(argv[1]);
    uint64_t mask = (1ULL<<k*2) - 1;
    fprintf(stderr, "Mask: %lx\n", mask);
}

gcc -o mask mask.c
./mask 31
Mask: 3fffffffffffffff //0011111111111111111111111111111111 AOK
./mask 32
Mask: 0 // Not AOK, expected  ffffffffffffffff

When counting 32 mers, the 0 mask, hashes all kmers to the same value (0)

I understand we should be able to count up to 32mers. Is the limit 31 in reality and I misunderstood the "up to"?

I also don't get why

mask = (1ULL<<k*2)

sets mask to 1 instead of 0 when k == 32

EDIT To my last point.

It seems that shift operations with values greater than or equal to the size of the type is undefined behavior. Which explains why "<<32*2" would not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant