Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meanings of fields of output paf file. #69

Closed
Dentalium opened this issue Sep 21, 2024 · 2 comments
Closed

Meanings of fields of output paf file. #69

Dentalium opened this issue Sep 21, 2024 · 2 comments

Comments

@Dentalium
Copy link

Hi, firstly i'd like to thank you for developing this tool!

I am a little confused about the meaning of fields in the PAF format output by the software. According to paf , column 10 should represent the number of matched bases, while column 11 indicates the total alignment length. However, in my result, the number of matches is significantly smaller than the total length. Could you please clarify the meaning of these two columns? I am using mashmap 3.1.3, and the command line is mashmap -q ptg004078l_revcomp.fa -r DWv2.1_chr2B.fa -t 120 -o DW/4078vsDW.mashmap.paf

My result looks like this:

ptg004078l      4455011 0       65000   +       LT934114.1      790338525       18792651        18856944        79      65000   18     id:f:0.983043    kc:f:1.0035
ptg004078l      4455011 65000   90000   +       LT934114.1      790338525       18877319        18899784        47      25000   15     id:f:0.968634    kc:f:0.982086
ptg004078l      4455011 90000   125000  +       LT934114.1      790338525       18918968        18953818        59      35000   18     id:f:0.982738    kc:f:1.00158
ptg004078l      4455011 125000  130000  +       LT934114.1      790338525       18980649        18985649        70      5000    17     id:f:0.981662    kc:f:0.928003
ptg004078l      4455011 130000  630000  +       LT934114.1      790338525       18995211        19496613        114     501402  28     id:f:0.998313    kc:f:0.96713
ptg004078l      4455011 630000  1160000 +       LT934114.1      790338525       19504070        20036872        127     532802  31     id:f:0.999273    kc:f:0.995263
ptg004078l      4455011 1165000 1175000 +       LT934114.1      790338525       20036919        20044611        18      10000   14     id:f:0.961334    kc:f:0.954291
ptg004078l      4455011 1175000 1190000 +       LT934114.1      790338525       20024503        20038160        107     15000   18     id:f:0.983895    kc:f:0.926864
ptg004078l      4455011 1190000 1195000 +       LT934114.1      790338525       447648420       447653420       2       5000    8      id:f:0.831914    kc:f:0.954308
ptg004078l      4455011 1195000 1205000 +       LT934114.1      790338525       20037554        20044809        41      10000   16     id:f:0.974318    kc:f:0.985881

Furthermore, what is the meaning of the two additional tags in columns 13 and 14? Does the ‘id’ tag in column 13 refer to identity of the alignment?

As a beginner, if I have misunderstood something obvious, please excuse my lack of knowledge. Thank you!

@bkille
Copy link
Contributor

bkille commented Sep 30, 2024

Hi @Dentalium, thanks for asking! Since MashMap is an "approximate" mapping method, it does not actually align any bases. When segments are not merged, the 10th column tracks how many sketched k-mers are shared between the reference and query for a segment mapping. These numbers are not updated during the merging step, though, so for the most part you can disregard them.

The id tag is the estimated identity of a mapping. The kc tag is an estimate of the "k-mer complexity." A number closer to 0.0 would mean that the mapped region has many repeated k-mers (e.g. a highly repetitive region).

Please let me know if you have any questions, thanks!

@Dentalium
Copy link
Author

Hi @Dentalium, thanks for asking! Since MashMap is an "approximate" mapping method, it does not actually align any bases. When segments are not merged, the 10th column tracks how many sketched k-mers are shared between the reference and query for a segment mapping. These numbers are not updated during the merging step, though, so for the most part you can disregard them.

The id tag is the estimated identity of a mapping. The kc tag is an estimate of the "k-mer complexity." A number closer to 0.0 would mean that the mapped region has many repeated k-mers (e.g. a highly repetitive region).

Please let me know if you have any questions, thanks!

I see. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants