NUL (\x00, ^@) and other control characters in output #107

Kaddea · 2024-06-27T10:10:18Z

Hi,

I've using the bam_readcount wrapper "mgibio/bam_readcount_helper-cwl". The output files (snv or indel) contain control characters which cannot be processed by the vcf_readcount_annotator.

Which substitution of the control characters are suitable for further processing?

Variation (vcf)
20 405939 . TTTC T . weak_evidence AS_FilterStatus=weak_evidence;AS_SB_TABLE=0,0|0,0;DP=1;ECNT=1;GERMQ=23;MBQ=0,32;MFRL=0,204;MMQ=60,60;MPOS=43;POPAF=7.3;TLOD=4.21;CSQ=-|upstream_gene_variant|MODIFIER|RBCK1|ENSG00000125826|Transcript|ENST00000356286.10|protein_coding|||||||||||2357|1||HGNC|HGNC:15864|1||| GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/1:0,1:0.667:1:0,1:0,0:0,1:0,0,1,0

bam_readcount output (indel)
20 405940 N 1 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 -^@^@^@:1:255.00:0.00:0.00:1:0:0.88:0.03:0.00:1:0.42:101.00:0.42

chrisamiller · 2024-06-27T13:48:20Z

Weird. We have processed lots of bams through this type of workflow and I've never seen anything like that. Happy to take a look though. Can you provide a tiny example bam with the steps needed to recreate the problem?

Kaddea · 2024-07-08T12:02:31Z

Thanks for your help!!
I've cropped one of the bam files and the corresponding vcf file (both from RNAseq reads) to reproduce the readcount output files.
The strange characters in the output files appear only from column 11 on, and it seems only at sites with varying deletions (2-5 bases).
The files (bam, vep-annotated vcf and the snv/indel tsv) can be downloaded from
https://kaddea.com/s/J76BAJsg4d5zytN (approx. 45 MB)
Sequence alignment and variant analysis based on Ensembl GRCh38, release 110.

Best,
Mathias

chrisamiller · 2024-07-15T18:20:31Z

Thank you. Can you also provide the exact commands that were used, along with software versions, etc - just trying to reproduce it on our end here.

Kaddea · 2024-07-31T13:13:00Z

read_count_pipeline.txt
Hmmm ... the attached file indicates the steps for alignment, variant calling, annotation and preparation for the read counts. I've omitted the mandatory parameters (like input/output, etc.). Hope it helps ...
btw.: truncating the read-count output files to the first 10 columns helps to proceed with the vcf annotation, but I'm not sure about the validity of the resulting files ...
Mathias

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NUL (\x00, ^@) and other control characters in output #107

NUL (\x00, ^@) and other control characters in output #107

Kaddea commented Jun 27, 2024

chrisamiller commented Jun 27, 2024

Kaddea commented Jul 8, 2024 •

edited

Loading

chrisamiller commented Jul 15, 2024

Kaddea commented Jul 31, 2024

NUL (\x00, ^@) and other control characters in output #107

NUL (\x00, ^@) and other control characters in output #107

Comments

Kaddea commented Jun 27, 2024

chrisamiller commented Jun 27, 2024

Kaddea commented Jul 8, 2024 • edited Loading

chrisamiller commented Jul 15, 2024

Kaddea commented Jul 31, 2024

Kaddea commented Jul 8, 2024 •

edited

Loading