Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine Contamination Analysis Workflow for Extra Input Resources #443

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Commits on Nov 8, 2023

  1. UPDATES TO THE Hifiasm pipeline:

      * update Hifiasm to version 0.19.5
      * update how Hifiasm outputs are compressed (bgz replacing gz), also
      * monitor hifiasm resources usage
    SHuang-Broad committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    e09dc65 View commit details
    Browse the repository at this point in the history
  2. For both CCS/ONT, update PBSV

      * update docker used in PBSV tasks to the version coming with official SMRTLink releases (2.9.0)
      * change how the 2-step PBSV process is done (following the recommended way now)
    SHuang-Broad committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    d3afc4b View commit details
    Browse the repository at this point in the history
  3. For both CCS/ONT, update Sniffles-2

      * to version 2.0.7
      * using TRF bed
      * conditionally phase sv (requires phased bam)
      * generates its own vcf.gz and tbi
    SHuang-Broad committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    3a2ac5c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    29aa964 View commit details
    Browse the repository at this point in the history
  5. MAJOR REFACTOR: UNIFY CCS/ONT WGS PIPELINE

    Overhaul how small variants are called in the WG pipelines
    
      * default to use DV to call small variants, Clair3 analysis needs to be requested explicitly
      * retire the Pepper toolchain completely from the CCS pipeline, using DV directly
      * for R10.4+ ONT data, also use DV directly
      * older ONT data would still use the PEPPER-DV-Margin pipeline
      * offers GPU version (though based on, it's not worth it yet)
      * update how bam haplotagging is done
    
    Cleanup structural variants calling
      * experiment with SNF2 phasing SV calls (implicitly depends on small variants calling now)
      * tune PBSV calling
        - discover now supports --hifi
        - output vcf.gz and tbi
        - less verbose logging by default
    
    Misc.:
      * optimizations to BAM merging and metrics workflow
      * updates coverage collection step
      * new R script to visualize log from vm_monitoring_script.sh
    SHuang-Broad committed Nov 8, 2023
    Configuration menu
    Copy the full SHA
    0262525 View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2023

  1. MISC UPDATES TO SEVERAL UTILS TASKS

      * organize dockstore.yml file a bit
    
      * make WDL validation shell script more usable
    
      * update pbmm2 and pbindex to versions in SMRTLink
    
      * update GeneralUtils.wdl
        - two bash-like new tasks [CoerceMapToArrayOfPairs, CoerceArrayOfPairsToMap]
        - cleanup task CollapseArrayOfStrings
    
      * update resource allocations to tasks
        - NanoplotFromBam (also changes docker)
        - MosDepthWGS
    SHuang-Broad committed Dec 1, 2023
    Configuration menu
    Copy the full SHA
    7be9309 View commit details
    Browse the repository at this point in the history
  2. New docker that's intended to replace lr-basic:

      * incorporates gcloud cli (not just gsutil)
      * integrate libdeflate for more speedups
    SHuang-Broad committed Dec 1, 2023
    Configuration menu
    Copy the full SHA
    1ff0912 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4224ab4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a13a541 View commit details
    Browse the repository at this point in the history
  5. significantly boost capabilities of BAMutils.wdl

    incorporate new tasks and optimize them
    
      * [CountMethylCallReads, GatherReadsWithoutMethylCalls]
        from sh_beans
    
      * [GetPileup, BamToRelevantPileup]
        from sh_more_atomic_qc
    
      * [GetReadGroupLines, GetSortOrder, SplitNameSortedUbam]
        from sh_ont_fc
    
      * [SamtoolsFlagStats, ParseFlagStatsJson]
        from sh_trvial_stats
    
      * [FilterBamByLen, InferSampleName]
        from sh_seqkit
    
      * [CountAlignmentRecords, StreamingBamErrored, CountAlignmentRecordsByFlag]
        from sh_maha_aln_metrics
    
      * [ResetSamplename]
        from sh_ingest_singlerg
    
      * [MergeBamsWithSamtools]
        from sh_ont_fc.Utils.wdl
    
      * [BamToFastq]
        from sh_more_bam_qcs
        and optimize it with
        sh_ingest_singlerg.Utils.wdl
    
    delete
      * GetSortOrder as that's now implemented in GatherBamMetadata
      * Drop2304Alignments as that's no longer used
    
    update dockers to the latest
    SHuang-Broad committed Dec 1, 2023
    Configuration menu
    Copy the full SHA
    2905c69 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    b16c619 View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2023

  1. New workflows

    CHERRY-PICK FROM VARIOUS QC/METRICS BRANCHES:
    
      * collect information about ML/MM tags in a long-read BAM
        (sh_beans)
    
      * a heuristic way to find peaks in a distribution (using dyst)
        (sh_dyst_peaker)
    
      * filter reads by length in a BAM
      * collect some read quality stats from (length-filtered) FASTQ/BAM
        (sh_seq_kit)
    
      * VerifyBamID2 (for contamination estimation)
      * naive sex-concordance check
        (sh_more_atomic_qc)
    
      * check fingerprint of a single BAM file
        (sh_sample_fp)
    
      * collect SAM flag stats
        (sh_trivial_stats)
    SHuang-Broad committed Dec 19, 2023
    Configuration menu
    Copy the full SHA
    c96ea8e View commit details
    Browse the repository at this point in the history

Commits on Dec 27, 2023

  1. Improve various existing codes

      * make BeanCounter finalization optional
        (wdl/pipelines/TechAgnostic/Utility/CountTheBeans.wdl)
      * custom struct for sub-workflow config using a JSON
        (wdl/pipelines/TechAgnostic/Utility/LongReadsContaminationEstimation.wdl)
      * make fingerprint checking subworkflow control size filtering
        (wdl/tasks/QC/FPCheckAoU.wdl)
        (wdl/pipelines/TechAgnostic/Utility/VerifyBamFingerprint.wdl)
      * fix a warning by IDE/miniwdl complaining WDL stdlib function length only applies to Array
        (wdl/tasks/Utility/BAMutils.wdl)
      * various updates to Finalize
        (wdl/tasks/Utility/Finalize.wdl)
    
    New tasks in (wdl/tasks/Utility/GeneralUtils.wdl) to
      * correctly convert Map to TSV
      * concatenate files
    SHuang-Broad committed Dec 27, 2023
    Configuration menu
    Copy the full SHA
    39d77d2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cbfed4e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8039c88 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b16155f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    5bf0b2f View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2023

  1. DEPRECATION

      * AlignAndCheckFingerprintCCS.wdl
      * CollectPacBioAlignedMetrics.wdl
      * CollectSMRTCellUnalignedMetrics.wdl
    SHuang-Broad committed Dec 28, 2023
    Configuration menu
    Copy the full SHA
    0d61935 View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2024

  1. Fix bug in deduplicating aligned ONT BAM

      (CHRRY-PICK & follow up to PR 406)
    SHuang-Broad committed Jan 2, 2024
    Configuration menu
    Copy the full SHA
    989ead1 View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2024

  1. DEPRECATION:

      * SampleLevelAlignedMetrics.wdl
      * PBCLRWholeGenome.wdl
    SHuang-Broad committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    b8e67ff View commit details
    Browse the repository at this point in the history
  2. Update utility code:

      * new struct in AlignedBamQCandMetrics.wdl to facilicate as-sub-workflow calling
      * change parameters name for fingerprint workflows
    SHuang-Broad committed Jan 3, 2024
    Configuration menu
    Copy the full SHA
    e5a79c2 View commit details
    Browse the repository at this point in the history

Commits on Jan 12, 2024

  1. a few tweaks to to AlignedBamQCandMetrics:

      * make saving of reads without methylation SAM tags optional
      * better parameter naming
    SHuang-Broad committed Jan 12, 2024
    Configuration menu
    Copy the full SHA
    b5dc978 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2024

  1. Configuration menu
    Copy the full SHA
    1abcc7a View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2024

  1. Configuration menu
    Copy the full SHA
    da5bfb0 View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2024

  1. Configuration menu
    Copy the full SHA
    7975d6d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5620d74 View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2024

  1. Configuration menu
    Copy the full SHA
    e910bac View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2024

  1. Configuration menu
    Copy the full SHA
    04b82f8 View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2024

  1. Safer and more efficient way to do targetted pileup conversion

    (affects contamination estimation)
    SHuang-Broad committed Feb 26, 2024
    Configuration menu
    Copy the full SHA
    a03ff34 View commit details
    Browse the repository at this point in the history

Commits on Feb 29, 2024

  1. Refactor LongReadsContaminationEstimation and Contamination workflows…

    … for efficiency
    
    - Remove unnecessary BED file input from LongReadsContaminationEstimation workflow as BED paths are now hardcoded in the Docker image.
    - Modify the inputs and commands in Contamination.wdl to align with new Docker setup and work with the .mu, .UD, and .bed files from the docker.
    - Adjust workflow parameters to better reflect current data processing requirements and practices.
    shadizaheri committed Feb 29, 2024
    Configuration menu
    Copy the full SHA
    28a9e68 View commit details
    Browse the repository at this point in the history
  2. Update Contamination.wdl

    Removing SVDPrefix from the command line.
    shadizaheri authored Feb 29, 2024
    Configuration menu
    Copy the full SHA
    7db20f6 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    784081a View commit details
    Browse the repository at this point in the history