-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGES.txt
158 lines (102 loc) · 6.53 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# VERSION 1.0
- extra pipelines: list wiht bpipe-config -vp
- trimmomatic for single reads (current dir and project)
# VERSION 0.8
- index html for pipeline
- force skip samplesheets check (-s)
- new pipelines: XHMM, VARIANTS from ALLELES
- send test
- branch variables:
Once set, a branch variable becomes referenceable directly by all subsequent stages that are part of the branch, including any child branches. The value can be referenced as a property on the branch object, for example 'branch.planet', but can also be referenced without the branch. prefix.
- removed /dev/shm (optional)
- check on global vars (fail if is not defined) with requires
# Version 0.6
- CENTRALIZED DEFAULT PATHS
the module: default_paths_gfu.groovy contains the SOFTWARE PATHS used currently by bpipe in ALL our modules
(location: /lustre1/tools/libexec/bpipeconfig/modules/default_paths_gfu.groovy). You can override the SOFTWARE PATHS definition in your pipeline.groovy file
- pipelines reorganized by type of input (exome, genome and rnaseq)
- Quality Control pipelines are now in subgroup “quality_control"
- exome pipelines:
* EXOME PIPELINES use TARGET_INTERVALS (nextera rapid capture targets as default)
* EXOME VARIANTS CALLING and BAM RECALIBRATION use by default HealtyExomes in /lustre1/workspace/Stupka/HealtyExomes/
* EXOME VARIANTS ANNOTATION: remove by default HealtyExomes from the vcf file
* EXOME ALIGN PROJECT now generate an HsMetrics report in the doc subdirectory (see quality_control)
- genome pipelines:
* DON’T USE TARGET INTERVALS
* VARIANTS CALLING and BAM RECALIBRATION DON’T USE by default HealtyExomes in /lustre1/workspace/Stupka/HealtyExomes/ (but can be changed in the pipeline file)
- variants annotation (exome and genome):
* upgraded dbSNFSP annotation (current: dbNSFP2.4.txt.gz)
- quality_control:
* new pipeline hsmetrics, calculate hs metrics of a set of bam files (for columns definition see: http://picard.sourceforge.net/picard-metric-definitions.shtml#HsMetrics)
* fastqc_project pipeline:
Works like the align_project pipelines, You have a raw_data directory with fastq.gz reads (after demultiplex) and a scratch directory to perform analysis:
/lustre2/raw_data/RUN_NAME/PINAME_PROJECTID_PROJECTNAME
/lustre2/scratch/PINAME/PROJECTID_PROJECTNAME
You launch the pipelines FROM the /lustre2/scratch/ directory pointing to /lustre2/raw_data/ directory:
cd /lustre2/scratch/PINAME/PROJECTID_PROJECTNAME
bpipe-config pipe fastqc_project /lustre2/raw_data/RUN_NAME/PINAME_PROJECTID_PROJECTNAME
* THIS CONFIGURATION (working dir: scratch, data dir: raw_data) avoid pollution of the raw_data dir with bpipe intermediate files (everything is done the scratch dir)
bpipe 0.9.8.6_beta_4.gfu:
- send, check and fail
bpipe-config new commands:
- project
With the project command you can launch the same project pipeline on multiple projects.
Example:
SCENARIO: You have just demultiplexed the RUN140 in /lustre2/raw_data/140211_SN859_0140_AD2A9EACXX and you want to launch the fastqc_project pipeline on all the Projects in the RUN 140
COMMANDS:
> mkdir /lustre2/scratch/RUN140_fastqc
> cd /lustre2/scratch/RUN140_fastqc
> bpipe-config project fastqc_project /lustre2/raw_data/140211_SN859_0140_AD2A9EACXX/Project_*
> ./runner.sh
A fastqc project pipeline will be launched for each Project dir in the RUN140
- smerge
This command with this cool name merge SampleSheet.csv from different projects
# Version 0.5
More pipelines:
- soapsplice_submit_project IOS 009
- exome_variants_calling IOS 005
- genome_variants_calling IOS 015
- variants_annotation IOS 005
* prototype for medip pipeline
The new fastqc project pipeline generate a better fastqc, with an index page to navigate samples:
Example: /lustre1/QC-Illumina/140211_SN859_0140_AD2A9EACXX/Stupka_10_RareDisease_fastqc_report/index.html
# Version 0.4
* PROJECT PIPELINES:
More or less is like the One Ring of Sauron: one bpipe to run them all.
in bpipe-config -p (pipelines list) you will notice a new category: illumina_projects.
There you will find PROJECT pipepelines that parallelize on multiple samples.
If classic pipelines create a bpipe instance for each sample, the project pipelines create a single JVM and bpipe instance.
Usage example in our server:
1. You have some raw_data located in:
/lustre2/raw_data/RUN_NAME/PINAME_PROJECTID_PROJECTNAME
Es: /lustre2/raw_data/131212_SN859_0138_AC2PGHACXX/Project_Kajaste_80_LVTranscription
2. You have to make analysis and generated intermediate file in:
/lustre2/scratch/PINAME/PROJECTID_PROJECTNAME
Es: /lustre2/scratch/Kajaste/80_LV_Transcription
3. Enter "scratch dir" for your project:
cd /lustre2/scratch/Kajaste/80_LV_Transcription
4. Run bpipe-config from THERE pointing to samples in /lustre2/raw_data
bpipe-config pipe bwa_submit_project /lustre2/raw_data/131212_SN859_0138_AC2PGHACXX/Project_Kajaste_80_LVTranscription/Sample_*
bpipe run -r bwa_submit_project.groovy /lustre2/raw_data/131212_SN859_0138_AC2PGHACXX/Project_Kajaste_80_LVTranscription/Sample_*
5. Project pipelines creates dirs in scratch for samples and link fastq files from original directory to scratch dir. By default output merged BAMs are in moved in directory "BAM"
* PIPES: pretend mode
Due to compatibility issues with bpipe, I have changed the "test mode" in "pretend mode". In order to launch a whole pipeline in pretend mode (just create empty file locally, without launching any bioinformatics software) use:
bpipe run -p pretend=true pipeline.groovy inputs.*
* command config: generate a bpipe.config resource file.
1. Now don't require a SampleSheet.csv or a Project name
by default the project name is USERNAME_DIRECTORY.
Check of Project name still active for command pipe.
2. Email Notification:
now if your username is in a configuration list (config/email_list.groovy),
bpipe will send you email notification when the pipeline is complete
* command pipe: generate a pipeline
1. PROJECT PIPELINES:
SampleSheet.csv automatically created in project directory using info (validated) from sample dirs
3. PIPELINE INFO:
Automatically generated in doc/pipeline_name.html
* command jvm: full info on JAVA VIRTUAL MACHINE CONFIGURATION
* command clean: remove .bpipe dirs from current dir or args (don't remove pipeline files)
* Logging:
1. verbose option (less verbose by default)
* SampleSheet Vaidations (validateSample)
1. Better validation for each sample in SampleSheet.csv during parsing