Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulated dataset for tests with run-example.sh #14

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
8a4d7e8
Merge branch 'develop'
andrej-fischer May 29, 2014
7e31116
mod run-example.sh
andrej-fischer May 29, 2014
ee22ecf
Readme: Fix publication link
juliangehring Jun 11, 2014
424ae5f
Merge branch 'julian-gehring-patch-1'
andrej-fischer Jun 12, 2014
b3fea7e
bug-001: seg-fault; edge case with a single SNV in a chromosome
andrej-fischer Jun 20, 2014
e72c4fc
Merge branch 'bug-001'
andrej-fischer Jun 20, 2014
df73d86
added screenshots
andrej-fischer Jul 2, 2014
3daf0cb
mod screenshots
andrej-fischer Jul 2, 2014
cd79a07
mod Readme
andrej-fischer Jul 23, 2014
b3103b9
mod Readme
andrej-fischer Jul 23, 2014
f1157c2
mod Readme
andrej-fischer Jul 23, 2014
3b2c7c3
mod README
andrej-fischer Jul 28, 2014
7779405
mod README
andrej-fischer Jul 28, 2014
d3c3c40
mod README
andrej-fischer Jul 28, 2014
63027fa
mod gitignore
andrej-fischer Aug 5, 2014
b94a76f
bug fixed on initial data set
andrej-fischer Aug 5, 2014
483c4a4
Merge branch 'bugfix-issue-7'
andrej-fischer Aug 6, 2014
fc74175
mod README
ivazquez Jun 19, 2015
2e8bcce
Compiler for compatibility with brew formula
ivazquez Feb 19, 2016
c30f5f1
Remove hard links for GSL library
ivazquez Mar 7, 2016
f1f1e65
Remove hard links for GSL library
ivazquez Mar 7, 2016
e5a43e3
Merge branch 'smchet' of https://github.com/ivazquez/cloneHD into smchet
ivazquez Mar 8, 2016
cac4b88
Simulated dataset for tests with run-example.sh
ivazquez Mar 18, 2016
fb172e9
Merge branch 'smchet' into develop
ivazquez Mar 18, 2016
ea35aa7
Test compiler version for brew installation
ivazquez Mar 18, 2016
1b7edc1
Travis CI for continuous integration
ivazquez Apr 21, 2016
9fd4cd5
gcc-4.8 is now preinstalled on Travis CI (+8 squashed commits)
ivazquez Apr 21, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
run-bulk*
run-test*
*.[oa]
build/*
build-*tar.gz
Expand Down
32 changes: 32 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Control file for continuous integration testing at http://travis-ci.org/
language: cpp

# Compiler selection
compiler: gcc

# VM selection
os:
- linux
- osx

# Replace gcc on osx (default is a front-end for LLVM) and gsl on osx and linux
before_install:
- if [ "${TRAVIS_OS_NAME}" = "linux" ]; then
sudo apt-get install gsl-bin libgsl0-dev;
fi
- if [ "${TRAVIS_OS_NAME}" = "osx" ]; then
export CC=gcc-4.8; export CXX=g++-4.8;
brew update;
brew install gsl;
fi

# Run Makefile
script:
- mkdir -p build
- cd src
- make

# Only watch the master branch
branches:
only:
- master
42 changes: 34 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
The current stable release, as well as pre-compiled executable binaries
for Mac OS X and GNU Linux (64bit), can be found [here](https://github.com/andrej-fischer/cloneHD/releases). The cloneHD software is undergoing rapid development. Watch/Star this repo to receive updates.

[![Build Status](https://travis-ci.org/ivazquez/cloneHD.svg)](https://travis-ci.org/ivazquez/cloneHD)

# Run a test with simulated data

After downloading cloneHD from the release site, you can test both filterHD and cloneHD by running
Expand Down Expand Up @@ -69,18 +71,43 @@ prediction (red).

# Tips and tricks

* Note: all input files are assumed to be sorted by genomic coordinate. With Unix, this
can be guaranteed with `sort -k1n,1 -k2n,2 file.txt > sorted-file.txt`.
* The read depth input files for filterHD and cloneHD can be generated from a bam-file with samtools. Given a bed file of non-overlapping 1kb windows, e.g.

human-genome.1kb-grid.bed :

1 9000 10000
1 10000 11000
...

`$ samtools bedcov human-genome.1kb-grid.bed sample.bam > read-depth.sample.txt`

read-depth.sample.txt :

1 9000 10000 12009
1 10000 11000 213557
...

`$ awk '{print $1,$3,int(0.5+$4/1000.0),1}' read-depth.sample.txt > read-depth.sample.cloneHD.txt`

read-depth.sample.cloneHD.txt :

1 10000 12 1
1 11000 214 1
...

For 10 kb windows, you would use `$ awk '{print $1,$3,int(0.5+$4/10000.0),10}` etc. For windows smaller than 1kb, the observations might not be approximately independent.

* All input files are assumed to be sorted by chromosome and genomic coordinate. With Unix, this can be achieved with `sort -k1n,1 -k2n,2 file.txt > sorted-file.txt`.

* Pre-filtering of data can be very important. If filterHD predicts
* Pre-filtering of data is very important. If filterHD predicts
many more jumps than you would expect, it might be necessary to
pre-filter the data, removing variable regions, outliers or very short
segments (use programs the `pre-filter` and `filterHD`).
segments (use programs `pre-filter` and `filterHD`).

* Make sure that the bias field for the tumor CNA data is
meaningful. If a matched normal sample was sequenced with the same
pipeline, its read depth profile, as predicted by filterHD, can be used as a
bias field for the tumor CNA data. Follow the logic of the example
bias field for the tumor CNA data. Follow the logic of the example data
given here.

* If the matched-normal sample was sequenced at lower coverage than the tumor sample,
Expand All @@ -104,8 +131,7 @@ prediction (red).
* If high copy numbers are expected only in a few chromosomes, you can increase performance
by using the `--max-tcn [file]` option to specify per-chromosome upper limits.

* For exome sequencing data, the read depth bias can be enormous. The filterHD estimate
of the bias field might not be very useful, especially in segmenting the tumor data.
* For exome sequencing data, the read depth bias can be enormous. The filterHD estimate of the bias field might not be very useful, especially in segmenting the tumor data.
Use rather, if available, the jumps seen in the BAF data for both CNA and BAF data
(give the BAF jumps file to both `--cna-jumps` and `--baf-jumps`).

Expand All @@ -114,4 +140,4 @@ prediction (red).
The cloneHD and filterHD software is free under the GNU General Public License v3.
If you use this software in your work, please cite the accompanying publication:

Andrej Fischer, Ignacio Vazquez-Garcia, Christopher J.R. Illingworth and Ville Mustonen. High-definition reconstruction of subclonal composition in cancer. Cell Reports (2014), http://dx.doi.org/ 10.1016/j.celrep.2014.04.055
Andrej Fischer, Ignacio Vázquez-García, Christopher J.R. Illingworth and Ville Mustonen. High-definition reconstruction of clonal composition in cancer. Cell Reports **7 (5)**, 1740-1752 (2014). [DOI: 10.1016/j.celrep.2014.04.055](http://dx.doi.org/10.1016/j.celrep.2014.04.055).
4 changes: 4 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# changelog for cloneHD/filterHD

## v1.17.9 / to come

* bug fix for sparse data with singletons in a chr (bug-001)

## v1.17.8 / 29.05.2014

* added checks whether files are open for writing
Expand Down
29 changes: 29 additions & 0 deletions docs/README-cloneHD.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,3 +242,32 @@ The grid sizes for the pre-computed emission probabilities if fuzzy data segment
* `--cna-grid [int:300]`
* `--baf-grid [int:100]`
* `--snv-grid [int:100]`

# Program output

cloneHD generates a number of output files automatically. Here, we provide annotated screenshots for the most important of them for the simulated example data set.

## STDOUT

![stdout1](/images/screenshots/cloneHD-stdout-1.png "cloneHD stdout 1")
![stdout2](/images/screenshots/cloneHD-stdout-2.png "cloneHD stdout 2")

## Output files

### The inference summary
![summary](/images/screenshots/cloneHD-summary.png "cloneHD summary")

### The posterior distribution over all copy number/genotype states
![posterior](/images/screenshots/cloneHD-posterior.png "cloneHD posterior")

### The posterior distribution for each subclone separately
![subclone](/images/screenshots/cloneHD-subclone.png "cloneHD subclone")

### The mean total copy number per segment
![meantcn](/images/screenshots/cloneHD-mean.png "cloneHD mean-tcn")

### The available genotype per segment
![avail](/images/screenshots/cloneHD-avail.png "cloneHD avail-cn")



29 changes: 24 additions & 5 deletions docs/README-filterHD.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

* `--data [file]` Input data.

The file format is the same as below for `--cna`, `--baf` or
`--snv`. Multiple amples are processed independently, one by one.
The file format is the same as in cloneHD for `--cna`, `--baf` or
`--snv` (see [here](./README-cloneHD.md)). Multiple samples are processed independently, one by one.

* `--mode [1/2/3/4]` Emission modes.

Expand Down Expand Up @@ -36,7 +36,7 @@

## Parameter options

The HMM underlying filterHD is determined by these four global
The continuous state space HMM underlying filterHD is determined by the following global
parameters. They can all be fixed, otherwise they are learned from the data.

* `--jump [double]` Fix the jump probability per length unit (bp).
Expand Down Expand Up @@ -76,6 +76,25 @@ local optima and want to start in the neighbourhood of a particular one.
The grid size for the internal representation of continuous distributions. For large ranges in
mode 3/4, it can make sense to increase this resolution.

## filterHD output files
# filterHD output

TBD
filterHD generates a few output files automatically. Here, we provide annotated screenshots for them for the simulated example data set.

## STDOUT

![stdout](/images/screenshots/filterHD-stdout.png "filterHD stdout")

## Output file

![posterior1](/images/screenshots/filterHD-posterior-1.png "filterHD posterior")

The posterior mean value of the hidden emission rate and jump probabilities

![posterior2](/images/screenshots/filterHD-posterior-2.png "filterHD posterior")

The same as above, but here a bias (normal) was used, so the rate is scaled accordingly. Note: in filterHD, the bias field is not scaled to have mean 1!


![posterior3](/images/screenshots/filterHD-posterior-3.png "filterHD posterior")

The same as above, but here the whole posterior distribution was requested with `--dist 1`
Binary file added images/screenshots/cloneHD-avail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/cloneHD-mean.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/cloneHD-posterior.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/cloneHD-stdout-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/cloneHD-stdout-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/cloneHD-subclone.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/cloneHD-summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/filterHD-posterior-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/filterHD-posterior-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/filterHD-posterior-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshots/filterHD-stdout.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 4 additions & 6 deletions run-example.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ part=$1
# input data
data="./test/data/"
results="./test/results/"
#data="/Users/af7/Projects/cloneHD/Test/data/"
#results="/Users/af7/Projects/cloneHD/Test/results/"
filterHD="./build/filterHD"
cloneHD="./build/cloneHD"

Expand All @@ -36,29 +34,29 @@ then
# The normal read depth is analysed to estimate the technical read depth modulation. This will be later used to account
# for the bias field in cloneHD. In principal, jumps are not expected (so could set --jump 0). The simulations do not have
# random emissions.
cmd="$filterHD --data $normalCNA --mode 3 --pre ${results}/normal.cna --rnd 0"
cmd="$filterHD --data $normalCNA --mode 3 --pre ${results}/normal.cna"
echo $cmd
$cmd
echo

# The tumor read depth is first analysed without bias to get a benchmark for the LLH value. The result will not be used later.
# In the tumor data, we do expect jumps, but we actually would like to learn the jumps only accounting for the bias field (below).
cmd="$filterHD --data $tumorCNA --mode 3 --pre ${results}/tumor.cna --rnd 0"
cmd="$filterHD --data $tumorCNA --mode 3 --pre ${results}/tumor.cna"
echo $cmd
$cmd
echo

# The tumor read depth is now analysed with the bias field from the matched normal. The diffusion constant is set to zero.
# If left free, it should converge to a very small value. The jump rate could be slightly higher. The LLH should be higher than
# for the run above indicating the presence of the bias field. Now we are interested in the jumps.
cmd="$filterHD --data $tumorCNA --mode 3 --pre ${results}/tumor.cna.bias --bias $bias --sigma 0 --rnd 0 --jumps 1"
cmd="$filterHD --data $tumorCNA --mode 3 --pre ${results}/tumor.cna.bias --bias $bias --sigma 0 --jumps 1"
echo $cmd
$cmd
echo

# The tumor BAF data is analysed, mainly to get the emission parameters (shape, rnd) and jumps. In principle, there could be jumps
# visible in the BAF data, but not in the read depth (copy number neutral LOH within chromosomes). Diffusion should be switched off.
cmd="$filterHD --data $tumorBAF --mode 1 --pre ${results}/tumor.baf --sigma 0 --jumps 1 --reflect 1 --dist 1 --rnd 0"
cmd="$filterHD --data $tumorBAF --mode 1 --pre ${results}/tumor.baf --sigma 0 --jumps 1 --reflect 1 --dist 1"
echo $cmd
$cmd
echo
Expand Down
8 changes: 4 additions & 4 deletions src/Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
CC = g++-mp-4.7
GSL = /opt/local/lib/libgsl.a /opt/local/lib/libgslcblas.a
CC_FLAGS = -Wall -O3 -static-libgcc -static-libstdc++ -fopenmp -DHAVE_INLINE
CC = g++
GSL = `gsl-config --cflags --libs`
CC_FLAGS = -Wall -O3 -static-libgcc -static-libstdc++ -fopenmp -DHAVE_INLINE
LD_FLAGS = ${GSL} -fopenmp
prefobj = emission.o common-functions.o
filterHDobj = emission.o common-functions.o jump-diffusion.o minimization.o
Expand All @@ -17,7 +17,7 @@ cloneHD: $(cloneHDobj) cloneHD.o
$(CC) $(CC_FLAGS) $^ -o ../build/cloneHD $(LD_FLAGS)
%.o: %.cpp
$(CC) $(CC_FLAGS) -c $< -o $@
$(cloneobj): clone.h# $(cloneobj:.o=.cpp)
$(cloneobj): clone.h $(cloneobj:.o=.cpp)
$(filterHDobj) $(cloneHDobj) $(prefobj): $(.TARGET:.o=.h)
clean:
rm -f ./*.o
15 changes: 12 additions & 3 deletions src/clone.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,18 @@ void Clone::allocate( Emission * cna, Emission * baf, Emission * snv, const char
Clone::set_normal_copy(chr_fn);
//all chr present
chrs.clear();
if (cnaEmit->is_set) for (int s=0;s<cnaEmit->nSamples;s++) chrs.insert(cnaEmit->chr[s]);
if (bafEmit->is_set) for (int s=0;s<bafEmit->nSamples;s++) chrs.insert(bafEmit->chr[s]);
if (snvEmit->is_set) for (int s=0;s<snvEmit->nSamples;s++) chrs.insert(snvEmit->chr[s]);
if (cnaEmit->is_set){
for (int s=0;s<cnaEmit->nSamples;s++)
chrs.insert(cnaEmit->chr[s]);
}
if (bafEmit->is_set){
for (int s=0;s<bafEmit->nSamples;s++)
chrs.insert(bafEmit->chr[s]);
}
if (snvEmit->is_set){
for (int s=0;s<snvEmit->nSamples;s++)
chrs.insert(snvEmit->chr[s]);
}
if (cnaEmit->is_set){
mass = gsl_vector_alloc(nTimes);
log_mass = gsl_vector_alloc(nTimes);
Expand Down
16 changes: 12 additions & 4 deletions src/emission.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,15 +136,15 @@ void Emission::map_idx_to_Event(Emission * Emit, int sample){
locus = loci[sample][idx];
}
for ( Event = 1; Event < Emit->nEvents[Sample]; Event++){
if (idx == nSites[sample]) break;
Idx = Emit->idx_of_event[Sample][Event];
Locus = Emit->loci[Sample][Idx];
while(locus < Locus){
Event_of_idx[sample][idx]= Event - 1;
idx++;
if (idx == nSites[sample]) break;
locus = loci[sample][idx];
}
if (idx == nSites[sample]) break;
}
}
while( idx < nSites[sample]){//right over-hang
Event_of_idx[sample][idx] = Emit->nEvents[Sample] - 1;
Expand Down Expand Up @@ -932,7 +932,13 @@ void Emission::coarse_grain_jumps( int sample, double min_jump, int range){
//get global minimum...
first_it = remaining.begin();
last_it = remaining.end();
first_it++;
if (first_it==last_it){//no jumps whatsoever!
for (int i=0; i<L; i++) pjump[sample][i] = cg_pjump[i];
delete [] cg_pjump;
remaining.clear();
idxOf.clear();
return;
}
it = std::min_element( first_it, last_it, value_comparer);
double gmin = it->second;
double p=0,p0=0,pmin=0,ps=0;
Expand All @@ -950,7 +956,9 @@ void Emission::coarse_grain_jumps( int sample, double min_jump, int range){
while( it != remaining.end() ){
it++;
p = it->second;
if ( p>10.0*pmin || it->first != last_it->first+1 || p<10.0*gmin) done=1;
if ( p>10.0*pmin || it->first != last_it->first+1 || p<10.0*gmin){
done = 1;
}
else if (incl < range){
ps *= 1.0 - p;
incl++;
Expand Down
Loading