Skip to content

Commit

Permalink
Reorganize readme (#18)
Browse files Browse the repository at this point in the history
  • Loading branch information
giang-nghg authored Apr 9, 2020
1 parent 0b767ce commit 5819787
Showing 1 changed file with 22 additions and 22 deletions.
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,23 +28,17 @@ will produce this output line:

*NC_000962.3 11 . A C . **.** . GT:DP:COV:GT_CONF:GT_CONF_PERCENTILE:**FT** 0/0:6:6,0:73.54:0.74:**MIN_GCP** 0/0:3:3,0:36.98:0.01:**MIN_DP;MIN_GCP***

## 4. How `batch_size` affect computation resources and performance

* Total memory usage will not exceed `batch_size * size(one line of input VCFs)`.
* Batch size equals the number of files the utility will open at once.
* Bigger batch size will reduce the total time taken, but requires more memory and file handles from the OS.

## 5. Usage
## 4. Usage

You can use the utility as either:

* [A Python library](#python-usage)
* [A Python script (no installation)](#cli-usage-no-install)
* [A Python script (installed in your environment)](#cli-usage-install)

### <a name="python-usage">5.1 In Python</a>
### <a name="python-usage">4.1 In Python</a>

#### 5.1.1 If the number of input files is small (can be opened all at once)
#### 4.1.1 If the number of input files is small (can be opened all at once)

```python
from contextlib import ExitStack
Expand All @@ -59,7 +53,7 @@ with ExitStack() as stack:
ivcfmerge(files, outfile)
```

#### 5.1.2 If the number of input files is big (cannot be opened all at once)
#### 4.1.2 If the number of input files is big (cannot be opened all at once)

```python
from ivcfmerge import ivcfmerge_batch
Expand All @@ -71,7 +65,7 @@ batch_size = 1000 # How many files to open and merge at once
ivcfmerge_batch(filenames, output_path, batch_size)
```

##### 5.1.2.1 You may also need to specify a temporary directory
##### 4.1.2.1 You may also need to specify a temporary directory

That has at least as much space as that occupied by the input files to store intermediate results, in the batch processing version.

Expand All @@ -82,9 +76,9 @@ temp_dir = '...' # for example, a directory on a mounted disk like /mnt/big_dis
ivcfmerge_batch(filenames, output_path, batch_size, temp_dir)
```

### <a name="cli-usage-no-install">5.2 Command line interface (no installation)</a>
### <a name="cli-usage-no-install">4.2 Command line interface (no installation)</a>

#### 5.2.1 If the number of input files is small (can be opened all at once)
#### 4.2.1 If the number of input files is small (can be opened all at once)

```shell script
# Prepare a file of paths to input VCF files
Expand All @@ -96,7 +90,7 @@ ivcfmerge_batch(filenames, output_path, batch_size, temp_dir)
> python3 ivcfmerge.py input_paths.txt path/to/output/file
```

#### 5.2.2 If the number of input files is big (cannot be opened all at once)
#### 4.2.2 If the number of input files is big (cannot be opened all at once)

```shell script
# Prepare a file of paths to input VCF files
Expand All @@ -108,7 +102,7 @@ ivcfmerge_batch(filenames, output_path, batch_size, temp_dir)
> python3 ivcfmerge_batch.py --batch-size 1000 input_paths.txt path/to/output/file
```

##### 5.2.2.1 You may also need to specify a temporary directory
##### 4.2.2.1 You may also need to specify a temporary directory

That has at least as much space as that occupied by the input files to store intermediate results, in the batch processing version.

Expand All @@ -118,33 +112,39 @@ That has at least as much space as that occupied by the input files to store int
> python3 ivcfmerge_batch.py --batch-size 1000 --temp-dir /path/to/tmp/dir input_paths.txt path/to/output/file
```

### <a name="cli-usage-install">5.3 Command line interface (with installation)</a>
### <a name="cli-usage-install">4.3 Command line interface (with installation)</a>

```shell script
pip3 install .
ivcfmerge -h
ivcfmerge_batch -h
```

All CLI arguments & options are the same as described in [5.2](#cli-usage-no-install), i.e. just replace `python3 ivcfmerge.py` with `ivcfmerge`, similarly for the batch version.
All CLI arguments & options are the same as described in [4.2](#cli-usage-no-install), i.e. just replace `python3 ivcfmerge.py` with `ivcfmerge`, similarly for the batch version.

## 6. Important parameters
## 5. Important parameters

### 6.1 `batch_size`
### 5.1 `batch_size`

Indicates how many files to open and merge each batch, for the batch processing version.

The default value for this parameter is 1000.

### 6.2 `temp_dir`
#### 5.1.1 How `batch_size` affect computation resources and performance

* Total memory usage will not exceed `batch_size * size(one line of input VCFs)`.
* Batch size equals the number of files the utility will open at once.
* Bigger batch size will reduce the total time taken, but requires more memory and file handles from the OS.

### 5.2 `temp_dir`

For the batch processing version, the utility needs to store the intermediate results somewhere with as much space as the total space occupied by the input files.

By default, the choice is left to the [tempfile](https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryFile) library. On Unix/Linux, this is usually `/tmp`.

## 7. Development
## 6. Development

### 7.1 Running tests
### 6.1 Running tests

```shell script
pip3 install -r requirements/dev.txt
Expand Down

0 comments on commit 5819787

Please sign in to comment.