From 5819787614a263a9f35fd0c247442f092ab174ff Mon Sep 17 00:00:00 2001 From: Giang Nguyen Date: Thu, 9 Apr 2020 18:16:54 +0100 Subject: [PATCH] Reorganize readme (#18) --- README.md | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index cf325db..486add8 100644 --- a/README.md +++ b/README.md @@ -28,13 +28,7 @@ will produce this output line: *NC_000962.3 11 . A C . **.** . GT:DP:COV:GT_CONF:GT_CONF_PERCENTILE:**FT** 0/0:6:6,0:73.54:0.74:**MIN_GCP** 0/0:3:3,0:36.98:0.01:**MIN_DP;MIN_GCP*** -## 4. How `batch_size` affect computation resources and performance - -* Total memory usage will not exceed `batch_size * size(one line of input VCFs)`. -* Batch size equals the number of files the utility will open at once. -* Bigger batch size will reduce the total time taken, but requires more memory and file handles from the OS. - -## 5. Usage +## 4. Usage You can use the utility as either: @@ -42,9 +36,9 @@ You can use the utility as either: * [A Python script (no installation)](#cli-usage-no-install) * [A Python script (installed in your environment)](#cli-usage-install) -### 5.1 In Python +### 4.1 In Python -#### 5.1.1 If the number of input files is small (can be opened all at once) +#### 4.1.1 If the number of input files is small (can be opened all at once) ```python from contextlib import ExitStack @@ -59,7 +53,7 @@ with ExitStack() as stack: ivcfmerge(files, outfile) ``` -#### 5.1.2 If the number of input files is big (cannot be opened all at once) +#### 4.1.2 If the number of input files is big (cannot be opened all at once) ```python from ivcfmerge import ivcfmerge_batch @@ -71,7 +65,7 @@ batch_size = 1000 # How many files to open and merge at once ivcfmerge_batch(filenames, output_path, batch_size) ``` -##### 5.1.2.1 You may also need to specify a temporary directory +##### 4.1.2.1 You may also need to specify a temporary directory That has at least as much space as that occupied by the input files to store intermediate results, in the batch processing version. @@ -82,9 +76,9 @@ temp_dir = '...' # for example, a directory on a mounted disk like /mnt/big_dis ivcfmerge_batch(filenames, output_path, batch_size, temp_dir) ``` -### 5.2 Command line interface (no installation) +### 4.2 Command line interface (no installation) -#### 5.2.1 If the number of input files is small (can be opened all at once) +#### 4.2.1 If the number of input files is small (can be opened all at once) ```shell script # Prepare a file of paths to input VCF files @@ -96,7 +90,7 @@ ivcfmerge_batch(filenames, output_path, batch_size, temp_dir) > python3 ivcfmerge.py input_paths.txt path/to/output/file ``` -#### 5.2.2 If the number of input files is big (cannot be opened all at once) +#### 4.2.2 If the number of input files is big (cannot be opened all at once) ```shell script # Prepare a file of paths to input VCF files @@ -108,7 +102,7 @@ ivcfmerge_batch(filenames, output_path, batch_size, temp_dir) > python3 ivcfmerge_batch.py --batch-size 1000 input_paths.txt path/to/output/file ``` -##### 5.2.2.1 You may also need to specify a temporary directory +##### 4.2.2.1 You may also need to specify a temporary directory That has at least as much space as that occupied by the input files to store intermediate results, in the batch processing version. @@ -118,7 +112,7 @@ That has at least as much space as that occupied by the input files to store int > python3 ivcfmerge_batch.py --batch-size 1000 --temp-dir /path/to/tmp/dir input_paths.txt path/to/output/file ``` -### 5.3 Command line interface (with installation) +### 4.3 Command line interface (with installation) ```shell script pip3 install . @@ -126,25 +120,31 @@ ivcfmerge -h ivcfmerge_batch -h ``` -All CLI arguments & options are the same as described in [5.2](#cli-usage-no-install), i.e. just replace `python3 ivcfmerge.py` with `ivcfmerge`, similarly for the batch version. +All CLI arguments & options are the same as described in [4.2](#cli-usage-no-install), i.e. just replace `python3 ivcfmerge.py` with `ivcfmerge`, similarly for the batch version. -## 6. Important parameters +## 5. Important parameters -### 6.1 `batch_size` +### 5.1 `batch_size` Indicates how many files to open and merge each batch, for the batch processing version. The default value for this parameter is 1000. -### 6.2 `temp_dir` +#### 5.1.1 How `batch_size` affect computation resources and performance + +* Total memory usage will not exceed `batch_size * size(one line of input VCFs)`. +* Batch size equals the number of files the utility will open at once. +* Bigger batch size will reduce the total time taken, but requires more memory and file handles from the OS. + +### 5.2 `temp_dir` For the batch processing version, the utility needs to store the intermediate results somewhere with as much space as the total space occupied by the input files. By default, the choice is left to the [tempfile](https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryFile) library. On Unix/Linux, this is usually `/tmp`. -## 7. Development +## 6. Development -### 7.1 Running tests +### 6.1 Running tests ```shell script pip3 install -r requirements/dev.txt