Skip to content

Commit

Permalink
.
Browse files Browse the repository at this point in the history
  • Loading branch information
kiyoon committed Oct 21, 2024
1 parent 43cb5ac commit ca6e291
Show file tree
Hide file tree
Showing 8 changed files with 190 additions and 152 deletions.
81 changes: 81 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,85 @@
| [![uv](https://img.shields.io/badge/uv-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](https://github.com/astral-sh/uv) | [![Actions status](https://github.com/deargen/biotest/workflows/Check%20pip%20compile%20sync/badge.svg)](https://github.com/deargen/biotest/actions) |
|[![Built with Material for MkDocs](https://img.shields.io/badge/Material_for_MkDocs-526CFE?style=for-the-badge&logo=MaterialForMkDocs&logoColor=white)](https://squidfunk.github.io/mkdocs-material/)|[![Actions status](https://github.com/deargen/biotest/workflows/Deploy%20MkDocs%20on%20latest%20commit/badge.svg)](https://github.com/deargen/biotest/actions)|

A Python package for testing bioinformatics data. Mainly, it provides a set of functions to compare normal text/binary files, npy files, pdb files, and directories.

## 🛠️ Installation

```bash
pip install biotest
```

## 🚀 Usage

Mainly, use the API with pytest.

```python
from biotest.compare_files import (
assert_two_files_equal_sha,
assert_two_npys_within_tolerance,
assert_two_pdbqt_files_within_tolerance,
assert_two_pdb_files_within_tolerance,
assert_two_dirs_within_tolerance,
)

def assert_two_files_sha(file1: str | PathLike | IOBase, file2: str | PathLike | IOBase):
"""
Assert that two files are exactly the same.
"""
...

def assert_two_npys_within_tolerance(
npy1: str | PathLike | np.ndarray, npy2: str | PathLike | np.ndarray, *, tolerance=1e-6
):
"""
Assert that two npy files are almost the same within a tolerance.
"""
...


def assert_two_pdbqt_files_within_tolerance(
file1: str | PathLike | IOBase, file2: str | PathLike | IOBase, *, tolerance=1e-3
):
"""
Assert that two pdbqt files are equal under following conditions.
- ignore the trailing whitespace.
- 0.001 default tolerance for Orthogonal coordinates for X,Y,Z in Angstroms.
"""
...


def assert_two_pdb_files_within_tolerance(
file1: str | PathLike | IOBase, file2: str | PathLike | IOBase, *, tolerance=1e-3
):
"""
Assert that two pdb files are equal under following conditions.
- ignore the trailing whitespace.
- 0.001 default tolerance for Orthogonal coordinates for X,Y,Z in Angstroms.
"""
...


def assert_two_dirs_within_tolerance(
dir1: str | PathLike,
dir2: str | PathLike,
*,
tolerance: float = 1e-3,
filenames_exclude: Sequence[str] | None = None,
):
"""
Assert that two directories have the same files with almost the same content within tolerance.
"""
...
```

Also, you can use the CLI to quickly test the functionality. These merely call the functions above, so they will print the traceback if the assertion fails.

```bash
biotest assert-two-files-equal-sha file1 file2
biotest assert-two-npys-within-tolerance file1.npy file2.npy
biotest assert-two-pdbqt-files-within-tolerance file1.pdbqt file2.pdbqt
biotest assert-two-pdb-files-within-tolerance file1.pdb file2.pdb
biotest assert-two-dirs-within-tolerance dir1 dir2
```
21 changes: 0 additions & 21 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,6 @@ nav:
- Home:
- Overview: index.md
- Changelog: CHANGELOG.md
- Document Generator:
- MkDocs: mkdocs/mkdocs.md
- GitLab Pages 세팅: mkdocs/gitlab_pages.md
- 파이썬 referencing: mkdocs/mkdocstrings.md
- API Reference 페이지: mkdocs/api_reference.md
- Admonition (Note 창): mkdocs/admonitions.md
- Git:
- 커밋 메시지: git/git_commit.md
- GitHub 사용법: git/github.md
- 버전 릴리즈: git/release_version.md
- Format code on GitHub: git/format_code.md
- Python tools:
- Formatters (black, isort): python_tools/formatters.md
- Linter (ruff): python_tools/linter.md
- LSP (pylance): python_tools/lsp.md
- TODO highlights: python_tools/todo_highlights.md
- 기타 플러그인: python_tools/other_vscode_extensions.md
- Python:
- logging: python/logging.md
- Version from Git Tag: python/versioneer.md
- Configuration: python/configuration.md
# defer to gen-files + literate-nav
- API reference:
- mkdocstrings-python: reference/
Expand Down
4 changes: 0 additions & 4 deletions src/biotest/__init__.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
import os
from pathlib import Path

from dotenv import load_dotenv

from . import _version

load_dotenv()

__version__ = _version.get_versions()["version"]

default_log_level = os.environ.get("BIOTEST_LOG_LEVEL")
Expand Down
55 changes: 50 additions & 5 deletions src/biotest/cli/main.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# ruff: noqa: T201
from pathlib import Path

import typer

app = typer.Typer(
Expand Down Expand Up @@ -26,12 +28,55 @@ def common(


@app.command()
def health():
from ..health import main as health_main
from ..utils.log import setup_logging
def assert_two_files_equal_sha(file1: Path, file2: Path):
from ..compare_files import assert_two_files_equal_sha

assert_two_files_equal_sha(file1, file2)


# biotest assert-two-npys-within-tolerance file1.npy file2.npy
# biotest assert-two-pdbqt-files-within-tolerance file1.pdbqt file2.pdbqt
# biotest assert-two-pdb-files-within-tolerance file1.pdb file2.pdb
# biotest assert-two-dirs-within-tolerance dir1 dir2
@app.command()
def assert_two_npys_within_tolerance(file1: Path, file2: Path, tolerance: float = 1e-6):
from ..compare_files import assert_two_npys_within_tolerance

assert_two_npys_within_tolerance(npy1=file1, npy2=file2, tolerance=tolerance)


@app.command()
def assert_two_pdbqt_files_within_tolerance(
file1: Path, file2: Path, tolerance: float = 1e-3
):
from ..compare_files import assert_two_pdbqt_files_within_tolerance

assert_two_pdbqt_files_within_tolerance(
file1=file1, file2=file2, tolerance=tolerance
)


@app.command()
def assert_two_pdb_files_within_tolerance(
file1: Path, file2: Path, tolerance: float = 1e-3
):
from ..compare_files import assert_two_pdb_files_within_tolerance

assert_two_pdb_files_within_tolerance(file1=file1, file2=file2, tolerance=tolerance)


@app.command()
def assert_two_dirs_within_tolerance(
dir1: Path,
dir2: Path,
tolerance: float = 1e-3,
filenames_exclude: list[str] | None = None,
):
from ..compare_files import assert_two_dirs_within_tolerance

setup_logging(output_files=[], file_levels=[])
health_main()
assert_two_dirs_within_tolerance(
dir1=dir1, dir2=dir2, tolerance=tolerance, filenames_exclude=filenames_exclude
)


def main():
Expand Down
87 changes: 41 additions & 46 deletions src/biotest/compare_files.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import hashlib
from collections.abc import Sequence
from io import IOBase
from os import PathLike
from pathlib import Path
Expand Down Expand Up @@ -36,7 +37,7 @@ def _read_file_or_io(file: str | PathLike | IOBase, *, decode=True):
return f.read()


def compare_two_files_sha(
def assert_two_files_equal_sha(
file1: str | PathLike | IOBase, file2: str | PathLike | IOBase
):
"""
Expand All @@ -52,8 +53,11 @@ def compare_two_files_sha(
), f"{file1} and {file2} have different SHA1 hashes."


def compare_two_npys_within_tolerance(
npy1: str | PathLike | np.ndarray, npy2: str | PathLike | np.ndarray, tolerance=1e-6
def assert_two_npys_within_tolerance(
npy1: str | PathLike | np.ndarray,
npy2: str | PathLike | np.ndarray,
*,
tolerance=1e-6,
):
"""
Assert that two npy files are almost the same within a tolerance.
Expand All @@ -73,8 +77,8 @@ def compare_two_npys_within_tolerance(
)


def compare_two_pdbqt_files_within_tolerance(
file1: str | PathLike | IOBase, file2: str | PathLike | IOBase, tolerance=1e-3
def assert_two_pdbqt_files_within_tolerance(
file1: str | PathLike | IOBase, file2: str | PathLike | IOBase, *, tolerance=1e-3
):
"""
Assert that two pdbqt files are equal under following conditions.
Expand Down Expand Up @@ -122,48 +126,35 @@ def compare_two_pdbqt_files_within_tolerance(
)


def compare_two_pdb_files_within_tolerance(
file1: str | PathLike | IOBase, file2: str | PathLike | IOBase, tolerance=1e-3
def assert_two_pdb_files_within_tolerance(
file1: str | PathLike | IOBase, file2: str | PathLike | IOBase, *, tolerance=1e-3
):
# ATOM 998 N PHE B 9 18.937-159.292 -13.075 1.00 30.49 N
pdb1_lines = _read_file_or_io(file1)
pdb2_lines = _read_file_or_io(file2)
assert len(pdb1_lines) == len(pdb2_lines)

for pdb1_line, pdb2_line in zip(pdb1_lines, pdb2_lines, strict=True):
if pdb1_line.startswith("ATOM") and pdb2_line.startswith("ATOM"):
coord_1 = (
float(pdb1_line[32:38]),
float(pdb1_line[38:47]),
float(pdb1_line[47:56]),
)
coord_2 = (
float(pdb2_line[32:38]),
float(pdb2_line[38:47]),
float(pdb2_line[47:56]),
)

for c1, c2 in zip(coord_1, coord_2, strict=False):
assert np.isclose(
c1, c2, atol=tolerance
), f"{file1} and {file2} have different lines."
f"{pdb1_line.rstrip()} and {pdb2_line.rstrip()} are not equal."
"""
Assert that two pdb files are equal under following conditions.
line1_except_coord = pdb1_line[:32] + pdb1_line[56:]
line2_except_coord = pdb2_line[:32] + pdb2_line[56:]
assert (
line1_except_coord.rstrip() == line2_except_coord.rstrip()
), f"{file1} and {file2} have different lines."
f"{pdb1_line.rstrip()} and {pdb2_line.rstrip()} are not equal."
- ignore the trailing whitespace.
- 0.001 default tolerance for Orthogonal coordinates for X,Y,Z in Angstroms.
else:
assert (
pdb1_line.rstrip() == pdb2_line.rstrip()
), f"{file1} and {file2} have different lines."
f"{pdb1_line.rstrip()} and {pdb2_line.rstrip()} are not equal."
Note:
- Currently, the implementation is completely equal to assert_two_pdbqt_files_within_tolerance.
- It may change and diverge in the future, thus there are two separate functions.
"""
# ATOM 998 N PHE B 9 18.937-159.292 -13.075 1.00 30.49 N
assert_two_pdbqt_files_within_tolerance(file1, file2, tolerance)


def compare_two_dirs(dir1: Path, dir2: Path, filenames_exclude=None):
def assert_two_dirs_within_tolerance(
dir1: str | PathLike,
dir2: str | PathLike,
*,
tolerance: float = 1e-3,
filenames_exclude: Sequence[str] | None = None,
):
"""
Assert that two directories have the same files with almost the same content within tolerance.
"""
dir1 = Path(dir1)
dir2 = Path(dir2)
assert dir1.is_dir()
assert dir2.is_dir()

Expand All @@ -185,10 +176,14 @@ def compare_two_dirs(dir1: Path, dir2: Path, filenames_exclude=None):
file2 = dir2 / file1.name

if file1.suffix == ".npy":
compare_two_npys_within_tolerance(file1, file2)
assert_two_npys_within_tolerance(file1, file2, tolerance=tolerance)
elif file1.suffix == ".pdbqt":
compare_two_pdbqt_files_within_tolerance(file1, file2)
assert_two_pdbqt_files_within_tolerance(file1, file2, tolerance=tolerance)
elif file1.suffix == ".pdb":
compare_two_pdb_files_within_tolerance(file1, file2)
assert_two_pdb_files_within_tolerance(file1, file2, tolerance=tolerance)
elif file1.is_dir():
assert_two_dirs_within_tolerance(
file1, file2, tolerance=tolerance, filenames_exclude=filenames_exclude
)
else:
compare_two_files_sha(file1, file2)
assert_two_files_equal_sha(file1, file2)
27 changes: 18 additions & 9 deletions src/biotest/utils/log.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import logging
import os
from datetime import datetime, timezone
from os import PathLike
from pathlib import Path

from rich.console import Console
Expand All @@ -10,9 +11,6 @@

from biotest import LOG_DIR, PROJECT_DIR, __version__, default_log_level

# 옵션으로 `from accelerate.logging import get_logger`
# 사용하시면 로깅할 때 main_process_only=False, in_order=True 등 옵션 사용 가능합니다
# https://huggingface.co/docs/accelerate/package_reference/logging
logger = logging.getLogger(__name__)

console = Console(
Expand All @@ -29,6 +27,7 @@

def setup_logging(
console_level: int | str = default_log_level,
log_dir: str | PathLike | None = None,
output_files: list[str] | None = None,
file_levels: list[int] | None = None,
):
Expand All @@ -38,14 +37,24 @@ def setup_logging(
You should call this function at the beginning of your script.
Args:
console_level: Logging level for console. Defaults to INFO or env var BIOTEST_LOG_LEVEL.
output_files: List of output file paths, relative to LOG_DIR. If None, use default.
console_level: Logging level for console. Defaults to INFO or env var MLPROJECT_LOG_LEVEL.
log_dir: Directory to save log files. If None, do not save log files.
output_files: List of output file paths, relative to log_dir. If None, use default.
file_levels: List of logging levels for each output file. If None, use default.
"""
if output_files is None:
output_files = ["{date:%Y%m%d-%H%M%S}-{name}-{levelname}-{version}.log"]
if file_levels is None:
file_levels = [logging.INFO]
if log_dir is None:
assert output_files is None, "output_files must be None if log_dir is None"
assert file_levels is None, "file_levels must be None if log_dir is None"

output_files = []
file_levels = []
else:
log_dir = Path(log_dir)

if output_files is None:
output_files = ["{date:%Y%m%d-%H%M%S}-{name}-{levelname}-{version}.log"]
if file_levels is None:
file_levels = [logging.INFO]

assert len(output_files) == len(
file_levels
Expand Down
8 changes: 0 additions & 8 deletions tests/conftest.py

This file was deleted.

Loading

0 comments on commit ca6e291

Please sign in to comment.