Bento Demo Dataset

Partially synthetic demo dataset for the Bento platform. Requires Python 3.10+

Based partly on data from:

The 1000 Genomes project, © EMBL-EBI
The International Human Epigenome Consortium

Requirements:

Optionally create a virtual environment, e.g.:

virtualenv -p python3 ./env
source env/bin/activate

To install dependencies run:

pip install -r requirements.txt

Usage:

To run:

python generate_dataset.py

This will write phenopackets to synthetic_phenopackets.json and experiments to synthetic_experiments.json.

Other useful files are available in the /dataset_files directory:

config.json: a Katsu config file matching the dataset
dats.json: an example DATS file
extra_properties_typing.json: to configure typed extra properties
mock experiment files in .csv, .jpg, .md, .mp4, .pdf, and .xlsx format

Optional Configuration:

The dataset is a mix of fixed and randomly generated values, random values will be the same across different runs of generate_dataset.py. To change the output, modify any of the values in config/constants.py.

The dataset is generated based on the input file config/individuals.json. You can add (or remove) individuals for different output. Individuals with "id" and "sex" fields only will get fully synthetic metadata, while any values in the "biosamples", "experiments" or "diseases" fields will be copied over unmodified. This allows, for example, generating appropriate metadata for real data files (which may involve, e.g., a particular disease).

Optional Data Files:

The dataset is meant for use with genomic data from the 1000 Genomes Project, and transcriptomics data from the International Human Epigenome Consortium. See here for more details on data files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bento Demo Dataset

Requirements:

Usage:

Optional Configuration:

Optional Data Files:

About

Releases

Packages

Contributors 7

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
config		config
dataset_files		dataset_files
experiments		experiments
individuals		individuals
phenopackets		phenopackets
random_generator		random_generator
transcriptomics		transcriptomics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_dataset.py		generate_dataset.py
requirements.txt		requirements.txt
sample.vcf.sh		sample.vcf.sh

License

bento-platform/bento_demo_dataset

Folders and files

Latest commit

History

Repository files navigation

Bento Demo Dataset

Requirements:

Usage:

Optional Configuration:

Optional Data Files:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages