Phenylketonuria landscape analysis v 1.9.0

Pipeline:

Apply Gaussian model and extract features
Perform QC (rmse - var)
Detect non-responders
Subpopulation detection

Run pipeline

Install packages

numpy
pandas
scipy
matplotlib
openpyxl
tqdm
scikit-learn      # needed for prediction only

To run the pipeline use the script gm.py, which requres experimant files as *.xlsm.

Example 1) run the pipeline for one experimant

python gm.py --input_dir Data/experiments_all/20240305_EXP13.xlsm

Example 2) run the pipeline for all experimants in a directory (e.g. Data/experiments_all)

python gm.py --input_dir Data/experiments_all

Promp terminal parameters:

--input_dir: input directiry. Either a *.xlsm file or a folder containing them.
--param_dir: directory to parameters json file (see below).
--no_plot: no plot will be created if apecified.

A parameter .json file is also needed to specify the parameters. By default the json file should be in the same folder as the gm.py named gm_parameters.json. You can change this by the parameter --param_dir.

Parameters in the json file:

save_image2d_dir: Directory to save 2D landscapes; set it as "" to prevent saving.
save_image3d_dir: Directory to save 3D plots; set it as "" to prevent saving.
save_feature_dir: Directory to save feature csv file; set it as "" to prevent saving.
tag: Name tag for the feature .csv file.
qc_thr_rmse: QC threshold for RMSE; default=[0.2, 0.25]
qc_thr_n_peaks: QC threshold for N Peacks; default=[5, 8]
qc_thr_variation: QC threshold for Variations; default=[0.1, 0.25]
elev: Elevation of the camera in the 3D plots; default=30,
azim: Angle of the camera in the 3D plots (in degrees);default=-120,
nbins: Number of bins used for the interpolation (mesh grid of size nbins x nbins);default=1000,
sm_method: Method used for the interpolation and smoothing ("regular_grid" or "linear_ndi");default='regular_grid',
rescale: Binary value to rescale the data with respect to the WT;default=True,
max_val_scale: Maximum value of the non-rescaled data;default=10000,
info_box: Binary value to add an information box to the plots;default=True,
max_val: Binary value to add the maximum value of the data to the information box;default=True,
peak_coords: Binary value to add the coordinates of the peaks to the information box;default=True,
fifty_coords: Binary value to add the coordinates of the 50% of the maximum value to the information ;default=True,
plot_replicates: Binary value to plot and save data for all the replicates;default=False
plot_extra": Binary value to plot and save the data in the log scale';default=False

Predict for new data

When we have new data samples, the script predict.py can be used to predict the lables of clusters for each sample.

Example)

python predict.py --train_data_dir clustering/data/clustering_result.csv --new_data_dir gm_output/features/extracted_features.csv

Promp terminal parameters:

--train_data_dir: Train data directory; the output of the clustering pipeline provided in R.
--new_data_dir: New data directory; the output of the gm.py for the train data.
--k: Number of neighbours k in KNN.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Phenylketonuria landscape analysis v 1.9.0

Run pipeline

Predict for new data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Phenylketonuria landscape analysis v 1.9.0

Run pipeline

Predict for new data