preprocessing

Jump to bottom

em812 edited this page May 11, 2021 · 3 revisions

This is a module with functions and classes that can be used for the preprocessing of quantitative phenotyping data in the format returned by the functions in read_data. Preprocessing encompasses filtering, scaling, smoothing and any other cleaning/imputing/quality control procedure that precedes the analysis stage.

Contents

filter_data

Functions to filter the data based on different criteria. The functions in this script do not modify the feature values. They only select or drop data based on quality (e.g. ratio of nan/inf values), feature properties or metadata information.

Functions available to the user:

filter_n_skeletons
drop_ventrally_signed
select_feat_set
filter_nan_inf
cap_feat_values
feat_filter_std
drop_feat_by_keyword
select_feat_by_keyword
drop_samples_by_meta_column
select_samples_by_meta_column
filter_samples_by_meta_col_thresholds
drop_bad_wells
select_bluelight_conditions

preprocess_features

Functions to preprocess the feature matrix. These are functions that modify the feature values, for example to impute nans or encode categerical features. Functions for scaling could also be included here (the scaling_class script is separated because it contains a class rather that functions).

Functions available to the user:

impute_nan_inf
average_by_groups
encode_categorical_variable

scaling_class

A class used for scaling features in different ways. It is loosely based on the structure of the sklearn scaling classes, employing fit, tranform and fit_transform methods.

Class scalingClass - methods available to the user:

fit
transform
fit_transform

bagging_data

Two classes that can be used to smooth experimental data with multiple replicates, but creating bootstrap averages (see description of this method in the Sungenta paper).

Class DataBagging - methods available to the user:

fit
fit_transform

Class DataBaggingByClass - methods available to the user:

fit
fit_transform

Other functions available to the user:

get_drug2moa_mapper