-
Notifications
You must be signed in to change notification settings - Fork 8
preprocessing
This is a module with functions and classes that can be used for the preprocessing of quantitative phenotyping data in the format returned by the functions in read_data. Preprocessing encompasses filtering, scaling, smoothing and any other cleaning/imputing/quality control procedure that precedes the analysis stage.
Functions to filter the data based on different criteria. The functions in this script do not modify the feature values. They only select or drop data based on quality (e.g. ratio of nan/inf values), feature properties or metadata information.
Functions available to the user:
- filter_n_skeletons
- drop_ventrally_signed
- select_feat_set
- filter_nan_inf
- cap_feat_values
- feat_filter_std
- drop_feat_by_keyword
- select_feat_by_keyword
- drop_samples_by_meta_column
- select_samples_by_meta_column
- filter_samples_by_meta_col_thresholds
- drop_bad_wells
- select_bluelight_conditions
Functions to preprocess the feature matrix. These are functions that modify the feature values, for example to impute nans or encode categerical features. Functions for scaling could also be included here (the scaling_class script is separated because it contains a class rather that functions).
Functions available to the user:
- impute_nan_inf
- average_by_groups
- encode_categorical_variable
A class used for scaling features in different ways. It is loosely based on the structure of the sklearn scaling classes, employing fit, tranform and fit_transform methods.
Class scalingClass - methods available to the user:
- fit
- transform
- fit_transform
Two classes that can be used to smooth experimental data with multiple replicates, but creating bootstrap averages (see description of this method in the Sungenta paper).
Class DataBagging - methods available to the user:
- fit
- fit_transform
Class DataBaggingByClass - methods available to the user:
- fit
- fit_transform
Other functions available to the user:
- get_drug2moa_mapper