Skip to content

preprocessing

em812 edited this page May 11, 2021 · 3 revisions

This is a module with functions and classes that can be used for the preprocessing of quantitative phenotyping data in the format returned by the functions in read_data. Preprocessing encompasses filtering, scaling, smoothing and any other cleaning/imputing/quality control procedure that precedes the analysis stage.

Contents

filter_data

Functions to filter the data based on different criteria. The functions in this script do not modify the feature values. They only select or drop data based on quality (e.g. ratio of nan/inf values), feature properties or metadata information.

Functions available to the user:

  • filter_n_skeletons
  • drop_ventrally_signed
  • select_feat_set
  • filter_nan_inf
  • cap_feat_values
  • feat_filter_std
  • drop_feat_by_keyword
  • select_feat_by_keyword
  • drop_samples_by_meta_column
  • select_samples_by_meta_column
  • filter_samples_by_meta_col_thresholds
  • drop_bad_wells
  • select_bluelight_conditions

preprocess_features

Functions to preprocess the feature matrix. These are functions that modify the feature values, for example to impute nans or encode categerical features. Functions for scaling could also be included here (the scaling_class script is separated because it contains a class rather that functions).

Functions available to the user:

  • impute_nan_inf
  • average_by_groups
  • encode_categorical_variable

scaling_class

A class used for scaling features in different ways. It is loosely based on the structure of the sklearn scaling classes, employing fit, tranform and fit_transform methods.

Class scalingClass - methods available to the user:

  • fit
  • transform
  • fit_transform

bagging_data

Two classes that can be used to smooth experimental data with multiple replicates, but creating bootstrap averages (see description of this method in the Sungenta paper).

Class DataBagging - methods available to the user:

  • fit
  • fit_transform

Class DataBaggingByClass - methods available to the user:

  • fit
  • fit_transform

Other functions available to the user:

  • get_drug2moa_mapper
Clone this wiki locally