Skip to content

Latest commit

 

History

History
146 lines (104 loc) · 3.5 KB

README.md

File metadata and controls

146 lines (104 loc) · 3.5 KB

calidhayte

Contact: [email protected]

Tests


Table of Contents

  1. Summary
  2. Main Features
  3. How to Install
  4. Dependencies
  5. Example Usage
  6. Acknowledgements

Summary

calidhayte calibrates one set of measurements against another, using a variety of parametric and non parametric techniques. The datasets are split by k-fold cross validation and stratified so the distribution of 'true' measurements is consistent in all. It can then performs multiple error calculations to validate them, as well as produce several graphs to visualise the calibrations.


Main Features

  • Calibrate one set of measurements (cross-comparing all available secondary variables) against a 'true' set
    • A suite of calibration methods are available, including bayesian regression
  • Perform a suite of error calculations on the resulting calibration
  • Visualise results of calibration
  • Summarise calibrations to highlight best performing techniques

How to install

pip

pip install git+https://github.com/CaderIdris/calidhayte@release_tag

conda

conda install git pip
pip install git+https://github.com/CaderIdris/calidhayte@release_tag 

The release tags can be found in the sidebar


Dependencies

Please see Pipfile.


Example Usage

This module requires two dataframes as a prerequisite.

Independent Measurements

x a b c d e
2022-01-01 0.1 0 7 2.2 3 5
2022-01-02 0.7 1 3 2 8.9 1
2022-01-03 nan nan 1 nan nan 7
_ _ _ _ _ _ _
2022-09-30 0.5 3 1 2.7 4 0

Dependent Measurements

x
2022-01-02 1
2022-01-05 3
_ _
2022-09-29 nan
2022-09-30 37
2022-10-01 3
  • The two dataframes are joined on the index as an inner join, so the indices do not have to match initially
  • nan values can be present
  • More than one column can be present for the dependent measurements but only 'Values' will be used
  • The index can contain date objects, datetime objects or integers. They should be unique. Strings are untested and may cause unexpected behaviours
from calidhayte import Calibrate, Results, Graphs, Summary

# x_df is a dataframe containing multiple columns containing independent measurements.
# The primary measurement is denoted by the 'Values' columns, the other measurement columns can have any name.
# y_df is a dataframe containing the dependent measurement in the 'Values' column.

coeffs = Calibrate(
	x=x_df,
	y=y_df
	target='x'
)

cal.linreg()
cal.theil_sen()
cal.random_forest(n_estimators=500, max_features=1.0)

models = coeffs.return_models()

results = Results(
	x=x_df,
	y=y_df,
	target='x',
	models=models
)

results.r2()
results.median_absolute()
results.max()

results_df = results.return_errors()
results_df.to_csv('results.csv')

graphs = Graphs(
	x=x_df,
	y=y_df,
	target='x',
	models=models,
	x_name='x',
	y_name='y'
)
graphs.ecdf_plot()
graphs.lin_reg_plot()
graphs.save_plots()

Acknowledgements

Many thanks to James Murphy at Mcoding who's excellent tutorial Automated Testing in Python and associated repository helped a lot when structuring this package