calidhayte

Summary

calidhayte calibrates one set of measurements against another, using a variety of parametric and non parametric techniques. The datasets are split by k-fold cross validation and stratified so the distribution of 'true' measurements is consistent in all. It can then performs multiple error calculations to validate them, as well as produce several graphs to visualise the calibrations.

Main Features

Calibrate one set of measurements (cross-comparing all available secondary variables) against a 'true' set
- A suite of calibration methods are available, including bayesian regression
Perform a suite of error calculations on the resulting calibration
Visualise results of calibration
Summarise calibrations to highlight best performing techniques

How to install

pip

pip install git+https://github.com/CaderIdris/calidhayte@release_tag

conda

conda install git pip
pip install git+https://github.com/CaderIdris/calidhayte@release_tag

The release tags can be found in the sidebar

Dependencies

Please see Pipfile.

Example Usage

This module requires two dataframes as a prerequisite.

Independent Measurements

	x	a	b	c	d	e
2022-01-01	0.1	0	7	2.2	3	5
2022-01-02	0.7	1	3	2	8.9	1
2022-01-03	nan	nan	1	nan	nan	7
_	_	_	_	_	_	_
2022-09-30	0.5	3	1	2.7	4	0

Dependent Measurements

	x
2022-01-02	1
2022-01-05	3
_	_
2022-09-29	nan
2022-09-30	37
2022-10-01	3

The two dataframes are joined on the index as an inner join, so the indices do not have to match initially
nan values can be present
More than one column can be present for the dependent measurements but only 'Values' will be used
The index can contain date objects, datetime objects or integers. They should be unique. Strings are untested and may cause unexpected behaviours

from calidhayte import Calibrate, Results, Graphs, Summary

# x_df is a dataframe containing multiple columns containing independent measurements.
# The primary measurement is denoted by the 'Values' columns, the other measurement columns can have any name.
# y_df is a dataframe containing the dependent measurement in the 'Values' column.

coeffs = Calibrate(
	x=x_df,
	y=y_df
	target='x'
)

cal.linreg()
cal.theil_sen()
cal.random_forest(n_estimators=500, max_features=1.0)

models = coeffs.return_models()

results = Results(
	x=x_df,
	y=y_df,
	target='x',
	models=models
)

results.r2()
results.median_absolute()
results.max()

results_df = results.return_errors()
results_df.to_csv('results.csv')

graphs = Graphs(
	x=x_df,
	y=y_df,
	target='x',
	models=models,
	x_name='x',
	y_name='y'
)
graphs.ecdf_plot()
graphs.lin_reg_plot()
graphs.save_plots()

Acknowledgements

Many thanks to James Murphy at Mcoding who's excellent tutorial Automated Testing in Python and associated repository helped a lot when structuring this package

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
src/calidhayte		src/calidhayte
tests		tests
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
makefile		makefile
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

calidhayte

Table of Contents

Summary

Main Features

How to install

Dependencies

Example Usage

Acknowledgements

About

Releases

Packages

Languages

License

CaderIdris/calidhayte

Folders and files

Latest commit

History

Repository files navigation

calidhayte

Table of Contents

Summary

Main Features

How to install

Dependencies

Example Usage

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages