Releases · joshuaspear/offline_rl_ope

26 Jul 12:51

joshuaspear

v7.0.0 Latest

Latest

Altered ISEstimator and OPEEstimatorBase APIs to depend on EmpiricalMeanDenomBase and WeightDenomBase
- EmpiricalMeanDenomBase and WeightDenomBase seperatly define functions over the dataset value and weights of the individul trajectory weights, respectively. This allows a far greater number of estimators to be flexibly implemented
Added api/StandardEstimators for IS and DR to allow for 'plug-and-play' analysis
Altered discrete torch propensity model to use softmax instead of torch. Requires modelling both classes for binary classification however, improves generalisability of code

Assets 2

17 Jul 16:31

joshuaspear

Version 6.0.0

Updated PropensityModels structure for sklearn and added a helper class for compatability with torch
Full runtime typechecking with jaxtyping
Fixed bug with IS methods where the average was being taken twice
Significantly simplified API, especially integrating Policy classes with propensity models
Generalised d3rlpy API to allow for wrapping continuous policies with D3RlPyTorchAlgoPredict
Added explicit stochastic policies for d3rlpy
Introduced 'policy_func' which is any function/method which outputs type Union[TorchPolicyReturn, NumpyPolicyReturn]
Simplified and unified ISCallback in d3rlpy/api using PolicyFactory
Added 'premade' doubly robust estimators for vanilla DR, weighted DR, per-decision DR and weighted per-decision DR

Assets 2

01 Mar 10:46

joshuaspear

v5.0.0

Correctly implemented per-decision weighted importance sampling
Expanded the different types of weights that can be implemented based on:
- http://proceedings.mlr.press/v48/jiang16.pdf: Per-decision weights are defined as the average weight at a given timepoint. This results in a different denominator for different timepoints. This is implemented with the following WISWeightNorm(avg_denom=True)
- https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_faculty_pubs: Per-decision weights are defined as the sum of discounted weights across all timesteps. This is implemented with the following WISWeightNorm(discount=discount_value)
- Combinations of different weights can be easily implemented for example 'average discounted weights' WISWeightNorm(discount=discount_value, avg_denom=True) however, these do not necessaily have backing from literature.
EffectiveSampleSize metric optinally returns nan if all weights are 0
Bug fixes:
- Fix bug when running on cuda where tensors were not being pushed to CPU
- Improved static typing

Assets 2

23 Feb 11:46

joshuaspear

v4.0.0

Various bug fixes (see release log in README.md)
Predefined propensity models including:
- Generic feedforward MLP for continuous and discrete action spaces built in PyTorch
- xGBoost for continuous and discrete action spaces built in sklearn
- Both PyTorch and sklearn models can handle space discrete actions spaces i.e., a propensity model can be exposed to 'new' actions provided the full action space definition is provided at the training time of the propensity model
Metrics pattern with:
- Effective sample size calculation
- Proportion of valid weights i.e., the mean proportion of weights between a min and max value across trajectories
Refactored the BehavPolicy class to accept a 'policy_func' that aligns with the other policy classes

Assets 2

Provide feedback