Releases: joshuaspear/offline_rl_ope
Releases · joshuaspear/offline_rl_ope
v7.0.0
- Altered ISEstimator and OPEEstimatorBase APIs to depend on EmpiricalMeanDenomBase and WeightDenomBase
- EmpiricalMeanDenomBase and WeightDenomBase seperatly define functions over the dataset value and weights of the individul trajectory weights, respectively. This allows a far greater number of estimators to be flexibly implemented
- Added api/StandardEstimators for IS and DR to allow for 'plug-and-play' analysis
- Altered discrete torch propensity model to use softmax instead of torch. Requires modelling both classes for binary classification however, improves generalisability of code
Version 6.0.0
- Updated PropensityModels structure for sklearn and added a helper class for compatability with torch
- Full runtime typechecking with jaxtyping
- Fixed bug with IS methods where the average was being taken twice
- Significantly simplified API, especially integrating Policy classes with propensity models
- Generalised d3rlpy API to allow for wrapping continuous policies with D3RlPyTorchAlgoPredict
- Added explicit stochastic policies for d3rlpy
- Introduced 'policy_func' which is any function/method which outputs type Union[TorchPolicyReturn, NumpyPolicyReturn]
- Simplified and unified ISCallback in d3rlpy/api using PolicyFactory
- Added 'premade' doubly robust estimators for vanilla DR, weighted DR, per-decision DR and weighted per-decision DR
v5.0.0
- Correctly implemented per-decision weighted importance sampling
- Expanded the different types of weights that can be implemented based on:
- http://proceedings.mlr.press/v48/jiang16.pdf: Per-decision weights are defined as the average weight at a given timepoint. This results in a different denominator for different timepoints. This is implemented with the following
WISWeightNorm(avg_denom=True)
- https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_faculty_pubs: Per-decision weights are defined as the sum of discounted weights across all timesteps. This is implemented with the following
WISWeightNorm(discount=discount_value)
- Combinations of different weights can be easily implemented for example 'average discounted weights'
WISWeightNorm(discount=discount_value, avg_denom=True)
however, these do not necessaily have backing from literature.
- http://proceedings.mlr.press/v48/jiang16.pdf: Per-decision weights are defined as the average weight at a given timepoint. This results in a different denominator for different timepoints. This is implemented with the following
- EffectiveSampleSize metric optinally returns nan if all weights are 0
- Bug fixes:
- Fix bug when running on cuda where tensors were not being pushed to CPU
- Improved static typing
v4.0.0
- Various bug fixes (see release log in README.md)
- Predefined propensity models including:
- Generic feedforward MLP for continuous and discrete action spaces built in PyTorch
- xGBoost for continuous and discrete action spaces built in sklearn
- Both PyTorch and sklearn models can handle space discrete actions spaces i.e., a propensity model can be exposed to 'new' actions provided the full action space definition is provided at the training time of the propensity model
- Metrics pattern with:
- Effective sample size calculation
- Proportion of valid weights i.e., the mean proportion of weights between a min and max value across trajectories
- Refactored the BehavPolicy class to accept a 'policy_func' that aligns with the other policy classes