diff --git a/TODOs_ROADMAP.rst b/TODOs_ROADMAP.rst index d8d3553bb..84f9ff0cb 100644 --- a/TODOs_ROADMAP.rst +++ b/TODOs_ROADMAP.rst @@ -21,7 +21,7 @@ their planned release. ---------------------- - [ ] Not part of public API; to be redesigned and probably dropped. Should use - ``_loader_*`` and ``_storer_*`` from ``cvxportfolio.data``. Target ``1.1.0``. + ``_loader_*`` and ``_storer_*`` from ``cvxportfolio.data``. ``cvxportfolio.forecast`` ------------------------- @@ -29,23 +29,22 @@ their planned release. - cache logic needs improvement, not easily exposable to third-parties now with ``dataclass.__hash__`` - drop decorator - - drop dataclass + - drop dataclass, PR #133 - cache IO logic should be managed by forecaster not by simulator, could be done by ``initialize_estimator``; maybe enough to just define it in the base class of forecasters - improve names of internal methods, clean them (lots of stuff can be re-used at universe change, ...) - generalize the mean estimator: - - use same code for ``past_returns``, ``past_returns**2``, ``past_volumes``, ... - - add rolling window option, should be in ``pd.Timedelta`` - - add exponential moving avg, should be in half-life ``pd.Timedelta`` -- add same extras to the covariance estimator + - [X] use same code for ``past_returns``, ``past_returns**2``, ``past_volumes``, .... Done in #126, target ``1.2.0`` + - [X] add rolling window option, should be in ``pd.Timedelta``. Done in #126, target ``1.2.0`` + - [X] add exponential moving avg, should be in half-life ``pd.Timedelta``. Done in #126, target ``1.2.0`` +- [X] add same extras to the covariance estimator. Done in #126, target ``1.2.0`` - goal: make this module crystal clear; third-party ML models should use it (at least for caching) ``cvxportfolio.estimator`` -------------------------- -- [ ] ``DataEstimator`` needs refactoring, too long and complex methods. Target - ``1.1.1``. +- [ ] ``DataEstimator`` needs refactoring, too long and complex methods. - ``Estimator`` could define base logic for on-disk caching. By itself it wouldn't do anything, actual functionality implemented by forecasters' base class. @@ -65,8 +64,8 @@ their planned release. YF), total returns like other data sources, or neither for non-stocks assets. This would implement all data cleaning process as sequence of small steps in separate methods, with good logging. It would also implement data quality - check in the ``preload`` method to give feedback to the user. PR #125 -- [ ] Factor ``data.py`` in ``data/`` submodule. PR #125 + check in the ``preload`` method to give feedback to the user. PR #127 +- [ ] Factor ``data.py`` in ``data/`` submodule. PR #127 ``cvxportfolio.simulator`` -------------------------- @@ -85,11 +84,12 @@ Partially public; only ``cvx.Gamma()`` (no arguments) and ``optimize_hyperparame (simple usage) are public, all the rest is not. - [ ] Clean up interface w/ ``MarketSimulator``, right now it calls private - methods, maybe enough to make them public. Target ``1.1.1``. + methods, maybe enough to make them public. - [ ] Add risk/fine default ``GammaTrade``, ``GammaRisk`` (which are ``RangeHyperParameter``) modeled after original examples from paper. -- [ ] Add ``Constant`` internal object throughout the library, also in ``DataEstimator`` +- [X] Add ``Constant`` internal object throughout the library, also in ``DataEstimator`` in the case of scalar; it resolves to ``current_value`` if you pass a hyper-parameter. + Replaced with _resolve_hyperpar in #126. - [ ] Distinguish integer and positive hyper-parameters (also enforced by Constant). - [ ] Consider changing the increment/decrement model; hyperparameter object could instead return a ``neighbors`` set at each point. Probably cleaner. @@ -98,7 +98,7 @@ Partially public; only ``cvx.Gamma()`` (no arguments) and ``optimize_hyperparame ------------------------- - [ ] Add `AllIn` policy, which allocates all to a single name (like - ``AllCash``). Target ``1.1.0``. + ``AllCash``). Optimization policies ~~~~~~~~~~~~~~~~~~~~~ @@ -115,7 +115,7 @@ Optimization policies ---------------------------- - [ ] Add missing constraints from the paper. -- [ ] Make ``MarketNeutral`` accept arbitrary benchmark (policy object). +- [X] Make ``MarketNeutral`` accept arbitrary benchmark (policy object). Done in #126. ``cvxportfolio.result`` ----------------------- @@ -144,7 +144,7 @@ Development & testing - [ ] Add extra pylint checkers. - - [ ] Code complexity. Target ``1.1.1``. + - [ ] Code complexity. - [ ] Consider removing downloaded data from ``test_simulator.py``, so only ``test_data.py`` requires internet. diff --git a/cvxportfolio/constraints/__init__.py b/cvxportfolio/constraints/__init__.py new file mode 100644 index 000000000..45cf973eb --- /dev/null +++ b/cvxportfolio/constraints/__init__.py @@ -0,0 +1,45 @@ +# Copyright 2023 Enzo Busseti +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Here we define many realistic constraints that apply to :ref:`portfolio +optimization trading policies `. + +Some of them, like :class:`LongOnly`, are +very simple to use. Some others are more advanced, +for example :class:`FactorNeutral` +takes time-varying factor exposures as parameters. + +For a minimal example we present the classic Markowitz allocation. + +.. code-block:: python + + import cvxportfolio as cvx + + objective = cvx.ReturnsForecast() - gamma_risk * cvx.FullCovariance() + + # the policy takes a list of constraint instances + constraints = [cvx.LongOnly(applies_to_cash=True)] + + policy = cvx.SinglePeriodOptimization(objective, constraints) + print(cvx.MarketSimulator(universe).backtest(policy)) + +With this, we require that the optimal post-trade weights +found by the single-period optimization policy are non-negative. +In our formulation the full portfolio weights vector (which includes +the cash account) sums to one, +see equation :math:`(4.9)` at page 43 of +`the book `_. +""" + +from .base_constraints import * +from .constraints import * diff --git a/cvxportfolio/constraints/base_constraints.py b/cvxportfolio/constraints/base_constraints.py new file mode 100644 index 000000000..44a42c589 --- /dev/null +++ b/cvxportfolio/constraints/base_constraints.py @@ -0,0 +1,135 @@ +# Copyright 2023 Enzo Busseti +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""This module defines base constraint classes.""" + + +from ..estimator import CvxpyExpressionEstimator, DataEstimator + +__all__ = ['Constraint', 'EqualityConstraint', 'InequalityConstraint'] + +class Constraint(CvxpyExpressionEstimator): + """Base cvxpy constraint class.""" + + def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): + """Compile constraint to cvxpy. + + :param w_plus: Post-trade weights. + :type w_plus: cvxpy.Variable + :param z: Trade weights. + :type z: cvxpy.Variable + :param w_plus_minus_w_bm: Post-trade weights minus benchmark + weights. + :type w_plus_minus_w_bm: cvxpy.Variable + :returns: some cvxpy.constraints object, or list of those + :rtype: cvxpy.constraints, list + """ + raise NotImplementedError # pragma: no cover + + +class EqualityConstraint(Constraint): + """Base class for equality constraints. + + This class is not exposed to the user, each equality + constraint inherits from this and overrides the + :func:`InequalityConstraint._compile_constr_to_cvxpy` and + :func:`InequalityConstraint._rhs` methods. + + We factor this code in order to streamline the + design of :class:`SoftConstraint` costs. + """ + + def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): + """Compile constraint to cvxpy. + + :param w_plus: Post-trade weights. + :type w_plus: cvxpy.Variable + :param z: Trade weights. + :type z: cvxpy.Variable + :param w_plus_minus_w_bm: Post-trade weights minus benchmark + weights. + :type w_plus_minus_w_bm: cvxpy.Variable + :returns: Cvxpy constraints object. + :rtype: cvxpy.constraints + """ + return self._compile_constr_to_cvxpy(w_plus, z, w_plus_minus_w_bm) ==\ + self._rhs() + + def _compile_constr_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): + """Cvxpy expression of the left-hand side of the constraint.""" + raise NotImplementedError # pragma: no cover + + def _rhs(self): + """Cvxpy expression of the right-hand side of the constraint.""" + raise NotImplementedError # pragma: no cover + + +class InequalityConstraint(Constraint): + """Base class for inequality constraints. + + This class is not exposed to the user, each inequality + constraint inherits from this and overrides the + :func:`InequalityConstraint._compile_constr_to_cvxpy` and + :func:`InequalityConstraint._rhs` methods. + + We factor this code in order to streamline the + design of :class:`SoftConstraint` costs. + """ + + def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): + """Compile constraint to cvxpy. + + :param w_plus: Post-trade weights. + :type w_plus: cvxpy.Variable + :param z: Trade weights. + :type z: cvxpy.Variable + :param w_plus_minus_w_bm: Post-trade weights minus benchmark + weights. + :type w_plus_minus_w_bm: cvxpy.Variable + :returns: Cvxpy constraints object. + :rtype: cvxpy.constraints + """ + return self._compile_constr_to_cvxpy(w_plus, z, w_plus_minus_w_bm) <=\ + self._rhs() + + def _compile_constr_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): + """Cvxpy expression of the left-hand side of the constraint.""" + raise NotImplementedError # pragma: no cover + + def _rhs(self): + """Cvxpy expression of the right-hand side of the constraint.""" + raise NotImplementedError # pragma: no cover + + +class CostInequalityConstraint(InequalityConstraint): + """Linear inequality constraint applied to a cost term. + + The user does not interact with this class directly, + it is returned by an expression such as ``cost <= value`` + where ``cost`` is a :class:`Cost` instance and ``value`` + is a scalar. + """ + + def __init__(self, cost, value): + self.cost = cost + self.value = DataEstimator(value, compile_parameter=True) + + def _compile_constr_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): + """Compile constraint to cvxpy.""" + return self.cost.compile_to_cvxpy(w_plus, z, w_plus_minus_w_bm) + + def _rhs(self): + return self.value.parameter + + def __repr__(self): + return self.cost.__repr__() + ' <= ' + self.value.__repr__() diff --git a/cvxportfolio/constraints.py b/cvxportfolio/constraints/constraints.py similarity index 83% rename from cvxportfolio/constraints.py rename to cvxportfolio/constraints/constraints.py index dcc307dfb..37428b968 100644 --- a/cvxportfolio/constraints.py +++ b/cvxportfolio/constraints/constraints.py @@ -11,41 +11,17 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -"""Here we define many realistic constraints that apply to :ref:`portfolio -optimization trading policies `. - -Some of them, like :class:`LongOnly`, are -very simple to use. Some others are more advanced, -for example :class:`FactorNeutral` -takes time-varying factor exposures as parameters. - -For a minimal example we present the classic Markowitz allocation. - -.. code-block:: python - - import cvxportfolio as cvx - - objective = cvx.ReturnsForecast() - gamma_risk * cvx.FullCovariance() - - # the policy takes a list of constraint instances - constraints = [cvx.LongOnly(applies_to_cash=True)] - - policy = cvx.SinglePeriodOptimization(objective, constraints) - print(cvx.MarketSimulator(universe).backtest(policy)) - -With this, we require that the optimal post-trade weights -found by the single-period optimization policy are non-negative. -In our formulation the full portfolio weights vector (which includes -the cash account) sums to one, -see equation :math:`(4.9)` at page 43 of -`the book `_. -""" +"""This module defines user-facing constraints.""" import cvxpy as cp import numpy as np -from .estimator import CvxpyExpressionEstimator, DataEstimator, Estimator -from .forecast import HistoricalFactorizedCovariance +from ..estimator import DataEstimator, Estimator +from ..forecast import (HistoricalFactorizedCovariance, + project_on_psd_cone_and_factorize) +from ..policies import MarketBenchmark +from .base_constraints import (Constraint, EqualityConstraint, + InequalityConstraint) __all__ = [ "LongOnly", @@ -71,124 +47,6 @@ "MinCashBalance" ] - -class Constraint(CvxpyExpressionEstimator): - """Base cvxpy constraint class.""" - - def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): - """Compile constraint to cvxpy. - - :param w_plus: Post-trade weights. - :type w_plus: cvxpy.Variable - :param z: Trade weights. - :type z: cvxpy.Variable - :param w_plus_minus_w_bm: Post-trade weights minus benchmark - weights. - :type w_plus_minus_w_bm: cvxpy.Variable - :returns: some cvxpy.constraints object, or list of those - :rtype: cvxpy.constraints, list - """ - raise NotImplementedError # pragma: no cover - - -class EqualityConstraint(Constraint): - """Base class for equality constraints. - - This class is not exposed to the user, each equality - constraint inherits from this and overrides the - :func:`InequalityConstraint._compile_constr_to_cvxpy` and - :func:`InequalityConstraint._rhs` methods. - - We factor this code in order to streamline the - design of :class:`SoftConstraint` costs. - """ - - def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): - """Compile constraint to cvxpy. - - :param w_plus: Post-trade weights. - :type w_plus: cvxpy.Variable - :param z: Trade weights. - :type z: cvxpy.Variable - :param w_plus_minus_w_bm: Post-trade weights minus benchmark - weights. - :type w_plus_minus_w_bm: cvxpy.Variable - :returns: Cvxpy constraints object. - :rtype: cvxpy.constraints - """ - return self._compile_constr_to_cvxpy(w_plus, z, w_plus_minus_w_bm) ==\ - self._rhs() - - def _compile_constr_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): - """Cvxpy expression of the left-hand side of the constraint.""" - raise NotImplementedError # pragma: no cover - - def _rhs(self): - """Cvxpy expression of the right-hand side of the constraint.""" - raise NotImplementedError # pragma: no cover - - -class InequalityConstraint(Constraint): - """Base class for inequality constraints. - - This class is not exposed to the user, each inequality - constraint inherits from this and overrides the - :func:`InequalityConstraint._compile_constr_to_cvxpy` and - :func:`InequalityConstraint._rhs` methods. - - We factor this code in order to streamline the - design of :class:`SoftConstraint` costs. - """ - - def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): - """Compile constraint to cvxpy. - - :param w_plus: Post-trade weights. - :type w_plus: cvxpy.Variable - :param z: Trade weights. - :type z: cvxpy.Variable - :param w_plus_minus_w_bm: Post-trade weights minus benchmark - weights. - :type w_plus_minus_w_bm: cvxpy.Variable - :returns: Cvxpy constraints object. - :rtype: cvxpy.constraints - """ - return self._compile_constr_to_cvxpy(w_plus, z, w_plus_minus_w_bm) <=\ - self._rhs() - - def _compile_constr_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): - """Cvxpy expression of the left-hand side of the constraint.""" - raise NotImplementedError # pragma: no cover - - def _rhs(self): - """Cvxpy expression of the right-hand side of the constraint.""" - raise NotImplementedError # pragma: no cover - - -class CostInequalityConstraint(InequalityConstraint): - """Linear inequality constraint applied to a cost term. - - The user does not interact with this class directly, - it is returned by an expression such as ``cost <= value`` - where ``cost`` is a :class:`Cost` instance and ``value`` - is a scalar. - """ - - def __init__(self, cost, value): - self.cost = cost - self.value = DataEstimator(value, compile_parameter=True) - - def _compile_constr_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): - """Compile constraint to cvxpy.""" - return self.cost.compile_to_cvxpy(w_plus, z, w_plus_minus_w_bm) - - def _rhs(self): - return self.value.parameter - - def __repr__(self): - return self.cost.__repr__() + ' <= ' + self.value.__repr__() - - class NoCash(EqualityConstraint): """Require that the cash balance is zero at each period.""" @@ -209,29 +67,36 @@ class MarketNeutral(EqualityConstraint): .. math:: {(w_t^\text{b})}^T \Sigma_t (w_t + z_t) = 0 - The benchmark portfolio weights are computed here, and are - proportional to the rolling averages of the market volumes over the - recent past. - - .. note:: + The benchmark portfolio weights are given by a Policy object chosen by + the user. + + .. versionadded:: 1.2.0 + + This constraint's interface has been improved: now you can pass + any policy object as benchmark, and give parameters to the forecaster + of :math:`\Sigma_t`. + + :param benchmark: Policy object whose target weights at each point in time + are the benchmark weights we neutralize against. You can pass a class + or an instance. If you pass a class it is instantiated with default + parameters. Default is :class:`cvxportfolio.MarketBenchmark`, which are + weights proportional to the previous year's total traded volumes. + :type benchmark: cvx.Policy class or instance + :param kwargs: Optional arguments passed to the initializer + of :class:`cvxportfolio.forecast.HistoricalFactorizedCovariance`, + like rolling window or exponential smoothing half life, for the + estimation of the covariance matrices :math:`\Sigma_t`. Default (no + other arguments) is to use its default parameters. + :type kwargs: dict + """ - This constraint's interface will improve; you will be able - to pass any Cvxportfolio policy object as benchmark weights. + def __init__(self, benchmark=MarketBenchmark, **kwargs): - :param window: How many past observations of the volumes are used to - estimate the market benchmark. - :type window: int - """ - # TODO: refactor code to import MarketBenchmark, now it causes circular - # imports - def __init__(self, window=250, #benchmark=MarketBenchmark - ): - self.covarianceforecaster = HistoricalFactorizedCovariance() - self.window = window - # if type(benchmark) is type: - # benchmark = benchmark() - # self.benchmark = benchmark - self.market_vector = None + if isinstance(benchmark, type): + benchmark = benchmark() + self.benchmark = benchmark + self.covariance_forecaster = HistoricalFactorizedCovariance(**kwargs) + self._market_vector = None def initialize_estimator( # pylint: disable=arguments-differ self, universe, **kwargs): @@ -242,7 +107,7 @@ def initialize_estimator( # pylint: disable=arguments-differ :param kwargs: Other unused arguments to :meth:`initialize_estimator`. :type kwargs: dict """ - self.market_vector = cp.Parameter(len(universe)-1) + self._market_vector = cp.Parameter(len(universe)-1) def values_in_time( # pylint: disable=arguments-differ self, past_volumes, **kwargs): @@ -253,18 +118,15 @@ def values_in_time( # pylint: disable=arguments-differ :param kwargs: Unused arguments passed to :meth:`values_in_time`. :type kwargs: dict """ - tmp = past_volumes.iloc[-self.window:].mean() - tmp /= sum(tmp) - # tmp = self.benchmark.current_value.iloc[:-1] - tmp2 = self.covarianceforecaster.current_value @ ( - self.covarianceforecaster.current_value.T @ tmp) - # print(tmp2) - self.market_vector.value = np.array(tmp2) + factorized_covariance = self.covariance_forecaster.current_value + bm = self.benchmark.current_value.iloc[:-1] + self._market_vector.value = np.array( + factorized_covariance @ (factorized_covariance.T @ bm)) def _compile_constr_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): """Compile left hand side of the constraint expression.""" - return w_plus[:-1].T @ self.market_vector + return w_plus[:-1].T @ self._market_vector def _rhs(self): """Compile right hand side of the constraint expression.""" diff --git a/cvxportfolio/costs.py b/cvxportfolio/costs.py index 296e44eb7..0bb393430 100644 --- a/cvxportfolio/costs.py +++ b/cvxportfolio/costs.py @@ -11,29 +11,64 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -"""This module implements cost functions used by optimization-based policies. +"""This module defines cost models for both simulation and optimization. -Currently these are two: :class:`StocksTransactionCost` and -:class:`StocksHoldingCost`. +We implement two types of costs, as discussed in the paper: +:class:`TransactionCost`, defined in :paper:`section 2.3 `, and +:class:`HoldingCost`, defined in :paper:`section 2.4 `. We also +provide versions of each that are specialized to the stock market, +:class:`StocksTransactionCost` and :class:`StocksHoldingCost`: these have a +default values that are typical for liquid stocks in the US. -The default parameters are chosen to approximate real costs for the stock -market as well as possible. +The latter two are included by default in +:class:`cvxportfolio.StockMarketSimulator`, the market simulator specialized to +the stock market, which is used throughout the :doc:`examples `. So, +the example back-tests all include these costs, unless otherwise specified. + +The cost objects have the same user interface when you use them as simulator +costs or optimization costs. For example, to include an annualized +borrowing cost of 10% (as opposed to the default 5%) in a back-test, you do + +.. code-block:: python + + simulator = cvx.MarketSimulator( + universe, costs=[cvx.HoldingCost(short_fees=10)]) + backtest_result_with_borrow_fee = simulator.backtest(policy) + +And to include the same borrow cost penalty in an optimization-based policy, +you do + +.. code-block:: python + + policy = cvx.SinglePeriodOptimization( + objective = cvx.ReturnsForecast() + - 0.5 * cvx.FullCovariance() + - cvx.HoldingCost(short_fees=10), + constraints = [cvx.LeverageLimit(3)]) + +As done throughout the library, you can pass data in the form of a Python +scalar (like 10), a Pandas Series indexed by time, by the assets, or a +Pandas DataFrame indexed by time and with the assets as columns; see the +:ref:`manual page on passing data `. """ import copy +import warnings from numbers import Number import cvxpy as cp import numpy as np import pandas as pd -from .constraints import (CostInequalityConstraint, EqualityConstraint, - InequalityConstraint) +from .constraints.base_constraints import (CostInequalityConstraint, + EqualityConstraint, + InequalityConstraint) from .errors import ConvexityError, ConvexSpecificationError -from .estimator import CvxpyExpressionEstimator, DataEstimator -from .hyperparameters import HyperParameter -from .utils import (average_periods_per_year, - periods_per_year_from_datetime_index, resample_returns) +from .estimator import (CvxpyExpressionEstimator, DataEstimator, Estimator, + SimulatorEstimator) +from .forecast import HistoricalMeanVolume, HistoricalStandardDeviation +from .hyperparameters import HyperParameter, _resolve_hyperpar +from .utils import periods_per_year_from_datetime_index, resample_returns __all__ = ["HoldingCost", "TransactionCost", "SoftConstraint", "StocksTransactionCost", "StocksHoldingCost", "TcostModel", @@ -43,8 +78,12 @@ class Cost(CvxpyExpressionEstimator): # pylint: disable=abstract-method """Base class for cost objects (and also risks). - Here there is some logic used to implement the algebraic operations. - See also :class:`CombinedCost`. + You should derive from this class to define an objective term for + optimization-based policies, like a risk model. + + The base class itself defines logic used for the algebraic operations, and + to take inequalities of a cost object (which results in a Cvxportfolio + :doc:`constraint ` object). """ def __mul__(self, other): @@ -129,6 +168,7 @@ def __init__(self, costs, multipliers): self.costs = costs self.multipliers = multipliers # this is changed by WorstCaseRisk before compiling to Cvxpy + # TODO this can be problematic, also CostConstraint needs to act on it self.do_convexity_check = True def __add__(self, other): @@ -201,8 +241,7 @@ def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): """ expression = 0 for multiplier, cost in zip(self.multipliers, self.costs): - add = (multiplier.current_value - if hasattr(multiplier, 'current_value') else multiplier) *\ + add = _resolve_hyperpar(multiplier) *\ cost.compile_to_cvxpy(w_plus, z, w_plus_minus_w_bm) if not add.is_dcp(): raise ConvexSpecificationError(cost * multiplier) @@ -260,6 +299,10 @@ def _copy_keeping_multipliers(self): class SoftConstraint(Cost): """Soft constraint cost. + This can be applied to most :doc:`constraint objects `, + as discussed in :ref:`its section of the constraints documentation + `. + :param constraint: Cvxportfolio constraint instance whose violation we penalize. :type constraint: cvxportfolio.constraints.EqualityConstraint or @@ -317,67 +360,173 @@ def _annual_percent_to_per_period(value, ppy): return resample_returns(returns=value/100, periods=ppy) -class SimulatorCost: - """Cost class that can be used by a MarketSimulator.""" +class SimulatorCost( # pylint: disable=abstract-method + SimulatorEstimator, Cost): + """Cost class that can be used by :class:`cvxportfolio.MarketSimulator`. + + This is the base class of both :class:`HoldingCost` and + :class:`TransactionCost`. + + This class derives from :class:`Cost` and + :class:`cvxportfolio.estimator.SimulatorEstimator`. + It implements the :meth:`simulate` method (which is abstract in + :class:`cvxportfolio.estimator.SimulatorEstimator`). The implementation + uses the CVXPY compiled expression of the (optimization) cost to evaluate + the cost in simulation, so we're sure the algebra is (exactly) the same. + Of course the CVXPY expression operates on weights, so holdings and trades + are divided by the portfolio value, and the result is multiplied by the + portfolio value. If you implement a custom simulator's cost and prefer + to implement the cost expression directly you can derive straight from + :class:`cvxportfolio.estimator.SimulatorEstimator`. Look at the + code of the cost classes we implement if in doubt. Every operation that + is different in simulation and in optimization is wrapped in a + :class:`cvxportfolio.estimator.SimulatorEstimator` which implements + different :meth:`values_in_time` and :meth:`simulate` respectively. + """ - def simulate(self, *args, **kwargs): - """Simulate cost, used by market simulator. + def initialize_estimator( # pylint: disable=arguments-differ + self, universe, **kwargs): + """Initialize cost by compiling its CVXPY expression (if applies). - Look at its invocation in ``MarketSimulator`` for its list of - arguments. + This must be called by derived classes if you want to use + an internal CVXPY expression to evaluate the simulator cost. - Cost classes that are meant to be used in the simulator - should implement this. + :param universe: Current trading universe. + :type universe: pd.Index + :param kwargs: Other unused arguments to :meth:`initialize_estimator`. + :type kwargs: dict + """ + # pylint: disable=attribute-defined-outside-init + if hasattr(self, 'compile_to_cvxpy'): + self._w_plus = cp.Variable(len(universe)) + self._z = cp.Variable(len(universe)) + self._w_plus_minus_w_bm = cp.Variable(len(universe)) + self._cvxpy_expression = self.compile_to_cvxpy( + w_plus = self._w_plus, z = self._z, + w_plus_minus_w_bm = self._w_plus_minus_w_bm) + + def simulate( # pylint: disable=arguments-differ,too-many-arguments + self, t, u, h_plus, past_volumes, + past_returns, current_prices, + current_weights, current_portfolio_value, **kwargs): + """Simulate the cost in the market simulator (not optimization). + + Cost classes that are meant to be used in the simulator can + implement this. The arguments to this are the same as for + :meth:`cvxportfolio.estimator.Estimator.values_in_time` plus the + trades vector and post-trade allocations. - :param args: Positional arguments. - :type args: tuple - :param kwargs: Keyword arguments. + :param t: Current timestamp. + :type t: pandas.Timestamp + :param u: Trade vector in cash units requested by the policy. + If the market simulator implements rounding by number of shares + and/or canceling trades on assets whose volume for the period + is zero, this is after those transformations. + :type u: pandas.Series + :param h_plus: Post-trade holdings vector. + :type h_plus: pandas.Series + :param past_returns: Past market returns (including cash). + :type past_returns: pandas.DataFrame + :param past_volumes: Past market volumes, or None if not available. + :type past_volumes: pandas.DataFrame or None + :param current_prices: Current (open) prices, or None if not available. + :type current_prices: pandas.Series or None + :param current_weights: Current allocation weights (before trading). + :type current_weights: pandas.Series + :param current_portfolio_value: Current total value of the portfolio + in cash units, before costs. + :type current_portfolio_value: float + :param kwargs: Other unused arguments to + :meth:`cvxportfolio.estimator.SimulatorEstimator.simulate`. :type kwargs: dict + + :returns: Simulated realized value of the cost, in cash accounting + units (e.g., US dollars). Typically a positive number: it is + subtracted from the cash account. + :rtype: float """ - raise NotImplementedError # pragma: no cover + self.values_in_time( + t=t, past_volumes=past_volumes, past_returns=past_returns, + current_prices=current_prices, current_weights=current_weights, + current_portfolio_value=current_portfolio_value) -class HoldingCost(Cost, SimulatorCost): - r"""Generic holding cost model, as described in page 11 of the book. + self._w_plus.value = h_plus.values / current_portfolio_value + self._z.value = u.values / current_portfolio_value + return self._cvxpy_expression.value * current_portfolio_value - There are two ways to use this class. Either in the costs attribute - of a :class:`MarketSimulator`, in which case the costs are evaluated - on the post-trade dollar positions :math:`h^+_t`. Or, - as part of the objective function (or as a constraint!) - of a :class:`SinglePeriodOptimization` - or :class:`MultiPeriodOptimization` trading policy, in which case they - are evaluated on the post-trade weights :math:`w_t + z_t`. The mathematical - form is the same (see the discussion at pages 11-12 of the book). - This particular implementation represents the following objective terms - (expressed here in terms of the post-trade dollar positions): +class YearDividedByTradingPeriod(SimulatorEstimator): + """Length of a year divided by this trading period's. - .. math:: + This is used by :class:`HoldingCost` to model its separate behaviors in + optimization and simulation. - s^T_t {(h^+_t)}_- + l^T_t {(h^+_t)}_+ - d^T_t h^+_t + :param periods_per_year: If provided, overrides internal estimation of + number of periods per year in optimization. Has no effect in + simulation. Default None. + :type periods_per_year: int or None + """ - where :math:`s_t` are the (short) borrowing fees, - :math:`l_t` are the fees on long positions, - and :math:`d_t` are dividend rates (their sign is flipped because - the costs are deducted from the cash account at each period). See - below for their precise definition. + def __init__(self, periods_per_year=None): + self.periods_per_year = periods_per_year - Example usage as simulator cost: + def values_in_time( # pylint: disable=arguments-differ + self, past_returns, **kwargs): + """Evaluate in optimization. - .. code-block:: python + :param past_returns: Past market returns, we use its index to estimate + the typical length of a trading period. + :type past_returns: pd.DataFrame + :param kwargs: Other unused arguments to + :meth:`Estimator.values_in_time`. + :type kwargs: dict - borrow_fees = pd.Series([5, 10], index=['AAPL', 'ZM']) - simulator = cvx.MarketSimulator(['AAPL', 'ZM'], - costs=cvx.HoldingCost(short_fees=borrow_fees)) + :returns: Trading periods per year. + :rtype: float + """ + if self.periods_per_year is None: + return periods_per_year_from_datetime_index(past_returns.index) + return self.periods_per_year - Example usage as trading policy cost: + def simulate( # pylint: disable=arguments-differ + self, t, t_next, **kwargs): + """Evaluate in simulation. - .. code-block:: python + We use the real time length of the trading period, which we know. - objective = cvx.ReturnsForecast() - 5 * cvx.FullCovariance() \ - - cvx.HoldingCost(short_fees=10) - constraints = [cvx.LeverageLimit(3)] - policy = cvx.SinglePeriodOptimization(objective, constraints) + :param t: Current timestamp. + :type t: pd.Timestamp + :param t_next: Timestamp of next trading period. + :type t_next: pd.Timestamp + :param kwargs: Other unused arguments to + :meth:`SimulatorEstimator.simulate`. + :type kwargs: dict + + :returns: Trading periods per year. + :rtype: float + """ + return pd.Timedelta('365.24d') / (t_next - t) + + +class HoldingCost(SimulatorCost): + r"""Generic holding cost model. + + This is a generalization of the model described in :paper:`section 2.4 + ` of the paper (which instead corresponds to + :class:`StocksHoldingCost`). + + This represents the following objective term, + expressed in terms of the post-trade dollar positions: + + .. math:: + + s^T_t {(h^+_t)}_- + l^T_t {(h^+_t)}_+ - d^T_t h^+_t + + where :math:`s_t` are the (short) borrowing fees, + :math:`l_t` are the fees on long positions, + and :math:`d_t` are dividend rates (their sign is flipped because + the costs are deducted from the cash account at each period). :param short_fees: Short borrowing fees expressed as annual percentage; you can provide them as a float (constant for all times and all @@ -385,34 +534,28 @@ class HoldingCost(Cost, SimulatorCost): assets but varying in time) or by assets' names (constant in time but varying across assets), or a :class:`pd.DataFrame` indexed by time and whose columns are the assets' names, if varying both - in time and across assets. If you use a time-indexed pandas object - be careful to include all times at which a backtest is evaluated - (otherwise you'll get a :class:`MissingValueError` exception). If - `None`, the term is ignored. + in time and across assets. See :ref:`the manual page on passing data + `. If `None` (the default) the term is + ignored. :type short_fees: float, pd.Series, pd.DataFrame or None :param long_fees: Fees on long positions expressed as annual percentage; same convention as above applies. :type long_fees: float, pd.Series, pd.DataFrame or None - :param dividends: Dividend rates per period. Dividends are already - included in the market returns by the default data interface - (based on Yahoo Finance "adjusted prices") and thus this parameter - should not be used in normal circumstances. + :param dividends: Dividend rates per period. Same conventions as above. + Our default data interface already includes the dividend rates in the + market returns (*i.e.*, uses total returns, from the adjusted + prices). If, however, you provide your own market returns that do not + include dividends, you may use this. Default None, which disables the + term. :type dividends: float, pd.Series, pd.DataFrame or None - :param periods_per_year: How many trading periods are there in a year, for - example 252 (for trading days in the US). This is - only relevant when using this class as part of a trading policy. If - you leave this to `None` the following happens. - The value of periods per year are estimated at each period by looking - at the past market returns at that point in time: the number of past - periods is divided by the timestamp of the most recent period minus - the timestamp of the first period (in years). That works well in most - cases where there is enough history (say, a few years) and saves the - user from having to manually enter this. - If instead you use this object - as a cost in a market simulator the parameter has no effect. (The - length of each trading period in the simulation is known and so the - per-period rates are evaluated exactly. For example, the rate over a - weekend will be higher than overnight.) + :param periods_per_year: Number of trading period per year, used to obtain + the holding cost per-period from the annualized percentages. Only + relevant in optimization, since in simulation we know the exact + duration of each period. If left to the default, None, uses the + estimated average length of each period from the historical data. + Note that, in simulation, the holding cost is applied to the actual + length of the period between trading times, so for example it will + be higher over a weekend than between weekdays. :type periods_per_year: float or None """ @@ -425,7 +568,10 @@ def __init__(self, short_fees=None, long_fees=None, dividends=None, long_fees) self.dividends = None if dividends is None else DataEstimator( dividends) - self.periods_per_year = periods_per_year + self.periods_per_year = YearDividedByTradingPeriod(periods_per_year) + self._short_fees_parameter = None + self._long_fees_parameter = None + self._dividends_parameter = None def initialize_estimator( # pylint: disable=arguments-differ self, universe, **kwargs): @@ -452,8 +598,11 @@ def initialize_estimator( # pylint: disable=arguments-differ if self.dividends is not None: self._dividends_parameter = cp.Parameter(len(universe) - 1) + # Used by SimulatorCost + super().initialize_estimator(universe=universe, **kwargs) + def values_in_time( # pylint: disable=arguments-differ - self, past_returns, **kwargs): + self, **kwargs): """Update cvxpy parameters. We compute the estimate of periods per year from past returns @@ -461,32 +610,28 @@ def values_in_time( # pylint: disable=arguments-differ with the current values of the user-provided data, transformed to per-period. - :param past_returns: Past market returns (includes cash). - :type past_returns: pandas.DataFrame :param kwargs: Other unused arguments to :meth:`values_in_time`. :type kwargs: dict """ - if not ((self.short_fees is None) - and (self.long_fees is None) - and (self.dividends is None)): - ppy = periods_per_year_from_datetime_index( - past_returns.index) if self.periods_per_year is None else \ - self.periods_per_year - if self.short_fees is not None: self._short_fees_parameter.value = np.ones( - past_returns.shape[1]-1) * _annual_percent_to_per_period( - self.short_fees.current_value, ppy) + self._short_fees_parameter.size + ) * _annual_percent_to_per_period( + self.short_fees.current_value, + self.periods_per_year.current_value) if self.long_fees is not None: self._long_fees_parameter.value = np.ones( - past_returns.shape[1]-1) * _annual_percent_to_per_period( - self.long_fees.current_value, ppy) + self._long_fees_parameter.size + ) * _annual_percent_to_per_period( + self.long_fees.current_value, + self.periods_per_year.current_value) if self.dividends is not None: self._dividends_parameter.value =\ - np.ones(past_returns.shape[1]-1) * self.dividends.current_value + np.ones(self._dividends_parameter.size + ) * self.dividends.current_value def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): """Compile cost to cvxpy expression. @@ -516,112 +661,188 @@ def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): expression -= self._dividends_parameter.T @ w_plus[:-1] assert expression.is_convex() - return expression - def simulate(self, t, h_plus, t_next, **kwargs): - """Simulate cost in a MarketSimulator. - TODO: make sure simulator cost sign convention is - the same as optimization cost! OK. - TODO: make sure DataEstimator returns np.array of correct size! ~OK - TODO: make sure simulator cost estimators are recursively evaluated! - ~OK +class StocksHoldingCost(HoldingCost): + r"""Holding cost specialized to stocks (only borrow fee). - :param t: Current time. - :type t: pandas.Timestamp - :param h_plus: Post-trade holdings. - :type h_plus: numpy.array or pandas.Series - :param t_next: Next period's time. - :type t_next: pandas.Timestamp - :param kwargs: Unused arguments to - :meth:`cvxportfolio.costs.SimulatorCost.simulate`. - :type kwargs: dict - - :returns: Cost in units of value, positive (is subtracted from cash - account). - :rtype: float - """ - - year_divided_by_period = pd.Timedelta('365.24d') / (t_next - t) - - cost = 0. + This implements the simple model described in :paper:`section 2.4 + ` of the paper, refer to that for more details. The cost is, + in terms of the post-trade dollar positions: - # TODO this is a temporary fix, - # we should plug this into a recursive tree - for est in [self.short_fees, self.long_fees, self.dividends]: - if est is not None: - est.initialize_estimator_recursive(universe=h_plus.index, - trading_calendar=[t]) - est.values_in_time_recursive(t=t) + .. math:: - if self.short_fees is not None: - cost += np.sum(_annual_percent_to_per_period( - self.short_fees.current_value, year_divided_by_period) * ( - -np.minimum(h_plus[:-1], 0.))) + s^T_t {(h^+_t)}_-, - if self.long_fees is not None: - cost += np.sum(_annual_percent_to_per_period( - self.long_fees.current_value, - year_divided_by_period) * np.maximum(h_plus[:-1], 0.)) + *i.e.*, a simple borrow fee applies to the short positions. This class is a + simplified version of :class:`HoldingCost`: it drops the ``long_fees`` and + ``dividends`` parameters and keeps only ``short_fees`` with a default + value of 5, *i.e.*, :math:`5\%` annualized borrowing fee. That is a rough + (optimistic) approximation of the cost of shorting liquid US stocks. This + cost is included **by default** in :class:`StockMarketSimulator`, the + market simulator specialized to US (liquid) stocks. - if self.dividends is not None: - # we have a minus sign because costs are deducted from PnL - cost -= np.sum(self.dividends.current_value * h_plus[:-1]) + :param short_fees: Same as in :class:`HoldingCost`: annualized borrow fee + in percent, can be asset- and/or period-specific. Default 5. + :type short_fees: float, pd.Series, pd.DataFrame or None + """ - return cost + def __init__(self, short_fees=5): + super().__init__(short_fees=short_fees) -class StocksHoldingCost(HoldingCost, SimulatorCost): - r"""Holding cost specialized to stocks. +class VolumeHatOrRealized(SimulatorEstimator): + r"""Predictor of market volumes used by :class:`TransactionCost`. - This implements the simple model describe at page 11 of the book, *i.e.* - the cost (in terms of the post-trade dollar positions): + This is used to model the different behaviors in optimization and + simulation. (In the latter case we use the realized volumes.) - .. math:: + :param volume_hat: Estimator of the :math:`\hat V` in optimization. Unused + in simulation. + :type volume_hat: cvx.estimator.Estimator + """ - s^T_t {(h^+_t)}_- + def __init__(self, volume_hat): + self.volume_hat = volume_hat - This class is a specialized version of :class:`HoldingCost`, and you should - read its documentation for all details. Here we - drop most of the parameters and use the default values explained above. - We use a default value of :math:`5\%` annualized borrowing - fee which is a rough (optimistic) approximation of the cost - of shorting liquid US stocks. This cost is included **by default** - in :class:`StockMarketSimulator`, the market simulator specialized - to US (liquid) stocks. + def values_in_time( # pylint: disable=arguments-differ + self, **kwargs): + """Evaluate in optimization. - :param short_fees: Same as in :class:`HoldingCost`. - :type short_fees: float, pd.Series, pd.DataFrame or None - """ + :param kwargs: All arguments to + :meth:`estimator.Estimator.values_in_time`. + :type kwargs: dict - def __init__(self, short_fees=5): + :returns: Current estimate of market volumes. + :rtype: np.ndarray + """ + return self.volume_hat.current_value - super().__init__(short_fees=short_fees) + def simulate( # pylint: disable=arguments-differ + self, current_volumes, **kwargs): + """Evaluate in simulation. + :param current_volumes: Current market volumes. + :type current_volumes: pd.Series + :param kwargs: All other arguments to + :meth:`estimator.SimulatorEstimator.simulate`. + :type kwargs: dict -class TransactionCost(Cost): - """This is a generic model for transaction cost of financial assets. + :raises SyntaxError: If the market volumes are not present in the + market data. - Currently it is not meant to be used directly. Look at - :class:`StocksTransactionCost` for its version specialized - to the stock market. + :returns: Current market volumes. + :rtype: np.ndarray + """ + assert self.volume_hat.current_value is None + if current_volumes is None: + raise SyntaxError( + "If you don't provide volumes you should set b to None" + " in the market simulator's TransactionCost object.") + return current_volumes.values + +class TransactionCost(SimulatorCost): + r"""This is a generic model for transaction cost of financial assets. + + .. versionadded:: 1.2.0 + + This was significantly improved with new options; it was undocumented + before. + + It is described in :paper:`section 2.3 ` of the paper. + + The model is, separated on a single asset (equation 2.2 in the paper) + + .. math :: + + a | x | + b \sigma \frac{{ | x |}^{3/2}}{V^{1/2}} + c x + + where :math:`x` is the dollar traded quantity, + :math:`a` is a coefficient representing fees proportional to the absolute + value traded, like half the bid-ask spread, + :math:`b` is a coefficient that multiplies the market impact term, + typically of the order of 1, + :math:`\sigma` is an estimate of the volatility of the asset returns over + recent periods, + :math:`V` is the market volume traded over the period for the asset + and :math:`c` is a coefficient used to introduce bias in the model, + for example the negative of open-to-close return (if transactions are + executed at close), or the negative of the open-to-VWAP return (if + transactions are executed at the volume-weighted average price). + + In optimization the realized market volume :math:`V` is not known and + we use its forecast :math:`\hat V` instead. + + As done throughout the library, this implementation accepts either + :ref:`user-provided data ` for the various parts of the + model, or uses built-in :doc:`forecaster classes ` to do the + heavy-lifting. + + :param a: Coefficients of the first term of the transaction cost model, for + example half the bid-ask spread, brokerage fees proportional to the + size of each trade, .... :ref:`Usual conventions on passing data + ` apply. Default None, which disables the term. + :type a: float, pd.Series, pd.DataFrame, or None + :param b: Coefficients of the second term of the transaction cost model, + typically of the order of 1. Same conventions. Default None, which + disables the term and invalidates the following three parameters. + :type b: float, pd.Series, pd.DataFrame, or None + :param volume_hat: Forecast of the market volumes, has only effect in + optimization (in simulation the actual volumes are used). You can + pass a DataFrame of externally computed forecasts, or use the default + :class:`cvxportfolio.forecast.HistoricalMeanVolume` (or another + forecaster) to compute the forecasts at each point in time. If you pass + a class, like the default, it is instantiated with parameter + ``rolling`` equal to 1 year, if you prefer a different estimation + lenght you can instantiate the forecaster and pass the instance. + :type volume_hat: float, pd.Series, pd.DataFrame, cvx.forecast.BaseForecast + class or instance + :param sigma: Externally computed forecasts, or forecaster, of the market + returns' volatilities :math:`\sigma`. The default is + :class:`cvxportfolio.forecast.HistoricalStandardDeviation`. If you + pass a class, like the default, it is instantiated with parameter + ``rolling`` equal to 1 year, if you prefer a different estimation + lenght you can instantiate the forecaster and pass the instance. + :type sigma: float, pd.Series, pd.DataFrame, cvx.forecast.BaseForecast + class or instance + :param exponent: Exponent of the second term of the model, default + :math:`3/2`. (In the model above, this exponent applies to + :math:`|z|`, and this exponent minus 1 applies to the denominator + term :math:`V`). You can use any float larger than 1. + :type exponent: float + :param c: Coefficients of the third term of the transaction cost model. + If None, the default, the term is ignored. + :type c: float, pd.Series, pd.DataFrame, or None """ - def __init__(self, a=None, pershare_cost=None, b=0., window_sigma_est=None, - window_volume_est=None, exponent=None): + def __init__( # pylint: disable=too-many-arguments + self, a=0., b=None, volume_hat=HistoricalMeanVolume, + sigma=HistoricalStandardDeviation, exponent=1.5, c=None): self.a = None if a is None else DataEstimator(a) - self.pershare_cost = None if pershare_cost is None else DataEstimator( - pershare_cost) self.b = None if b is None else DataEstimator(b) - self.window_sigma_est = window_sigma_est - self.window_volume_est = window_volume_est - self.exponent = exponent + self.c = None if c is None else DataEstimator( + c, compile_parameter=True) - # these are overwritten by parameters defined below - self.first_term_multiplier = None - self.second_term_multiplier = None + if self.b is not None: + if isinstance(volume_hat, type): + volume_hat = volume_hat( + rolling=pd.Timedelta('365.24d')) + self.market_volumes = VolumeHatOrRealized(DataEstimator(volume_hat)) + + if isinstance(sigma, type): + sigma = sigma( + rolling=pd.Timedelta('365.24d')) + self.sigma = DataEstimator(sigma) + + if exponent < 1.: + raise SyntaxError( + 'Exponent should be >=1, otherwise the' + ' transaction cost model is not convex.') + self.exponent = exponent + self._first_term_multiplier = None + self._second_term_multiplier = None def initialize_estimator( # pylint: disable=arguments-differ self, universe, **kwargs): @@ -632,210 +853,245 @@ def initialize_estimator( # pylint: disable=arguments-differ :param kwargs: Other unused arguments to :meth:`initialize_estimator`. :type kwargs: dict """ - if self.a is not None or self.pershare_cost is not None: - self.first_term_multiplier = cp.Parameter( + if self.a is not None: + self._first_term_multiplier = cp.Parameter( len(universe)-1, nonneg=True) if self.b is not None: - self.second_term_multiplier = cp.Parameter( + self._second_term_multiplier = cp.Parameter( len(universe)-1, nonneg=True) + # SimulatorCost + super().initialize_estimator(universe=universe, **kwargs) + def values_in_time( # pylint: disable=arguments-differ - self, current_portfolio_value, past_returns, past_volumes, - current_prices, **kwargs): + self, current_portfolio_value, **kwargs): """Update cvxpy parameters. - :raises SyntaxError: If the prices are missing from the market data. - :param current_portfolio_value: Current total value of the portfolio. :type current_portfolio_value: float - :param past_returns: Past market returns (includes cash). - :type past_returns: pandas.DataFrame - :param past_volumes: Past market volumes. - :type past_volumes: pandas.DataFrame - :param current_prices: Current open prices. - :type current_prices: pandas.Series :param kwargs: Other unused arguments to :meth:`values_in_time`. :type kwargs: dict """ + if self.a is not None: + self._first_term_multiplier.value = np.ones( + self._first_term_multiplier.size) * self.a.current_value - tmp = 0. + if self.b is not None: - if self.a is not None: - _ = self.a.current_value - tmp += _ *\ - np.ones(past_returns.shape[1]-1) if np.isscalar(_) else _ - if self.pershare_cost is not None: - if current_prices is None: - raise SyntaxError("If you don't provide prices you", - " should set pershare_cost to None") - assert not np.any(current_prices.isnull()) - # assert not np.any(current_prices == 0.) - tmp += self.pershare_cost.current_value / current_prices.values + self._second_term_multiplier.value =\ + (self.b.current_value * self.sigma.current_value + ) / ((self.market_volumes.current_value + 1E-8) + / current_portfolio_value) ** (self.exponent - 1) - if self.a is not None or self.pershare_cost is not None: - self.first_term_multiplier.value = tmp + def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): + """Compile cost to cvxpy expression. + :param w_plus: Post-trade weights. + :type w_plus: cvxpy.Variable + :param z: Trade weights. + :type z: cvxpy.Variable + :param w_plus_minus_w_bm: Post-trade weights minus benchmark + weights. + :type w_plus_minus_w_bm: cvxpy.Variable + + :returns: Cvxpy expression. + :rtype: cvxpy.expression + """ + expression = 0 + if self.a is not None: + expression += cp.abs(z[:-1]) @ self._first_term_multiplier + assert expression.is_convex() if self.b is not None: + expression += ( + cp.abs(z[:-1]) ** (self.exponent) + ) @ self._second_term_multiplier + assert expression.is_convex() + if self.c is not None: + expression += cp.sum(z[:-1] * self.c.parameter) + return expression - if (self.window_sigma_est is None) or\ - (self.window_volume_est is None): - ppy = periods_per_year_from_datetime_index(past_returns.index) - windowsigma = ppy if ( - self.window_sigma_est is None) else self.window_sigma_est - windowvolume = ppy if ( - self.window_volume_est is None) else self.window_volume_est +# Backward compatibility before 1.2.0 - # TODO refactor this with forecast.py logic - sigma_est = np.sqrt( - (past_returns.iloc[-windowsigma:, :-1]**2).mean()).values - volume_est = past_volumes.iloc[-windowvolume:].mean().values + 1E-8 +class SimpleSigmaEst(SimulatorEstimator): + """Simple estimator of sigma for backward compatibility.""" - self.second_term_multiplier.value =\ - self.b.current_value * sigma_est * (current_portfolio_value / - volume_est) ** ( - (2 if self.exponent is None else self.exponent) - 1) + def __init__(self, window_sigma_est): + warnings.warn( + "Passing a number to window_sigma_est is deprecated, " + "You should use a forecaster like the default " + "HistoricalStandardDeviation, instantiating with" + " its 'rolling' argument to choose the length of estimation.", + DeprecationWarning) + self.window_sigma_est = window_sigma_est - def simulate( - self, t, u, past_returns, current_returns, current_volumes, - current_prices, **kwargs): - """Simulate transaction cost in cash units. + def values_in_time( + # pylint: disable=arguments-differ + self, past_returns, **kwargs): + """Compute historical sigma. - :raises SyntaxError: If the market returns are not available in the - market data. + :param past_returns: Past market returns. + :type past_returns: pd.DataFrame + :param kwargs: Other unused arguments. + :type kwargs: dict - :param t: Current timestamp. - :type t: pandas.Timestamp - :param u: Trades vector. - :type u: pandas.Series - :param past_returns: Dataframe of past market returns. - :type past_returns: pandas.DataFrame - :param current_returns: Current period's market returns. - :type current_returns: pandas.Series - :param current_volumes: Current market volumes. - :type current_volumes: pandas.Series or None - :param current_prices: Current market prices. - :type current_prices: pandas.Series or None - :param kwargs: Unused arguments passed by :class:`MarketSimulator`. + :returns: Estimated sigma + :rtype: np.array + """ + + return np.sqrt( + (past_returns.iloc[-self.window_sigma_est:, :-1]**2).mean()).values + + def simulate( + # pylint: disable=arguments-differ + self, current_returns, past_returns, **kwargs): + """Compute historical sigma. + + :param current_returns: Current market returns. + :type current_returns: pd.Series + :param past_returns: Past market returns. + :type past_returns: pd.DataFrame + :param kwargs: Other unused arguments. :type kwargs: dict - :returns: Transaction cost for this period in cash units. - :rtype: float + :returns: Estimated sigma + :rtype: np.array """ - result = 0. - if self.pershare_cost is not None: - if current_prices is None: - raise SyntaxError( - "If you don't provide prices you should" - " set pershare_cost to None") - result += self.pershare_cost.values_in_time_recursive(t=t) * int( - sum(np.abs(u.iloc[:-1] + 1E-6) / current_prices.values)) + return np.std(pd.concat( + [past_returns.iloc[-self.window_sigma_est + 1:, :-1], + pd.DataFrame(current_returns.iloc[:-1]).T], axis=0), axis=0).values - if self.a is not None: - result += sum(self.a.values_in_time_recursive(t=t) - * np.abs(u.iloc[:-1])) + # sum = (past_returns.iloc[-self.window_sigma_est-1:, :-1]**2).sum() + # count = past_returns.iloc[-self.window_sigma_est-1:, :-1].count() + # sum += current_returns**2 + # count += ~current_returns.isnull() + # return np.sqrt(sum / count).values - if self.b is not None: +# Backward compatibility before 1.2.0 - if self.window_sigma_est is None: - windowsigma = average_periods_per_year( - num_periods=len(past_returns)+1, - first_time=past_returns.index[0], last_time=t) - else: - windowsigma = self.window_sigma_est +class SimpleVolumeEst(Estimator): + """Simple estimator of volume for backward compatibility.""" - exponent = (1.5 if self.exponent is None else self.exponent) + def __init__(self, window_volume_est): + warnings.warn( + "Passing a number to window_volume_est is deprecated, " + "You should use a forecaster like the default " + "HistoricalMeanVolume, instantiating with" + " its 'rolling' argument to choose the length of estimation.", + DeprecationWarning) - sigma = np.std(pd.concat( - [past_returns.iloc[-windowsigma + 1:, :-1], - pd.DataFrame(current_returns.iloc[:-1]).T], axis=0), axis=0) - if current_volumes is None: - raise SyntaxError( - "If you don't provide volumes you should set b to None" - f" in the {self.__class__.__name__} simulator cost") - # we add 1E-8 to the volumes to prevent 0 volumes error - # (trades are cancelled on 0 volumes) - result += (np.abs(u.iloc[:-1])**exponent) @ ( - self.b.values_in_time_recursive(t=t) * - sigma / ((current_volumes + 1E-8) ** ( - exponent - 1))) - - assert not np.isnan(result) - assert not np.isinf(result) + self.window_volume_est = window_volume_est - return result + def values_in_time( + # pylint: disable=arguments-differ + self, past_volumes, **kwargs): + """Compute historical mean of volumes. - def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): - """Compile cost to cvxpy expression. + :param past_volumes: Past market volumes + :type past_volumes: pd.DataFrame + :param kwargs: Other unused arguments. + :type kwargs: dict - :param w_plus: Post-trade weights. - :type w_plus: cvxpy.Variable - :param z: Trade weights. - :type z: cvxpy.Variable - :param w_plus_minus_w_bm: Post-trade weights minus benchmark - weights. - :type w_plus_minus_w_bm: cvxpy.Variable + :raises SyntaxError: If the market data does not contain volumes. - :returns: Cvxpy expression. - :rtype: cvxpy.expression + :returns: Estimated volume + :rtype: np.array """ - expression = 0 - if self.a is not None or self.pershare_cost is not None: - expression += cp.abs(z[:-1]).T @ self.first_term_multiplier - assert expression.is_convex() - if self.b is not None: - expression += (cp.abs(z[:-1]) ** ( - 2 if self.exponent is None else self.exponent) - ).T @ self.second_term_multiplier - assert expression.is_convex() - return expression + if past_volumes is None: + raise SyntaxError( + "If you don't provide market volumes you can not use the" + " market impact term of the transaction cost model.") + + return past_volumes.iloc[-self.window_volume_est:].mean().values class StocksTransactionCost(TransactionCost): - """A model for transaction costs of stocks. + """Simplified version of :class:`TransactionCost` for stocks. + + .. versionadded:: 1.2.0 + + We added the ``sigma`` and ``volume_hat`` parameters to support + user-provided values as well as forecasters with various parameters. + We deprecated the old way (``window_sigma_est`` and + ``window_volume_est``) in which that was done before. - See pages 10-11 in - `the book `_. - We don't include the short-term alpha term `c` here because it - can be expressed with a separate `ReturnsForecast` object. + This is included as a simulator cost by default (with default arguments) in + :class:`cvxportfolio.StockMarketSimulator`. - :param a: linear cost, which multiplies the absolute value of each trade. - This can model (half) the bid-ask spread, or any fee linear in the size - of a trade. + :param a: Same as in :class:`TransactionCost`, default 0. :type a: float or pd.Series or pd.DataFrame - :param pershare_cost: per-share trade cost, amount of dollars paid for - each share traded. + :param pershare_cost: Per-share cost: cash paid for each share traded. + Requires to know the prices of the stocks (they are present in the + default market data server + :class:`cvxportfolio.data.DownloadedMarketData`). Default 0.005. :type pershare_cost: float or pd.Series or pd.DataFrame - :param b: coefficient of the non-linear term of the transaction cost model, - which multiplies the estimated volatility for each stock. + :param b: Same as in :class:`TransactionCost`, default 1. :type b: float or pd.Series or pd.DataFrame - :param window_sigma_est: we use an historical rolling standard deviation to - estimate the average size of the return on a stock on each day, and - this multiplies the second term of the transaction cost model. See the - paper for an explanation of the model. Here you specify the length of - the rolling window to use. If None (the default) it uses a length of 1 - year (approximated with the data provided). + :param sigma: Same as in :class:`TransactionCost`. + :type sigma: float, pd.Series, pd.DataFrame, cvx.forecast.BaseForecast + class or instance + :param window_sigma_est: Deprecated, size of rolling window to + estimate historical standard deviations. If left to None (the default) + has no effect. Use instead the ``sigma`` parameter. :type window_sigma_est: int or None - :param window_volume_est: length of the window for the mean of past volumes - used as estimate of each period's volume. Has no effect on the - simulator version of this which uses the actual volume. If None (the - default) it uses a length of 1 year (approximated with the data - provided). - :type window_volume_est: int - :param exponent: exponent of the non-linear term, defaults (if set to - ``None``) to 1.5 for the simulator version, and 2 for the optimization - version (because it is more efficient numerically and the difference is - small, you can change it if you want). - :type exponent: float or None + :param volume_hat: Same as in :class:`TransactionCost`. + :type volume_hat: float, pd.Series, pd.DataFrame, cvx.forecast.BaseForecast + class or instance + :param window_volume_est: Deprecated, size of rolling window to estimate + the mean of past volumes. If left to None (the default) + has no effect. Use instead the ``volume_hat`` parameter. + :type window_volume_est: int or None + :param exponent: Same as in :class:`TransactionCost`. + :type exponent: float + :param c: Same as in :class:`TransactionCost`. + :type c: float or pd.Series or pd.DataFrame or None """ - def __init__(self, a=0., pershare_cost=0.005, b=1.0, window_sigma_est=None, - window_volume_est=None, exponent=1.5): + def __init__( # pylint: disable=too-many-arguments + self, a=0., pershare_cost=0.005, b=1.0, + volume_hat=HistoricalMeanVolume, sigma=HistoricalStandardDeviation, + exponent=1.5, c=None, window_sigma_est=None, window_volume_est=None): + + if window_sigma_est is not None: + sigma = SimpleSigmaEst(window_sigma_est) + + if window_volume_est is not None: + volume_hat = SimpleVolumeEst(window_volume_est) - super().__init__(a=a, pershare_cost=pershare_cost, b=b, - window_sigma_est=window_sigma_est, - window_volume_est=window_volume_est, - exponent=exponent) + super().__init__(# because we update it with pershare_cost + a= 0. if a is None else a, + b=b, c=c, exponent=exponent, + volume_hat=volume_hat, + sigma=sigma, + ) + + self.pershare_cost = DataEstimator(pershare_cost)\ + if pershare_cost is not None else None + + def values_in_time( # pylint: disable=arguments-renamed + self, current_prices, **kwargs): + """Update linear cost with per-share cost. + + :param current_prices: Current (open) market prices. + :type current_prices: pd.Series + :param kwargs: Other unused arguments to + :meth:`cvxportfolio.estimator.Estimator.values_in_time`. + :type kwargs: dict + + :raises SyntaxError: If market prices are not available and + pershare_cost is not None. + """ + + super().values_in_time(current_prices=current_prices, **kwargs) + + if self.pershare_cost is not None: + if current_prices is None: + raise SyntaxError("If the market data doesn't contain prices" + " you should set pershare_cost to None") + assert not np.any(current_prices.isnull()) + assert not np.any(current_prices == 0.) + self._first_term_multiplier.value += \ + self.pershare_cost.current_value / current_prices.values # Aliases diff --git a/cvxportfolio/estimator.py b/cvxportfolio/estimator.py index 71fe45cb5..73db12900 100644 --- a/cvxportfolio/estimator.py +++ b/cvxportfolio/estimator.py @@ -225,6 +225,92 @@ def __repr__(self): return lhs + core + rhs +class SimulatorEstimator(Estimator): + """Base class for estimators that are used by the market simulator. + + .. versionadded:: 1.2.0 + + This is currently used as the base class for + :class:`cvxportfolio.costs.SimulatorCost` but could implement in future + versions more operations done as part of a simulation (back-test) loop, + like filtering out trades (rejecting small ones, ...), or more. It allows + for nested evaluation like it's done by :meth:`values_in_time` and + :meth:`values_in_time_recursive`. Estimators that are used in the market + simulator should derive from this. Some examples are + :class:`DataEstimator`, which is used to select values (like borrow costs) + in simulations as well as in optimization, and + :class:`cvxportfolio.forecast.HistoricalStandardDeviation`, which is used + also in simulation by :class:`cvxportfolio.costs.TransactionCost`. + """ + + def simulate( # pylint: disable=too-many-arguments + self, t, t_next, u, h_plus, past_volumes, + past_returns, current_prices, current_returns, current_volumes, + current_weights, current_portfolio_value, **kwargs): + """Evaluate the estimator as part of a Market Simulator back-test loop. + + Cost classes that are meant to be used in the simulator can + implement this. The arguments to this are the same as for + :meth:`cvxportfolio.estimator.Estimator.values_in_time` plus the + realized returns and volumes in the period, and the trades requested + by the policy, .... + + :param t: Current timestamp. + :type t: pandas.Timestamp + :param u: Trade vector in cash units requested by the policy. + If the market simulator implements rounding by number of shares + and/or canceling trades on assets whose volume for the period + is zero, this is after those transformations. + :type u: pandas.Series + :param h_plus: Post-trade holdings vector. + :type h_plus: pandas.Series + :param past_returns: Past market returns (including cash). + :type past_returns: pandas.DataFrame + :param current_returns: Current period's market returns (including + cash). + :type current_returns: pandas.Series + :param past_volumes: Past market volumes, or None if not available. + :type past_volumes: pandas.DataFrame or None + :param current_volumes: Current period's market volumes, or None if not + available. + :type current_volumes: pandas.Series or None + :param current_prices: Current (open) prices, or None if not available. + :type current_prices: pandas.Series or None + :param current_weights: Current allocation weights (before trading). + :type current_weights: pandas.Series + :param current_portfolio_value: Current total value of the portfolio + in cash units, before costs. + :type current_portfolio_value: float + :param t_next: Timestamp of the next trading period. + :type t_next: pandas.Timestamp + :param kwargs: Reserved for future expansion. + :type kwargs: dict + + :returns: The current value of this instance. + :rtype: any object + """ + raise NotImplementedError # pragma: no cover + + def simulate_recursive(self, **kwargs): + """Evaluate simulator value(s) recursively on sub-estimators. + + :param kwargs: All parameters to :meth:`simulate` that are passed + to all elements contained in a simulator cost object. + :type kwargs: dict + + :returns: The current simulator value evaluated by this instance, if it + implements the :meth:`simulate` method and it returns + something there. + :rtype: numpy.array, pandas.Series, pandas.DataFrame, ... + """ + for _, subestimator in self.__dict__.items(): + if hasattr(subestimator, "simulate_recursive"): + subestimator.simulate_recursive(**kwargs) + if hasattr(self, "simulate"): + self._current_value = self.simulate(**kwargs) + return self.current_value + return None # pragma: no cover + class CvxpyExpressionEstimator(Estimator): """Base class for estimators that are Cvxpy expressions.""" @@ -254,7 +340,7 @@ def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): raise NotImplementedError # pylint: disable=too-many-arguments -class DataEstimator(Estimator): +class DataEstimator(SimulatorEstimator): """Estimator of point-in-time values from internal data. It also implements logic to check that no ``nan`` are returned @@ -452,6 +538,11 @@ def _internal_values_in_time(self, t, **kwargs): # here (probably user-provided) we check if hasattr(self.data, "values_in_time"): + if len(kwargs) == 0: + raise ValueError( + "It seems you're using a custom forecaster as part of a " + "simulate_recursive evaluation, you should derive from " + "SimulatorEstimator instead.") return self.value_checker(self._universe_subselect( self.data.current_value if hasattr(self.data, 'current_value') else self.data.values_in_time(t=t, **kwargs))) @@ -513,6 +604,25 @@ def values_in_time( # pylint: disable=arguments-differ self.parameter.value = result return result + def simulate( # pylint: disable=arguments-differ + self, t, **kwargs): + """Evaluate in simulation (e.g., TransactionCost). + + :param t: Current timestamp. + :type t: pd.Timestamp + :param kwargs: All other unused arguments to + :meth:`SimulatorEstimator.simulate`. + :type kwargs: dict + + :returns: The value from this + :class:`cvxportfolio.estimator.DataEstimator` at current time. + :rtype: int, float, numpy.ndarray + """ + # We don't support evaluation inside DataEstimator in this case + # You should implement simulate/simulate_recursive in each + # SimulatorEstimator wrapped by DataEstimator + return self.values_in_time(t=t) + def __repr__(self): """Pretty-print.""" if np.isscalar(self.data): diff --git a/cvxportfolio/forecast.py b/cvxportfolio/forecast.py index b375dfedd..2ae9079fc 100644 --- a/cvxportfolio/forecast.py +++ b/cvxportfolio/forecast.py @@ -1,4 +1,4 @@ -# Copyright 2016 Enzo Busseti, Stephen Boyd, Steven Diamond, BlackRock Inc. +# Copyright 2023 Enzo Busseti # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -11,34 +11,107 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -"""This module contains classes that make forecasts. - -It implements the models described in Chapter 7 of the book (Examples). - -For example, historical means of market returns, and covariances, are -forecasted here. These are used internally by cvxportfolio objects. In -addition, some of the classes defined here have the ability to cache the -result of their computation online so that if multiple copies of the -same forecaster need access to the estimated value (as is the case in -MultiPeriodOptimization policies) the expensive evaluation is only done -once. The same cache is stored on disk when a back-test ends, so next -time the user runs a back-test with the same universe and market data, -the forecasted values will be retrieved automatically. +r"""This module implements the simple forecasting models used by Cvxportfolio. + +These are standard ones like historical mean, variance, and covariance. In +most cases the models implemented here are equivalent to the relevant Pandas +DataFrame methods, including (most importantly) the logic used to skip over any +``np.nan``. There are some subtle differences explained below. + +Our forecasters are optimized to be evaluated sequentially in time: at each +point in time in a back-test the forecast computed at the previous time step +is updated with the most recent observation. This is in some cases (e.g., +covariances) much more efficient than computing from scratch. + +Most of our forecasters implement both a rolling window and exponential moving +average logic. These are specified by the ``rolling`` and ``half_life`` +parameters respectively, which are either Pandas Timedeltas or ``np.inf``. +The latter is the default, and means that the whole past is used, with no +exponential smoothing. Note that it's possible to use both, *e.g.*, +estimate covariance matrices ignoring past returns older than 5 years and +smoothing the recent ones using an exponential kernel with half-life of 1 year. + +Finally, we note that the covariance, variance and standard deviation +forecasters implement the ``kelly`` parameter, which is True by default. +This is a simple trick explained in +:paper:`section 4.2 (page 28) ` of the paper, simplifies the +computation and provides in general (slightly) higher performance. +For example, using the notation of the paper, the classical definition of +covariance is + +.. math:: + + \Sigma = \mathbf{E}(r_t - \mu)(r_t - \mu)^T, + +this is what you get by setting ``kelly=False``. The default, ``kelly=True``, +gives instead + +.. math:: + + \Sigma^\text{kelly} = \mathbf{E}r_t r_t^T = \Sigma + \mu \mu^T, + +so that the resulting Markowitz-style optimization problem corresponds to +the second order Taylor approximation of a (risk-constrained) Kelly objective, +as is explained briefly :paper:`at page 28 of the paper `, or with +more detail (and hard-to-read math) in `section 6 of the Risk-Constrained Kelly +Gambling paper +`_. + +Lastly, some forecasters implement a basic caching mechanism. +This is used in two ways. First, online (e.g., in back-test): if multiple +copies of the same forecaster need access to the estimated value, as is the +case in :class:`cvxportfolio.MultiPeriodOptimization` policies, the expensive +evaluation is only done once. Then, offline, provided that the +:class:`cvxportfolio.data.MarketData` server used implements the +:meth:`cvxportfolio.data.MarketData.partial_universe_signature` method +(so that we can certify which market data the cached values are computed on). +This type of caching simply saves on disk the forecasted values, and makes it +available automatically next time the user runs a back-test on the same market +data (and same universe). This is especially useful when doing hyper-parameter +optimization, so that expensive computations like evaluating large covariance +matrices are only done once. + +How to use them +~~~~~~~~~~~~~~~ + +These forecasters are each the default option of some Cvxportfolio optimization +term, for example :class:`HistoricalMeanReturn` is the default used by +:class:`cvxportfolio.ReturnsForecast`. In this way each is used with its +default options. If you want to change the options you can simply pass +the relevant forecaster class, instantiated with the options of your choice, +to the Cvxportfolio object. For example + +.. code-block:: + + import cvxportfolio as cvx + from cvxportfolio.forecast import HistoricalMeanReturn + import pandas as pd + + returns_forecast = cvx.ReturnsForecast( + r_hat = HistoricalMeanReturn( + half_life=pd.Timedelta(days=365), + rolling=pd.Timedelta(days=365*5))) + +if you want to apply exponential smoothing to the mean returns forecaster with +half-life of 1 year, and skip over all observations older than 5 years. Both +are relative to each point in time at which the policy is evaluated. """ import logging from dataclasses import dataclass +from typing import Union import numpy as np import pandas as pd -from .errors import ForecastError -from .estimator import Estimator +from .errors import DataError, ForecastError +from .estimator import Estimator, SimulatorEstimator +from .hyperparameters import _resolve_hyperpar logger = logging.getLogger(__name__) def online_cache(values_in_time): - """A simple online cache that decorates values_in_time. + """A simple online cache that decorates :meth:`values_in_time`. The instance it is used on needs to be hashable (we currently use the hash of its __repr__ via dataclass). @@ -94,89 +167,335 @@ class BaseForecast(Estimator): _last_time = None - def _agnostic_update(self, t, past_returns): + def __post_init__(self): + raise NotImplementedError # pragma: no cover + + def initialize_estimator( # pylint: disable=arguments-differ + self, **kwargs): + """Re-initialize whenever universe changes. + + :param kwargs: Unused arguments to :meth:`initialize_estimator`. + :type kwargs: dict + """ + self.__post_init__() + + def _agnostic_update(self, t, past_returns, **kwargs): """Choose whether to make forecast from scratch or update last one.""" if (self._last_time is None) or ( self._last_time != past_returns.index[-1]): logger.debug( '%s.values_in_time at time %s is computed from scratch.', self, t) - self._initial_compute(t=t, past_returns=past_returns) + self._initial_compute(t=t, past_returns=past_returns, **kwargs) else: logger.debug( '%s.values_in_time at time %s is updated from previous value.', self, t) - self._online_update(t=t, past_returns=past_returns) + self._online_update(t=t, past_returns=past_returns, **kwargs) - def _initial_compute(self, t, past_returns): + def _initial_compute(self, **kwargs): """Make forecast from scratch.""" raise NotImplementedError # pragma: no cover - def _online_update(self, t, past_returns): + def _online_update(self, **kwargs): """Update forecast from period before.""" raise NotImplementedError # pragma: no cover +def _is_timedelta(value): + if isinstance(value, pd.Timedelta): + if value <= pd.Timedelta('0d'): + raise ValueError( + '(Exponential) moving average window must be positive') + return True + if isinstance(value, float) and np.isposinf(value): + return False + raise ValueError( + '(Exponential) moving average window can only be' + ' pandas Timedeltas or np.inf.') + @dataclass(unsafe_hash=True) -class HistoricalMeanReturn(BaseForecast): - r"""Historical mean returns. +class BaseMeanVarForecast(BaseForecast): + """This class contains logic common to mean and (co)variance forecasters. - This ignores both the cash returns column and all missing values. + It implements both moving average and exponential moving average, which + can be used at the same time (e.g., ignore observations older than 5 + years and weight exponentially with half-life of 1 year the recent ones). + + Then, it implements the "online update" vs "compute from scratch" model, + updating with a new observations is much cheaper than computing from + scratch (especially for covariances). """ + half_life: Union[pd.Timedelta, float] = np.inf + rolling: Union[pd.Timedelta, float] = np.inf + def __post_init__(self): self._last_time = None - self._last_counts = None - self._last_sum = None + self._denominator = None + self._numerator = None - def initialize_estimator( # pylint: disable=arguments-differ - self, **kwargs): - """Re-initialize whenever universe changes. + def _compute_numerator(self, df, emw_weights): + """Exponential moving window (optional) numerator.""" + raise NotImplementedError # pragma: no cover - :param kwargs: Unused arguments to :meth:`initialize_estimator`. - :type kwargs: dict + def _compute_denominator(self, df, emw_weights): + """Exponential moving window (optional) denominator.""" + raise NotImplementedError # pragma: no cover + + def _update_numerator(self, last_row): + """Update with last observation. + + Emw (if any) is applied in this class. """ - self.__post_init__() + raise NotImplementedError # pragma: no cover + + def _update_denominator(self, last_row): + """Update with last observation. + + Emw (if any) is applied in this class. + """ + raise NotImplementedError # pragma: no cover + + def _dataframe_selector(self, **kwargs): + """Return dataframe we work with. + + This method receives the **kwargs passed to :meth:`values_in_time`. + """ + raise NotImplementedError # pragma: no cover def values_in_time( # pylint: disable=arguments-differ - self, t, past_returns, **kwargs): - """Obtain current value of the mean returns. + self, **kwargs): + """Obtain current value of the historical mean of given dataframe. - :param t: Current time. - :type t: pandas.Timestamp - :param past_returns: Past market returns, including cash. - :type past_returns: pandas.DataFrame - :param kwargs: Unused arguments to :meth:`values_in_time`. + :param kwargs: All arguments to :meth:`values_in_time`. :type kwargs: dict - :returns: Means of past returns (excluding cash). + :returns: Historical means of given dataframe. :rtype: numpy.array """ - self._agnostic_update(t=t, past_returns=past_returns) - return (self._last_sum / self._last_counts).values + self._agnostic_update(**kwargs) + return (self._numerator / self._denominator).values - def _initial_compute(self, t, past_returns): - """Make forecast from scratch.""" - self._last_counts = past_returns.iloc[:, :-1].count() - self._last_sum = past_returns.iloc[:, :-1].sum() + def _emw_weights(self, index, t): + """Get weights to apply to the past obs for EMW.""" + index_in_halflifes = (index - t) / _resolve_hyperpar(self.half_life) + return np.exp(index_in_halflifes * np.log(2)) + + def _initial_compute(self, t, **kwargs): # pylint: disable=arguments-differ + """Make forecast from scratch. + + This method receives the **kwargs passed to :meth:`values_in_time`. + """ + df = self._dataframe_selector(t=t, **kwargs) + + # Moving average window logic + if _is_timedelta(_resolve_hyperpar(self.rolling)): + df = df.loc[df.index >= t-_resolve_hyperpar(self.rolling)] + + # If EMW, compute weights here + if _is_timedelta(_resolve_hyperpar(self.half_life)): + emw_weights = self._emw_weights(df.index, t) + else: + emw_weights = None + + self._denominator = self._compute_denominator(df, emw_weights) + if np.min(self._denominator.values) == 0: + raise ForecastError( + f'{self.__class__.__name__} is given a dataframe with ' + + 'at least a column that has no values.') + self._numerator = self._compute_numerator(df, emw_weights) self._last_time = t - def _online_update(self, t, past_returns): - """Update forecast from period before.""" - self._last_counts += ~(past_returns.iloc[-1, :-1].isnull()) - self._last_sum += past_returns.iloc[-1, :-1].fillna(0.) + # used by covariance forecaster + return df, emw_weights + + def _online_update(self, t, **kwargs): # pylint: disable=arguments-differ + """Update forecast from period before. + + This method receives the **kwargs passed to :meth:`values_in_time`. + """ + df = self._dataframe_selector(t=t, **kwargs) + last_row = df.iloc[-1] + + # if emw discount past + if _is_timedelta(_resolve_hyperpar(self.half_life)): + time_passed_in_halflifes = ( + self._last_time - t)/_resolve_hyperpar(self.half_life) + discount_factor = np.exp(time_passed_in_halflifes * np.log(2)) + self._denominator *= discount_factor + self._numerator *= discount_factor + else: + discount_factor = 1. + + # for emw we also need to discount last element + self._denominator += self._update_denominator( + last_row) * discount_factor + self._numerator += self._update_numerator(last_row) * discount_factor + + # Moving average window logic: subtract elements that have gone out + if _is_timedelta(_resolve_hyperpar(self.rolling)): + observations_to_subtract, emw_weights_of_subtract = \ + self._remove_part_gone_out_of_ma(df, t) + else: + observations_to_subtract, emw_weights_of_subtract = None, None + self._last_time = t + # used by covariance forecaster + return ( + discount_factor, observations_to_subtract, emw_weights_of_subtract) + + def _remove_part_gone_out_of_ma(self, df, t): + """Subtract from numerator and denominator too old observations.""" + + observations_to_subtract = df.loc[ + (df.index >= (self._last_time - _resolve_hyperpar(self.rolling))) + & (df.index < (t - _resolve_hyperpar(self.rolling)))] + + # If EMW, compute weights here + if _is_timedelta(_resolve_hyperpar(self.half_life)): + emw_weights = self._emw_weights(observations_to_subtract.index, t) + else: + emw_weights = None + + self._denominator -= self._compute_denominator( + observations_to_subtract, emw_weights) + if np.min(self._denominator.values) == 0: + raise ForecastError( + f'{self.__class__.__name__} is given a dataframe with ' + + 'at least a column that has no values.') + self._numerator -= self._compute_numerator( + observations_to_subtract, emw_weights).fillna(0.) + + # used by covariance forecaster + return observations_to_subtract, emw_weights + + +@dataclass(unsafe_hash=True) +class BaseMeanForecast(BaseMeanVarForecast): # pylint: disable=abstract-method + """This class contains the logic common to the mean forecasters.""" + + def _compute_numerator(self, df, emw_weights): + """Exponential moving window (optional) numerator.""" + if emw_weights is None: + return df.sum() + return df.multiply(emw_weights, axis=0).sum() + + def _compute_denominator(self, df, emw_weights): + """Exponential moving window (optional) denominator.""" + if emw_weights is None: + return df.count() + ones = (~df.isnull()) * 1. + return ones.multiply(emw_weights, axis=0).sum() + + def _update_numerator(self, last_row): + """Update with last observation. + + Emw (if any) is applied upstream. + """ + return last_row.fillna(0.) + + def _update_denominator(self, last_row): + """Update with last observation. + + Emw (if any) is applied upstream. + """ + return ~(last_row.isnull()) + + +@dataclass(unsafe_hash=True) +class HistoricalMeanReturn(BaseMeanForecast): + r"""Historical means of non-cash returns. + + .. versionadded:: 1.2.0 + + Added the ``half_life`` and ``rolling`` parameters. + + When both ``half_life`` and ``rolling`` are infinity, this is equivalent to + + .. code-block:: + + past_returns.iloc[:,:-1].mean() + + where ``past_returns`` is a time-indexed dataframe containing the past + returns (if in back-test that's relative to each point in time, ), and its + last column, which we skip over, are the cash returns. We use the same + logic as Pandas to handle ``np.nan`` values. + + :param half_life: Half-life of exponential smoothing, expressed as + Pandas Timedelta. If in back-test, that is with respect to each point + in time. Default ``np.inf``, meaning no exponential smoothing. + :type half_life: pandas.Timedelta or np.inf + :param rolling: Rolling window used: observations older than this Pandas + Timedelta are skipped over. If in back-test, that is with respect to + each point in time. Default ``np.inf``, meaning that all past is used. + :type rolling: pandas.Timedelta or np.inf + """ + # pylint: disable=arguments-differ + def _dataframe_selector(self, past_returns, **kwargs): + """Return dataframe to compute the historical means of.""" + return past_returns.iloc[:, :-1] + +@dataclass(unsafe_hash=True) +class HistoricalMeanVolume(BaseMeanForecast): + r"""Historical means of traded volume in units of value (e.g., dollars). + + .. versionadded:: 1.2.0 + + :param half_life: Half-life of exponential smoothing, expressed as + Pandas Timedelta. If in back-test, that is with respect to each point + in time. Default ``np.inf``, meaning no exponential smoothing. + :type half_life: pandas.Timedelta or np.inf + :param rolling: Rolling window used: observations older than this Pandas + Timedelta are skipped over. If in back-test, that is with respect to + each point in time. Default ``np.inf``, meaning that all past is used. + :type rolling: pandas.Timedelta or np.inf + """ + # pylint: disable=arguments-differ + def _dataframe_selector(self, past_volumes, **kwargs): + """Return dataframe to compute the historical means of.""" + if past_volumes is None: + raise DataError( + f"{self.__class__.__name__} can only be used if MarketData" + + " provides market volumes.") + return past_volumes @dataclass(unsafe_hash=True) -class HistoricalVariance(BaseForecast): +class HistoricalVariance(BaseMeanForecast): r"""Historical variances of non-cash returns. + .. versionadded:: 1.2.0 + + Added the ``half_life`` and ``rolling`` parameters. + + When both ``half_life`` and ``rolling`` are infinity, this is equivalent to + + .. code-block:: + + past_returns.iloc[:,:-1].var(ddof=0) + + if you set ``kelly=False`` and + + .. code-block:: + + (past_returns**2).iloc[:,:-1].mean() + + otherwise (we use the same logic to handle ``np.nan`` values). + + :param half_life: Half-life of exponential smoothing, expressed as + Pandas Timedelta. If in back-test, that is with respect to each point + in time. Default ``np.inf``, meaning no exponential smoothing. + :type half_life: pandas.Timedelta or np.inf + :param rolling: Rolling window used: observations older than this Pandas + Timedelta are skipped over. If in back-test, that is with respect to + each point in time. Default ``np.inf``, meaning that all past is used. + :type rolling: pandas.Timedelta or np.inf :param kelly: if ``True`` compute :math:`\mathbf{E}[r^2]`, else :math:`\mathbf{E}[r^2] - {\mathbf{E}[r]}^2`. The second corresponds to the classic definition of variance, while the first is what is obtained by Taylor approximation of the Kelly gambling objective. - (See page 28 of the book.) + See discussion above. :type kelly: bool """ @@ -184,110 +503,349 @@ class HistoricalVariance(BaseForecast): def __post_init__(self): if not self.kelly: - self.meanforecaster = HistoricalMeanReturn() - self._last_time = None - self._last_counts = None - self._last_sum = None - - def initialize_estimator( # pylint: disable=arguments-differ - self, **kwargs): - """Re-initialize whenever universe changes. - - :param kwargs: Unused arguments to :meth:`initialize_estimator`. - :type kwargs: dict - """ - self.__post_init__() + self.meanforecaster = HistoricalMeanReturn( + half_life=_resolve_hyperpar(self.half_life), + rolling=_resolve_hyperpar(self.rolling)) + super().__post_init__() - def values_in_time( # pylint: disable=arguments-differ - self, t, past_returns, **kwargs): + def values_in_time(self, **kwargs): """Obtain current value either by update or from scratch. - :param t: Current time. - :type t: pandas.Timestamp - :param past_returns: Past market returns, including cash. - :type past_returns: pandas.DataFrame - :param kwargs: Unused arguments to :meth:`values_in_time`. + :param kwargs: All arguments to :meth:`values_in_time`. :type kwargs: dict :returns: Variances of past returns (excluding cash). :rtype: numpy.array """ - self._agnostic_update(t=t, past_returns=past_returns) - result = (self._last_sum / self._last_counts).values + result = super().values_in_time(**kwargs) if not self.kelly: result -= self.meanforecaster.current_value**2 return result - def _initial_compute(self, t, past_returns): - """Compute from scratch.""" - self._last_counts = past_returns.iloc[:, :-1].count() - self._last_sum = (past_returns.iloc[:, :-1]**2).sum() - self._last_time = t - - def _online_update(self, t, past_returns): - """Update from estimate at t-1.""" - self._last_counts += ~(past_returns.iloc[-1, :-1].isnull()) - self._last_sum += past_returns.iloc[-1, :-1].fillna(0.)**2 - self._last_time = t + # pylint: disable=arguments-differ + def _dataframe_selector(self, past_returns, **kwargs): + """Return dataframe to compute the historical means of.""" + return past_returns.iloc[:, :-1]**2 @dataclass(unsafe_hash=True) -class HistoricalStandardDeviation(HistoricalVariance): - """Historical standard deviation.""" +class HistoricalStandardDeviation(HistoricalVariance, SimulatorEstimator): + """Historical standard deviation of non-cash returns. + + .. versionadded:: 1.2.0 + + Added the ``half_life`` and ``rolling`` parameters. + + When both ``half_life`` and ``rolling`` are infinity, this is equivalent to + + .. code-block:: + + past_returns.iloc[:,:-1].std(ddof=0) + + if you set ``kelly=False`` and + + .. code-block:: + + np.sqrt((past_returns**2).iloc[:,:-1].mean()) + + otherwise (we use the same logic to handle ``np.nan`` values). + + :param half_life: Half-life of exponential smoothing, expressed as + Pandas Timedelta. If in back-test, that is with respect to each point + in time. Default ``np.inf``, meaning no exponential smoothing. + :type half_life: pandas.Timedelta or np.inf + :param rolling: Rolling window used: observations older than this Pandas + Timedelta are skipped over. If in back-test, that is with respect to + each point in time. Default ``np.inf``, meaning that all past is used. + :type rolling: pandas.Timedelta or np.inf + :param kelly: Same as in :class:`cvxportfolio.forecast.HistoricalVariance`. + Default True. + :type kelly: bool + """ kelly: bool = True - def values_in_time(self, t, past_returns, **kwargs): + def values_in_time(self, **kwargs): """Obtain current value either by update or from scratch. - :param t: Current time. - :type t: pandas.Timestamp - :param past_returns: Past market returns, including cash. - :type past_returns: pandas.DataFrame - :param kwargs: Unused arguments to :meth:`values_in_time`. + :param kwargs: All arguments to :meth:`values_in_time`. :type kwargs: dict :returns: Standard deviations of past returns (excluding cash). :rtype: numpy.array """ variances = \ - super().values_in_time(t=t, past_returns=past_returns, **kwargs) + super().values_in_time(**kwargs) return np.sqrt(variances) + def simulate(self, **kwargs): + # TODO could take last return as well + return self.values_in_time( + t=kwargs['t'], + # These are not necessary with current design of + # DataEstimator + current_weights=kwargs['current_weights'], + current_portfolio_value=kwargs['current_portfolio_value'], + past_returns=kwargs['past_returns'], + past_volumes=kwargs['past_volumes'], + current_prices=kwargs['current_prices'] + ) + +@dataclass(unsafe_hash=True) class HistoricalMeanError(HistoricalVariance): r"""Historical standard deviations of the mean of non-cash returns. + .. versionadded:: 1.2.0 + + Added the ``half_life`` and ``rolling`` parameters. + For a given time series of past returns :math:`r_{t-1}, r_{t-2}, \ldots, r_0` this is :math:`\sqrt{\text{Var}[r]/t}`. When there are missing values we ignore them, both to compute the variance and the count. + + :param half_life: Half-life of exponential smoothing, expressed as + Pandas Timedelta. If in back-test, that is with respect to each point + in time. Default ``np.inf``, meaning no exponential smoothing. + :type half_life: pandas.Timedelta or np.inf + :param rolling: Rolling window used: observations older than this Pandas + Timedelta are skipped over. If in back-test, that is with respect to + each point in time. Default ``np.inf``, meaning that all past is used. + :type rolling: pandas.Timedelta or np.inf + :param kelly: Same as in :class:`cvxportfolio.forecast.HistoricalVariance`. + Default False. + :type kelly: bool """ - def __init__(self): - super().__init__(kelly=False) + kelly: bool = False - def values_in_time(self, t, past_returns, **kwargs): + def values_in_time(self, **kwargs): """Obtain current value either by update or from scratch. - :param t: Current time. - :type t: pandas.Timestamp - :param past_returns: Past market returns, including cash. - :type past_returns: pandas.DataFrame - :param kwargs: Unused arguments to :meth:`values_in_time`. + :param kwargs: All arguments to :meth:`values_in_time`. :type kwargs: dict :returns: Standard deviation of the mean of past returns (excluding cash). :rtype: numpy.array """ - variance = super().values_in_time( - t=t, past_returns=past_returns, **kwargs) - return np.sqrt(variance / self._last_counts.values) + variance = super().values_in_time(**kwargs) + return np.sqrt(variance / self._denominator.values) + + +@dataclass(unsafe_hash=True) +class HistoricalCovariance(BaseMeanVarForecast): + r"""Historical covariance matrix.""" + + kelly: bool = True + + def __post_init__(self): + super().__post_init__() + self._joint_mean = None + + def _compute_numerator(self, df, emw_weights): + """Exponential moving window (optional) numerator.""" + filled = df.fillna(0.) + if emw_weights is None: + return filled.T @ filled + tmp = filled.multiply(emw_weights, axis=0) + return tmp.T @ filled + + def _compute_denominator(self, df, emw_weights): + """Exponential moving window (optional) denominator.""" + ones = (~df.isnull()) * 1. + if emw_weights is None: + return ones.T @ ones + tmp = ones.multiply(emw_weights, axis=0) + return tmp.T @ ones + + def _update_denominator(self, last_row): + """Update with last observation. + + Emw (if any) is applied upstream. + """ + nonnull = ~(last_row.isnull()) + return np.outer(nonnull, nonnull) + + def _update_numerator(self, last_row): + """Update with last observation. + + Emw (if any) is applied upstream. + """ + filled = last_row.fillna(0.) + return np.outer(filled, filled) + + def _dataframe_selector( # pylint: disable=arguments-differ + self, past_returns, **kwargs): + """Return dataframe to compute the historical covariance of.""" + return past_returns.iloc[:, :-1] + + def _compute_joint_mean(self, df, emw_weights): + r"""Compute precursor of :math:`\Sigma_{i,j} = + \mathbf{E}[r^{i}]\mathbf{E}[r^{j}]`.""" + nonnull = (~df.isnull()) * 1. + if emw_weights is None: + return nonnull.T @ df.fillna(0.) + return nonnull.T @ df.fillna(0.).multiply(emw_weights, axis=0) + + def _update_joint_mean(self, last_row): + r"""Update precursor of :math:`\Sigma_{i,j} = + \mathbf{E}[r^{i}]\mathbf{E}[r^{j}]`.""" + return last_row.fillna(0.) + + def _initial_compute( # pylint: disable=arguments-differ + self, **kwargs): + """Compute from scratch, taking care of non-Kelly correction.""" + + df, emw_weights = super()._initial_compute(**kwargs) + + if not self.kelly: + self._joint_mean = self._compute_joint_mean(df, emw_weights) + + def _online_update( # pylint: disable=arguments-differ + self, **kwargs): + """Update from last observation.""" + + discount_factor, observations_to_subtract, emw_weights_of_subtract = \ + super()._online_update(**kwargs) + df = self._dataframe_selector(**kwargs) + last_row = df.iloc[-1] + + if not self.kelly: + + # discount past if EMW + if discount_factor != 1.: + self._joint_mean *= discount_factor + + # add last anyways + self._joint_mean += self._update_joint_mean( + last_row) * discount_factor + + # if MW, update by removing old observations + if observations_to_subtract is not None: + self._joint_mean -= self._compute_joint_mean( + observations_to_subtract, emw_weights_of_subtract) + + def values_in_time(self, **kwargs): + """Obtain current value of the covariance estimate. + + :param kwargs: All arguments passed to :meth:`values_in_time`. + :type kwargs: dict + + :returns: Covariance matrix (excludes cash). + :rtype: numpy.array + """ + + covariance = super().values_in_time(**kwargs) + + if not self.kelly: + tmp = self._joint_mean / self._denominator + covariance -= tmp.T * tmp + + return covariance + +def project_on_psd_cone_and_factorize(covariance): + """Factorize matrix and remove negative eigenvalues. + + :param covariance: Square (symmetric) approximate covariance matrix, can + have negative eigenvalues. + :type covariance: numpy.array + + :returns: Square root factorization with negative eigenvalues removed. + :rtype: numpy.array + """ + eigval, eigvec = np.linalg.eigh(covariance) + eigval = np.maximum(eigval, 0.) + return eigvec @ np.diag(np.sqrt(eigval)) + +@dataclass(unsafe_hash=True) +class HistoricalFactorizedCovariance(HistoricalCovariance): + r"""Historical covariance matrix of non-cash returns, factorized. + + .. versionadded:: 1.2.0 + + Added the ``half_life`` and ``rolling`` parameters. + + When both ``half_life`` and ``rolling`` are infinity, this is equivalent + to, before factorization + + .. code-block:: + + past_returns.iloc[:,:-1].cov(ddof=0) + + if you set ``kelly=False``. We use the same logic to handle ``np.nan`` + values. For ``kelly=True`` it is not possible to reproduce with one single + Pandas method (but we do test against Pandas in the unit tests). + + :param half_life: Half-life of exponential smoothing, expressed as + Pandas Timedelta. Default ``np.inf``, meaning no exponential smoothing. + :type half_life: pandas.Timedelta or np.inf + :param rolling: Rolling window used: observations older than this Pandas + Timedelta are skipped over. If in back-test, that is with respect to + each point in time. Default ``np.inf``, meaning that all past is used. + :type rolling: pandas.Timedelta or np.inf + :param kelly: if ``True`` each element of the covariance matrix + :math:`\Sigma_{i,j}` is equal to :math:`\mathbf{E} r^{i} r^{j}`, + otherwise it is + :math:`\mathbf{E} r^{i} r^{j} - \mathbf{E} r^{i} \mathbf{E} r^{j}`. + The second case corresponds to the classic definition of covariance, + while the first is what is obtained by Taylor approximation of + the Kelly gambling objective. (See discussion above.) + In the second case, the estimated covariance is the same + as what is returned by ``pandas.DataFrame.cov(ddof=0)``, *i.e.*, + we use the same logic to handle missing data. + :type kelly: bool + """ + + # this is used by FullCovariance + FACTORIZED = True + + @online_cache + def values_in_time( # pylint: disable=arguments-differ + self, t, **kwargs): + """Obtain current value of the covariance estimate. + + :param t: Current time period (possibly of simulation). + :type t: pandas.Timestamp + :param kwargs: All arguments passed to :meth:`values_in_time`. + :type kwargs: dict + + :raises cvxportfolio.errors.ForecastError: The procedure failed, + typically because there are too many missing values (*e.g.*, some + asset has only missing values). + + :returns: Square root factorized covariance matrix (excludes cash). + :rtype: numpy.array + """ + + covariance = super().values_in_time(t=t, **kwargs) + + try: + return project_on_psd_cone_and_factorize(covariance) + except np.linalg.LinAlgError as exc: # pragma: no cover + raise ForecastError(f'Covariance estimation at time {t} failed;' + + ' there are (probably) too many missing values in the' + + ' past returns.') from exc @dataclass(unsafe_hash=True) class HistoricalLowRankCovarianceSVD(Estimator): - """Build factor model covariance using truncated SVD.""" + """Build factor model covariance using truncated SVD. + + .. note:: + + This forecaster is experimental and not covered by semantic versioning, + we may change it without warning. + + :param num_factors: How many factors in the low rank model. + :type num_factors: int + :param svd_iters: How many iteration of truncated SVD to apply. If you + get a badly conditioned covariance you may to lower this. + :type svd_iters: int + :param svd: Which SVD routine to use, currently only dense (LAPACK) via + Numpy. + :type svd: str + """ num_factors: int svd_iters: int = 10 @@ -380,124 +938,3 @@ def values_in_time( # pylint: disable=arguments-differ return self.build_low_rank_model(past_returns.iloc[:, :-1], num_factors=self.num_factors, iters=self.svd_iters, svd=self.svd) - - -def project_on_psd_cone_and_factorize(covariance): - """Factorize matrix and remove negative eigenvalues. - - :param covariance: Square (symmetric) approximate covariance matrix, can - have negative eigenvalues. - :type covariance: numpy.array - - :returns: Square root factorization with negative eigenvalues removed. - :rtype: numpy.array - """ - eigval, eigvec = np.linalg.eigh(covariance) - eigval = np.maximum(eigval, 0.) - return eigvec @ np.diag(np.sqrt(eigval)) - -@dataclass(unsafe_hash=True) -class HistoricalFactorizedCovariance(BaseForecast): - r"""Historical covariance matrix, sqrt factorized. - - :param kelly: if ``True`` compute each - :math:`\Sigma_{i,j} = \overline{r^{i} r^{j}}`, else - :math:`\overline{r^{i} r^{j}} - \overline{r^{i}}\overline{r^{j}}`. - The second case corresponds to the classic definition of covariance, - while the first is what is obtained by Taylor approximation of - the Kelly gambling objective. (See page 28 of the book.) - In the second case, the estimated covariance is the same - as what is returned by ``pandas.DataFrame.cov(ddof=0)``, *i.e.*, - we use the same logic to handle missing data. - :type kelly: bool - """ - - kelly: bool = True - - # this is used by FullCovariance - FACTORIZED = True - - def __post_init__(self): - self._last_time = None - self._last_counts_matrix = None - self._last_sum_matrix = None - self._joint_mean = None - - def initialize_estimator( # pylint: disable=arguments-differ - self, **kwargs): - """Re-initialize whenever universe changes. - - :param kwargs: Unused arguments to :meth:`initialize_estimator`. - :type kwargs: dict - """ - self.__post_init__() - - @staticmethod - def _get_count_matrix(past_returns): - r"""We obtain the matrix of non-null joint counts: - - .. math:: - - \text{Count}\left(r^{i}r^{j} \neq \texttt{nan}\right). - """ - tmp = (~past_returns.iloc[:, :-1].isnull()) * 1. - return tmp.T @ tmp - - @staticmethod - def _get_initial_joint_mean(past_returns): - r"""Compute precursor of :math:`\Sigma_{i,j} = - \mathbf{E}[r^{i}]\mathbf{E}[r^{j}]`.""" - nonnull = (~past_returns.iloc[:, :-1].isnull()) * 1. - tmp = nonnull.T @ past_returns.iloc[:, :-1].fillna(0.) - return tmp # * tmp.T - - def _initial_compute(self, t, past_returns): - self._last_counts_matrix = self._get_count_matrix(past_returns).values - filled = past_returns.iloc[:, :-1].fillna(0.).values - self._last_sum_matrix = filled.T @ filled - if not self.kelly: - self._joint_mean = self._get_initial_joint_mean(past_returns) - - self._last_time = t - - def _online_update(self, t, past_returns): - nonnull = ~(past_returns.iloc[-1, :-1].isnull()) - self._last_counts_matrix += np.outer(nonnull, nonnull) - last_ret = past_returns.iloc[-1, :-1].fillna(0.) - self._last_sum_matrix += np.outer(last_ret, last_ret) - self._last_time = t - if not self.kelly: - self._joint_mean += last_ret - - @online_cache - def values_in_time( # pylint: disable=arguments-differ - self, t, past_returns, **kwargs): - """Obtain current value of the covariance estimate. - - :param t: Current time. - :type t: pandas.Timestamp - :param past_returns: Past market returns (including cash). - :type past_returns: pandas.DataFrame - :param kwargs: Extra arguments passed to :meth:`values_in_time`. - :type kwargs: dict - - :raises cvxportfolio.errors.ForecastError: The procedure failed, - typically because there are too many missing values (*e.g.*, some - asset has only missing values). - - :returns: Square root factorized covariance matrix (excludes cash). - :rtype: numpy.array - """ - - self._agnostic_update(t=t, past_returns=past_returns) - covariance = self._last_sum_matrix / self._last_counts_matrix - - if not self.kelly: - tmp = self._joint_mean / self._last_counts_matrix - covariance -= tmp.T * tmp - try: - return project_on_psd_cone_and_factorize(covariance) - except np.linalg.LinAlgError as exc: - raise ForecastError(f'Covariance estimation at time {t} failed;' - + ' there are (probably) too many missing values in the' - + ' past returns.') from exc diff --git a/cvxportfolio/hyperparameters.py b/cvxportfolio/hyperparameters.py index 26c557eec..d024966a3 100644 --- a/cvxportfolio/hyperparameters.py +++ b/cvxportfolio/hyperparameters.py @@ -21,6 +21,7 @@ from numbers import Number import numpy as np +import pandas as pd # GAMMA_RISK_RANGE = [.5, 1., 2., 5., 10.] # GAMMA_COST_RANGE = [0., .1, .2, .5, 1., 2., 5., 10.] @@ -29,6 +30,12 @@ __all__ = ['Gamma', 'RangeHyperParameter'] +def _resolve_hyperpar(possible_hyperpar): + """Return current value if input is hyper-parameter, or input itself.""" + if isinstance(possible_hyperpar, HyperParameter): + return possible_hyperpar.current_value + return possible_hyperpar + class HyperParameter: """Base Hyper Parameter class. @@ -40,7 +47,8 @@ class HyperParameter: """ def __mul__(self, other): - if np.isscalar(other) or isinstance(other, HyperParameter): + if np.isscalar(other) or isinstance(other, HyperParameter) \ + or isinstance(other, pd.Timedelta): return CombinedHyperParameter([self], [other]) return NotImplemented @@ -82,7 +90,7 @@ def current_value(self): """Current value of the hyper-parameter. :returns: Current value. - :rtype: int or float + :rtype: int, float, pd.Timedelta """ raise NotImplementedError # pragma: no cover @@ -104,10 +112,15 @@ def current_value(self): :returns: Current value. :rtype: int or float """ - return sum(( + # we unroll the sum() to support non-numeric (Timedeltas, ...) + summands = list( (le.current_value if hasattr(le, 'current_value') else le) * (ri.current_value if hasattr(ri, 'current_value') else ri) - for le, ri in zip(self.left, self.right))) + for le, ri in zip(self.left, self.right)) + result = summands[0] + for el in summands[1:]: + result += el + return result def collect_hyperparameters(self): """Collect (not combined) hyper-parameters. diff --git a/cvxportfolio/policies.py b/cvxportfolio/policies.py index d39288a9d..9e690367b 100644 --- a/cvxportfolio/policies.py +++ b/cvxportfolio/policies.py @@ -24,6 +24,7 @@ from .errors import (ConvexityError, ConvexSpecificationError, DataError, MissingTimesError, PortfolioOptimizationError) from .estimator import DataEstimator, Estimator +from .forecast import HistoricalMeanVolume from .returns import CashReturn from .utils import flatten_heterogeneous_list @@ -169,17 +170,43 @@ def values_in_time( # pylint: disable=arguments-differ return result class MarketBenchmark(Policy): - """Allocation weighted by last year's total market volumes.""" + """Allocation weighted by last year's average market traded volumes. + + This policy provides an approximation of a market capitalization-weighted + allocation, by using the average of traded volumes in units of value (e.g., + USDOLLAR) over the previous year as proxy. + + .. versionadded:: 1.2.0 + + We added the ``mean_volume_forecast`` parameter. + + :param mean_volume_forecast: Forecaster class that computes the average + of historical volumes. You can also pass a DataFrame containing + your own forecasts computed externally. Default is + :class:`cvxportfolio.forecast.HistoricalMeanVolume` which is + instantiated with parameter ``rolling=pd.Timedelta('365.24d')`` + (that's one solar year in number of days). If you + want to provide a different forecaster, or change the parameters (like + adding exponential smoothing) you should instantiate the forecaster + class and pass the instance. + :type mean_volume_forecast: pandas.DataFrame, cvx.forecast.BaseForecast + class or instance + """ + + def __init__(self, mean_volume_forecast=HistoricalMeanVolume): + if isinstance(mean_volume_forecast, type): + mean_volume_forecast = mean_volume_forecast( + rolling=pd.Timedelta('365.24d')) + self.mean_volume_forecast = DataEstimator( + mean_volume_forecast, data_includes_cash=False) def values_in_time( # pylint: disable=arguments-differ - self, past_returns, past_volumes, **kwargs): + self, past_returns, **kwargs): """Return market benchmark weights. :param past_returns: Past market returns (used to infer universe with cash). :type past_returns: pandas.DataFrame - :param past_volumes: Past market volumes. - :type past_volumes: pandas.DataFrame :param kwargs: Unused arguments to :meth:`values_in_time`. :type kwargs: dict @@ -190,14 +217,9 @@ def values_in_time( # pylint: disable=arguments-differ :rtype: pandas.Series """ - if past_volumes is None: - raise DataError( - f"{self.__class__.__name__} can only be used if MarketData" - + " provides market volumes.") - sumvolumes = past_volumes.loc[past_volumes.index >= ( - past_volumes.index[-1] - pd.Timedelta('365d'))].mean() - result = np.zeros(len(sumvolumes) + 1) - result[:-1] = sumvolumes / sum(sumvolumes) + meanvolumes = self.mean_volume_forecast.current_value + result = np.zeros(len(meanvolumes) + 1) + result[:-1] = meanvolumes / sum(meanvolumes) return pd.Series(result, index=past_returns.columns) class RankAndLongShort(Policy): diff --git a/cvxportfolio/returns.py b/cvxportfolio/returns.py index ee627bb0a..d4ce9668f 100644 --- a/cvxportfolio/returns.py +++ b/cvxportfolio/returns.py @@ -15,6 +15,7 @@ optimization policies and related objects.""" import cvxpy as cp +import numpy as np from .costs import Cost from .estimator import DataEstimator # , ParameterEstimator @@ -112,19 +113,20 @@ class ReturnsForecast(Cost): must be supplied for all assets excluding cash. See the :ref:`passing-data` manual page for more information on how these are passed. - :param r_hat: constant or time varying returns estimates, provided in the - form of a pandas DataFrame indexed by timestamps of trading period and - whose columns are all non-cash assets. Alternatively it can be a pandas - Series indexed by the assets' names (so it is constant in time), a - pandas Series indexed by time (so it is constant across assets), or a - float (constant for all times and assets). Alternatively you can - provide a :class:`cvxportfolio.estimator.Estimator` subclass that - implements the logic to compute the returns forecast given the past - market data, like the default + + :param r_hat: constant or time varying returns estimates, either + user-provided (see :ref:`the manual page on passing data + ` or computed internally by a forecaster class or + instance. The default is :class:`cvxportfolio.forecast.HistoricalMeanReturn` which computes the historical means of the past returns, at each point in the back-test. - :type r_hat: pd.Series or pd.DataFrame or float or - :class:`cvxportfolio.estimator.Estimator` + It is instantiated internally with default parameters, so it computes + the full historical means at each point in time. If you prefer + to change that (e.g., do rolling mean or exponential moving window) + you can instantiate :class:`cvxportfolio.forecast.HistoricalMeanReturn` + with your chosen parameters and pass the instance. + :type r_hat: pd.Series, pd.DataFrame, float, + :class:`cvxportfolio.forecast.BaseForecast` class or instance :param decay: decay factor used in :class:`cvxportfolio.MultiPeriodOptimization` policies. It is as a number in :math:`[0,1]`. At step :math:`\tau` of the MPO policy, where @@ -134,6 +136,8 @@ class ReturnsForecast(Cost): signal. The default value is 1. :type decay: float + + :Example: >>> import cvxportfolio as cvx @@ -145,6 +149,22 @@ class ReturnsForecast(Cost): :math:`\hat{r}_t` are the full average of past returns at each point in time and the risk model is the full covariance, also computed from the past returns. + + :Example: + + >>> my_forecasts = pd.DataFrame(...) + >>> returns_model = cvx.ReturnsForecast(r_hat=my_forecasts) + + With user-provided forecasts. + + :Example: + + >>> from cvxportfolio.forecast import HistoricalMeanReturn + >>> returns_model = cvx.ReturnsForecast( + r_hat=HistoricalMeanReturn(rolling = pd.Timedelta('365d'))) + + Instead of the default (full historical means at each point in time), + use rolling means of 1 year. """ def __init__(self, r_hat=HistoricalMeanReturn, decay=1.): @@ -178,8 +198,9 @@ def values_in_time( # pylint: disable=arguments-differ :param kwargs: All other parameters to :meth:`values_in_time`. :type kwargs: dict """ - self._r_hat_parameter.value = self.r_hat.current_value *\ - self.decay**(mpo_step) + self._r_hat_parameter.value = \ + np.ones(self._r_hat_parameter.size) * \ + self.r_hat.current_value * self.decay**(mpo_step) def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): """Compile to cvxpy expression. diff --git a/cvxportfolio/risks.py b/cvxportfolio/risks.py index 4b9d23e62..ef58a8e77 100644 --- a/cvxportfolio/risks.py +++ b/cvxportfolio/risks.py @@ -51,12 +51,24 @@ class FullCovariance(Cost): and the benchmark weights, respectively, at time :math:`t`. :param Sigma: DataFrame of covariance matrices supplied by the user, or by - default Covariance fitted from the past data. The DataFrame can either + default covariance matrix forecasted from the past data. + The DataFrame can either represents a single constant covariance matrix or one for each point in - time. If it is a class we instantiate it with default parameters. At - each time :math:`t` we project the value of :math:`\Sigma_t` on the - cone of positive semi-definite matrices. - :type Sigma: pandas.DataFrame or Estimator + time: in the latter case you use a Pandas multiindexed dataframe + where the first level are the points in time (of the back-test) and + the second level are the assets, as are the columns. + The default is to use + :class:`cvxportfolio.forecast.HistoricalFactorizedCovariance`, the + :doc:`forecaster class ` that computes the full historical + covariance, at each point in time of a back-test, from past returns. + (It also factorizes it to ease the optimization and caches it on + disk.) It is instantiated with default parameters, if instead you + wish to change them you can pass an instance with your choices of + parameters (like ``rolling`` for moving average and ``half_life`` + for exponential smoothing). You can also pass any other forecaster + estimator that computes a covariance matrix from the past returns. + :type Sigma: pandas.DataFrame, cvxportfolio.forecast.BaseForecast class + or instance """ def __init__(self, Sigma=HistoricalFactorizedCovariance): @@ -119,10 +131,16 @@ class RiskForecastError(Cost): Implements the model defined in :paper:`chapter 4, page 32 ` of the paper. Takes same arguments as :class:`DiagonalCovariance`. - :param sigma_squares: per-stock variances, indexed by time if DataFrame. + :param sigma_squares: Per-asset variances, either constant + (Pandas series) or changing in time (time-indexed Pandas dataframe) + (see :ref:`the passing data manual page `). Default is to use historical variances, using - past returns at each point in time of a backtest. - :type sigma_squares: pd.DataFrame or pd.Series or None + past returns at each point in time of a backtest. If you wish to + change the parameters to + :class:`cvxportfolio.forecast.HistoricalVariance` you can instantiate + it with your choice of parameters and pass the instance. + :type sigma_squares: pd.DataFrame, pd.Series, cvx.forecast.BaseForecast + class or instance """ def __init__(self, sigma_squares=HistoricalVariance): @@ -174,12 +192,27 @@ def compile_to_cvxpy(self, w_plus, z, w_plus_minus_w_bm): class DiagonalCovariance(Cost): - """Diagonal covariance matrix, user-provided or fit from data. + r"""Diagonal covariance matrix, user-provided or fit from data. - :param sigma_squares: per-stock variances, indexed by time if - DataFrame. Default is to use historical variances, using past - returns at each point in time of a backtest. - :type sigma_squares: pd.DataFrame or pd.Series or None + It represents the objective term: + + .. math:: + {(w^+_t - w^\text{b}_t )}^T \mathbf{diag}(\sigma_t^2) + (w^+_t - w^\text{b}_t) + + where :math:`w^+_t` and :math:`w^\text{b}_t` are the post-trade + and the benchmark weights, respectively, at time :math:`t`. + + :param sigma_squares: Per-asset variances, either constant + (Pandas series) or changing in time (time-indexed Pandas dataframe) + (see :ref:`the passing data manual page `). + Default is to use historical variances, using + past returns at each point in time of a backtest. If you wish to + change the parameters to + :class:`cvxportfolio.forecast.HistoricalVariance` you can instantiate + it with your choice of parameters and pass the instance. + :type sigma_squares: pd.DataFrame, pd.Series, cvx.forecast.BaseForecast + class or instance """ def __init__(self, sigma_squares=HistoricalVariance): @@ -283,16 +316,16 @@ class FactorModelCovariance(Cost): passed to :class:`FullCovariance` (by default, historical covariance fitted at each point in time). We take its PCA for the low-rank model, and the remaining factors are used to estimate - the diagonal, as is explained at pages 59-60 of the book. If it is a + the diagonal, as is explained at pages 59-60 of the paper. If it is a class, we instantiate it with default parameters. - :type Sigma: pandas.DataFrame or Estimator + :type Sigma: pandas.DataFrame or cvxportfolio.forecast.BaseForecast :param F_and_d_Forecaster: Only relevant if F or d are None, and Sigma is None. Forecaster that at each point in time produces estimate of F and d. By default we use a SVD-based forecaster that is equivalent to :class:`cvxportfolio.forecast.HistoricalFactorizedCovariance` if there are no missing values. If you pass a class, it will be instantiated with ``num_factors``. - :type F_and_d_Forecaster: Estimator + :type F_and_d_Forecaster: cvxportfolio.forecast.BaseForecast """ def __init__(self, F=None, d=None, Sigma_F=None, num_factors=1, @@ -338,7 +371,8 @@ def initialize_estimator( # pylint: disable=arguments-differ if self.Sigma_F is None: self.factor_exposures_parameter = self.F.parameter else: - # we could refactor the code here so we don't create duplicate parameters + # we could refactor the code here + # so we don't create duplicate parameters self.factor_exposures_parameter = cp.Parameter( self.F.parameter.shape) diff --git a/cvxportfolio/simulator.py b/cvxportfolio/simulator.py index a42d69321..19710e8c9 100644 --- a/cvxportfolio/simulator.py +++ b/cvxportfolio/simulator.py @@ -252,16 +252,17 @@ def simulate( h_plus = h + u # evaluate cost functions - realized_costs = {cost.__class__.__name__: cost.simulate( - t=t, u=u, h_plus=h_plus, - past_volumes=past_volumes, - current_volumes=current_volumes, - past_returns=past_returns, - current_returns=current_returns, - current_prices=current_prices, - t_next=t_next, - periods_per_year=self.market_data.periods_per_year, - windowsigma=self.market_data.periods_per_year) + realized_costs = {cost.__class__.__name__: getattr(cost, + # to support interface before 1.2.0 + 'simulate_recursive' if hasattr(cost, 'simulate_recursive') + else 'simulate')( + t=t, u=u, h_plus=h_plus, past_volumes=past_volumes, + current_volumes=current_volumes, past_returns=past_returns, + current_returns=current_returns, + current_prices=current_prices, + current_weights=current_weights, + current_portfolio_value=current_portfolio_value, + t_next=t_next) for cost in self.costs} # initialize tomorrow's holdings @@ -284,6 +285,13 @@ def _get_initialized_policy(self, orig_policy, universe, trading_calendar): policy.initialize_estimator_recursive( universe=universe, trading_calendar=trading_calendar) + # we also initialize cost objects + for cost in self.costs: + if hasattr(cost, 'initialize_estimator_recursive'): + # to support interface before 1.2.0 + cost.initialize_estimator_recursive( + universe=universe, trading_calendar=trading_calendar) + # if policy uses a cache load it from disk if hasattr(policy, '_cache'): logger.info('Trying to load cache from disk...') @@ -291,15 +299,17 @@ def _get_initialized_policy(self, orig_policy, universe, trading_calendar): signature=self.market_data.partial_universe_signature(universe), base_location=self.base_location) - # if hasattr(policy, 'compile_to_cvxpy'): - # policy.compile_to_cvxpy() - return policy def _finalize_policy(self, policy, universe): policy.finalize_estimator_recursive() # currently unused + for cost in self.costs: + if hasattr(cost, 'finalize_estimator_recursive'): + # to support interface before 1.2.0 + cost.finalize_estimator_recursive() # currently unused + if hasattr(policy, '_cache'): logger.info('Storing cache from policy to disk...') _store_cache( diff --git a/cvxportfolio/tests/test_costs.py b/cvxportfolio/tests/test_costs.py index 4bdeb0fc2..818c57e28 100644 --- a/cvxportfolio/tests/test_costs.py +++ b/cvxportfolio/tests/test_costs.py @@ -107,34 +107,42 @@ def test_hcost(self): """Test holding cost model.""" dividends = pd.Series(np.random.randn(self.N-1), self.returns.columns[:-1]) - hcost = cvx.HoldingCost(short_fees=5, long_fees=3, dividends=dividends) - t = 100 # this is picked so that periods_per_year evaluates to 252 - hcost.initialize_estimator_recursive( - universe=self.returns.columns, trading_calendar=self.returns.index) - expression = hcost.compile_to_cvxpy( - self.w_plus, self.z, self.w_plus_minus_w_bm) - hcost.values_in_time_recursive( - t=self.returns.index[t], past_returns=self.returns.iloc[:t]) - - for _ in range(10): - self.w_plus.value = np.random.randn(self.N) - self.w_plus.value[-1] = 1 - np.sum(self.w_plus.value[:-1]) - - print(expression.value) - - self.assertTrue( - np.isclose(expression.value, - # short fees - - np.sum(np.minimum( - self.w_plus.value[:-1], 0.)) - * (np.exp(np.log(1.05)/252) - 1) - # long fees - + np.sum(np.maximum( - self.w_plus.value[:-1], 0.)) - * (np.exp(np.log(1.03)/252) - 1) - # dividends - - self.w_plus.value[:-1].T @ dividends)) + for hcost, t in [ + [cvx.HoldingCost(short_fees=5, long_fees=3, dividends=dividends), + 100], # this is picked so that periods_per_year evaluates to 252 + [cvx.HoldingCost( + periods_per_year=252, short_fees=5, long_fees=3, + dividends=dividends), + 123], + ]: + + hcost.initialize_estimator_recursive( + universe=self.returns.columns, + trading_calendar=self.returns.index) + expression = hcost.compile_to_cvxpy( + self.w_plus, self.z, self.w_plus_minus_w_bm) + hcost.values_in_time_recursive( + t=self.returns.index[t], past_returns=self.returns.iloc[:t]) + + for _ in range(10): + self.w_plus.value = np.random.randn(self.N) + self.w_plus.value[-1] = 1 - np.sum(self.w_plus.value[:-1]) + + print(expression.value) + + self.assertTrue( + np.isclose(expression.value, + # short fees + - np.sum(np.minimum( + self.w_plus.value[:-1], 0.)) + * (np.exp(np.log(1.05)/252) - 1) + # long fees + + np.sum(np.maximum( + self.w_plus.value[:-1], 0.)) + * (np.exp(np.log(1.03)/252) - 1) + # dividends + - self.w_plus.value[:-1].T @ dividends)) def test_softconstraint(self): """Test code of SoftConstraints that is not covered elsewhere.""" @@ -231,6 +239,40 @@ def test_tcost(self): self.assertTrue(np.isclose(expression.value, est_tcost_lin+est_tcost_nonnlin)) + # also c + tcost = cvx.StocksTransactionCost( + a=0.001/2, pershare_cost=pershare_cost, + b=b, window_sigma_est=250, window_volume_est=250, c=0.001, + exponent=1.5) + tcost.initialize_estimator_recursive( + universe=self.returns.columns, trading_calendar=self.returns.index) + expression = tcost.compile_to_cvxpy( + self.w_plus, self.z, self.w_plus_minus_w_bm) + + tcost.values_in_time_recursive( + t=self.returns.index[34], + current_portfolio_value=value, + past_returns=self.returns.iloc[:34], + past_volumes=self.volumes.iloc[:34], + current_prices=pd.Series(np.ones(self.returns.shape[1]-1), + self.returns.columns[:-1])) + + self.z.value = np.random.randn(self.returns.shape[1]) + self.z.value[-1] = -np.sum(self.z.value[:-1]) + + est_tcost_lin = sum(np.abs(self.z.value[:-1]) * 0.0005) + volumes_est = self.volumes.iloc[:34].mean().values + sigmas_est = np.sqrt((self.returns.iloc[:34, :-1]**2).mean()).values + est_tcost_nonnlin = ( + np.abs(self.z.value[:-1])**(3/2)) @ ( + sigmas_est * np.sqrt(value / volumes_est)) + est_tcost_c = np.sum(self.z.value[:-1] * 0.001) + print(est_tcost_lin) + print(est_tcost_nonnlin) + print(expression.value) + self.assertTrue(np.isclose(expression.value, + est_tcost_lin+est_tcost_nonnlin+est_tcost_c)) + if __name__ == '__main__': diff --git a/cvxportfolio/tests/test_estimator.py b/cvxportfolio/tests/test_estimator.py index f61f2a63e..80a52a4e1 100644 --- a/cvxportfolio/tests/test_estimator.py +++ b/cvxportfolio/tests/test_estimator.py @@ -47,20 +47,27 @@ def test_callable(self): """Test DataEstimator with an internal Estimator.""" estimator = DataEstimator(PlaceholderCallable(1.0)) time = pd.Timestamp("2022-01-01") - self.assertEqual(estimator.values_in_time_recursive(t=time), 1.0) + self.assertEqual(estimator.values_in_time_recursive( + t=time, current_portfolio_value=1000), 1.0) estimator = DataEstimator(PlaceholderCallable(np.nan)) with self.assertRaises(NaNError): - estimator.values_in_time_recursive(t=time) + estimator.values_in_time_recursive( + t=time, current_portfolio_value=1000) data = np.arange(10.0) estimator = DataEstimator(PlaceholderCallable(data)) self.assertTrue( - np.all(estimator.values_in_time_recursive(t=time) == data)) + np.all(estimator.values_in_time_recursive( + t=time, current_portfolio_value=1000) == data)) + + with self.assertRaises(ValueError): + estimator.simulate_recursive(t=time, current_portfolio_value=1000) data[1] = np.nan with self.assertRaises(NaNError): - estimator.values_in_time_recursive(t=time) + estimator.values_in_time_recursive( + t=time, current_portfolio_value=1000) def test_scalar(self): """Test DataEstimator with a scalar.""" diff --git a/cvxportfolio/tests/test_forecast.py b/cvxportfolio/tests/test_forecast.py index 9ac61897b..42ccbaf76 100644 --- a/cvxportfolio/tests/test_forecast.py +++ b/cvxportfolio/tests/test_forecast.py @@ -18,83 +18,333 @@ import numpy as np import pandas as pd -from cvxportfolio.forecast import (ForecastError, +from cvxportfolio.forecast import (ForecastError, HistoricalCovariance, HistoricalFactorizedCovariance, HistoricalLowRankCovarianceSVD, HistoricalMeanError, HistoricalMeanReturn, + HistoricalMeanVolume, HistoricalStandardDeviation, HistoricalVariance) from cvxportfolio.tests import CvxportfolioTest class TestForecast(CvxportfolioTest): - """Test forecast estimators and their caching.""" + """Test forecast estimators and their caching. + + In most cases we test against the relevant pandas function as reference. + """ + + def test_historical_mean_volume(self): + """Test mean volume forecaster.""" + + forecaster = HistoricalMeanVolume() + for tidx in [20, 21, 22, 25, 26, 27]: + cvx_val = forecaster.values_in_time_recursive( + t=self.volumes.index[tidx], + past_returns=self.returns.iloc[:tidx], + past_volumes=self.volumes.iloc[:tidx]) + + self.assertTrue(np.allclose(cvx_val, + self.volumes.iloc[:tidx].mean())) + + with self.assertRaises(ValueError): + cvx_val = forecaster.values_in_time_recursive( + t=self.volumes.index[tidx], + past_returns=self.returns.iloc[:tidx], + past_volumes=None) + + def test_vector_fc_syntax(self): + """Test syntax of vector forecasters.""" + with self.assertRaises(ValueError): + forecaster = HistoricalMeanReturn(rolling=3) + forecaster.values_in_time_recursive( + t=pd.Timestamp.today(), past_returns=self.returns) - def test_mean_update(self): - """Test the mean forecaster.""" - forecaster = HistoricalMeanReturn() # lastforcash=True) + with self.assertRaises(ValueError): + forecaster = HistoricalMeanReturn(rolling=pd.Timedelta('0d')) + forecaster.values_in_time_recursive( + t=pd.Timestamp.today(), past_returns=self.returns) + + with self.assertRaises(ForecastError): + forecaster = HistoricalMeanReturn() + returns = pd.DataFrame(self.returns, copy=True) + returns.iloc[:40, 3:10] = np.nan + forecaster.values_in_time_recursive( + t=self.returns.index[20], past_returns=returns.iloc[:20]) + # test that update throws exception when moving window results + # in invalid data + forecaster = HistoricalMeanReturn(rolling=pd.Timedelta('10d')) returns = pd.DataFrame(self.returns, copy=True) - returns.iloc[:20, 3:10] = np.nan + returns.iloc[20:, 3:10] = np.nan + last_valid_t = 25 + forecaster.values_in_time_recursive( + t=self.returns.index[last_valid_t], + past_returns=returns.iloc[:last_valid_t]) + + with self.assertRaises(ForecastError): + forecaster.values_in_time_recursive( + t=self.returns.index[last_valid_t+1], + past_returns=returns.iloc[:last_valid_t+1]) + + def _base_test_vector_update( + # pylint: disable=too-many-arguments + self, forecaster, fc_kwargs, df_attr, pd_kwargs, df_callable=None, + with_nans=True): + """Base function to test vector updates (mean, std, var).""" + forecaster = forecaster(**fc_kwargs) + + returns = pd.DataFrame(self.returns, copy=True) + if with_nans: + returns.iloc[:40, 3:10] = np.nan for tidx in [50, 51, 52, 55, 56, 57]: t = returns.index[tidx] past_returns = returns.loc[returns.index < t] - mean = forecaster.values_in_time_recursive( + cvx_val = forecaster.values_in_time_recursive( t=t, past_returns=past_returns) - # print(mean) - # self.assertTrue(mean[-1] == past_returns.iloc[-1,-1]) + self.assertFalse(np.any(np.isnan(cvx_val))) self.assertTrue(np.allclose( - mean, past_returns.iloc[:, :-1].mean())) + cvx_val, + df_callable(past_returns.iloc[:, :-1]) + if df_callable is not None else + getattr(past_returns.iloc[:, :-1], df_attr)(**pd_kwargs))) + + def test_mean_update(self): + """Test the mean forecaster.""" + self._base_test_vector_update(HistoricalMeanReturn, {}, 'mean', {}) def test_variance_update(self): """Test the variance forecaster.""" - forecaster = HistoricalVariance(kelly=False) + self._base_test_vector_update( + HistoricalVariance, {'kelly': False}, 'var', {'ddof': 0}) + + def test_stddev_update(self): + """Test the standard deviation forecaster.""" + self._base_test_vector_update( + HistoricalStandardDeviation, {'kelly': False}, 'std', {'ddof': 0}) + + def test_meanerror_update(self): + """Test the forecaster of the standard deviation on the mean.""" + + def _me(past_returns_noncash): + return past_returns_noncash.std(ddof=0)\ + / np.sqrt(past_returns_noncash.count()) + + self._base_test_vector_update( + HistoricalMeanError, {}, None, None, df_callable=_me) + + def _kelly_covariance(self, past_returns_noncash, half_life=None): + """Covariance using Kelly. + + Below we implement also the one not using Kelly. That should be + done by past_returns_noncash.cov(ddof=0) but a bug in pandas + prevents from doing so (see below). + """ + + # EMW + if half_life is not None: + index_in_halflifes = ( + past_returns_noncash.index - past_returns_noncash.index[-1] + ) / half_life + emw_weights = np.exp(index_in_halflifes * np.log(2)) + else: + emw_weights = pd.Series(1., past_returns_noncash.index) + + _ = past_returns_noncash.fillna(0.) + num = _.T @ _.multiply(emw_weights, axis=0) + _ = (~past_returns_noncash.isnull()) * 1. + den = _.T @ _.multiply(emw_weights, axis=0) + + return num / den + + def _nokelly_covariance_emw_nonans(self, past_returns_noncash, half_life): + """This is only without nans.""" + result = self._kelly_covariance( + past_returns_noncash, half_life=half_life) + means = self._mean_emw(past_returns_noncash, half_life) + return result - np.outer(means, means) + + def test_cov_update(self): + """Test the covariance forecaster.""" + + self._base_test_vector_update( + HistoricalCovariance, {'kelly': True}, None, None, + df_callable=self._kelly_covariance) + + def test_cov_update_nokelly(self): + """Test the covariance forecaster without Kelly correction. + + Due to a bug in pandas, we can compare with Pandas' DataFrame.cov + only if the df has no NaNs. + """ + + self._base_test_vector_update( + HistoricalCovariance, {'kelly': False}, 'cov', {'ddof': 0}, + with_nans=False) + + def _base_test_moving_window_vector_update( + # pylint: disable=too-many-arguments + self, forecaster, fc_kwargs, df_attr, pd_kwargs, df_callable=None, + with_nans=True): + """Base test for vector quantities using a moving window.""" + + window = pd.Timedelta('20d') + forecaster = forecaster(**fc_kwargs, rolling=window) returns = pd.DataFrame(self.returns, copy=True) - returns.iloc[:20, 3:10] = np.nan + if with_nans: + returns.iloc[:40, 3:10] = np.nan for tidx in [50, 51, 52, 55, 56, 57]: t = returns.index[tidx] past_returns = returns.loc[returns.index < t] - var = forecaster.values_in_time_recursive( + past_returns_window = past_returns.loc[ + past_returns.index >= t - window] + cvx_val = forecaster.values_in_time_recursive( t=t, past_returns=past_returns) - print(var) - # self.assertTrue(mean[-1] == past_returns.iloc[-1,-1]) - self.assertTrue(np.allclose(var, past_returns.var(ddof=0)[:-1])) - def test_stddev_update(self): - """Test the standard deviation forecaster.""" - forecaster = HistoricalStandardDeviation(kelly=False) + self.assertFalse(np.any(np.isnan(cvx_val))) + test_val = (df_callable(past_returns_window.iloc[:, :-1]) + if df_callable is not None else + getattr(past_returns_window.iloc[:, :-1], + df_attr)(**pd_kwargs)) + + self.assertTrue(np.allclose(cvx_val, test_val)) + + def test_mean_update_moving_window(self): + """Test the mean forecaster with moving window.""" + self._base_test_moving_window_vector_update( + HistoricalMeanReturn, {}, 'mean', {}) + + def test_variance_update_moving_window(self): + """Test the variance forecaster with moving window.""" + self._base_test_moving_window_vector_update( + HistoricalVariance, {'kelly': False}, 'var', {'ddof': 0}) + + def test_stddev_update_moving_window(self): + """Test the standard deviation forecaster with moving window.""" + self._base_test_moving_window_vector_update( + HistoricalStandardDeviation, {'kelly': False}, 'std', {'ddof': 0}) + + def test_meanerror_update_moving_window(self): + """Test standard deviation on the mean with moving window.""" + + def _me(past_returns_noncash): + return past_returns_noncash.std(ddof=0)\ + / np.sqrt(past_returns_noncash.count()) + + self._base_test_moving_window_vector_update( + HistoricalMeanError, {}, None, None, df_callable=_me) + + def test_cov_update_moving_window(self): + """Test the covariance forecaster with moving window.""" + + self._base_test_moving_window_vector_update( + HistoricalCovariance, {'kelly': True}, None, None, + df_callable=self._kelly_covariance) + + def test_cov_update_moving_window_nokelly(self): + """Test the covariance forecaster with moving window without Kelly.""" + + self._base_test_moving_window_vector_update( + HistoricalCovariance, {'kelly': False}, 'cov', {'ddof': 0}, + with_nans=False) + + def _base_test_exponential_moving_window_vector_update( + self, forecaster, fc_kwargs, df_callable=None, with_nans=True): + """Base test for vector quantities using an exponential moving window, + and an exponential moving window in combination with a moving window.""" + half_life = pd.Timedelta('20d') + inst_forecaster = forecaster(**fc_kwargs, half_life=half_life) returns = pd.DataFrame(self.returns, copy=True) - returns.iloc[:20, 3:10] = np.nan + if with_nans: + returns.iloc[:40, 3:10] = np.nan - for tidx in [50, 51, 52, 55, 56, 57]: + for tidx in [50, 51, 52, 55, 56, 57, 58, 59, 60]: t = returns.index[tidx] past_returns = returns.loc[returns.index < t] - std = forecaster.values_in_time_recursive( + + cvx_val = inst_forecaster.values_in_time_recursive( t=t, past_returns=past_returns) - print(std) - # self.assertTrue(mean[-1] == past_returns.iloc[-1,-1]) - self.assertTrue(np.allclose(std, past_returns.std(ddof=0)[:-1])) - def test_meanerror_update(self): - """Test the forecaster of the standard deviation on the mean.""" - forecaster = HistoricalMeanError() + self.assertFalse(np.any(np.isnan(cvx_val))) + + # print(cvx_val - df_callable(past_returns.iloc[:,:-1], half_life)) + self.assertTrue(np.allclose( + cvx_val, df_callable(past_returns.iloc[:, :-1], half_life))) + + # Test with both MA and EMA + half_life = pd.Timedelta('10d') + window = pd.Timedelta('20d') + inst_forecaster = forecaster( + **fc_kwargs, rolling = window, half_life=half_life) returns = pd.DataFrame(self.returns, copy=True) - returns.iloc[:20, 3:10] = np.nan + if with_nans: + returns.iloc[:40, 3:10] = np.nan - for tidx in [50, 51, 52, 55, 56, 57]: + for tidx in [50, 51, 52, 55, 56, 57, 58, 59, 60]: t = returns.index[tidx] past_returns = returns.loc[returns.index < t] - val = forecaster.values_in_time_recursive( + past_returns_window = past_returns.loc[ + past_returns.index >= t - window] + cvx_val = inst_forecaster.values_in_time_recursive( t=t, past_returns=past_returns) - print(val) - # self.assertTrue(mean[-1] == past_returns.iloc[-1,-1]) - self.assertTrue(np.allclose(val, past_returns.std(ddof=0)[ - :-1] / np.sqrt(past_returns.count()[:-1]))) + + self.assertFalse(np.any(np.isnan(cvx_val))) + self.assertTrue(np.allclose( + cvx_val, df_callable( + past_returns_window.iloc[:, :-1], half_life))) + + @staticmethod + def _mean_emw(past_returns_noncash, half_life): + return past_returns_noncash.ewm( + halflife=half_life, times=past_returns_noncash.index + ).mean().iloc[-1] + + def test_mean_update_exponential_moving_window(self): + """Test the mean forecaster with exponential moving window.""" + + self._base_test_exponential_moving_window_vector_update( + HistoricalMeanReturn, {}, self._mean_emw) + + @staticmethod + def _var_emw(past_returns_noncash, half_life): + """We need to do this b/c pandas.DataFrame.emw.var doesn't support + the ddof=0 option.""" + return (past_returns_noncash**2).ewm( + halflife=half_life, times=past_returns_noncash.index + ).mean().iloc[-1] - TestForecast._mean_emw( + past_returns_noncash, half_life)**2 + + def test_variance_update_exponential_moving_window(self): + """Test the var forecaster with exponential moving window.""" + + self._base_test_exponential_moving_window_vector_update( + HistoricalVariance, {'kelly': False}, self._var_emw) + + def test_stddev_update_exponential_moving_window(self): + """Test the std forecaster with exponential moving window.""" + def _std_emw(*args): + return np.sqrt(self._var_emw(*args)) + self._base_test_exponential_moving_window_vector_update( + HistoricalStandardDeviation, {'kelly': False}, _std_emw) + + def test_cov_update_exponential_moving_window(self): + """Test the covariance forecaster with exponential moving window.""" + + self._base_test_exponential_moving_window_vector_update( + HistoricalCovariance, {'kelly': True}, + df_callable=self._kelly_covariance) + + def test_cov_update_exponential_moving_window_nokelly(self): + """Test the covariance forecaster with exponential moving window.""" + + self._base_test_exponential_moving_window_vector_update( + HistoricalCovariance, {'kelly': False}, + df_callable=self._nokelly_covariance_emw_nonans, with_nans=False) def test_counts_matrix(self): """Test internal method(s) of HistoricalFactorizedCovariance.""" @@ -104,7 +354,8 @@ def test_counts_matrix(self): returns.iloc[10:15, 10:20] = np.nan # pylint: disable=protected-access - count_matrix = forecaster._get_count_matrix(returns) + count_matrix = forecaster._compute_denominator( + returns.iloc[:, :-1], None) for indexes in [(1, 2), (4, 5), (1, 5), (7, 18), (7, 24), (1, 15), (13, 22)]: @@ -127,21 +378,21 @@ def test_sum_matrix(self): t=pd.Timestamp('2022-01-01'), past_returns=returns) # pylint: disable=protected-access - sum_matrix = forecaster._last_sum_matrix + sum_matrix = forecaster._numerator for indexes in [(1, 2), (4, 5), (1, 5), (7, 18), (7, 24), (1, 15), (13, 22)]: print() - print(sum_matrix[indexes[0], indexes[1]]) + print(sum_matrix.iloc[indexes[0], indexes[1]]) print((returns.iloc[:, indexes[0]] * returns.iloc[:, indexes[1]]).sum()) self.assertTrue(np.isclose( - sum_matrix[indexes[0], indexes[1]], + sum_matrix.iloc[indexes[0], indexes[1]], (returns.iloc[:, indexes[0]] * returns.iloc[:, indexes[1]]).sum() )) - def test_covariance_update(self): + def test_covariance_update_withnans(self): """Test covariance forecast estimator.""" forecaster = HistoricalFactorizedCovariance() @@ -173,7 +424,7 @@ def _compute_covariance(rets): self.assertTrue( np.allclose(Sigma, _compute_covariance(past_returns))) - def test_covariance_update_nokelly(self): + def test_covariance_update_nokelly_withnans(self): """Test covariance forecast estimator. NOTE: due to a bug in pandas we can't test against diff --git a/cvxportfolio/tests/test_hyperparameters.py b/cvxportfolio/tests/test_hyperparameters.py index 3772adb5f..ed79889ea 100644 --- a/cvxportfolio/tests/test_hyperparameters.py +++ b/cvxportfolio/tests/test_hyperparameters.py @@ -44,11 +44,18 @@ def test_repr(self): - cvx.Gamma() * cvx.StocksTransactionCost() self.assertTrue(str(obj) == - 'ReturnsForecast(r_hat=HistoricalMeanReturn(), decay=1.0)' + 'ReturnsForecast(r_hat=HistoricalMeanReturn(half_life=inf,' + + ' rolling=inf), decay=1.0)' + '- Gamma(current_value=1.0) * FullCovariance(' - + 'Sigma=HistoricalFactorizedCovariance(kelly=True))' - + '- Gamma(current_value=1.0) * StocksTransactionCost(' - + 'a=0.0, pershare_cost=0.005, b=1.0, exponent=1.5)') + + 'Sigma=HistoricalFactorizedCovariance(half_life=inf,' + + ' rolling=inf, kelly=True))' + + '- Gamma(current_value=1.0) * StocksTransactionCost(a=0.0, ' + + 'b=1.0, market_volumes=VolumeHatOrRealized(' + + 'volume_hat=HistoricalMeanVolume(half_life=inf, ' + + "rolling=Timedelta('365 days 05:45:36'))), " + + "sigma=HistoricalStandardDeviation(half_life=inf, " + + "rolling=Timedelta('365 days 05:45:36'), kelly=True), " + + 'exponent=1.5, pershare_cost=0.005)') print(cvx.Gamma() * cvx.Gamma()) print(cvx.Gamma() - cvx.Gamma()) diff --git a/cvxportfolio/tests/test_policies.py b/cvxportfolio/tests/test_policies.py index b368cc01e..5051104da 100644 --- a/cvxportfolio/tests/test_policies.py +++ b/cvxportfolio/tests/test_policies.py @@ -308,7 +308,7 @@ def test_single_period_optimization(self): risk_forecast = cvx.FullCovariance( HistoricalFactorizedCovariance(kelly=False)) tcost = cvx.TransactionCost( - a=1E-3/2, pershare_cost=0., b=None, exponent=2) + a=1E-3/2, b=None, exponent=2) policy = cvx.SinglePeriodOptimization( return_forecast @@ -371,7 +371,7 @@ def test_single_period_optimization_solve_twice(self): policy = cvx.SinglePeriodOptimization( return_forecast - 2 * risk_forecast - - cvx.TransactionCost(a=5 * 1E-4, pershare_cost=0., b=0.), + - cvx.TransactionCost(a=5 * 1E-4, b=0.), constraints=[cvx.LongOnly(), cvx.LeverageLimit(1)], # verbose=True, solver=self.default_socp_solver) @@ -417,7 +417,7 @@ def test_single_period_optimization_infeasible(self): policy = cvx.SinglePeriodOptimization( return_forecast - 2 * risk_forecast - - cvx.TransactionCost(a=5 * 1E-4, pershare_cost=0., b=0.), + - cvx.TransactionCost(a=5 * 1E-4, b=0.), constraints=[cvx.LongOnly(), cvx.LeverageLimit(1), cvx.MaxWeights(-1)], # verbose=True, @@ -520,7 +520,7 @@ def test_multi_period_optimization2(self): return_forecast - 10 * risk_forecast # - TcostModel(half_spread=5 * 1E-4) # , power=2) - - cvx.TransactionCost(a=25 * 1E-4, pershare_cost=0., b=0.), + - cvx.TransactionCost(a=25 * 1E-4, b=0.), constraints=[cvx.LongOnly(), cvx.LeverageLimit(1)], # verbose=True, planning_horizon=planning_horizon, @@ -576,7 +576,7 @@ def test_multi_period_optimization3(self): return_forecast - 10 * risk_forecast # , power=2) - - cvx.TransactionCost(a=5 * 1E-4, pershare_cost=0., b=0.), + - cvx.TransactionCost(a=5 * 1E-4, b=0.), constraints=[cvx.LongOnly(), cvx.LeverageLimit(1)], # verbose=True, terminal_constraint=benchmark, diff --git a/cvxportfolio/tests/test_simulator.py b/cvxportfolio/tests/test_simulator.py index c1ae7f3d8..6257244bb 100644 --- a/cvxportfolio/tests/test_simulator.py +++ b/cvxportfolio/tests/test_simulator.py @@ -214,9 +214,20 @@ def test_holding_cost(self): hcost = cvx.HoldingCost(short_fees=5, dividends=dividends) - sim_hcost = hcost.simulate( + hcost.initialize_estimator_recursive( + universe=h_plus.index, trading_calendar=[t]) + + sim_hcost = hcost.simulate_recursive( t=t, h_plus=h_plus, - t_next=t + pd.Timedelta('1d')) + u=pd.Series(1., h_plus.index), + t_next=t + pd.Timedelta('1d'), + past_returns=None, + current_returns=None, + past_volumes=None, + current_volumes=None, + current_prices=None, + current_portfolio_value=sum(h_plus), + current_weights=None) hcost = -(np.exp(np.log(1.05)/365.24)-1) * sum( -np.minimum(h_plus, 0.)[:-1]) @@ -237,34 +248,63 @@ def test_transaction_cost_syntax(self): tcost = cvx.StocksTransactionCost() # syntax checks with self.assertRaises(SyntaxError): - tcost.simulate(t, u=u, + tcost.initialize_estimator_recursive( + universe=current_returns.index, trading_calendar=[t]) + tcost.simulate_recursive(t=t, u=u, past_returns=past_returns, + t_next=None, h_plus=pd.Series(1., u.index), current_returns=current_returns, past_volumes=past_volumes, current_volumes=current_volumes, - current_prices=None) + current_prices=None, + current_portfolio_value=1000, + current_weights=None,) - tcost = cvx.TransactionCost(pershare_cost=None,) - tcost.simulate(t, u=u, current_prices=None, + tcost = cvx.TransactionCost() + tcost.initialize_estimator_recursive( + universe=current_returns.index, trading_calendar=[t]) + tcost.simulate_recursive(t=t, u=u, current_prices=None, + t_next=None, h_plus=pd.Series(1., u.index), past_returns=past_returns, current_returns=current_returns, past_volumes=past_volumes, - current_volumes=current_volumes) + current_volumes=current_volumes, + current_portfolio_value=1000, + current_weights=None,) - tcost = cvx.TransactionCost() + tcost = cvx.TransactionCost(b=0.) with self.assertRaises(SyntaxError): - tcost.simulate(t, u=u, current_prices=current_prices, + tcost.simulate_recursive(t=t, u=u, current_prices=current_prices, past_returns=past_returns, current_returns=current_returns, past_volumes=None, - current_volumes=None) + current_volumes=None, + current_portfolio_value=None, + current_weights=None,) + + tcost = cvx.StocksTransactionCost(window_volume_est=252) + with self.assertRaises(SyntaxError): + tcost.values_in_time_recursive(t=t, + current_prices=current_prices, + past_returns=past_returns, + past_volumes=None, + current_portfolio_value=1000, + current_weights=None,) + + with self.assertRaises(SyntaxError): + _ = cvx.TransactionCost(b=1, exponent=.9) tcost = cvx.TransactionCost(b=None) - tcost.simulate(t, u=u, current_prices=current_prices, + tcost.initialize_estimator_recursive( + universe=current_returns.index, trading_calendar=[t]) + tcost.simulate_recursive(t=t, u=u, current_prices=current_prices, + t_next=None, h_plus=pd.Series(1., u.index), past_returns=past_returns, current_returns=current_returns, past_volumes=None, - current_volumes=None) + current_volumes=None, + current_portfolio_value=1000, + current_weights=None,) def test_transaction_cost(self): """Test (Stock)TransactionCost simulator's interface.""" @@ -288,10 +328,16 @@ def test_transaction_cost(self): # pylint: disable=protected-access u = MarketSimulator._round_trade_vector(u, current_prices) - tcost = cvx.StocksTransactionCost(a=spreads/2) + tcost = cvx.StocksTransactionCost( + a=spreads/2, window_sigma_est=252) + tcost.initialize_estimator_recursive( + universe=current_returns.index, trading_calendar=[t]) - sim_cost = tcost.simulate( - t, u=u, current_prices=current_prices, + sim_cost = tcost.simulate_recursive( + t=t, u=u, current_prices=current_prices, + t_next=None, h_plus=pd.Series(1000., u.index), + current_portfolio_value=1E4, + current_weights=None, past_returns=past_returns, current_returns=current_returns, past_volumes=past_volumes, @@ -364,13 +410,22 @@ def test_simulate_policy(self): start_time, end_time, include_end=False) ) + for cost in simulator.costs: + cost.initialize_estimator_recursive( + universe=simulator.market_data.full_universe, + trading_calendar=simulator.market_data.trading_calendar( + start_time, end_time, include_end=False) + ) + for (i, t) in enumerate(simulator.market_data.returns.index[ (simulator.market_data.returns.index >= start_time) & ( simulator.market_data.returns.index <= end_time)]): - t_next = simulator.market_data.returns.index[i+1] + t_next = simulator.market_data.returns.index[ + simulator.market_data.returns.index.get_loc(t) + 1] oldcash = h.iloc[-1] past_returns, current_returns, past_volumes, current_volumes, \ current_prices = simulator.market_data.serve(t) + # import code; code.interact(local=locals()) h, _, _, costs, _ = simulator.simulate( t=t, h=h, policy=policy, t_next=t_next, past_returns=past_returns, current_returns=current_returns, @@ -379,8 +434,8 @@ def test_simulate_policy(self): tcost, hcost = costs['StocksTransactionCost' ], costs['StocksHoldingCost'] assert tcost == 0. - # if np.all(h0[:2] > 0): - # assert hcost == 0. + if np.all(h0[:2] > 0): + assert hcost == 0. assert np.isclose( (oldcash - hcost) * (1+simulator.market_data.returns.loc[ t, 'USDOLLAR']), h.iloc[-1]) @@ -411,7 +466,8 @@ def test_simulate_policy(self): for i, t in enumerate(simulator.market_data.returns.index[ (simulator.market_data.returns.index >= start_time) & (simulator.market_data.returns.index <= end_time)]): - t_next = simulator.market_data.returns.index[i+1] + t_next = simulator.market_data.returns.index[ + simulator.market_data.returns.index.get_loc(t) + 1] oldcash = h.iloc[-1] past_returns, current_returns, past_volumes, current_volumes, \ current_prices = simulator.market_data.serve(t) @@ -979,4 +1035,5 @@ def test_cache_missing_signature(self): if __name__ == '__main__': - unittest.main(warnings='error') # pragma: no cover + # unittest.main(warnings='error') # pragma: no cover + unittest.main() # pragma: no cover diff --git a/docs/constraints.rst b/docs/constraints.rst index 1a279d957..dbca11ddd 100644 --- a/docs/constraints.rst +++ b/docs/constraints.rst @@ -8,6 +8,8 @@ Constraints .. autoclass:: DollarNeutral +.. autoclass:: MarketNeutral + .. autoclass:: FactorMaxLimit .. autoclass:: FactorMinLimit @@ -37,15 +39,15 @@ Constraints .. autoclass:: TurnoverLimit +.. _soft-constraints: Soft constraints ---------------- (Almost) all the constraints described above can also be be made "soft". -The concept is developed at pages 37-38 of -`the book `_, -and implemented by the objective term :class:`cvxportfolio.SoftConstraint`, -documented in :ref:`the objective terms page `. +The concept is developed in +:paper:`section 4.6 of the paper `, +and implemented by the objective term :class:`cvxportfolio.SoftConstraint`. In the case of a linear equality constraint, which can be expressed as :math:`h(x) = 0`, @@ -60,7 +62,7 @@ the corresponding soft constraint is the cost :math:`{(\cdot )}_+` denotes the positive part of each element of :math:`h(x)`. -In the book we describe having different penalizers for +In the paper we describe having different penalizers for different elements of each constraint vector (so that the penalizer :math:`\gamma` is a vector). In our implementation this is achieved by constructing @@ -71,20 +73,21 @@ The syntax of our implementation is very simple. We pass a constraint instance t :class:`cvxportfolio.SoftConstraint`, multiply the term by the penalizer, and subtract it from the objective function. For a high value of the penalizer the constraint will be -enforced almost exactly. For a small value, it will be almost ignored. +enforced almost exactly. For a small value, it will be almost ignored. +For example: .. code-block:: python - import cvxportfolio as cvx - - objective = cvx.ReturnsForecast() - gamma_risk * cvx.FullCovariance() + policy = cvx.SinglePeriodOptimization( + objective = + cvx.ReturnsForecast() + - 0.5 * cvx.FullCovariance() + - 10 * cvx.SoftConstraint(cvx.LeverageLimit(3))) - # for large enough value of the penalizer the constraint becomes "hard" - objective -= gamma_longonly * cvx.SoftConstraint(cvx.LongOnly()) - - cvx.MarketSimulator(universe).backtest( - cvx.SinglePeriodOptimization(objective).plot() +is a policy that almost enforces a leverage limit of 3, allowing for some +violation. This can be controlled by tuning the multiplier in front +of :class:`cvxportfolio.SoftConstraint`. Some constraint objects, which are in reality compositions of constraints, can not be used as soft constraints. See their documentation for more details. diff --git a/docs/costs.rst b/docs/costs.rst index 8e3b52f3d..d8352ccc6 100644 --- a/docs/costs.rst +++ b/docs/costs.rst @@ -3,8 +3,14 @@ Cost models ----------- +.. py:module:: cvxportfolio + :noindex: + .. automodule:: cvxportfolio.costs +Costs Documentation +~~~~~~~~~~~~~~~~~~~ + .. py:module:: cvxportfolio :noindex: @@ -20,4 +26,14 @@ Cost models .. autoclass:: TcostModel -.. autoclass:: HcostModel \ No newline at end of file +.. autoclass:: HcostModel + +Base classes (for defining your costs) +====================================== + + +.. autoclass:: cvxportfolio.costs.Cost + +.. autoclass:: cvxportfolio.costs.SimulatorCost + + .. automethod:: simulate diff --git a/docs/estimators.rst b/docs/estimators.rst index 172c10c01..10a637e3f 100644 --- a/docs/estimators.rst +++ b/docs/estimators.rst @@ -1,8 +1,6 @@ Estimators =========== -*Work in progress.* - .. automodule:: cvxportfolio.estimator .. py:module:: cvxportfolio.estimator @@ -22,6 +20,12 @@ Estimators .. automethod:: finalize_estimator_recursive +.. autoclass:: SimulatorEstimator + + .. automethod:: simulate + + .. automethod:: simulate_recursive + .. autoclass:: CvxpyExpressionEstimator .. automethod:: compile_to_cvxpy diff --git a/docs/forecasts.rst b/docs/forecasts.rst index 1ec3dad4a..a325e95bb 100644 --- a/docs/forecasts.rst +++ b/docs/forecasts.rst @@ -1,21 +1,32 @@ Forecasters =========== -*Work in progress.* - .. automodule:: cvxportfolio.forecast .. py:module:: cvxportfolio.forecast :noindex: +Forecasters Documentation +~~~~~~~~~~~~~~~~~~~~~~~~~ + .. autoclass:: HistoricalMeanReturn +.. autoclass:: HistoricalMeanVolume + .. autoclass:: HistoricalVariance .. autoclass:: HistoricalStandardDeviation .. autoclass:: HistoricalMeanError +.. autoclass:: HistoricalFactorizedCovariance + .. autoclass:: HistoricalLowRankCovarianceSVD -.. autoclass:: HistoricalFactorizedCovariance + +Base forecaster classes +~~~~~~~~~~~~~~~~~~~~~~~ + +*Work in progress.* + +.. autoclass:: BaseForecast diff --git a/docs/hello_world.rst b/docs/hello_world.rst index 970c1ee28..05ced20d9 100644 --- a/docs/hello_world.rst +++ b/docs/hello_world.rst @@ -9,6 +9,7 @@ and show the results. This example script is .. literalinclude:: ../examples/hello_world.py :language: python + :lines: 14- This is the output printed to screen when executing this script. You can see many statistics of the back-tests. The timestamps of the back-test are the diff --git a/docs/index.rst b/docs/index.rst index 2de020c7b..f0ce88363 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -69,9 +69,11 @@ Table of Contents hello_world quickstart manual - policies simulator - objective_terms + policies + returns + risks + costs constraints result data diff --git a/docs/internals.rst b/docs/internals.rst index ca2f71b78..b3a1162d3 100644 --- a/docs/internals.rst +++ b/docs/internals.rst @@ -2,11 +2,9 @@ Internal Objects and Interfaces =============================== Here we document the internal objects and interfaces used by Cvxportfolio. -Users rarely need to interact with these directly, but may be interested +Users may not need to interact with these directly, but could be interested in knowing how these work. -*Work in progress.* - .. toctree:: :maxdepth: 2 diff --git a/docs/objective_terms.rst b/docs/objective_terms.rst index b0d131203..95a9e3f07 100644 --- a/docs/objective_terms.rst +++ b/docs/objective_terms.rst @@ -10,11 +10,3 @@ Objective terms costs .. py:module:: cvxportfolio - - -Base classes (for defining your own objective terms) ----------------------------------------------------- - -.. autoclass:: cvxportfolio.costs.Cost - -.. autoclass:: cvxportfolio.costs.SimulatorCost \ No newline at end of file diff --git a/docs/risks.rst b/docs/risks.rst index ea9e07146..6aa0a6c12 100644 --- a/docs/risks.rst +++ b/docs/risks.rst @@ -22,7 +22,7 @@ Risk models Forecast error models ---------------------- +===================== .. autoclass:: ReturnsForecastError diff --git a/examples/market_neutral_nocosts.py b/examples/market_neutral_nocosts.py index 8eace2e29..125947eea 100644 --- a/examples/market_neutral_nocosts.py +++ b/examples/market_neutral_nocosts.py @@ -26,7 +26,8 @@ import cvxportfolio as cvx from cvxportfolio.result import LOG_FORMAT, RECORD_LOGS -from .universes import SP500, DOW30, NDX100 + +from .universes import DOW30, NDX100, SP500 logging.basicConfig(level=RECORD_LOGS, format=LOG_FORMAT) diff --git a/examples/strategies/ndx100_daily.py b/examples/strategies/ndx100_daily.py index cb1949a4b..5f0ef736f 100644 --- a/examples/strategies/ndx100_daily.py +++ b/examples/strategies/ndx100_daily.py @@ -28,7 +28,6 @@ Changed the start time for hyperparameter optimization from 2020-01-01 to 2012-01-01 and the CVXPY solver (and, 2024-01-12, the initial values of hyper-parameter optimization). - """ import cvxportfolio as cvx @@ -109,8 +108,7 @@ def policy(gamma_risk, gamma_trade): plt.show() - from .strategy_executor import main main(policy=policy, hyperparameter_opt_start=HYPERPAR_OPTIMIZE_START, objective=OBJECTIVE, universe=NDX100, - initial_values={'gamma_risk':20., 'gamma_trade':1.}) + initial_values={'gamma_risk': 20., 'gamma_trade': 1.}) diff --git a/examples/strategies/strategy_executor.py b/examples/strategies/strategy_executor.py index 54f3c47ae..b5e5ee398 100644 --- a/examples/strategies/strategy_executor.py +++ b/examples/strategies/strategy_executor.py @@ -291,7 +291,7 @@ def backtest_from_day(self, day): el for el in day_init_holdings.index if not el == self.cash_key] sim = cvx.StockMarketSimulator( market_data=cvx.DownloadedMarketData( - day_universe, min_history=pd.Timedelta('0d'), + day_universe, min_history=pd.Timedelta('0d'), cash_key=self.cash_key)) # This should be done by MarketSimulator, but for safety. diff --git a/git-split.sh b/git-split.sh new file mode 100755 index 000000000..09871d561 --- /dev/null +++ b/git-split.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash +# From https://stackoverflow.com/questions/3887736/keep-git-history-when-splitting-a-file + +if [[ $# -ne 2 ]] ; then + echo "Usage: git-split.sh original copy" + exit 0 +fi + +git mv $1 $2 +git commit -n -m "Split history $1 to $2" +REV=`git rev-parse HEAD` +git reset --hard HEAD^ +git mv $1 temp +git commit -n -m "Split history $1 to $2" +git merge $REV +git commit -a -n -m "Split history $1 to $2" +git mv temp $1 +git commit -n -m "Split history $1 to $2" \ No newline at end of file