Deep Deterministic Policy Gradient and Geometric Brownian Motion for Simulated Portfolio Optimization
Given N independently moving price series following geometric Brownian Motion with equivalent parameters, train a DDPG agent that maximizes the logarithmic portfolio value.
This is a 4-year long culmination of multiple quantitative trading algorithm projects listed below:
Suppose an asset's price,
where
Given
Given some state
Since the critic network directly maps the state-action space to reward, the action gradient
Let the state space be
All hyperparameters can be found in ./lib/param.hpp.
The results above show that the model tends to maximize holdings of assets with the best momentum. Furthermore, the model demonstrates arbitrage behavior as its final portfolio value closely reflects the average value of all N assets.
Note that these results above are merely experimental as the model's convergence and behavior may vary by the random seed assigned for simulating an environment. Further work and more complex optimization objectives would be needed for practical applications in real market environments.
https://www.columbia.edu/~ks20/FE-Notes/4700-07-Notes-GBM.pdf
https://arxiv.org/abs/1509.02971
https://spinningup.openai.com/en/latest/algorithms/ddpg.html