Welcome to AssetManagerGym, a custom OpenAI Gym environment designed for training reinforcement learning agents on stock trading tasks. This environment simulates stock market dynamics, allowing agents to learn trading strategies by interacting with historical stock data.
- Introduction
- Environment Description
- Installation
- Usage
- Parameters
- Agent Integration
- Example
- Contributing
- License
AssetManagerGym is a simulation environment that provides a platform for developing and testing reinforcement learning (RL) agents on stock trading tasks. It supports features like:
- Position management (long, short, flat)
- Stop loss and take profit mechanisms
- Customizable risk parameters
- Randomized stock selection for each episode
- Support for training and testing data splits
The environment is compatible with standard RL libraries and can be integrated with various deep RL algorithms.
- Random Stock Selection: Each episode starts with a randomly selected stock from a provided dataset.
- Position Management: The agent can open long or short positions or choose to remain flat.
- Risk Management: Supports stop loss and take profit levels to simulate realistic trading conditions.
- Customizable Horizon: Set the maximum number of steps per episode.
- Data Splitting: Ability to split data into training and testing sets based on a specified year.
The action space is continuous, representing the agent's trading decisions:
- Range:
[-1.0, 1.0]
- -1.0: Strong sell (open a short position)
- 0.0: Hold (maintain current position)
- 1.0: Strong buy (open a long position)
- Shape:
(1,)
The observation space consists of a feature vector extracted from stock data:
- Type:
Box
- Shape:
(n_features,)
wheren_features
is determined dynamically based on the dataset. - Data: Includes features like price (
prc
) and other indicators relevant to trading.
The reward is calculated based on the agent's actions and market movements:
- Opening a Position: No immediate reward upon opening a position.
- Holding a Position:
- Unrealized Profit/Loss: Calculated as the percentage change in price since the position was opened, adjusted for position type (long or short).
- Closing a Position:
- Take Profit Hit: Agent receives a reward of
+1
and the episode may terminate. - Stop Loss Hit: Agent receives a reward of
-1
(penalty) and the episode may terminate.
- Take Profit Hit: Agent receives a reward of
- Choosing Not to Trade:
- Agent can choose to do nothing, and the episode continues unless specified otherwise.
An episode may terminate when:
- Stop Loss or Take Profit is Hit: Position is closed due to risk parameters being triggered.
- Horizon is Reached: The maximum number of steps per episode is reached.
- No More Data: No further data is available for the selected stock.
- Custom Conditions: Agent-specific logic can dictate episode termination.
-
Clone the Repository
git clone https://github.com/anashoussaini/assetmanagergym.git cd assetmanagergym
-
Install Dependencies
pip install -r requirements.txt
Ensure you have the following installed:
gym
numpy
matplotlib
torch
(for integrating with PyTorch agents)json
(standard library)os
(standard library)
-
Prepare the Dataset
-
Place your stock data in a JSON file named
feature_dict.json
in the working directory. -
The data should be structured as:
{ "AAPL": { "2008": { "1": {"prc": 150.0, "feature1": value, ...}, "2": {"prc": 155.0, "feature1": value, ...}, ... }, "2009": { ... } }, "GOOG": { ... }, ... }
-
from assetmanagergym import assetmanagergym
json_file = 'feature_dict.json'
risk_params = {'stop_loss': 0.05, 'take_profit': 0.1} # 5% stop loss, 10% take profit
env = assetmanagergym(json_file_path=json_file, risk_params=risk_params, mode='train', train_test_split_year=2010)
env.set_horizon(12) # Set the maximum number of steps per episode
observation = env.reset()
done = False
while not done:
action = env.action_space.sample() # Replace with agent's action
observation, reward, done, info = env.step(action)
env.render() # Optional: Visualize the current state
Parameter | Type | Description |
---|---|---|
json_file_path |
str |
Path to the JSON file containing stock data. |
risk_params |
dict |
Dictionary with risk parameters: stop_loss and take_profit . |
mode |
str |
'train' or 'test' , determines which data split to use. |
train_test_split_year |
int |
The year used to split the data into training and testing sets. |
discount_factor |
float |
Discount factor for calculating discounted rewards. |
horizon |
int |
Maximum number of steps per episode. Set using env.set_horizon(horizon) . |
assetmanagergym is designed to be compatible with deep reinforcement learning agents. Below is an example of how to integrate a custom agent.
agent = YourCustomAgent(env.observation_space, env.action_space)
for episode in range(num_episodes):
state = env.reset()
done = False
while not done:
action = agent.select_action(state)
next_state, reward, done, info = env.step(action)
agent.store_transition(state, action, reward, next_state, done)
agent.learn() # Update agent's networks
state = next_state
agent.reset() # Reset agent's state if necessary
Here's a full example combining environment setup, agent interaction, and visualization.
import gym
import numpy as np
from assetmanagergym import assetmanagergym
def main():
json_file = 'feature_dict.json'
risk_params = {'stop_loss': 0.05, 'take_profit': 0.1}
env = assetmanagergym(json_file_path=json_file, risk_params=risk_params, mode='train', train_test_split_year=2010)
env.set_horizon(12)
observation = env.reset()
done = False
while not done:
action = env.action_space.sample() # Replace with agent's action
observation, reward, done, info = env.step(action)
env.render()
env.close()
if __name__ == "__main__":
main()
Contributions are welcome! Please follow these steps:
-
Fork the repository.
-
Create a new branch:
git checkout -b feature/your-feature-name
-
Commit your changes:
git commit -am 'Add new feature'
-
Push to the branch:
git push origin feature/your-feature-name
-
Open a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
Note: This README provides an overview of the assetmanagergym environment, including its setup and usage. For detailed documentation and advanced configurations, please refer to the code comments and additional documentation files in the repository.
env = assetmanagergym(
json_file_path='feature_dict.json',
risk_params={'stop_loss': 0.05, 'take_profit': 0.1},
mode='train',
train_test_split_year=2010,
discount_factor=0.1
)
env.reset()
: Resets the environment to a new initial state.env.step(action)
: Advances the environment by one step based on the action.env.render()
: Renders the current state of the environment.env.close()
: Performs any necessary cleanup.env.set_horizon(horizon)
: Sets the maximum number of steps per episode.
Action Value | Interpretation |
---|---|
> 0.5 |
Open a long position |
< -0.5 |
Open a short position |
-0.5 to 0.5 |
Hold / Do nothing |
-
Unrealized P&L:
[ \text{price_change} = \frac{\text{next_price} - \text{entry_price}}{\text{entry_price}} ]
[ \text{unrealized_pnl} = \text{price_change} \times \text{position} ]
-
Stop Loss Triggered:
- If
unrealized_pnl <= -stop_loss
, reward is-1
.
- If
-
Take Profit Triggered:
- If
unrealized_pnl >= take_profit
, reward is+1
.
- If
The environment includes methods for visualizing rewards and actions.
env.render_with_arrows(rewards, discounted_rewards, actions, steps)
- Arrows:
- ↑: Buy action
- ↓: Sell action
- –: Hold action
The environment expects the stock data in a specific JSON format.
{
"AAPL": {
"2008": {
"1": {
"prc": 150.0,
"feature1": value,
"feature2": value,
...
},
"2": { ... },
...
},
"2009": { ... },
...
},
"GOOG": { ... },
...
}
- Stock Ticker: Top-level keys (e.g.,
"AAPL"
,"GOOG"
). - Year: Second-level keys (e.g.,
"2008"
,"2009"
). - Month: Third-level keys (e.g.,
"1"
for January). - Features: Dictionary containing
"prc"
and other features.
Parameter | Default Value | Description |
---|---|---|
stop_loss |
0.05 |
Stop loss threshold (5%) |
take_profit |
0.10 |
Take profit threshold (10%) |
Parameter | Type | Default | Description |
---|---|---|---|
discount_factor |
float |
0.1 |
Discount factor for rewards |
horizon |
int |
12 |
Max steps per episode |
mode |
str |
'train' |
Mode: 'train' or 'test' |
train_test_split_year |
int |
2010 |
Year to split training and testing data |
Goodluck Trading.
Happy trading and reinforcement learning!