Machine Learning

Stable Baselines

Introduction

This page introduces how to use stable baselines library in Python for reinforcement machine learning (RL) model building, training, saving in the Object Store, and loading, through an example of a single-asset deep Q-network learning (DQN) trading bot.

Import Libraries

Import the stable_baselines, and gym.

import gym
from stable_baselines import DQN
from stable_baselines.deepq.policies import MlpPolicy

Get Historical Data

Get some historical market data to train and test the model. For example, to get data for the SPY ETF during 2020 and 2021, run:

qb = QuantBook()
symbol = qb.AddEquity("SPY", Resolution.Daily).Symbol
history = qb.History(symbol, datetime(2020, 1, 1), datetime(2022, 1, 1)).loc[symbol]

Prepare Data

You need some historical data to prepare the data for the model. If you have historical data, manipulate it to train and test the model. In this example, calculate the log return time-series of the securities:

ret = np.log(history/history.shift(1)).iloc[1:].close

Train Models

You need to prepare the historical data for training before you train the model. If you have prepared the data, build and train the environment and the model. In this example, create a gym environment to initialize the training environment, agent and reward. Then, create a RL model by DQN algorithm. Follow these steps to create the environment and the model:

  1. Split the data for training and testing to evaluate our model.
  2. X_train = history.iloc[:-50].values
    X_test = history.iloc[-50:].values
    y_train = ret.iloc[:-50].values
    y_test = ret.iloc[-50:].values
  3. Create a custom gym environment class.
  4. In this example, create a custom environment with previous 5 OHLCV log-return data as observation and the highest portfolio value as reward.

    class TradingEnv(gym.Env):
        metadata = {'render.modes': ['console']}
        
        FLAT = 0
        LONG = 1
        SHORT = 2
    
        def __init__(self, ohlcv, ret):
            super(TradingEnv, self).__init__()
            
            self.ohlcv = ohlcv
            self.ret = ret
            self.trading_cost = 0.01
            self.reward = 1
            
            # The number of step the training has taken, starts at 5 since we're using the previous 5 data for observation.
            self.current_step = 5
            # The last action
            self.last_action = 0
    
            # Define action and observation space
            # Example when using discrete actions, we have 3: LONG, SHORT and FLAT.
            n_actions = 3
            self.action_space = gym.spaces.Discrete(n_actions)
            # The observation will be the coordinate of the agent, shape for (5 previous data poionts, OHLCV)
            self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(5, 5), dtype=np.float64)
    
        def reset(self):
            # Reset the number of step the training has taken
            self.current_step = 5
            # Reset the last action
            self.last_action = 0
            # must return np.array type
            return self.ohlcv[self.current_step-5:self.current_step].astype(np.float32)
    
        def step(self, action):
            if action == self.LONG:
                self.reward *= 1 + self.ret[self.current_step] - (self.trading_cost if self.last_action != action else 0)
            elif action == self.SHORT:
                self.reward *= 1 + -1 * self.ret[self.current_step] - (self.trading_cost if self.last_action != action else 0)
            elif action == self.FLAT:
                 self.reward *= 1 - (self.trading_cost if self.last_action != action else 0)
            else:
                raise ValueError("Received invalid action={} which is not part of the action space".format(action))
                
            self.last_action = action
            self.current_step += 1
    
            # Have we iterate all data points?
            done = (self.current_step == self.ret.shape[0]-1)
    
            # Reward as return
            return self.ohlcv[self.current_step-5:self.current_step].astype(np.float32), self.reward, done, {}
    
        def render(self, mode='console'):
            if mode != 'console':
                raise NotImplementedError()
            print(f'Equity Value: {self.reward}')
  5. Initialize the environment.
  6. env = TradingEnv(X_train, y_train)
  7. Train the model.
  8. In this example, create a RL model and train with MLP-policy DQN algorithm.

    model = DQN(MlpPolicy, env, verbose=1)
    model.learn(total_timesteps=1000)

Test Models

You need to build and train the model before you test its performance. If you have trained the model, test it on the out-of-sample data. Follow these steps to test the model:

  1. Initialize a list to store the equity value with initial capital in each timestep, and variables to store last action and trading cost.
  2. equity = [1]
    last_action = 0
    trading_cost = 0.01
  3. Iterate each testing data point for prediction and trading.
  4. for i in range(5, X_test.shape[0]):
        action, _ = model.predict(X_test[i-5:i], deterministic=True)
        
        if action == 0:
            new = equity[-1] * (1 - (trading_cost if last_action != action else 0))
        elif action == 1:
            new = equity[-1] * (1 + y_test[i] - (trading_cost if last_action != action else 0))
        elif action == 2:
            new = equity[-1] * (1 + -1 * y_test[i] - (trading_cost if last_action != action else 0))
        
        equity.append(new)
        last_action = action
  5. Plot the result.
  6. plt.figure(figsize=(15, 10))
    plt.title("Equity Curve")
    plt.xlabel("timestep")
    plt.ylabel("equity")
    plt.plot(equity)
    plt.show()
    Stable baselines model training summary

Store Models

You can save and load stable baselines models using the Object Store.

Save Models

  1. Set the key name of the model to be stored in the Object Store.
  2. model_key = "model"
  3. Call the GetFilePath method with the key.
  4. file_name = qb.ObjectStore.GetFilePath(model_key)

    This method returns the file path where the model will be stored.

  5. Call the save method with the file path.
  6. model.save(file_name)

Load Models

You must save a model into the Object Store before you can load it from the Object Store. If you saved a model, follow these steps to load it:

  1. Call the ContainsKey method.
  2. qb.ObjectStore.ContainsKey(model_key)

    This method returns a boolean that represents if the model_key is in the Object Store. If the Object Store does not contain the model_key, save the model using the model_key before you proceed.

  3. Call the GetFilePath method with the key.
  4. file_name = qb.ObjectStore.GetFilePath(model_key)

    This method returns the path where the model is stored.

  5. Call the load method with the file path, environment and policy.
  6. loaded_model = DQN.load(file_name, env=env, policy=MlpPolicy)

    This method returns the saved model.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: