Hey all,

For another installment of our "mini-series" of examples on how to move your work from the research environment and into production, we've shown how you can implement a basic random forest regression model using the sklearn RandomForestRegressor. Briefly, random forests is a supervised learning algorithm that we here use specifically for regression in order to identify important features of our dataset and create weights to build a tradeable portfolio. 

To start, we continue to use the US Treasuries ETF basket and get the historical data we want. We'll use the most recent 1000 hours of historical data to create our train / test data sets.

from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split qb = QuantBook() qb symbols = {} assets = ["SHY", "TLT", "SHV", "TLH", "EDV", "BIL", "SPTL", "TBT", "TMF", "TMV", "TBF", "VGSH", "VGIT", "VGLT", "SCHO", "SCHR", "SPTS", "GOVT"] for i in range(len(assets)): symbols[assets[i]] = qb.AddEquity(assets[i],Resolution.Minute).Symbol #Copy Paste Region For Backtesting. #========================================== # Set up classifier regressor = RandomForestRegressor(n_estimators=100, min_samples_split=5, random_state = 1990) # Fetch history on our universe df = qb.History(qb.Securities.Keys, 500, Resolution.Hour)# Get train/test data returns = df.unstack(level=1).close.transpose().pct_change().dropna() X = returns # use real portfolio value in algo: y = [x for x in qb.portfolioValue][-X.shape[0]:] y = np.random.normal(100000, 5, X.shape[0]) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1990)

Ideally we would be able to pass an array of our Portfolio values over time as the second array into train_test_split. We can do this in the algorithm, so, for now, we'll create a "dummy" array just to make sure our code runs properly. Once we have our data and have initialized our regressor, we fit the model and then can determine the importance of each feature, which in this case are the different symbols.

# Fit regressor regressor.fit(X_train, y_train) # Get weights weights = regressor.feature_importances_ symbols = returns.columns[np.where(weights)] selected = zip(symbols, weights)

Our final variable selected is a zip of symbol-weight tuples to be used in building our portfolio.

We use a similar algorithm setup as the one in our Mean Reversion example, but this time we add a RollingWindow to track our total portfolio value and an Insight Weighting Portfolio Construction Model, which allows us to dictate the weighting scheme using the weights found in the regression.

def Initialize(self): #1. Required: Five years of backtest history self.SetStartDate(2014, 1, 1) #2. Required: Alpha Streams Models: self.SetBrokerageModel(BrokerageName.AlphaStreams) #3. Required: Significant AUM Capacity self.SetCash(1000000) #4. Required: Benchmark to SPY self.SetBenchmark("SPY") self.SetPortfolioConstruction(InsightWeightingPortfolioConstructionModel()) self.SetExecution(ImmediateExecutionModel()) self.assets = ["IEF", "SHY", "TLT", "IEI", "SHV", "TLH", "EDV", "BIL", "SPTL", "TBT", "TMF", "TMV", "TBF", "VGSH", "VGIT", "VGLT", "SCHO", "SCHR", "SPTS", "GOVT"] self.symbols = {} # This holds our rolling portfolio value self.portfolioValue = RollingWindow[Decimal](500) # This ensures the RollingWindow is fully populated before anything else happens self.SetWarmup(500) # Add Equity ------------------------------------------------ for i in range(len(self.assets)): self.symbols[self.assets[i]] = self.AddEquity(self.assets[i],Resolution.Hour).Symbol self.Schedule.On(self.DateRules.Every(DayOfWeek.Monday), self.TimeRules.AfterMarketOpen("IEF", 30), self.EveryDayAfterMarketOpen)

Once again, we can copy and paste our research notebook code into our Scheduled Event, and by setting qb = self, we don't even need to change any of the code we've imported. Unlike the Mean Reversion example, we need to establish a starting position and have our portfolio value change so that our regressor doesn't give us 0 weight for each symbol.

def EveryDayAfterMarketOpen(self): if not self.Portfolio.Invested: insights = [] for ticker, symbol in self.symbols.items(): insights.append( Insight.Price(symbol, timedelta(days=5), InsightDirection.Up, 0.01, None, None, 1/len(self.symbols)) ) self.EmitInsights(insights) else: qb = self #============================== # Initialize instance of Random Forest Regressor regressor = RandomForestRegressor(n_estimators=100, min_samples_split=5, random_state = 1990) # Fetch history on our universe df = qb.History(qb.Securities.Keys, 500, Resolution.Hour) # Get train/test data returns = df.unstack(level=1).close.transpose().pct_change().dropna() X = returns y = [x for x in qb.portfolioValue][-X.shape[0]:] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1990) # Fit regressor regressor.fit(X_train, y_train) # Get long-only predictions weights = regressor.feature_importances_ symbols = returns.columns[np.where(weights)] selected = zip(symbols, weights) # ============================== insights = [] for symbol, weight in selected: insights.append( Insight.Price(symbol, timedelta(days=5), InsightDirection.Up, 0.01, None, None, weight) ) self.EmitInsights(insights)

In this example, we use the weight argument of the Insight constructor. Since we are using the Insight Weighting Portfolio Construction Model, this allows us to use the weights given by our model as the weights we want to give each symbol in our portfolio. Finally, to keep our portfolio value updated:

def OnData(self, data): self.portfolioValue.Add(self.Portfolio.TotalPortfolioValue)

And that's it! A quick example of how to transfer your ML research into an algorithm without too much hassle.