From Research To Production -- Random Forest Regression

Hey all,

For another installment of our "mini-series" of examples on how to move your work from the research environment and into production, we've shown how you can implement a basic random forest regression model using the sklearn RandomForestRegressor. Briefly, random forests is a supervised learning algorithm that we here use specifically for regression in order to identify important features of our dataset and create weights to build a tradeable portfolio.

To start, we continue to use the US Treasuries ETF basket and get the historical data we want. We'll use the most recent 1000 hours of historical data to create our train / test data sets.

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
qb = QuantBook()
qb

symbols = {}
assets = ["SHY", "TLT", "SHV", "TLH", "EDV", "BIL",
                  "SPTL", "TBT", "TMF", "TMV", "TBF", "VGSH", "VGIT",
                  "VGLT", "SCHO", "SCHR", "SPTS", "GOVT"]

for i in range(len(assets)):
    symbols[assets[i]] = qb.AddEquity(assets[i],Resolution.Minute).Symbol

#Copy Paste Region For Backtesting.
#==========================================
# Set up classifier
regressor = RandomForestRegressor(n_estimators=100, min_samples_split=5, random_state = 1990)
# Fetch history on our universe
df = qb.History(qb.Securities.Keys, 500, Resolution.Hour)# Get train/test data
returns = df.unstack(level=1).close.transpose().pct_change().dropna()
X = returns
# use real portfolio value in algo: y = [x for x in qb.portfolioValue][-X.shape[0]:]
y = np.random.normal(100000, 5, X.shape[0])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1990)

Ideally we would be able to pass an array of our Portfolio values over time as the second array into train_test_split. We can do this in the algorithm, so, for now, we'll create a "dummy" array just to make sure our code runs properly. Once we have our data and have initialized our regressor, we fit the model and then can determine the importance of each feature, which in this case are the different symbols.

# Fit regressor
regressor.fit(X_train, y_train)

# Get weights
weights = regressor.feature_importances_
symbols = returns.columns[np.where(weights)]
selected = zip(symbols, weights)

Our final variable selected is a zip of symbol-weight tuples to be used in building our portfolio.

We use a similar algorithm setup as the one in our Mean Reversion example, but this time we add a RollingWindow to track our total portfolio value and an Insight Weighting Portfolio Construction Model, which allows us to dictate the weighting scheme using the weights found in the regression.

    def Initialize(self):

        #1. Required: Five years of backtest history
        self.SetStartDate(2014, 1, 1)
    
        #2. Required: Alpha Streams Models:
        self.SetBrokerageModel(BrokerageName.AlphaStreams)
    
        #3. Required: Significant AUM Capacity
        self.SetCash(1000000)

        #4. Required: Benchmark to SPY
        self.SetBenchmark("SPY")
        
        self.SetPortfolioConstruction(InsightWeightingPortfolioConstructionModel())
        self.SetExecution(ImmediateExecutionModel())
    
        self.assets = ["IEF", "SHY", "TLT", "IEI", "SHV", "TLH", "EDV", "BIL",
                      "SPTL", "TBT", "TMF", "TMV", "TBF", "VGSH", "VGIT",
                      "VGLT", "SCHO", "SCHR", "SPTS", "GOVT"]

        self.symbols = {}
        
        # This holds our rolling portfolio value
        self.portfolioValue = RollingWindow[Decimal](500)

        # This ensures the RollingWindow is fully populated before anything else happens
        self.SetWarmup(500)
        
        # Add Equity ------------------------------------------------ 
        for i in range(len(self.assets)):
            self.symbols[self.assets[i]] = self.AddEquity(self.assets[i],Resolution.Hour).Symbol 
                
        self.Schedule.On(self.DateRules.Every(DayOfWeek.Monday), self.TimeRules.AfterMarketOpen("IEF", 30), self.EveryDayAfterMarketOpen)

Once again, we can copy and paste our research notebook code into our Scheduled Event, and by setting qb = self, we don't even need to change any of the code we've imported. Unlike the Mean Reversion example, we need to establish a starting position and have our portfolio value change so that our regressor doesn't give us 0 weight for each symbol.

   def EveryDayAfterMarketOpen(self):
        if not self.Portfolio.Invested:
            insights = []
            for ticker, symbol in self.symbols.items():
                insights.append( Insight.Price(symbol, timedelta(days=5), InsightDirection.Up, 0.01, None, None, 1/len(self.symbols)) )
            self.EmitInsights(insights)
        else:
            qb = self 
            #==============================
            # Initialize instance of Random Forest Regressor
            regressor = RandomForestRegressor(n_estimators=100, min_samples_split=5, random_state = 1990)
    
            # Fetch history on our universe
            df = qb.History(qb.Securities.Keys, 500, Resolution.Hour)
            
            # Get train/test data
            returns = df.unstack(level=1).close.transpose().pct_change().dropna()
            X = returns
            y = [x for x in qb.portfolioValue][-X.shape[0]:]
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1990)
            
            # Fit regressor
            regressor.fit(X_train, y_train)
            
            # Get long-only predictions
            weights = regressor.feature_importances_
            symbols = returns.columns[np.where(weights)]
            selected = zip(symbols, weights)
            # ==============================
            
            insights = []
            for symbol, weight in selected:
                insights.append( Insight.Price(symbol, timedelta(days=5), InsightDirection.Up, 0.01, None, None, weight) )
            self.EmitInsights(insights)

In this example, we use the weight argument of the Insight constructor. Since we are using the Insight Weighting Portfolio Construction Model, this allows us to use the weights given by our model as the weights we want to give each symbol in our portfolio. Finally, to keep our portfolio value updated:

    def OnData(self, data):
        self.portfolioValue.Add(self.Portfolio.TotalPortfolioValue)

And that's it! A quick example of how to transfer your ML research into an algorithm without too much hassle.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

This is cool. Is there an easy way to add in some fundamentals data to this framework? I have been experimenting a bit but have not found a way to do that yet. So say instead of just pct change in close we could also grab something from the ValuationRatios Morningstar data. If it comes as a Pandas DataFrame we could then just concat them.

Great tutorial!

If I understand it right, in the definition of

EveryDayAfterMarketOpen

I think

y = [x for x in qb.portfolioValue][-X.shape[0]:]

should be

y = [x for x in qb.portfolioValue][:X.shape[0]]
y.reverse()

This is because newer data are added to the beginning of

qb.portfolioValue

whereas

qb.History(qb.Securities.Keys, 10, Resolution.Hour)

adds new data at the bottom of the DataFrame. Reversing y could make it compatible with X in the modeling step.

Emilio Freire

35.5k Pro ,

That's amazing thanks Sherry!!!

Jason Benner

156 Pro ,

Lucky Dragon

209 Pro ,

Rahul Chowdhury

40k Pro ,

Hey Liang,

You are absolutely correct. Thank you for pointing that out!

Best
Rahul

AlMoJo

2.9k Pro ,

Hello everybody,

First thank you very much for this amazing strategy!

Just to confirm as I am not 100% sure but does that Algo use a sliding window for the History dataframe for the ML training to be adaptative to the more recent situation and frequently recalculate the logic? I saw History 500,

Also I will try to append the X of the regressor.fit(X,y) with other columns like ema200 and rsi200 so I was wondering if QuantConnect allows for external technical analysis libraries like ta-lib to add TA based on the values in a dataframe without the resolution etc…

Cheers 😃

Vladimir

94.7k Pro ,

-→ I was wondering if QuantConnect allows for external technical analysis libraries like ta-lib to add TA.

Here is an example how we can use ta-lib indicators in QuantConnect.

Sherry Yang INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research Publications

446,700 Quants.

VOTE FOR UPCOMING FEATURES

From Research To Production -- Random Forest Regression

Allocate to this Strategy

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research Publications

446,700 Quants.

VOTE FOR UPCOMING FEATURES

From Research To Production -- Random Forest Regression

Allocate to this Strategy

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free