Factor Investing In QuantConnect - Research Notebook & Backtesting Algorithm

20210330 - NOTE: This was updated in the comment below to deal with a look ahead bias:

https://www.quantconnect.com/forum/discussion/9768/factor-investing-in-quantconnect-research-notebook-amp-backtesting-algorithm/p1/comment-31435

======================

Hi QuantConnect Community!

There are already many open-source tools available out there for analyzing and creating factor-based trading strategies. Among those, perhaps the most popular one is Alphalens by the Quantopian team, which has had a pivotal role in the career of thousands of aspiring Quants like myself looking to learn industry-standard techniques for factor analysis.

As a personal project, and with the objective to better understand these techniques, I did my own implementation (and sometimes interpretation) of some of the features of Alphalens using my favorite engines LEAN and QuantConnect. I am now sharing this product to help QuantConnect users perform factor research and strategy backtesting in their favorite platform. You will find the full product at the end of this article, which can be directly cloned into your QuantConnect account.

I would like to add here that even though the features in this product look very similar to those in Alphalens, I did not use (or look at) any of their open-source code to build it.

I believe factor investing is a very interesting area to explore within financial trading and this is my contribution to the community. I hope this product will improve over time and strongly encourage questions and suggestions!

This article consists of the following parts:

Case Study - Factor Research: How to use the research tools to analyse a long-short equity strategy based on the combination of two factors (momentum and volatility)
Case Study - Risk Research: Explore the influence that some external risk factors might have on our strategy.
Case Study - Backtesting Algorithm: After completing the research phase, we will show how to seamlessly move the strategy to the algorithm to test it against historical data, including slippage and commission modelling.

CASE STUDY - FACTOR RESEARCH

This section corresponds to the FactorAnalysis class whose purpose is to build a long-short portfolio based on statistically significant factors. The Notebook also contains detailed step-by-step instructions.

For this first version, we have focused on factors created using historical price and volume data simply because we found historical fundamental data (when requested for many tickers and years) is still too computationally expensive to do in QuantConnect. Once this is improved we will work to add fundamental factors to the product.

Initialize Data

The first thing we need to do is to add our start and end dates and initialize the FactorAnalysis class by passing a list of tickers. We need to provide a manual list of tickers because at the time of this article QuantConnect does not offer a dynamic universe in the research environment (looking forward to that!). In our example, we're using a list with all the SP500 constituents as of Q4 2020.

# select start and end date for analysis startDate = datetime(2017, 1, 1) endDate = datetime(2020, 10, 1)# initialize factor analysis factorAnalysis = FactorAnalysis(qb, tickers, startDate, endDate, Resolution.Daily)

This is going to create a MultiIndex Dataframe with the historical OHLCV daily data needed for analysis.

factorAnalysis.ohlcvDf

Create Factors

Now we are going to create two very simple factors, Momentum and Volatility, using the CustomFactor function as follows.

# example of calculating multiple factors using the CustomFactor function from scipy.stats import skew, kurtosis def CustomFactor(x): ''' Description: Applies factor calculations to a SingleIndex DataFrame of historical data OHLCV by symbol Args: x: SingleIndex DataFrame of historical OHLCV data for each symbol Returns: The factor value for each day ''' try: # momentum factor -------------------------------------------------------------------------- closePricesTimeseries = x['close'].rolling(252) # create a 252 day rolling window of close prices returns = x['close'].pct_change().dropna() # create a returns series momentum = closePricesTimeseries.apply(lambda x: (x[-1] / x[-252]) - 1) # volatility factor ------------------------------------------------------------------------ volatility = returns.rolling(252).apply(lambda x: np.nanstd(x, axis = 0)) # get a dataframe with all factors as columns -------------------------------------------- factors = pd.concat([momentum, volatility], axis = 1) except BaseException as e: factors = np.nan return factors

What's going on there?

Under the hood, this function gets applied to the OHLCV DataFrame grouped by symbol. That means we can perform calculations for each symbol using any of the OHLCV columns in the grouped 'x' DataFrame.
We want to calculate factors in a rolling fashion so we get a value for each day that is calculated using data up until that day and including that day. By doing this we assume that in backtesting (and live trading) the calculations and trading decisions happen after the market closes and before the next open.
Finally, we concatenate the factors so we get the resulting DataFrame with one column per factor.

# example of a multiple factors factorsDf = factorAnalysis.GetFactorsDf(CustomFactor) factorsDf

In order to standardize the data, we apply winsorization and zscore normalization. We won't go over that here so please refer to the Notebook for more information on this.

We have two single factors and we want to combine them into one factor that is some linear combination of the two. We do this using the combinedFactorWeightsDict that takes the factor names and the weights. Note how we could reverse the effect of a factor by assigning a negative weight here. In this example, we will just sum the two.

# dictionary containing the factor name and weights for each factor combinedFactorWeightsDict = {'Factor_1': 1, 'Factor_2': 1} #combinedFactorWeightsDict = None # None to not add a combined factor when using single factors finalFactorsDf = factorAnalysis.GetCombinedFactorsDf(standardizedFactorsDf, combinedFactorWeightsDict) finalFactorsDf

Create Quantiles And Add Forward Returns

It is time to create our factor quantiles and calculate forward returns in order to assess the relationship between the two.

# inputs for forward returns calculations field = 'open' # choose between open, high, low, close prices to calculate returns forwardPeriods = [1, 5, 21] # choose periods for forward return calculations # inputs for quantile calculations factor = 'Combined_Factor' # choose a factor to create quantiles q = 5 # choose the number of quantile groups to create factorQuantilesForwardReturnsDf = factorAnalysis.GetFactorQuantilesForwardReturnsDf(finalFactorsDf, field, forwardPeriods, factor, q) factorQuantilesForwardReturnsDf

In order to calculate forward returns, we need to choose the price we want to use for that and the different periods we want to get. In this example, we are calculating the 1, 5 and 21 forward returns based on Open prices. We use Open prices in order to replicate how the event-driven backtesting will work: we make all calculations after the market close (with data up until then and including that data point) and rebalance positions at the market open.
We select the factor we want to use for the quantiles and how many quantile groups we want to create. We are using the Combined_Factor and 5 quintiles here.

Let's have a look at the mean returns.

The next step is to visualize the cumulative returns from each quintile over time. In order to do that, we are going to group by quintile every day and calculate the return for each quintile/day by either using equal-weighting (mean) or factor-based weighting (weight the return of each stock in the quintile by its factor value).

forwardPeriod = 1 # choose the forward period to use for returns weighting = 'mean' # mean/factor returnsByQuantileDf = factorAnalysis.GetReturnsByQuantileDf(factorQuantilesForwardReturnsDf, forwardPeriod, weighting) returnsByQuantileDf

Here we are ideally looking for returns series that deviate from each other in the direction of the quintiles order (top quintile going up while bottom quintile going down).

Create a Long-Short Portfolio

We are finally in a position to construct a portfolio that exploits the spread between the top and bottom quintiles. In order to do this, we need to select two quintiles and provide some weighting that we want to apply to each of them using the portfolioWeightsDict. This allows for some flexibility in the way we create the portfolio as we can give more or less weight to one of the quintiles.

# dictionary containing the quintile group names and portfolio weights for each portfolioWeightsDict = {'Group_5': 1, 'Group_1': -1} portfolioLongShortReturnsDf = factorAnalysis.GetPortfolioLongShortReturnsDf(returnsByQuantileDf, portfolioWeightsDict) portfolioLongShortReturnsDf

And the plot!

Spearman Rank Correlation Coefficient

A standard way of assessing the degree of correlation between our factor and forward returns is the Spearman Rank Correlation (Information Coefficient).

The Spearman Rank Correlation measures the strength and direction of association between two ranked variables. It is the non-parametric version of the Pearson correlation and focuses on the monotonic relationship between two variables rather than their linear relationship. Below we plot the daily IC between the factor values and each forward period return, along with a 21-day moving average.

factorAnalysis.PlotIC(factorQuantilesForwardReturnsDf)

CASE STUDY - RISK RESEARCH

This section corresponds to the RiskAnalysis class whose purpose is to discover what risk factors our strategy is exposed to and to what degree. As we will see below in more detail, these external factors can be any time series of returns that our portfolio could have some exposure to. Some popular risk factors are provided here (Fama-French Five Factors, Industry Factors), but the user can easily test any other by passing its time series of returns. The Notebook also contains detailed step by step instructions.

Initialize Data

Let's initialize the RiskAnalysis class.

# initialize risk analysis riskAnalysis = RiskAnalysis(qb)

After initializing the RiskAnalysis class, we get two datasets with classic risk factors:

Fama-French 5 Factors: Historical daily returns of Market Excess Return (Mkt-RF), Small Minus Big (SMB), High Minus Low (HML), Robust Minus Weak (RMW) and Conservative Minus Aggressive (CMA).
12 Industry Factors: Consumer Nondurables (NoDur), Consumer durables (Durbl), Manufacturing (Manuf), Energy (Enrgy), Chemicals (Chems), Business Equipment (BusEq), Telecommunications (Telcm), Utilities (Utils), Wholesale and Retail (Shops), Healthcare (Hlth), Finance (Money), Other (Other)

Visit this site for more factor datasets to add to this analysis.

# fama-french 5 factors riskAnalysis.ffFiveFactorsDf.head()

# 12 industry factors riskAnalysis.industryFactorsDf.head()

Let's take a look at the cumulative returns of our long-short strategy together with the returns of the Fama-French 5 Factors.

We can visualize the correlations between the risk factors and our strategy returns.

# plot correlation matrix factorAnalysis.PlotFactorsCorrMatrix(combinedReturnsDf))

Run Regression Analysis

Fit a Regression Model to the data to analyse linear relationships between our strategy returns and the external risk factors.
Partial Regression plots. When performing multiple linear regression, these plots are useful in analysing the relationship between each independent variable and the response variable while accounting for the effect of all the other independent variables peresent in the model. Calculations are as follows (Wikipedia):
- Compute the residuals of regressing the response variable against the independent variables but omitting Xi.
- Compute the residuals from regressing Xi against the remaining independent variables.
- Plot the residuals from (1) against the residuals from (2).

riskAnalysis.PlotRegressionModel(combinedReturnsDf, dependentColumn = 'Strategy')

Plot Rolling Regression Coefficients

The above relationships are not static through time, therefore it is useful to visualize how these coefficients behave over time by running a Rolling Regression Model (with a given lookback period).

riskAnalysis.PlotRollingRegressionCoefficients(combinedReturnsDf, dependentColumn = 'Strategy', lookback = 126)

Plot Distribution Of Rolling Exposures

We can now visualize the historical distributions of the rolling regression coefficients in order to get a better idea of the variability of the data.

riskAnalysis.PlotBoxPlotRollingFactorExposure(combinedReturnsDf, dependentColumn = 'Strategy', lookback = 126)

CASE STUDY - BACKTESTING ALGORITHM

The purpose of the research process illustrated above is purely to determine if there is a significant relationship between our factors and the future returns of those stocks. That means the cumulative returns we saw are not realistic and they assumed daily rebalancing of hundreds of stocks without accounting for any slippage or commissions.

In order to test how this strategy would have performed historically we need to run a proper backtest, and for that we have to move from the Research Notebook to the Algorithm Framework.

Below I will explain the most important features and scripts of this part of the product.

Algorithm Framework - main.py

The main.py script includes the follwing user-defined inputs worth mentioning.

# date rule for rebalancing our portfolio by updating long-short positions based on factor values rebalancingFunc = Expiry.EndOfMonth # number of stocks to keep for factor modelling calculations nStocks = 100 # number of positions to hold on each side (long/short) positionsOnEachSide = 20 # lookback for historical data to calculate factors lookback = 252 # select the leverage factor leverageFactor = 1

We first need to select how often we want to rebalance the portfolio (i.e. recalculate all factors and portfolio weights). Here you can choose among a number of date rules such as Expiry.EndOfMonth
At every rebalancing, the algorithm will create a dynamic universe of stocks based on DollarVolume and MarketCap that will be used to calculate the factors on (nStocks) and ultimately select the top and bottom stocks to trade (positionsOnEachSide).
Finally, we need to decide how much historical data we want for our factors calculations using lookback
There is also a leverageFactor parameter that can be used to modify the account leverage.

Algorithm Framework - classSymbolData.py

The Backtesting Algorithm has been designed to make it easy to quickly add and remove the factors previously analysed in the notebook. We only need to add the function that calculates the factor to the SymbolData class. For example, have a look at how we add the Momentum and Volatility factors from before.

def CalculateMomentum(self, history): closePrices = history.loc[self.Symbol]['close'] momentum = (closePrices[-1] / closePrices[-252]) - 1 return momentum def CalculateVolatility(self, history): closePrices = history.loc[self.Symbol]['close'] returns = closePrices.pct_change().dropna() volatility = np.nanstd(returns, axis = 0) return volatility

You can add as many functions as you want and then simply include them or exclude them from the strategy by simply commenting out the function call here. Note how there are a few functions for other factors that we commented out to leave out of the strategy.

def CalculateFactors(self, history, fundamentalDataBySymbolDict): self.fundamentalDataDict = fundamentalDataBySymbolDict[self.Symbol] self.momentum = self.CalculateMomentum(history) self.volatility = self.CalculateVolatility(history) #self.skewness = self.CalculateSkewness(history) #self.kurt = self.CalculateKurtosis(history) #self.distanceVsHL = self.CalculateDistanceVsHL(history) #self.meanOvernightReturns = self.CalculateMeanOvernightReturns(history)

And at the end we just need to add the chosen factors to the property factorsList

@property def factorsList(self): technicalFactors = [self.momentum, self.volatility]Algorithm Framework - HelperFunctions.py

In the Research Notebook we could only add factors created using OHLCV data for the reasons already stated above. However, the Backtesting Algorithm allows to add fundamental factors very easily by using the GetFundamentalDataDict function. As you can see below, we create a dictionary containing the fundamental ratio and its desired direction in the model (1 for a positive effect or -1 for a negative one). For a list of all the fundamental data available in QuantConnect please refer to this page https://www.quantconnect.com/docs/data-library/fundamentals

# dictionary of symbols containing factors and the direction of the factor (1 for sorting descending and -1 for sorting ascending) fundamentalDataBySymbolDict[x.Symbol] = { #fundamental.ValuationRatios.BookValuePerShare: 1, #fundamental.FinancialStatements.BalanceSheet.TotalEquity.Value: -1, #fundamental.OperationRatios.OperationMargin.Value: 1, #fundamental.OperationRatios.ROE.Value: 1, #fundamental.OperationRatios.TotalAssetsGrowth.Value: 1, #fundamental.ValuationRatios.PERatio: 1}

Finally, very much like we did in the Research Notebook, in the Backtesting Algorithm we can also give different weights to each factor to create a combined factor. We do that using the GetLongShortLists function in the HelperFunctions.py script as per below.

normFactorsDf['combinedFactor'] = normFactorsDf['Factor_1'] * 1 + normFactorsDf['Factor_2'] * 1CLONE THE ALGORITHM

And that was it! Now you can clone the below algorithm into your QuantConnect account and start playing with the different features yourself. The algorithm also has a number of interesting backtesting charts such as Drawdown and Total Portfolio Exposure %, so remember to activate those on the Select Chart box (top right corner of backtesting page).

-------------

Thank you!

Emilio

InnoQuantivity.com

======================

20210330 - NOTE: This was updated in the comment below to deal with a look ahead bias:

https://www.quantconnect.com/forum/discussion/9768/factor-investing-in-quantconnect-research-notebook-amp-backtesting-algorithm/p1/comment-31435

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

Hello Emilio,

Awesome work with the factor-based modeling! I was wondering if there was a way to include the stocks in the SP500 on a rolling basis. I noticed that the constituents are current and don't take into account delistings! Is there a way to include on a rolling basis the stocks that are part of the index?

Also awesome work with the risk decompositions! I really enjoyed playing around with the notebook!

Thank you all for the kind comments!

Ben Zhang Unfortunately QuantConnect does not support dynamic universe selection in the Research Notebook as of now, only in the backtesting engine. I believe it's a feature in their pipeline though. In order to build this notebook I had to provide a static list of current SP500 tickers for now, but hopefully in the near future we will be able to create universes like the rolling SP500 index in the Research Notebook and this product will become more powerful!

Thanks,

Emilio

Spacetime

10.9k Pro ,

amazing! thank you!

Sebastian Brocklesby

58 Pro ,

This is great. Thank you.

Rémy Heinis

415 Pro ,

That's really great work thank you !

Laurent Crouzet

4.9k Pro ,

Thanks a lot, Emilio! The QC community will most probably profit from the sharing of such a good research!

Ben Zhang

649 Pro ,

Emilio Freire

34.7k Pro ,

Vladimir

94.7k Pro ,

A very detailed study deserves a special reward.
I have back tested the strategy from 2007, 6, 1.
It works, like most long-short strategies, only at the beginning of bear markets.
I think Long - Bond strategy will perform much better.
Try that.

Thanks Vladimir !

I will test it out.

It will be good to have this simple Stock-Bond Portfolio as a bankmark to surpass.

Emiliano Fraticelli

3.1k Pro ,

Thank you so much @emilio

Erol Aspromatis

99 Pro ,

This is amazing work Emilio and I will be spending my Holiday going through this model. I'm particullarly interestead in leveraging this for factor investing.

Your strategy here has done such an amazing job ourperforming the other traditional five FF factors, particullarly the HML (value factor) in the backtest. The momentum factor has obviously been huge in hindsight, but my hypothis is that we will eventually see some type of mean reversion, which could indicate that the value factor will perhaps to better in the future.

Anyway, this gives me a great framework to start testing and developing.

Thank you Erol Aspromatis !

Great to see this project is helping the community.

Please do not hesitate to ask questions here if needed about the code and overall algo. I will try my best to answer them :)

Enjoy your holidays!

ASHOK MEHTA

31 Pro ,

Hi Emilio, This very good description of using a research notebook. I am very new to Quant Connect, hence don't yet know how can I see the code in your notebook. Looks like there is a link to clone the algorithm, but nothing to clone the notebook. If you don't mind can you please share the notebook. Thanks

Ashok

Hi ASHOK MEHTA !

The research notebook is part of the algorithm as you can see in the below screenshot. When you click on Clone Algorithm you will get a copy of the whole project in your QuantConnect account. To open the notebook simply open the research.ipynb file!

Fabmei

109 Pro ,

Hello Emilio,

this is great for learning and the detailed desciption helps a lot creating and working with dataframes. Thank you! One question regarding the universe selection: I can't seem to figure out where the tickers are added - maybe missing the forest for the trees there. An answer would help a lot working with the algo and trying out different universes. Could you please give some description on that as well?

Best, Fabmei

Hi!

In order to say test factors based on fundamentals like , for instance, Free Cash Flow Yield, using this notebook? It seems you get the CustomFactor by usig close price and later volatility, all things that could be build from the OHLCV df. I'm worried that in order to test factors like the one I mentioned on FCF I should build a strategy on QC, export the return series via .json, uploading it to dropbox, and then finally import the return series on the notebook.

Am I wrong?

Thanks

Derek Melchin

STAFF Pro ,

Hi Fabmei,

The tickers are subscribed to through a custom coarse-fine universe selection model. In the Initialize method in main.py, we see

self.SetUniverseSelection(FactorModelUniverseSelectionModel(benchmark = benchmark,
                                                            nStocks = nStocks,
                                                            lookback = lookback,
                                                            maxNumberOfPositions = maxNumberOfPositions,
                                                            rebalancingFunc = rebalancingFunc))

Refer to our documentation on Universe Selection for more information.

Hi Emiliano,

When switching the factor used in the notebook above, we don't need to run a backtest. The `GetFactorsDf` method in the notebook above returns a DataFrame with the technical factor values at each timestep. The notebook below demonstrates how we can get the FCF fundamental factor into a similar timeseries DataFrame. To merge this fundamental factor logic into the factor analysis notebook above, we just need to define a method like `GetFactorsDf` that uses fundamental factors.

Best,
Derek Melchin

EpsilonOcean

402 Pro ,

Excellent post. There's a future leak in the presented factor (I had to figure out why performance was so spectacular).

If you're using close prices to create the factor, using open to evaluate it and shifting open prices backwards creates a future leak.

Thanks for the framework, will use it.

Hi EpsilonOcean !

Thanks a lot for that. You're right. There's a half-day look ahead! I was trying to replicate the behaviour of the backtest but I guess the only way is to use close prices to evaluate the factor and that's it? In a backtest we wouldn't get the close to close return but rather open to close return so I was trying to achieve that and introduced the look ahead in the process.

You fixed it by just using 'close' instead of 'open', right? Can you think of any other way to get closer to backtest behaviour?

Thanks!

Hey, yep I just used Close not Open... So I guess we would just have to be mindful of what data is going in and coming out when using the factor scripts.

I dont think you necessarily need to match backtest perfectly when looking at a factor in research, it's just a really great way to see if there is a predictive component in an idea - then we can combine/alter etc as we wish before the backtest. Quantile spreads are handly too - as a check that it will beat costs.

There have been a papers recently replicating academic factor studies I'm keen to check out with your framework!

Emilio Freire INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

301,700 Quants.

VOTE FOR UPCOMING FEATURES

Factor Investing In QuantConnect - Research Notebook & Backtesting Algorithm

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

Platform

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

301,700 Quants.

VOTE FOR UPCOMING FEATURES

Factor Investing In QuantConnect - Research Notebook & Backtesting Algorithm

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free