Mean-Reversion Statistical Arbitrage Strategy in Stocks

In this post we will take a close look at a principal component analysis (PCA)-based statistical arbitrage strategy derived from the paper Statistical Arbitrage in the U.S. Equities Market.

Statistical arbitrage strategies use mean-reversion models to take advantage of pricing inefficiencies between groups of correlated securities. This class of short-term financial trading strategies produces moves that can contrarian to the broader market movement and are often discussed in conjunction with Pairs Trading. In this algorithm, we will be using a PCA-based approach as opposed to an ETF-based approach to limit our universe of stocks. Backtests from the period 1997-2007 support our strategy by showing that PCA-based strategies have Sharpe ratios that outperform Sharpe ratios from ETF-based strategies.

Step 1: Select our universe

We will select our universe of stocks by dropping securities with prices lower than $5 and pick the ones with the highest dollar traded volume.

# Sort the equities in DollarVolume decendingly
selected = sorted([x for x in coarse if x.Price > 5],
                  key=lambda x: x.DollarVolume, reverse=True)
symbols = [x.Symbol for x in selected[:self.num_equities]]

Step 2: Reduce dimensions to three principal components

We want to minimize our algorithm's exposure to market factors. PCA is a procedure that extracts uncorrelated components of a possibly-correlated set of observations to reveal the factors that contribute most to the variance of the observations as a whole. Applying PCA to the data above enables us to reduce dimensionality and select the most relevant market factors to shape our asset universe. Based on the results found in the cited paper, and for the sake of demonstration, we chose 3 components to account for the bulk of the variance. In our algorithm, the 3 principal components of the feature space are formed by the historical close values.

  # Sample data for PCA (smooth it using np.log function)
  sample = np.log(history.dropna(axis=1))
  sample -= sample.mean() # Center it column-wise

  # Fit the PCA model for sample data
  model = PCA().fit(sample)

  # Get the first n_components factors
  factors = np.dot(sample, model.components_.T)[:,:self.num_components]

Step 3: Measure price deviation

We will model the mean-reverting residuals of our assets from a regression line. We use linear regression to derive the weight of each stock in the portfolio based on its price deviation, which is measured by the residual. If the absolute value of a stock's residual is large, it means that the level of price deviation is high and we should give it more weight in the portfolio. Similarly, if the absolute value of the residual is small, it is reasonable to give the stock less weight in the portfolio. To facilitate this, we can first standardize the residuals to get their z-scores. Then, based on the z-scores, it is easy to detect the level of price deviation. Specifically, the level of deviation is higher when the absolute values of the z-scores are large. From this, it is natural to use the inverse of the absolute values of the z-scores as a measurement of the weights of the portfolio.

    # Train Ordinary Least Squares linear model for each stock
    OLSmodels = {ticker: sm.OLS(sample[ticker], factors).fit() for ticker in sample.columns}

    # Get the residuals from the linear regression after PCA for each stock
    resids = pd.DataFrame({ticker: model.resid for ticker, model in OLSmodels.items()})

    # Get the Z scores by standarize the given pandas dataframe X
    zscores = ((resids - resids.mean()) / resids.std()).iloc[-1] # residuals of the most recent day

    # Get the stocks far from mean (for mean reversion)
    selected = zscores[zscores < -1.5]

    # Return the weights for each selected stock
    weights = selected * (1 / selected.abs().sum())

Results

In our algorithm, the portfolio is rebalanced every 30 days and the backtest period runs from Jan 2010 to Aug 2019. Our result is an annual rate of return over 7% with a max drawdown of around 40% for nearly 10 years. Our performance indicates using PCA combined with linear regression to measure the deviation level is reasonable. To tune the model, we could expand our universe of stocks beyond the current 20 equities or incorporate more PCA components. We could also come up with another way to measure the level of deviation or change the rebalancing frequency of the algorithm (30 days in this example).

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

DotKinesis

4.3k ,

Thanks for sharing. two thumbs up

Daniel Chen INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

287,000 Quants.

VOTE FOR UPCOMING FEATURES

Mean-Reversion Statistical Arbitrage Strategy in Stocks

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

Platform

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

287,000 Quants.

VOTE FOR UPCOMING FEATURES

Mean-Reversion Statistical Arbitrage Strategy in Stocks

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free