This tutorial performs a simple linear regression to build the Capital Asset Pricing Model (CAPM), a classical model developed by William F. Sharpe and Harry Markowitz. This model yields alpha and beta for each asset and is traded by going long on the stocks ranked with the highest alpha. This tutorial will demonstrate the following:

The implementation of the strategy demonstrates that stocks beat the market last month are likely to beat the market again in the subsequent month. This algorithm performs well when the market is smooth. However when the market volatility increases the model fails to capture alpha and it performs poorly. What we learn from this is that market fluctuations decrease the significance level of the linear regression coefficients, especially when we are using daily returns to fit the model.

CAPM Theory

The capital asset pricing model (CAPM) describes the relationship between systematic risk and expected return for assets, typically stocks. The formula for calculating the expected return of an asset given its risk is as follows:

\[r_a = r_f + \beta_a*(r_m - r_f) + \epsilon \]


\[r_f = Risk-free \ rate\] \[\beta = Beta \ of \ the \ security\]

\[r_m = Expected \ market \ return\]

\[\epsilon = Tracking \ error\]

This formula can be better understood if we refactor the formula as seen below:

\[(r_a - r_f ) = \beta_a*(r_m - r_f) + \epsilon \]

The left side of the equation gives us the difference between the asset return and risk free rate, the "excess return". If we regress the market excess return against the asset excess return, the slope represents the "beta" of the asset. Therefore, beta can also be calculated by the equation:

\[\beta = \frac{Cov(r_a,r_b)}{var(r_b)}\]

So beta can be described as:

\[\beta = \rho _a,_b*\frac{\sigma _a}{\sigma_b}\]

The formula above indicates that beta can be explained as "correlated relative volatility". To make this simpler, beta can be calculated by doing a simple linear regression which can be viewed as a factor to explain the return, and the tracking error can represent alpha. To make this theory more convenient for our algorithm, we change the above formula into the following form:

\[r_a = \beta*r_m + r_f*(1-\beta) + \epsilon\]

r*(1-β) on the right hand side of the equation is a very small item, making it negligible in the context of the Dow 30 companies. If we regress the stocks return with the return of the benchmark, the slope and intercept will be beta and alpha.


Our investment logic is simple and straightforward. We assume that stocks which beat the market last month will continue to beat the market. We rank stocks according to their alpha, and each month we "long" the top two stocks. For this strategy to work, we need to do the following at the start of each month:

  • Get the historical price of Dow 30 stocks in the past 21 trading days and calculate their daily rates of return.
  • Conduct simple linear regression on the return of each stock against a benchmark (S&P 500 index, SPY).
  • Rank the stocks by their intercepts.
  • Liquidate all our positions and purchase the first 2 stocks in our sorted list.

Dow Jones components change very infrequently, with the last change being on March 19th, 2015. To make the implementation easier we have simply listed the current Dow components in this algorithm. This means that the earliest start date of this algorithm is March 19th, 2015.

Step 1: Setup Event Handler

In the Initialize method we define a Scheduled Event to trigger a monthly re-balancing of the portfolio.

def Initialize(self):

Step 2: History Function

Each month we get the historical prices of the DOW30 components using the History API. The data is returned from the API as a pandas.DataFrame indexed by Symbol objects. The close data is selected and the data frame is unstacked to create columns of Symbol objects.

# Fetch the historical data to perform the linear regression
history = self.History(
    self.symbols + [self.benchmark], 

Step 3: Symbol Selection Function

We aim to trade the two assets with the highest alpha to the benchmark. In order to conduct linear regression to find the alpha (linear regression intercept), we need to compute returns (percentage change of closing price) benchmark and the asset then conduct a linear regression.

def SelectSymbols(self, history):
    '''Select symbols with the highest intercept/alpha to the benchmark
    alphas = dict()

    # Get the benchmark returns
    benchmark = history[self.benchmark].pct_change().dropna()

    # Conducts linear regression for each symbol and save the intercept/alpha
    for symbol in self.symbols:

        # Get the security returns
        returns = history[symbol].pct_change().dropna()
        returns = np.vstack([returns, np.ones(len(returns))]).T

        # Simple linear regression function in Numpy
        result = np.linalg.lstsq(returns, benchmark)
        alphas[symbol] = result[0][1]

    # Select symbols with the highest intercept/alpha to the benchmark
    selected = sorted(alphas.items(), key=lambda x: x[1], reverse=True)[:2]
    return [x[0] for x in selected]

Step 4: Rebalance Function:

This function is where all the action happens, it will be executed on the first trading day of each month as a Scheduled Event. The algorithm closes all positions of securities that were not selected using Liquidate and go 100% long for both of the selected symbols using SetHoldings.

def Rebalance(self):

    # Fetch the historical data to perform the linear regression
    history = self.History(
        self.symbols + [self.benchmark], 

    symbols = self.SelectSymbols(history)

    # Liquidate positions that are not held by selected symbols
    for holdings in self.Portfolio.Values:
        symbol = holdings.Symbol
        if symbol not in symbols and holdings.Invested:

    # Invest 100% in the each of the selected symbols
    for symbol in symbols:
        self.SetHoldings(symbol, 1)


We have demonstrated that during a smooth market, the stocks that beat the market last month are likely to beat the market again in the subsequent month. When there is market fluctuation, the significance level of linear regression will reduce and the model performance will decrease. We can understand this by looking at the covariance of the asse (x) and the benchmark (y). As the covariance reduces to zero, the beta will decrease.

\[\hat{\beta} = \frac{Cov[x,y]}{\sum (x_i - \beta{x})^2}\]

As an experiment, we tested the algorithm on market data from 2015. This was a much more volatile period for the market with a fluctuation that returned a mean close to zero and dropped nearly 10% from Aug 18th to Aug 25th of that year. The algorithm performed quite poorly in this year with a return rate of -15.463%. The risks associated with this strategy include a high drawdown, lack of hedging and no stop-loss. Since we are using leverage, the risk is increased and it has a margin call in January as a result. We can improve the performance by applying the following techniques:

  • Conduct optimizations: we can implement mean-variance analysis to determine the asset allocation each month and select more stocks to trade. This will lower our risk and manage the portfolio more scientifically.
  • Take beta into consideration: If we want to be more aggressive, we can select targets by a combination of alpha and beta. This means we choose stocks with a high alpha that are more volatile than the market. If we are conservative investors however, we can make the strategy market-neutral, which means the portfolio would not be affected by the market performance. For example, if we long two stocks with beta 1 and -1 respectively at the same position size, our portfolio becomes market-neutral.


  2. Sharpe, William, 1990