AuthorJing Wu2018-06-11


In this tutorial we use regression to predict the return from the stock market and compare it to the short-term U.S. T-bill rate. It is based on the paper "Striking Oil: Another Puzzle?" by Gerben, Ben and Benjamin (2007). If the predicted return is larger than the risk-free rate, the portfolio is fully invested in stock; if the predicted return is lower than the risk-free rate, the portfolio is invested in short-term U.S T-bills. The backtesting period starts in 1980 and is divided into an in-sample period where regression analysis is made and an out of sample period where the regression result is embedded "statically" into the strategy.

In our implementation of the strategy we adapt the method of the original paper to make it more applicable to the current market. We have set our backtesting period to be from 2010 to 2017 and we refresh our regression analysis each month to form a rolling dynamic projection. This is because empirical evidence shows us the correlation between oil and stocks is not as strong as in the 1980's. We use the price of S&P GSCI Crude Oil Total Return Index ETNs to represent spot oil price. We import the oil price and T-bill data from Nasdaq Data Link. We use Scheduled Events to trigger an event every month automatically and the History function to retrieve data for regression analysis.

Our analysis shows this strategy under performs the market in recent years. In the 9 year analysis period the algorithm was mostly long the S&P500 index and only 9 trades were performed as the markets were strongly bullish. The trades could potentially simply be due to the weakening of the relationship between stocks and oil.


We assume the predicted return of the stock is proportional to the return of oil. This can be represented by the regression equation:


The independent variable is the return of the oil and the dependent variable is the return of the stock. We use the monthly returns over a regression period of 2 years, giving us 22-23 observations to regress. Every month regression analysis is conducted, and we use the estimated coefficient from the regression to compute the expected stock return with the given return of oil.


The algorithm implementation consists of mainly three parts: Defining the custom imported data, initialization of the strategy parameters, and monthly re-balancing of the portfolio.

Step 1: Defining Custom Imported Data

We import T-Bill data from Nasdaq Data Link - a marketplace for financial, economic and alternative data. In our Initialize function, we use the following commands to add the custom data to our algorithm.

self.oil = self.AddData(NasdaqDataLink, "OPEC/ORB").Symbol
self.tbill = self.AddData(NasdaqDataLink, "USTREASURY/BILLRATES").Symbol

Step 2: Initialization of the Strategy Parameters

In our Initialize function we set the cash amount, start-end date as well as other parameters that are specific to this strategy. We set a parameters for the regression analysis period:

self.regPeriod = timedelta(days=2*365)

The variable regPeriod indicates the period of time we are going to take into consideration in our regression analysis. We need to set up the Scheduled Event in Initialize so as to trigger the monthly re-balancing function every month.

self.Schedule.On(self.DateRules.MonthStart(self.spy), self.TimeRules.At(0, 0), self.MonthlyReg)

Step 3: Monthly Re-balancing of the Portfolio

Every month we reconstruct the regression analysis to determine whether to be 100% long stocks or T-Bill contracts. We perform this re-balancing in the MonthlyReg function at the start of each month. We use the History function to retrieve historical data for oil and stocks and then divide the T-Bill rate by 12 to make it comparable to the monthly expected return of stocks.

hist = self.History([self.oil, self.spy], self.regPeriod, Resolution.Daily)
oilSeries = hist.loc[self.oil]['value'].resample('m').last()
spySeries = hist.loc[self.spy]['close'].resample('m').last()
index = sorted(set(oilSeries.index).intersection(spySeries.index))
oilSeries = oilSeries[index]
spySeries = spySeries[index]
rf = float(self.Securities[self.tbill].Price)/12.0

Then we make an OLS regression by using numpy to make the prediction on next month's stock return.

x = np.array(oilSeries)
x = (np.diff(x)/x[:-1])
y = np.array(spySeries)
y = (np.diff(y)/y[:-1])
A = np.vstack([x[:-1],np.ones(len(x[:-1]))]).T
beta, alpha = np.linalg.lstsq(A,y[1:])[0]
yPred = alpha + x[-1]*beta

Finally, we compare the expected return of stocks with risk-free rate. If the former is larger than the latter, we invest fully in stocks; otherwise we liquidate our holdings. Because we cannot purchase T-Bill contracts the performance is likely slightly underestimated.

if yPred > rf:
    self.SetHoldings(self.spy, 1)


We backtested this strategy over the period beginning in 2010 and ending in 2017. It has a Sharpe ratio of 0.726, which is similar to the benchmark's 0.735 over the same period. Although the annual return closely matches that of the paper, it is largely a coincidence of the strong bull market in recent years. If we look at the monthly regression results, we could find that in most cases, the p-value is not small enough to reject the null hypothesis that there is no correlation between oil and stocks. So the investment decisions based on the insignificant statistical results are almost meaningless. The performance of this strategy cannot effectively beat the benchmark, mostly due to the weakened correlation between oil and stocks.

Further research and backtesting could be conducted on assets other than oil that have a stronger relationship with stocks.


Strategy code, as well as backtesting result, is attached below.


  1. Gerben, Driesprong (2007). Striking Oil: Another Puzzle? page 1, Online Copy

QuantConnect Logo

Try the world leading quantitative analysis platform today

Sign Up

Previous: The Momentum Strategy Based on the Low Frequency Component of Forex Market Next: Combining Mean Reversion and Momentum in Forex Market