Hey Everyone,

Today I'm going to implement two concepts, (a) testing for stationarity and (b) normalizing data, in the research notebook. Using these methods can improve an Alpha’s ability to forecast and are valuable techniques to understand. The majority of time-series statistical forecasting methods are based on the assumption that the series being used is stationary or approximately stationary, whether on its own or through the use of transformations.

Briefly stated, a stationary series is one whose joint probability distribution remains the same over time. Put another way, it is a process whose mean, variance, autocorrelation, etc. are all constant over time.

Stationary series are useful because we can assume that its current statistical properties will remain constant. Given stationarity, we know approximately what future values will be and what sort of error will be present in our forecast, and this naturally leads us to think of taking advantage of the mean-reverting behavior of stationary series for trading purposes.

Unfortunately, equity prices are never stationary. However, we are able to approximate a stationary series by using the returns ("differencing" the series). This preserves the sequential nature of the series and other important properties and is usually a sufficient transformation to render a time-series stationary.

```# Import our custom functions from StationarityAndZScores import * # Import the Liquid ETF Universe helper methods from QuantConnect.Data.UniverseSelection import * # Initialize QuantBook and the Sector ETFs qb = QuantBook() symbols = [x for x in LiquidETFUniverse.SP500Sectors] # Fetch history and returns history = qb.History(symbols, 500, Resolution.Hour) returns = history.unstack(level = 1).close.transpose().pct_change().dropna()```

To ensure that we can act on our assumption of stationarity, we need to test the data. One of the most common tests for stationarity is the augmented Dickey-Fuller (ADF) test. An ADF test operates on the null-hypothesis that a unit is present in a time series sample (i.e. the time-series is not stationary). Therefore, when we apply the test to our transformed data, we want to look for a p-value of less than 0.05 so that we can confidently reject the null-hypothesis in favor of the alternative: the series is stationary.

(I won't go into the details of how an ADF test works, but you can find plenty of information about it online and other resources if you want to dive further into the mathematics.)

```def TestStationartiy(returns): # Return pandas Series with True/False for each symbol return pd.Series([adfuller(values) < 0.05 for columns, values in returns.iteritems()], index = returns.columns)```

We can also normalize the returns data and trade based on the z-score, which will give us an idea of how far of an outlier a given return is relative to its historical mean and variance.

```def GetZScores(returns): # Return pandas DataFrame containing z-scores return returns.subtract(returns.mean()).div(returns.std())```

In the research notebook, we can use these functions we've written to manipulate the data.

```# Test for stationarity stationarity = TestStationartiy(returns) # Get z-score z_scores = GetZScores(returns)```

Now we're able to examine the data, transform it, test for stationarity, and create z-scores to use in our actual algorithm. In this demonstration strategy, we'll arbitrarily pick the time to enter a position as when the z-score is one or more standard deviations below zero. Similarly, we'll exit a position when the z-score is one or more standard deviations above zero.

``` def TransformTestTrade(self): qb = self symbols = [x.Symbol for x in qb.ActiveSecurities.Values] # Copy and paste from research notebook # ----------------------------------------------------------------------------- # Fetch history and returns history = qb.History(symbols, 500, Resolution.Hour) returns = history.unstack(level = 1).close.transpose().pct_change().dropna() # Test for stationarity stationarity = TestStationartiy(returns) # Get z-scores z_scores = GetZScores(returns) # ----------------------------------------------------------------------------- insights = [] # Iterate over symbols for symbol, value in stationarity.iteritems(): # Only emit Insights for those whose returns exhibit stationary behavior if value: # Get most recent z_score z_score = z_scores[symbol].tail(1).values if z_score < -1: insights.append(Insight.Price(symbol, timedelta(1), InsightDirection.Up)) elif z_score > 1: if self.Portfolio[symbol].Invested: insights.append(Insight.Price(symbol, timedelta(1), InsightDirection.Flat)) self.EmitInsights(insights)```

The function above implements our trading strategy with minimal additions to the code we already wrote in the research notebook. To ensure that we trade or emit Insights frequently enough, we schedule this function to run every day 5-minutes after market-open.

``` def Initialize(self): self.SetStartDate(2018, 11, 1) # Set Start Date self.SetCash(1000000) # Set Strategy Cash self.SetBrokerageModel(AlphaStreamsBrokerageModel()) self.SetBenchmark('SPY') self.SetExecution(ImmediateExecutionModel()) self.SetPortfolioConstruction(EqualWeightingPortfolioConstructionModel()) self.UniverseSettings.Resolution = Resolution.Minute self.SetUniverseSelection(LiquidETFUniverse()) self.AddEquity('XLE') self.Schedule.On(self.DateRules.EveryDay('XLE'), self.TimeRules.AfterMarketOpen('XLE', 5), self.TransformTestTrade)```

Even though this isn't the best strategy ever written, I was able to test out some useful statistical techniques for analyzing data and then was able to quickly transfer it into practice and test out my ideas. Hopefully, this helps you do the same!