Making Models that Fit the Signal, Not the Noise
One of the most common pitfalls when creating an algorithm is overfitting.
Overfitting is the act of fine-tuning the parameters of an algorithm to fit the detail and noise of backtesting data to the extent that it negatively impacts the performance of the algorithm on new data.
It can manifest itself in a variety of ways, but probably the most common, and the easiest to detect, is adding too many parameters to a model with the hope of producing better backtest results.
To help our quants avoid this common issue, we created the parameter detector built into our development environment. This helps them assess the changes they are overfitting their strategy. Below, we’ll walk through some steps to keep in mind as you design your algorithms to avoid overfitting.
Start by intentionally limiting your parameters.
Since using too many parameters is often a cause of overfitting, attempt to design your strategy to avoid them where possible. There is no single number of parameters that we recommend you use, and no hard limit. It depends a lot on the overall complexity of the algorithm, but in general, if you have more than 3-4 parameters for a relatively simple algorithm, then you are at risk for overfitting.
When you do use parameters, try and make them grounded in reality by being based on a well known constant. For example, a corn-futures strategy based on trading weather could use the historical averages of the region. Outsourcing the definition of constants like this can help reduce the “tweaking” you end up doing to your model.
Alternatively, when seeking to detect abnormal values, you should apply the rolling standard deviation wherever possible. Using our weather example above, if you were trying to detect abnormal high weather conditions you could derive a mean and standard deviation of temperatures. When a day’s temperature is more than two standard deviations from the mean you can be 95% certain it is a weather anomaly. This defines your range as a “rolling parameter” based on a common statistical constant.
Some algorithms may require more parameters, so it is a relative number, but keep in mind that any model can become overfit more quickly than most people expect.
Resist the temptation to go back and make changes to the model.
Once you’ve run your first backtest, you may have to exercise some self-restraint. It can be tempting to go back and make small changes or add parameters to your model, but in doing so, you run the risk of overfitting.
While it might enhance your model’s performance over the test data, it will almost certainly set you up for failure in live trading.
The latest research suggests that with each backtest you perform your performance should be scaled down to reflect the possibility of overfitting. The “Deflated Sharpe Ratio” attempts to account for selection bias caused by researchers running multiple backtests. Keep in mind that institutions seeking to license your strategy care about your research process, and constantly tweaked models will be quickly discarded.
If you do make changes, stress test the model.
Any time you make changes to your model, you need to make sure it still performs well. Here are three tests you can try:
- Run it against different data: does it perform as well?
- Run it in crisis conditions: does it survive 2007-2009?
- Run it against randomly-generated data: you can gain such data via the LEAN Random Data Generator.
Do everything you can to make sure the functionality of the model is consistent across a variety of market conditions and data samples. Doing this gives your model a higher chance of performing similarly in the future than a model that has been tuned to perform exceptionally well over a small sample of data and narrow range of market conditions. You can use the LEAN Report Creator to generate reports covering the top 18 crisis events since 1999.
Use forward performance testing to ensure that your model really works.
Walk-forward optimization (WFO) is a method for finding and testing the optimal parameters of a trading strategy. Parameters are optimized using a fixed dataset, and then the strategy is run using data that is isolated from the test data. The standard method is to use a period of historical data for optimizing the initial parameters and then let the strategy operate on subsequent historical data — in other words, let it “walk forward” in time. WFO doesn’t need to be done exactly this way, but this is a simple method to simulate how a backtested strategy would have performed in live trading.
When you need to choose a parameter, you can do so in QuantConnect by using the history API, or a rolling window of data, to define your algorithm parameters. This helps you simulate live trading where you don’t know the best parameters to use ahead of time.
The results of WFO can tell us a lot about a strategy, but it mainly demonstrates how fragile or robust a model is. If a strategy is constructed so that its parameters maximize performance over a single test dataset, then it will likely struggle in walk-forward testing.
As you begin building your algorithms, remember that overfitting can happen very easily and to anyone.
It’s easy to tell yourself that one little change to a parameter or adding just one more variable won’t overfit your model to a dataset, especially when you see that your resulting equity curve looks so much better. Holding yourself accountable to such strict standards is tough, and even the best quants and data scientists fail to do so at times. Especially when you work on your own or with only a few people, it can be easy to fall into the trap of overfitting due to lack of oversight.
We hope that watching the parameter counter tick up will help remind you to keep your model fitting the signal, not the noise. To learn more about parameter detection, please visit our research guide. Happy coding!