Backtesting

Research Guide

Introduction

Backtesting is the process of simulating a trading algorithm on historical data. By running a backtest, you can measure how the algorithm would have performed in the past. Although past performance doesn't guarantee future results, an algorithm that has a proven track record can provide investors with more confidence when deploying to live trading than an algorithm that hasn't performed favorably in the past. Use the QuantConnect platform to run your backtests because we have institutional-grade datasets, an open-source backtesting engine that's constantly being improved, cloud servers to execute the backtests, and the backtesting hardware is maintained 24/7 by QuantConnect engineers.

QuantConnect aims to teach and inspire our community to create high-performing algorithmic trading strategies. We measure our success by the profits created by the community through their live trading. As such, we try to build the best quantitative research techniques possible into the product to encourage a robust research process.

Hypothesis-Driven Research

We recommend you develop an algorithmic trading strategy based on a central hypothesis. You should develop an algorithm hypothesis at the start of your research and spend the remaining time exploring how to test your theory. If you find yourself deviating from your core theory or introducing code that isn't based around that hypothesis, you should stop and go back to thesis development.

Wang et al. (2014) illustrate the danger of creating your hypothesis based on test results. In their research, they examined the earnings yield factor in the technology sector over time. During 1998-1999, before the tech bubble burst, the factor was unprofitable. If you saw the results and then decided to bet against the factor during 2000-2002, you would have lost a lot of money because the factor performed extremely well during that time.

Hypothesis development is somewhat of an art and requires creativity and great observation skills. It is one of the most powerful skills a quant can learn. We recommend that an algorithm hypothesis follow the pattern of cause and effect. Your aim should be to express your strategy in the following sentence:

A change in {cause} leads to an {effect}.

To search for inspiration, consider causes from your own experience, intuition, or the media. Generally, causes of financial market movements fall into the following categories:

  • Human psychology
  • Real-world events/fundamentals
  • Invisible financial actions

Consider the following examples:

Causeleads toEffect
Share class stocks are the same company, so any price divergence is irrational...A perfect pairs trade. Since they are the same company, the price will revert.
New stock addition to the S&P500 Index causes fund managers to buy up stock...An increase in the price of the new asset in the universe from buying pressure.
Increase in sunshine-hours increases the production of oranges...An increase in the supply of oranges, decreasing the price of Orange Juice Futures.
Allegations of fraud by the CEO causes investor faith in the stock to fall...A collapse of stock prices for the company as people panic.
FDA approval of a new drug opens up new markets for the pharmaceutical company...A jump in stock prices for the company.
Increasing federal interest rates restrict lending from banks, raising interest rates...Restricted REIT leverage and lower REIT ETF returns.

There are millions of potential alpha strategies to explore, each of them a candidate for an algorithm. Once you have chosen a strategy, we recommend exploring it for no more than 8-32 hours, depending on your coding ability.

Research Panel

We launched the Research Guide in 2019 to inform you about common quantitative research pitfalls. It displays a power gauge for the number of backtests performed, the number of parameters used, and the time invested in the strategy. These measures can give a ballpark estimate of the overfitting risk of the project. Generally, as a strategy becomes more overfit on historical data, it is less likely to perform well in live trading. These properties were selected based on the recommended best practices of the global quantitative research community.

Restricting Backtests

According to current research, the number of backtests performed on an idea should be limited to prevent overfitting. In theory, each backtest performed on an idea moves it one step closer to being overfitted as you are testing and selecting for strategies written into your code instead of being based on a central thesis. For more information, see the paper Probability of Backtest Overfitting (Bailey, Borwein, Jim Zho, & López de Prado, 2015).

QuantConnect does not restrict the number of backtests performed on a project, but we have implemented the counter as a guide for your reference. Your coding skills are a factor in how many backtests constitute overfitting, so if you are a new programmer you can increase these targets.

Backtest Count Overfit Reference
0-30: Likely Not Overfit 30-70: Possibly Overfitting 70+ Probably Overfitting

Reducing Strategy Parameters

With just a handful of parameters, it is possible to create an algorithm that perfectly models historical markets. Current research suggests keeping your parameter count to a minimum to decrease the risk of overfitting.

Parameter Overfit Reference
0-10: Likely Not Overfit 10-20: Possibly Overfitting 20+ Probably Overfitting

Limiting Research Time Invested

As you spend more time on one algorithm, research suggests you are more likely to be overfitting the strategy to the data. It is common to become attached to an idea and spend weeks or months to perform well in a backtest. Assuming you are a proficient coder who fully understands the QuantConnect API, we recommend no more than 16 hours of work per experiment. In theory, within two full working days, you should be able to test a single hypothesis thoroughly.

Research Time Overfitting Reference
0-8 Hours: Likely Not Overfit 8-16 Hours: Possibly Overfitting 16 Hours+ Probably Overfitting

Parameter Detection

Using parameters is almost unavoidable, but a strategy trends toward being overfitted as more parameters get added or fine-tuned. Adding or optimizing parameters should only be done by a robust methodology such as walk-forward optimization. The parameter detection system is a general guide to inform you of how many parameters are present in the algorithm. It looks for criteria to warn that code is potentially a parameter. The following table shows the criteria for parameters:

Parameter Types Example Instances
Numeric ComparisonNumeric operators used to compare numeric arguments: <= < > >=
Time SpanSetting the interval of TimeSpan or timedelta
Order EventInputting numeric arguments when placing orders
Scheduled EventInputting numeric arguments when scheduling an algorithm event to occur
Variable AssignmentAssigning numeric values to variables
Mathematical OperationAny mathematical operation involving explicit numbers
Lean APINumeric arguments passed to Indicators, Consolidators, Rolling Windows, etc.

The following table shows common expressions that are not parameters:

Non-Parameter Types Example Instances
Common APIs SetStartDate, SetEndDate, SetCash, etc.
Boolean Comparison Testing for True or False conditions
String Numbers Numbers formatted as part of Log or Debug statements
Variable Names Any variable names that use numbers as part of the name (for example, smaIndicator200)
Common Functions Rounding, array indexing, boolean comparison using 1/0 for True/False, etc.

Overfitting

Overfitting occurs when you fine-tune the parameters of an algorithm to fit the detail and noise of backtesting data to the extent that it negatively impacts the performance of the algorithm on new data. The problem is that the parameters don't necessarily apply to new data and thus negatively impact the algorithm's ability to generalize and trade well in all market conditions. The following table shows ways that overfitting can manifest itself:

Data PracticeDescription
Data Dredging Performing many statistical tests on data and only paying attention to those that come back with significant results.
Hyper-Tuning Parameters Manually changing algorithm parameters to produce better results without altering the test data.
Overfit Regression Models Regression, machine learning, or other statistical models with too many variables will likely introduce overfitting to an algorithm.
Stale Testing Data Not changing the backtesting data set when testing the algorithm. Any improvements might not be able to be generalized to different datasets.

An algorithm that is dynamic and generalizes to new data is more valuable to funds and individual investors. It is more likely to survive across different market conditions and apply to new asset classes and markets.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: