In This Page

Introduction

Hypothesis-Driven Research

Research Panel

Parameter Detection

Overfitting

Out of Sample Period

Backtesting

Research Guide

Introduction

QuantConnect aims to teach and inspire our community to create high-performing algorithmic trading strategies. We measure our success by the profits created by the community through their live trading. As such, we try to build the best quantitative research techniques possible into the product to encourage a robust research process.

Hypothesis-Driven Research

We recommend you develop an algorithmic trading strategy based on a central hypothesis. You should develop an algorithm hypothesis at the start of your research and spend the remaining time exploring how to test your theory. If you find yourself deviating from your core theory or introducing code that isn't based around that hypothesis, you should stop and go back to thesis development.

Wang et al. (2014) illustrate the danger of creating your hypothesis based on test results. In their research, they examined the earnings yield factor in the technology sector over time. During 1998-1999, before the tech bubble burst, the factor was unprofitable. If you saw the results and then decided to bet against the factor during 2000-2002, you would have lost a lot of money because the factor performed extremely well during that time.

Hypothesis development is somewhat of an art and requires creativity and great observation skills. It is one of the most powerful skills a quant can learn. We recommend that an algorithm hypothesis follow the pattern of cause and effect. Your aim should be to express your strategy in the following sentence:

A change in {cause} leads to an {effect}.

To search for inspiration, consider causes from your own experience, intuition, or the media. Generally, causes of financial market movements fall into the following categories:

Human psychology
Real-world events/fundamentals
Invisible financial actions

Consider the following examples:

Cause	leads to	Effect
Share class stocks are the same company, so any price divergence is irrational...		A perfect pairs trade. Since they are the same company, the price will revert.
New stock addition to the S&P500 Index causes fund managers to buy up stock...		An increase in the price of the new asset in the universe from buying pressure.
Increase in sunshine-hours increases the production of oranges...		An increase in the supply of oranges, decreasing the price of Orange Juice Futures.
Allegations of fraud by the CEO causes investor faith in the stock to fall...		A collapse of stock prices for the company as people panic.
FDA approval of a new drug opens up new markets for the pharmaceutical company...		A jump in stock prices for the company.
Increasing federal interest rates restrict lending from banks, raising interest rates...		Restricted REIT leverage and lower REIT ETF returns.

There are millions of potential alpha strategies to explore, each of them a candidate for an algorithm. Once you have chosen a strategy, we recommend exploring it for no more than 8-32 hours, depending on your coding ability.

Research Panel

We launched the Research Guide in 2019 to inform you about common quantitative research pitfalls. It displays a power gauge for the number of backtests performed, the number of parameters used, and the time invested in the strategy. These measures can give a ballpark estimate of the overfitting risk of the project. Generally, as a strategy becomes more overfit on historical data, it is less likely to perform well in live trading. These properties were selected based on the recommended best practices of the global quantitative research community.

Restricting Backtests

According to current research, the number of backtests performed on an idea should be limited to prevent overfitting. In theory, each backtest performed on an idea moves it one step closer to being overfitted as you are testing and selecting for strategies written into your code instead of being based on a central thesis. For more information, see the paper Probability of Backtest Overfitting (Bailey, Borwein, Jim Zho, & López de Prado, 2015).

QuantConnect does not restrict the number of backtests performed on a project, but we have implemented the counter as a guide for your reference. Your coding skills are a factor in how many backtests constitute overfitting, so if you are a new programmer you can increase these targets.

Backtest Count Overfit Reference
0-30: Likely Not Overfit	30-70: Possibly Overfitting	70+ Probably Overfitting

Reducing Strategy Parameters

With just a handful of parameters, it is possible to create an algorithm that perfectly models historical markets. Current research suggests keeping your parameter count to a minimum to decrease the risk of overfitting.

Parameter Overfit Reference
0-10: Likely Not Overfit	10-20: Possibly Overfitting	20+ Probably Overfitting

Limiting Research Time Invested

As you spend more time on one algorithm, research suggests you are more likely to be overfitting the strategy to the data. It is common to become attached to an idea and spend weeks or months to perform well in a backtest. Assuming you are a proficient coder who fully understands the QuantConnect API, we recommend no more than 16 hours of work per experiment. In theory, within two full working days, you should be able to test a single hypothesis thoroughly.

Research Time Overfitting Reference
0-8 Hours: Likely Not Overfit	8-16 Hours: Possibly Overfitting	16 Hours+ Probably Overfitting

Parameter Detection

Using parameters is almost unavoidable, but a strategy trends toward being overfitted as more parameters get added or fine-tuned. Adding or optimizing parameters should only be done by a robust methodology such as walk-forward optimization. The parameter detection system is a general guide to inform you of how many parameters are present in the algorithm. It looks for criteria to warn that code is potentially a parameter. The following table shows the criteria for parameters:

Parameter Types	Example Instances
Numeric Comparison	Numeric operators used to compare numeric arguments: <= < > >=
Time Span	Setting the interval of `TimeSpan` or `timedelta`
Order Event	Inputting numeric arguments when placing orders
Scheduled Event	Inputting numeric arguments when scheduling an algorithm event to occur
Variable Assignment	Assigning numeric values to variables
Mathematical Operation	Any mathematical operation involving explicit numbers
Lean API	Numeric arguments passed to Indicators, Consolidators, Rolling Windows, etc.

The following table shows common expressions that are not parameters:

Non-Parameter Types	Example Instances
Common APIs	`SetStartDateset_start_date`, `SetEndDateset_end_date`, `SetCashset_cash`, etc.
Boolean Comparison	Testing for True or False conditions
String Numbers	Numbers formatted as part of `Loglog` method or `Debugdebug` method statements
Variable Names	Any variable names that use numbers as part of the name (for example, `smaIndicator200`)
Common Functions	Rounding, array indexing, boolean comparison using 1/0 for True/False, etc.

Overfitting

Overfitting occurs when you fine-tune the parameters of an algorithm to fit the detail and noise of backtesting data to the extent that it negatively impacts the performance of the algorithm on new data. The problem is that the parameters don't necessarily apply to new data and thus negatively impact the algorithm's ability to generalize and trade well in all market conditions. The following table shows ways that overfitting can manifest itself:

Data Practice	Description
Data Dredging	Performing many statistical tests on data and only paying attention to those that come back with significant results.
Hyper-Tuning Parameters	Manually changing algorithm parameters to produce better results without altering the test data.
Overfit Regression Models	Regression, machine learning, or other statistical models with too many variables will likely introduce overfitting to an algorithm.
Stale Testing Data	Not changing the backtesting data set when testing the algorithm. Any improvements might not be able to be generalized to different datasets.

An algorithm that is dynamic and generalizes to new data is more valuable to funds and individual investors. It is more likely to survive across different market conditions and apply to new asset classes and markets.

Out of Sample Period

To reduce the chance of overfitting, organization managers can enforce all backtests must end a certain number of months before the current date. For example, if you set a one year out-of-sample period, the researchers on your team will not be able to use the most recent year of data in their backtests. A out-of-sample period is helpful because it leaves you a period to test your model after your done the development stage. Follow these steps to change the backtest out-of-sample period:

Open the organization homepage.
Scroll down to the Backtesting Out of Sample Period section.
Adjust the out-of-sample period duration or click on "No Holdout Period".

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation:

Platform

Browse

Cloud Platform

Welcome

Getting Started

Security and IP

▶
Organizations

▶
Learning Center

▶
Projects

▶
Research

▶
Backtesting

▶
Datasets

▶
Live Trading

▶
Optimization

Object Store

▶
Community

▶
API Reference

Writing Algorithms

Research Environment

Local Platform

LEAN CLI

LEAN Engine

Hello

Backtesting

Research Guide

Introduction

Hypothesis-Driven Research

Research Panel

Restricting Backtests

Reducing Strategy Parameters

Limiting Research Time Invested

Parameter Detection

Overfitting

Out of Sample Period

Platform

SIGN IN

Browse

Cloud Platform

Welcome

Getting Started

Security and IP

▶Organizations

▶Learning Center

▶Projects

▶Research

▶Backtesting

▶Datasets

▶Live Trading

▶Optimization

Object Store

▶Community

▶API Reference

Writing Algorithms

Research Environment

Local Platform

LEAN CLI

LEAN Engine

Hello

Backtesting

Research Guide

Introduction

Hypothesis-Driven Research

Research Panel

Restricting Backtests

Reducing Strategy Parameters

Limiting Research Time Invested

Parameter Detection

Overfitting

Out of Sample Period

▶
Organizations

▶
Learning Center

▶
Projects

▶
Research

▶
Backtesting

▶
Datasets

▶
Live Trading

▶
Optimization

▶
Community

▶
API Reference