A very profitable version of IN and OUT, and why it is likely to fail in real life trading like its siblings

The IN and OUT strategy is a very popular strategy on this forum. Unlike many here I have been very skeptical of the strategy in general in all its forms, because its alleged performance relies solely on what seems to me highly overfitted in-sample backtests. Although most acknowledge the strategy is overfit to some degree, most assume overfitting will result in a somewhat less spectacular real life performance. The reality is, that it is likely, it will not just fail to meet expectations, but generate negative alpha in the long run.

To illustrate the point, I've constructed my own version of IN & OUT. The strategy switches between QQQ and TLT at the market open, while positions are based on the closing price. Most of the IN & OUT strategies incorporate the sector ETFs XLI and XLU, so I will do as well, with the difference, that I will use exponential movong averages to calculate the trends. Additionally I consider the covariance between QQQ and TLT. I calculate an exponentially weighted historic daily covariance between QQQ and TLT:

The figure clearly shows, that QQQ and TLT show a very negative covariance in periods of crisis over the considered time frame. As such the strategy is as follows,

Exit QQQ and enter TLT when:

EMA XLI < EMA XLU (filter setting alpha = 0.05 ~ SMA 40 days)

EM COV QQQ/TLT < exit limit (filter setting alpa = 0.1 ~ SM COV 20 days, exit limit = -1e-4)

As such there are three adjustable parameters (two filters, and an exit level for covariance), which is relatively conservative compared to some of the alternatives presented on this forum. Applying this strategy we obtain the following equity curve:

The stats are as follows:

CAGR: 27.0%, Sharpe ratio: 1.64, Max drawdown: 15.9%

Not bad, right?

Unlike the other versions of IN & OUT, it's actually fairly easy to extend the data set to include a longer history, since the only ETF that starts in 2007 is TLT. We will extend the TLT data with the mutual fund VUSTX, which has a high correlation with TLT. We thus have a data set going back to 1999, which includes another major crisis. So what's the out-of-sample performance? Here it is:

CAGR: -5.65%, Sharpe ratio: -0.24, Max drawdown: 69.4%

How can this be? The answer is simple. the strategy is overfit, a conclusion that we could have determined from the ETF data alone, which I will discuss in an upcoming post.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

I think there are other variations more popular now that use for example use the top 10 value + momentum stocks selected on dollar volume traded and EV/Ebitda rather than the QQQ etf. Long term who knows how QQQ will perform considering tech stocks had a 80% crash after 2000 dotcom crash. But the other stock selection method i suggested above with montly rebalancing should do fine.

Hi Leandro,

It's not so much about pointing out stuff, that will work or won't work. I could start a thread on a strategy, but there are already so many of those on this forum. I believe the success of a trading strategy lies more in the process by which strategies are developed than it does with the individual strategy or product that follows from this process. A good quality product follows from a good quality process. Questions like "What consitutes a good strategy?" and "How do we test the goodness of a strategy from a modelling perspective?" are far more important than "CAGR, Sharpe Ratio, or Max Drawdown". Why? Because we have to ask, whether these figures of merit are in any way representative of how a strategy will perform out-of-sample. In my view far too much time is spent optimizing in-sample backtests to the point, that they are overfit, and likely to fail in practise. Look I get it, we all like to see the next best thing with double digit returns, but in my view that is not what strategy development is about. So, I try to offer a different perspective, that isn't about chasing returns, but about expanding our skill set, and using a scientific approach.

Goldie Yalamanchi

5.2k ,

Leandro Maia

5.6k ,

Hi Menno!

I really appreciate your feedbacks and suggestions pointing out stuff that won't work.

Could you also suggest stuff that would work?

Vovik

1.1k ,

Mennor,

I take your findings as well as those of a commission that stated that if a truck on two wheels built in Bangladesh crashed in 2000, then all six-wheeled trucks built by Tesla and all twelve-wheeled trucks built by General Motors were likely to crash in the future.

Your two-wheeled model more likely to crash than six-wheeled or twelve-wheeled ones.

I should add, that I've already demonstrated, that another "six wheel" version of IN & OUT doesn't perform out-of-sample through a purged cross-validation. In the world of data driven financial modelling six wheeled and twelve wheeled trucks are much more likely to crash than two wheeled ones.

Vladimir

94.7k ,

Menno Dreishore,

I will not believe any of your words in the results of your study until you publish your
IN and OUT strategy code, where I'm sure we can find many mistakes and tricks that made
it a loser in out of sample period.
I am not a scientist, but I know how much QQQ suffered in 2000-2002 and how well that time was for bonds.
If you carefully analyze the source of earnings in any IN & OUT strategy, you may find that the main source of profits are bonds, and it works better in bearish markets.
You can show your pessimistic pictures to your boss, colleagues, but please do not post them on the
QC forum without code that can confirm them.

Arthur Asenheimer

17.8k ,

>> Anyone can mess up a good strategy, few can create one. <<

It's actually more the other way round, imho.

When you define "good strategy" only by its performance metrics like PSR, Sharpe Ratio, CAGR, max Drawdown etc., it's very easy to create a "good strategy". It's much harder to identify the weaknesses of the strategy (or as you would say: to mess up the strategy).

We should always know and check the assumptions we've made when creating a trading strategy.

FYI, during 2000-2003, I made over 600% on my retirement accounts switching between
Strong International and Strong Utility based on NDX movements.
What about you?

Here's a picture of one of my latest versions using the OUT_DAY approach.

Do you guys like to communicate with pictures?

Jared Broad

STAFF ,

At no one in particular - keep discussion civil please. We're launching a code of conduct today and will require respectful courteous debate to participate on QuantConnect. If a comment or discussion cannot be friendly and respectful please do not reply.

@Vladimir You can ignore these warnings, if you want. I have no stake in this other than to provide a professional with 15 years of modelling experience's perspective. Since I work in MATLAB I sadly cannot share the code in the forum, but it can be easily verified by anyone willing to take a critical look, and do some out-of-sample testing, which is possible just using the existing ETF data. I've done it for the original version of IN & OUT and it did not work out-of-sample. I have shown, that similar Sharpe ratios can be obtained by applying IN & OUT to random data. I've shown that models with similar Sharpe ratios on real and random data are expected to have severely deflated out-of-sample Sharpe ratios. I understand it's difficult to accept from a layman's perspective, that a strategy with a seemingly very good performance in-sample, can not just lose some of its edge, but all of it, or worse underperform its benchmark in the long run, but it is very common in not just in finance, but in chemistry, and physics as well. A good builder of statistical models should be skeptical by nature. In my opinion the process followed in constructing these models don't pass muster scientifically. A backtest without out-of-sample testing, and robustness analyses is practically worthless without a follow up. So, if you are really serious about developing quantitative models, you should want to know these things...

Just to be clear, I'm not out to criticize anyone in particular. I like many of the ideas presented here, but sometimes it's useful to go against the grain. I know, I won't win any popularity contest for it, but c'est la vie.

Just to further enforce my point, here's a simplified IN & OUT, which just uses SPY for trading, and the XLI as a signal. We determine positions, and enter or exit SPY at the daily close. There are three parameters:

- Lookback period for XLI

- The exit level for XLI

- The number of days we exit SPY

I optimize the parameters for the dummy strategy by maximizing the Sharpe ratio over the period 2007-2020. The optimal parameters are:

- Lookback period for XLI: 65 days

- The exit level for XLI: -4%

- The number of days we exit SPY: 5 days

The equity curve for the period 2007 - 2020:

Like before it performs very well, avoiding all the major drawdowns of the SPY. Of course we have a much longer history for SPY and XLI, so we can do an out-of-sample test for the period 1999-2006. Here is the equity curve:

The out-of-sample test shows the strategy completely breaks down, and generates negative alpha like in my first post.

I think I have shown, that it's just too easy to avoid the three largest drawdowns in the SPY with a few adjustable parameters. Add an asset like TLT, that negatively correlated with equity during these periods, and you seemingly have a good strategy. However, on closer inspection the parameters are overfit on the period used for optimization, and the negative correlation between TLT and equity was much weaker in another time frame, and so the model fails to find the proper exit levels out-of-sample causing the model to be out of phase with the market, resulting in catastrophy, whilst an asset like TLT doesn't provide much intermediate return to soften the blow.

Arthur Asenheimer,

Are you sure that the strategy Menno used to model crash is the same as those discussed on QC forum?
Menno used absolutely different strategy, Vovik cal it two-wheeled model.
Is this a scientific approach?

Are you sure that he did not over fit that strategy for crash?

No recommendation, just a statement.
The reality is, that it is likely, it will not just fail to meet expectations,
but generate negative alpha in the long run.

What is percent probability for likely?
And when it may happens?
What is VUSTX beta to TLT?
How long is the long run?

The only usefull is he remind me of the NASDAQ crash - a good time for my retirement accounts.
But this has nothing to do with the IN and OUT strategy.
There are versions of the strategy. which do not use QQQ but dynamically select 10-20 stocks
based on momentum and fundamentals.

@Vladimir TLT and VUSTX are highly correlated, since they both are long term bond funds. I don't have the exact number, but the strategy performs about the same over the period 2007-2020 whether you use TLT or VUSTX. However, you seem to be missing the point. It's nothing to do with using QQQ or SPY, or a subselection of stocks. It's the fact that it is much too easy to generate signals, that will help you avoid the three biggest drawdowns of the US stock market over the period 2007-2020. I have shown two ways, but you can use any other way, and it will have the same critical weakness. Avoiding those drawdowns in-sample, and generating excess return by using a negatively correlated asset is too easy to achieve for it to have any statistical significance.

Jared Brod,

sorry, it was Google that gave me this translation.

Menno Dreischor,

It's you must prove that you did not distort the model.
Sine you cannot share the code in the forum, as I expected, there is nothing to disscus.
I do not believe any of your words in the results of your study and turn off follow discassion bell.
I wonder how many strategies have you approved in 15 years?

@Vladimir It doesn't matter, whether you believe me, because reality will ultimately prove my point. These tests are easy to implement, and if someone who has some experience with modelling were to tell me, that I have overlooked issues with my strategy, that can be uncovered with proven model validation techniques, I would not hesitate to apply them. There are afterall investments on the line. We've been managing investments for well over a decade using a number of different quantitative strategies. I cannot go into details, but I will tell you, that our latest machine learning based models deliver in-sample Sharpe ratio's range between 5 and 7, while the out-of-sample Sharpe ratio's range from 1.8 - 2.5 over periods ranging from 12 to 20 years.

TLT and VUSTX are highly correlated .

All bonds(5-30 years) are highly correlated because they are in the same asset class.

AGG to TLT beta - 0.4

TMF to TLT beta - 3,0

@Vladimir I'm talking about correlations of >95%, and funds that have the same underlying assets. AGG is an accurate bond ETF, which means that just by its description it is significantly different from TLT, and cannot be used as a proxy. TMF is leveraged, so it will again not be a good proxy for TLT.

INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

301,800 Quants.

VOTE FOR UPCOMING FEATURES

A very profitable version of IN and OUT, and why it is likely to fail in real life trading like its siblings

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

Platform

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

301,800 Quants.

VOTE FOR UPCOMING FEATURES

A very profitable version of IN and OUT, and why it is likely to fail in real life trading like its siblings

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free