This thread is meant to continue the development of the In & Out strategy started on Quantopian. The first challenge for us will probalbly be to translate our ideas to QC code.
I'll start by attaching the version Bob Bob kindly translated on Vladimir's request.
Vladimir:
About your key error, did you also initialize UUP like this?
self.UUP = self.AddEquity('UUP', res).Symbol
Vladimir
Menno Dreischor,
I see that you are trying to comment on one of the very first algos in the Quantopian disscution forum.
Try a little bit newer:
the first two are published in this thread by Tentor Testivis or the last ones published by Peter Genter.
They are different from each other.
Manny Frederick
@Menmo, thanks for your critical feedback on this, this type of analysis adds more value for the amateur (like myself) than trying to game the returns. The first rule of trading is to know your risks.
Seeing that you are in the business, what are your views when you apply this critical thinking to a simple mechanical system like Meb Faber's GTAA? (Apologies if this should be in another thread, but I feel your views on a generally accepted strategy among the quants, like the GTAA, as compared to another "in and out" strategy, like this one, merits discussion).
More specifically, producing randomized prices using the covariance structure and applying the system to it as well as the cross-validation study, since the GTAA has severely underperformed in the past few years. Finally, do you think this is an example of overfit of in-sample data vs out-of-sample underperformance?
Manny Frederick Well, let's go through a simple procedure. I will consider the simplest case of a momentum like strategy, as presented by Faber. In this case I will be using the 150 year data set provided by Antonacci, which includes 4 assets: the S&P 500, EAFE, gold, and bonds, The model will switch between these four assets based on past performance, using a simple moving average over some lookback period. We will each month select the asset with the highest historic return.
First let's look at performance stability for different parameter settings, where we examine lookback periods between 1 and 24 months:
The performance is quite stable for the different settings to the extend, that every setting has resulted in an outperformance relative to the S&P 500 in the long run. So far so good!
Next we perform a leave one year out cross-validation to test the strategy out-of-sample:
The out of sample test based on cross-validation is a success, and the out of sample performance approaches the best in sample performance.
Finally we do a traditional out of sample test, where we optimize the strategy over the period 1871-1970, and do a forward test with the optimal parameter settings from 1970-2020:
The results look very promising, and while the strategy doesn't outperform the S&P 500 under all market conditions, we can expect it to outperform the S&P 500 in the long run with far less downside risk.
So, why does this strategy work? The underlying idea is simple, but sound. There's ample data to test the strategy, and only a single parameter to optimize, making the strategy robust against overfitting. This basic result then lends itself to further improvement, where it should be noted, that we should always check if adding complexity will improve the out of sample performance.
Vladimir
Joshua Tsai,
The strategy is rather robust (if slightly overfit).
I agree with that.
Nathan Swenson
Strategy says to be in bonds now. Seems like a good idea perhaps. My first trade to sync up with algo.
Goldie Yalamanchi
Nathan Swenson I too started running a live account. Have to switch to real money next (just worried about if my whole subaccount setup with IB is good for QC and won't log the algo out).
Anyways yes it is buying bonds. I am using TLT/IEF (out) and QQQ (in).
I hear Menno Dreischor concerns about too many parameters. I am not as an advanced chartist as he, but I have in NinjaTrader just even picked random indicators using the IA generation tool and then tried them out of sample and they did terrible.
But I do seem to have good confidence in this IN/OUT algo, I don't know if it will ALWAYS beat the SP500 but I do think it's hist shift logic gives some great risk avoidance compared to other indicators I tried to use like 50 or 200 SMA, etc.
So here is the chart after running live from 11/16/2020 -- there is indeed if you look at SPY price vs RSI I think there is actually bearish divergence in the price action occurring at the moment so I think bonds is the correct answer -- the market could head lower. I wouldn't bet the farm on this strategy but I think overall it won't be dangerous for your money. I think max maybe 10-15% drawdown at most.
Peter Guenther
Excellent work @all, especially Tristan F ’s coding work and Menno Dreischor ’s insights and results regarding strategy testing!
Menno Dreischor: very interesting analyses, thanks a lot for sharing! You seem to have some very nice simulation capabilities there. Would it be possible to also test the percentile version of the algo (signal = extreme return in the 1% tail) or would this be difficult to implement? Some changes were made to the algo based on the Quantopian discussions concerning moving away from hard-wired cut-off values. Also, in your tests regarding simulated random returns and optimized parameters, does the program give you summary output for the 100,000 runs? I see that you posted one example; yet, how many of the random runs are yielding returns close to or above the original strategy, was it 1, 10, 100, 1000, or more of the runs? I understand that you are looking for that one event to refute a strategy, however if you have this data it would still be interesting to look at it as well.
In terms of additional ways of testing, could we, instead of creating random price charts for the signals use actual realizations of the market and signals’ price data, e.g. using 6-month (or monthly) segments drawn via sampling with replacement, e.g. doing 1,000 runs and see how the strategy is holding up in these different data scenarios. Intuitively, the sampling of actual realizations might preserve some of the evolving dynamics that are going on in the market at different points in time. Not sure whether your software allows such bootstrapping procedures, or what your general take on this approach is.
Peter Guenther Thanks for your suggestions. I will take a look at the updated version. However, since I work in MATLAB I will have to see what changes were made, and translate it to that environment. When it comes to the 100,000 runs, only a small percentage results in a good performance. However, the point here is, that the more parameters you add to your system, the larger the probability, that you will be able to find regimes in any set of predictors, that will correlate with a good or even great asset performance, whether there really exists a relationship or not. Such a spurious relationship will obviously not hold op out of sample or in real life. Look at it this way, if a person tries to sell you a gold watch for a cheap price, and it looks like metal, and has a golden color, you may be tempted to take the deal. If I then present you 1,000 more watches, which look exactly the same, and tell you they are fool's gold, would you still trust the person, who claims he's selling you a real gold watch? I would not, unless the person involved can present some test, that can proof, that the watch he or she is selling is actually made of gold. The tests with random data, or by using different assets are there to show, that I can present 1,000 different models with different predictors, that look every bit as golden and shiny as the strategy, that is under investigation, particulary if these models are presented with some credible narrative to explain the spurious relationship. How would you know which of these is the real deal? Is it because the in and out strategy was presented to you first? What if it was another one of the fake strategies? Or is the in and out strategy also a fake strategy like the others? This is why it is imperative to always do out of sample testing on a significant amount of data. Cross-validation is ideally suited for this, since you can use it to do out of sample testing on all available data.
Guy Fleury
@Menno, after thousands of tests, you kind of find out that you cannot influence that much the stock price to go in the direction you want, no matter what tools you want to use, simple, complex, or otherwise. Short-term, the predictability of stock prices remains mostly elusive. We make bets that sometimes pay off and sometimes not. But we certainly cannot make a sure bet as to what will be the price tomorrow. Otherwise, we would be all-in all the time.
I look at the trading problem from a mathematical and statistical point of view. That it be over future or past data, I can express the outcome of any stock trading strategy using its payoff matrix: F(t) = F_0 + Σ (H ∙ deltaP). A simple expression covering it all. deltaP is the period to period price difference matrix for all the stocks in your portfolio or your entire trading universe if you want to. And H is the ongoing inventory held in your portfolio on any of the stocks in your trading universe at any point in time. I cannot change deltaP, but H is under my control and yours. We determine by our trading methods and procedures how many shares will be held in our inventory, in which stocks, at which time, and for whatever duration we want. That is what the trading strategy is trying to determine. That is what we can force our trading strategies to do.
The trading procedures are us putting thresholds, barriers, pivotal points all over the place based on whatever reasoning we want. The above equation is trade agnostic, it is only concerned with results, what your strategy does to the trading account. We wait for the thresholds or conditionals to be crossed in order to trigger our trades that they be long or short, initiating positions or closing them, or scaling them up or down. It is all us determining what is to be done under whatever we consider as appropriate stuff to do. That we be wrong or not. That the market complies or not. We could, technically, consider ourselves as another independent random-like variable in all this.
You can make your trading strategy dependent on market conditions or set up your own trading rules. You want a variance controlled setting, then you will have to live within those bounds. What I am saying is that we control the how we want to trade. It will not change the nature of the market and we can trade in whichever way we want at any given point in time for whatever reason that it be relevant or not. You make your next bet on the toss of a fair coin, does it matter if you decide to go for heads if your reason to do so was based on the phase of the moon or that you draw a 6 on your last draw of a dice. Your probability on a fair coin will still be one half. And whatever reason you had to toss that coin would be immaterial, but nonetheless, consequential. You would either win or lose.
Here is an annotated chart taken from my book: Beyond the Efficient Frontier.
It says you can “simply” jump over the Market Efficient Frontier to generate your excess return. Albeit, at a higher risk level, but also at a much higher return. It is a trader's choice to stay within the boundaries of Modern Portfolio Theory or go beyond.
@guy fleury Game theory is a nice framework, and has been readily applied to deep learning. However, like any model it will need to be optimized, and whenever we deal with optimization, we will have to accept the fact, that the number of adjustable parameters you can introduce will be limited by the amount of data available.
Leif Trulsson
Vladimir I think Peter Guenter has to answer this, but my guess (without being able to debug the code) it's because how returns_sample is created.
hist = self.History( self.SIGNALS + [self.MRKT] + self.FORPAIRS, 252, Resolution.Daily)['close'].unstack(level=0).dropna() hist_shift = hist.apply(lambda x: (x.shift(65) + x.shift(64) + x.shift(63) + x.shift(62) + x.shift( 61) + x.shift(60) + x.shift(59) + x.shift(58) + x.shift(57) + x.shift(56) + x.shift(55)) / 11) returns_sample = (hist / hist_shift - 1)
self.SIGNALS and self.FORPAIRS are set to: [self.GOLD, self.SLVA, self.UTIL, self.SHCU, self.RICU] and [self.PRDC, self.METL, self.NRES, self.DEBT, self.USDX] respectively, but the following code is used to calculate adjwaitdays:
self.WDadjvar = int( max(0.50 * self.WDadjvar, self.INI_WAIT_DAYS * max(1, np.where((returns_sample[self.GOLD].iloc[-1]>0) & (returns_sample[self.SLVA].iloc[-1]<0) & (returns_sample[self.SLVA].iloc[-2]>0), self.INI_WAIT_DAYS, 1), np.where((returns_sample[self.UTIL].iloc[-1]>0) & (returns_sample[self.INDU].iloc[-1]<0) & (returns_sample[self.INDU].iloc[-2]>0), self.INI_WAIT_DAYS, 1), np.where((returns_sample[self.SHCU].iloc[-1]>0) & (returns_sample[self.RICU].iloc[-1]<0) & (returns_sample[self.RICU].iloc[-2]>0), self.INI_WAIT_DAYS, 1) )) ) adjwaitdays = min(60, self.WDadjvar)
and here returns_sample[self.INDU].iloc[-1] and returns_sample[self.INDU].iloc[-2] is being used. So either we have to change the assignment in self.FORPAIRS from self.PRDC to self.INDU (they are the same) or use the reference to self.PRDC in the self.WDadjvar calculation.
But I have to admit, that I don't know how the QCAlgorithm works under the cover,
Leif Trulsson
Vladimir "Why adjwaitdays today is calculated based on price to 11 day moving average ratio 55 days ago?"
Again, this is something I think Peter Guenter has to answer.
Peter Guenther
@Menno: Great stuff, very much enjoying the discussion! I hope you don’t mind me continuing stirring a bit from the other side so that we can see what comes up.
@all:
Gedankenexperiment
1. Randomized returns test: A key implication is that one should minimize the number of free parameters. This is true by design, since the test ‘punishes’ strongly for the number of parameters, and also the number of parameter levels. The intensity of punishment definitely is quite distinct compared with other statistical tests. More parameters and levels give the ‘spurious relations detection’ step more permutations to play around with and then more chances to find a random setup that performs at par or better than the original strategy. My current understanding is that one finding can be enough to refute a strategy. Since the number of permutations is akin to an exponential function of the parameters and their levels (I think one of the previous testing posts comes down to about 4,728,618,000 permutations across the parameters and their levels assuming increments of one), it almost seems inevitable that 4 parameters will perform better than 5 ‒ 3 better than 4 ‒ 2 better than 3 ‒ 1 better than 2. I reckon 0 would be optimal.
2. Leave one out test: A key implication is that strategies that can draw on long times series should be preferred. Again, this is true by design, especially if we opt to leave one year out (instead of a percentage of the sample). Assume a strategy for which you have a decade or less worth of data (say you have a cryptocurrency businesses strategy or similar and these companies did not exist earlier). What percentage of the data do we shoot off (i.e., estimate using the rest of the sample)? About 10%. Now, when we have data from 1850 like in the Faber’s GTAA case, what percentage of the data do we shoot off? 0.6%. How stable will parameters be that are estimated on 99.4% of the original sample (rolling forward)? Extremely – at least they will seem like it by design. So, in this test, strategies with 5 years of data will tend to perform worse in terms of stability than those with data for 10 years, 20 years, 50 years, 100 years, 200 years.
Evaluation
Now, taking a step back, what are we looking at: It seems that the optimal strategy has close to zero parameters and has been around for a generation or ideally more (at least its underlying data have been around for a while, increasing the chances that someone, or many someones, have picked it up).
What do we want to do with this strategy? Ideally, beat the market.
What are the odds that we can beat the market with something that has close to zero parameters/inputs/assumptions and that, in the best case, our parents, grandparents, and grand-grandparents may have been trading on? Or put differently (not good at picture language but here’s a try), what are the odds that we could cut both our arms off, maybe one leg, only use artificial prosthetics tech that has been around for at least 100 years and successfully compete in the next swimming competition.
One could argue that the testing regime comes down to favoring not highly innovative, widely known strategies that will find it difficult to do well in markets in which data and expert knowledge become increasingly widely available. Noticed how Faber’s GTAA passed the test with flying colors; its returns (vs SPY) for the past ten years: 2011: 2.4% (vs 2.1%), 2012: 3.0% (vs 16%), 2013: 7.9% (vs 32.4%), 2014: 7.9% (vs 13.7%), 2015: -2.7% (vs 1.4%), 2016: -0.4% (12%), 2017: 12.5% (vs 21.8%), 2018: -2.1% (vs 4.4%), 2019: 6.2% (vs 31.5%), 2020: -0.7% (vs 12.8%) (although of course returns are not the only factor to judge a strategy on).
Now, taking another step back, what do we see: Validity tests have designs and parameters that we choose. These choices can be even more decisive than the parameters that we are setting in our trading strategies (assuming that we are not putting all out money on only one strategy). This is because the validity tests will weed out entire branches of potential trading strategies and favor others. In that sense, via the tests one is making meta-choices that are informed by deep rooted beliefs regarding what a valid trading strategy should look like in one’s own belief system. Related to the tests above, do we believe in parameter minimalism (to limit choice/gaming) or in rich data models (the world is complex requiring many parameters to model it)? Do we believe that true strategies exist that are universally valid over long time periods (almost ‘handed down’) for us to discover or do we believe that strategies inevitably will break down over a certain time period since things are in flux requiring recency in testing and constant strategy innovation?
In addition, would we need to test the validity of a validity test? How successful is the test in identifying trading strategies that perform well? How many strategies that the test identifies as sound turn out to perform poorly (false positives)? How many trading strategies that would have performed well were removed from the consideration set due to the test design and its parameter settings (false negatives)? What is the test’s net value compared with other possible tests?
Implications
This is not to say that the In & Out strategy is valid. It is to suggests that the validity tests that show that the strategy is invalid could, depending on one’s own belief system, themselves be considered invalid, making the test results invalid as a consequence. It is also not to say that out of sample tests are invalid. Quite the opposite, they are great and needed here. The strategy has some known soft patches, also see the Quantopian archive discussion. I could imagine that these are the required (insurance) costs to avoid larger drawdowns but they could also be indicators of a more systematic issue. When testing, I reckon that we might be looking at very different out of sample testing designs, depending on beliefs.
As always, we probably need to be careful with leveraged strategies since returns can be substantially smaller and drawdowns substantially larger than what backtests indicate. One could bet a bit but not the farm, as @Goldie Yalamanchi noted above and @Nathan Swenson and @Guy Fleury have indicated earlier. If we are going unleveraged in the SPY/bonds duo, I would not assume that the stakes are unbearably high, since the risk is that one is randomly (if the strategy has no substance) moving in and out of the SPY and bonds, resulting in a mixed bonds/SPY return which may still be able to beat Faber’s GTAA (or not) :)
Vladimir
Leif Trulsson
I asked you because this is your words:
The strength of the algorithm per se does not lay in the data itself, but in how the data
is handled, and in particular this part:...
To my mind that part of algorithm is the the weakest.
I may show you the results of algorithm that does not have that part, has twise less
variables, simple calculations but very close metrics.
"Intersection of ROC comparison using OUT_DAY approach"
Vladimir
Leif Trulsson
Here is metrics:
Total Trades
185
Average Win
4.26%
Average Loss
-0.51%
Compounding Annual Return
25.692%
Drawdown
13.800%
Expectancy
6.567
Net Profit
1808.416%
Sharpe Ratio
1.571
PSR
95.487%
Loss Rate
18%
Win Rate
82%
Profit-Loss Ratio
8.28
Alpha
0.217
Beta
0.016
Annual Standard Deviation
0.139
Annual Variance
0.019
Information Ratio
0.531
Tracking Error
0.231
Treynor Ratio
13.511
Total Fees
$2367.16
Leif Trulsson
Vladimir "I may show you the results of algorithm that does not have that part, has twise less variables, simple calculations but very close metrics."
That would be very interesting, and I would very much like to try that code/strategy.
@Peter Guenther Here's the thing though, when it comes to validation tests, and theoretical frameworks in general from which these validation tests are derived, they are generally very structured to the point, that invalidating one or more elements of it, won't make the entire structure collapse. It's therefore not surprising, that most models that use different approaches to solve a certain set of problems generally give a very similar outcome. For example in the vast majority of cases it won't really matter which classification algorithm you use, because even if the theoretical approach is different they appear to share certain commonalities, common "thruths" that will ensure the outcomes of each algorithm will be similar, and logical. If we view the theoretical framework as a meta-algorithm, and the different approach or methodology as a set of meta-parameters, the meta-algorithm is highly robust to adjustments of the meta-parameters, and more importantly their behaviour is quite often pretty well understood, where we can even predict for which situation a certain approach is better suited.
Now when it comes to a strategy like in and out, it doesn't behave theoretical framework, nor has it been designed as such. It's more like a fairly loose collection of rules to achieve a certain objective, which frequently tends to be approached with a tunnel vision. Unlike the theoretical frameworks discussed above, such a loose collection of rules is extremely sensitive to adjustments of the parameters. So, for me it makes little sense to try to argue against the validity of theoretical tests, because while no one will argue these tests are flawless, they are at least based on sound theoretical frameworks, and more importantly their general validity have been proven over and over again for a wide variety of problems in a wide variety of scientific fields.
Finally, I would argue we should approach our strategies as a theoretical framework, rather than a loose collection of opportunistic rules. Trading rules should fit into this framework, and better yet follow logically from the theory. This is the way we approach strategy development. Don't build a car (the strategy), but build a factory for making cars. You can try to control the quality of each of your products, but it is better to control the quality of the process that produces the products.
Ernest Shaggleford
Menno Dreischor
re: "I would argue we should approach our strategies as a theoretical framework, rather than a loose collection of opportunistic rules. "
As financial markets are classified as a second-order chaotic system, where live trading on a prediction can potentially alter the prediction depending on liquidity, order flow and the reaction of other market participants, does the theoretical framework take this into account ?
Guy Fleury
@Menno, the chart in my prior post was from the outcome of randomly generated portfolios. It is based on 4,000,000 randomly generated stock price series trying to simulate 20 years of price data distributed in 20,000 portfolios of 200 stocks each. Each price series following a classic stochastic equation (drift + Wiener process). I did a lot of those simulations with a lot of variations and picked a chart that represented something close to a somewhat average outcome.
My point at the time the chart was made was to demonstrate that even using randomly generated price data, you could have a serious advantage over aiming for a market efficient portfolio residing on a Markowitz efficient frontier.
Even with randomly generated price series you could jump over the efficient frontier. For sure, you had to go for higher risk (as illustrated), but not by that much. The simulations should have had a risk-return curve correlated to the CML line, if not on the CML due to the leveraging. But with no surprise (it was expected from prior research), the simulations produced the upside green curve. It meant that for the same risk level you could get higher incremental returns, and even higher returns for less risk. It is a remarkable chart unexplained by conventional MPT.
Evidently, no fundamental or technical indicator applied and none was used. Whatever other relevant data you might have had would have been purely coincidental if predictive in any way. No machine learning, deep learning, or whatever could help you in predicting those 4,000,000 price series. To make the price series more realistic I even added random outliers.
You could run the test again and again, and everything would change, every single series in every portfolio. Nonetheless, even if you did this, the results would come close to previous tests, never the same, evidently, but still relatively close.
The idea was that if the trading procedures had value they should show it even if using randomly generated price series. And the program structure repeatedly passed that test.
The next step was to test the whole thing using historical data, real-life stock prices. That resulted in another book: Reengineering Your Stock Portfolio where actual stock prices are used to demonstrate the same thing. The program used the CVXOPT optimizer (familiar to Quantopian users) to do the trading. And there too, based on 17 years of market data, you could jump over the efficient frontier as if it was just a line in the sand, a kind of self-imposed barrier.
I agree with Peter's, Vladimir's and your arguments as I had to go through all of them myself. In many ways considered about everyone else's point of view. We could all make points for and against this strategy. And yet, it can still be improved upon. We could also have the same kind of arguments for about any other strategy. But the argument here should be: can we forgo what we have learned in MPT in order to jump over the efficient frontier? My point in all this is: it is possible, it is feasible, it is doable, and it is reasonable. Albeit at a higher risk level, but there, the added risk might be worth it. It is all a matter of choices.
@Guy Fleury Did you carry out your simulations with a volatility also taken from a random distribution?
Tentor Testivis
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!