This thread is meant to continue the development of the In & Out strategy started on Quantopian. The first challenge for us will probalbly be to translate our ideas to QC code.
I'll start by attaching the version Bob Bob kindly translated on Vladimir's request.
Vladimir:
About your key error, did you also initialize UUP like this?
self.UUP = self.AddEquity('UUP', res).Symbol
Guy Fleury
@Abbi,
The origin of 'AlphaPowerTrading' comes from a tribute to the 'alpha' as defined by Jensen in the late '60s as the excess return over and above the average market return. However, his paper concluded that the average alpha was negative (by about -1.7%) meaning that professional money manages, on average, did not even cover trading expenses.
A lot of fund managers, at the time, did not like his conclusions since it was putting little value on their money management skills. And from his premises, we saw the emergence of index funds. A 'if you cannot beat them, at least, join them' kind of mentality. Trillions are now managed that way.
The 'power' part comes from the alpha's compounding impact over time as illustrated in the following formula: F(t) = F_0 ∙ (1 + r_m + α – exp)^twhere the 'alpha' is added to the average market return r_m. An 'expense' component was added to represent the operating costs of gaining that added return. It is easy to see that if the alpha is greater than the expenses incurred: α > |exp|, it would improve performance over the long term, even for a small positive alpha value. The impact gets greater as time increases. To get the higher alpha requires a probabilistic edge, some sustainable skills, or better long-term methods of play.
The Jensen alpha is different from the 'alpha streams' described and used in QC where any source of profit becomes an 'alpha stream' even if it turns out that a portfolio might produce less than the average market return. Jensen's idea was simple: if there are management skills, it will show in a positive alpha α > |exp|. The average market return was easily obtainable using mutual or managed funds. So, r_m was a trivial component of the equation since it was available with practically no effort or skills.
I like Mr. Buffett's methods of play where compounding and time are put to the forefront. He has managed an 'alpha' of about 10 points over his 54 years or so on the job with a CAGR close to 20% (10% coming from the average market return and 10% from his alpha skills after trading expenses).
So, for me, he is the benchmark. If over the long term your trading strategy cannot outperform Mr. Buffett's investment methods, you technically missed the boat since there was an easy solution available that would have outperformed your own trading methods. A long-term structured trading plan is what is needed, and backtesting can help you find it. At least, that is how I use this strategy and many others.
You might want to read my 2007 paper on the subject. It is an old paper but still relevant. It is titled appropriately: Alpha Power
Todd
There has been discussions across this and other threads regarding bonds (TLT / SHY) and what we don't know about the future. I didn't want to side track the focus of this thread so I created a new thread and posted a simple tactical bond strategy for the community that pulls from bond ETFs across maturities, credit quality and yield. It's a simple and automated way to play the yield curve.
Peter Guenther
Welcome to the discussion, Todd. Fantastic work on this bond strategy and thanks a lot for sharing it with the community! You are absolutely right, there is some debate in this thread and the “Amazing returns = …” thread regarding that the ‘out’ side of the in & out strategy has been relatively neglected and that there might be room for improvement/optimization. So, your strategy is definitely timely. Some comments recommend to also look beyond bonds and consider additional alternative assets (e.g., gold), so there might even be room for an ‘alternative asset rotator’ which then could be combined with / plugged into the in & out algo.
Han Ruobin
Hi Peter Guenther thanks so much for starting this thread. I've briefly looked through the historical discussion from Quantopian times, and also at the newer updates. I have a small point to make about the use of the one-percentile for determining the threshold values for extreme_b. You used the 1% statistical significance to explain why the bottom one percentile was taken to be the threshold. However, if that were the case, I think a normal distribution should have been assumed, and the threshold taken to be -2.58 standard deviations (z-score threshold for a one-tail test requiring 1% confidence-level) away from the mean of the rolling window. I performed a backtest replacing line 225:
with the following:
I've attached a backtest (code taken from the algo you posted on Jan 2021 on the Amazing returns = … thread), but there doesn't seem to be a huge difference. Nonetheless I think there is a slight misuse of the statistical confidence concept that I wish to clarify. Or I could have misunderstood why the one-percentile was used, in which case I would also like to clarify :)
I also would like to ask if there is any reason why the initial ‘Debt’ and ‘Tips’ (obtained when self.dcount ==0) are always used in comparison with the current ‘Debt’ and ‘Tips’ to obtain the median, which in turn will be used to determine if there is any inflation at the current moment. It seems more intuitive to me to use compare the current prices with prices perhaps a year or two ago. Considering that the algo runs from 2008 to present, I would not expect prices from 10 years ago to be relevant to making trading decisions in the present. I also do not understand why there was division involved, so I hope I could get some clarification on this.
Thanks so much for sharing this with the community! I think it's the first time I've seen an algo that makes use of price signals from a variety of sources to make trading decisions, so it's very interesting!
Peter Guenther
Thanks for sharing these observations, Han Ruobin!
Valid idea regarding changing to a ‘mean minus x-times standard deviation’ logic.
In terms of your question concerning misuse, or different use, of stats concepts: Using the 1% extreme from the observed returns sample can have the advantage that we do not need to make any assumption about the distribution of the underlying returns. In contrast, when using the ‘mean – 2.58*sd’ we need to assume that stock returns are normally distributed which does not always hold. For instance, see ‘fat tails’ in returns distributions. Using the 1% extreme of the observed returns takes the returns distribution as is and does not require any distributional assumptions. Not sure whether the following is a completely sound comparison, but if you are into statistics, you could compare typical significance testing (estimate/sd) which is usually based on a normal distribution assumption vs significance testing based on bootstrapping which, similar to the In & Out algo approach, is based on the empirical distribution of the observed data (i.e., also holds for non-normal data).
In terms of ‘Debt’ and ‘Tips’, annual resets are also a valid idea and worth a try. If the backtest starts not earlier than Jan 2012, the preference may be to use RINF (see an earlier algo version) since it directly measures inflation. Regarding why to divide by the base level, this is to calculate a return (relative to the base level). For example, if the base level is $100 and, after a year, the ETF lists at $110, the calculation yields 1.1 (i.e. 10% above base level). Expressing as returns helps making ETFs with different price levels comparable. To measure inflation, we subtract the ‘Tips’ return (bond yield without inflation) from the ‘Debt’ return (bond yield plus inflation). The reason for this calculation is that when inflation expectations are increasing in the market, then bond yields increase while TIPS do not. Regarding using the 2008 base level: Both the ‘Debt’ underlying (SHY) and the ‘Tips’ underlying (TIP) do not increase extensively over time, so that using the 2008 base level vs annual resets might be valid to some degree. Anyway, it’s definitely worth a try to use annual resets.
Good observations, keep them coming and let me know if things don’t add up and how your tests played out!
SHY
TIP
Han Ruobin
Hi Peter Guenther thank you for your replies!
I have downloaded some data for backtesting on my PC instead of using the QC engines. In my backtest, I decided to go with my definition for short-term inflation using SHY-TIPS but only looking at the 187 most recent returns. I also assumed no slippage and trading fees (not very realistic oops, but I don't think those would be a big problem when performing so few trades and trading very liquid ETFs).
I decided to investigate how the width of the rolling window of prices (in the original algo this is 11) and the lookback period (in the original algo this is 55? I think) used to determine returns (returns=hist/hist_shift) affects the performance of the in-out strat. I compared it against just holding TQQQ from the same starting period (sometime in January 2010), and I have attached an image of the results. I would attach the file if anyone can tell me how I can do it.
The appearance of a band is rather interesting, and hopefully this could be helpful in dealing with the issue of having fixed constants (why was a window size of 11 lagged by 55 days used to determine returns?). It seems to me that in general rule for the band would be k_min<window width + lookback<k_max. I suppose strictly speaking the lookback period is also partially determined by the size of the window (in the above data set, the lookback period is defined as the number of trading days between the most recent end of the window and current day), so maybe by presenting the data as such I might get a clearer picture of what is happening. I'll update it here sometime later. I'll explore to see if the waiting time would affect the band or not.
What I think would be helpful / interesting would be an explanation about why there is such a band. My intuitive explanation for why returns for window_width+lookback > k_max is disappointing is because the algo looks too far back (window_width also affects how far back the algo looks) and data in the past would have become irrelevant. I do not yet have an explanation for why window_width+lookback < k_min could also yield disappointing results. I don't think it is noise because if it were, I should be seeing a lot more greens at wider windows.
Peter Guenther
Really great work there, Han Ruobin! Nice illustration using the green coloured returns. Since this strategy tries to time the market, I reckon that there will always be a ‘perfect timing’ in terms of lookback/window (and other) settings. The green diagonal band is indeed quite interesting and somewhat reassuring, since it shows that there is not only one perfect lookback/window combo, but there seem to be several combos that work similarly well. This is definitely a new interesting insight. Also, great regarding formalizing the diagonal band. Yes, it’s interesting to mused about this a bit more. One could argue that, by the end of the day, we need to select a specific combo from the band—so then, a selection criterion would be needed to determine this selection. Alternatively, you could select multiple (or all, if the calculation effort is worth it) combos from the band (according to your formula for the band) and use the results from these combos to determine whether to go ‘out’ of the market or stay ‘in’, e.g. as a majority call based on the different results from the band combos.
Regarding percentiles and distribution assumptions: Hmm ... I would argue that percentiles have no distribution assumption, not explicit nor implicit, because the, say, 1% extreme observations are the 1% extreme observations no matter what the returns distribution looks like. They are the 1% most extreme observations on the left-hand side, always, no matter whether the returns follow a normal distribution, Chi2, Poisson, uniform distribution, or any other distribution. In contrast, when we work with -2.58*sd, we only really know what we will be getting if the returns are normally distributed. For any other distribution, -2.58*sd could be anything, really. For a normal distribution, it means the 1% most extreme (smallest) observations, but for a Chi2 distribution, it means something else, and again it means something different for a uniform distribution etc.
Regarding TIPS and SHY: I think you are right that this should be the other way around, but for a different reason. As I understand it, inflation expectation changes do not move TIPS since these have an inflation protection guarantee. In contrast, changes in inflation expectations move SHY. Specifically, and this is the reason why the calculation needs to be reversed, SHY (i.e. bond prices) goes down as inflation expectations go up since investors demand additional returns to be compensated for inflation. So, let’s say inflation expectations increase by 1%, then SHY (i.e., bond prices) should drop by 1% (i.e., the bond yield increases by 1% to compensate for inflation) while TIPS should stay constant, meaning that more negative values in (SHY-TIPS) indicate higher inflation expectations. So, when we check for above-median inflation expectations, we should either check for (SHY-TIPS) below its historic median or, equivalently, check for whether the reverse-coded difference (i.e., -(SHY-TIPS) = (TIPS-SHY)) is above its historic median. Thus, since we currently work with an above-median type of check in the algo, the easiest fix seems to be to calculate the return difference reverse-coded via TIPS-SHY instead of SHY-TIPS.
Chak
Actually, percentile has an implicit assumption of a normal distribution. If you want to customize your own distribution, then you need to use the first, second, third, etc. moments to obtain non-normal distributions.
Peter Guenther
No, they do not, Chak 😊 You can use them on a normal distribution, yes, but they mean the same thing, no matter what the distribution is. For example the 1% percentile means the 1% left-most extreme observations … this is true in any distribution, not only a normal distribution.
Chak
https://numpy.org/doc/1.20/reference/generated/numpy.percentile.html
Chak
Moving forward, I'm suggesting that perhaps using np.percentile has some limitations to finding accurate outliers.
Strongs
Hi Chak , now I don't know exactly what data you are talking about, I know how it works in & out with the percentile. But from what I understand from the discussion, taking a certain percentile level doesn't change the type of distribution. The random variable taken into consideration are equity returns, and it is the random variable itself that assumes a certain type of distribution, which usually in the financial markets tends to be a T-student. By placing yourself on the most extreme percentile, you are only observing what happens in the distribution queues, which is the least likely event.
Peter Guenther
Chak, thanks for the link. I am not sure where this suggests that percentiles assume an underlying normal distribution. They do not. We should not confuse a ‘normalized ranking’ (mentioned in the link) with a normal distribution. These are very different things. To determine percentiles we just rank order the observed values (e.g. historical returns) to determine the percentile of a given observation. What the algo uses this for is to check whether a given, current return is falling into the 1% most extreme (left hand side) observed returns. If it is, this creates an ‘out’ signal. The distribution of the historical returns does not need to be of any particular distribution type. This approach works with all kinds of distributions – the distribution simply does not matter for the approach to work. Hope this helps / clarifies.
Chak
I have read and followed this strategy since it was introduced on quantconnect 10 months ago. While my academic training lies in physics, neuroscience, statistics, and behavioral economics, I didn't really understand enough of the market's implicit undertones to suggest anything. Now, I believe I'm somewhat comfortable enough with this topic to begin to really talk about the limitation of each strategy's variation.
My point in bringing up the limitations of setting a hard rule, in this case 1%, is that it causes the strategy to exit out too late or too soon, and subsequently, miscalculates the suggested n number of wait days. When you take a combinatorial approach to the number of out signals, GLD>SLV, XLI>XLU, DBB>UUP, the chances of getting an incorrect signal multiplies. Sure, bonds go up when one/two of the signals are TRUE, but the market remains “bullish” as well. What one needs is a malleable cutoff point.
As a corollary example, the PE ratio of growth stocks post covid market crash is 25+, while precovid it's an entirely different number. Further, the PE ratio for a growth stock fluctuates across time and perhaps, in some cases, might not even be a leading indicator at all!
Han Ruobin
Thanks Peter Guenther! I agree with you on the effects of inflation expectations on the prices of SHY and TIP. I believe I mistakenly analysed the effects of inflation that has already happened on the prices of SHY and TIP. Thank you for the clarification:)
Regarding the use of the 1% threshold, I agree that this can be applied to any type of probability distributions. However, this ‘1% of all values’ do not necessarily, and I believe probably would never correspond directly to, a ‘99% confidence interval’ in the typical statistics definition unless the distribution was rectangular like I described earlier. I have attached a graph that I plotted on python to investigate the difference between using the 2.58 std as a threshold for the 99% confidence interval, and using just the bottom 1% as the threshold. The results are strikingly different.
On this note, I think I want to ask if the assumption that I have had about the use of the 1% threshold is right, which is that it is motivated by the requirement of 99% confidence that the observation is above the mean.
Chak About your point on the hard cutoff, while I agree that a 1% threshold, and any hardcoded number, would have a certain rigidity to them, they already do have some degree of malleability to them. A more rigid criteria would perhaps be a rule that is like ‘if day-to-day returns are below x, out-signal generated’. This is different from a rule like ‘if day-to-day returns are within the bottom 1% of returns in the most recent 187 days, out-signal generated'. The 1% threshold used in tandem with the moving window allows this signal-generating threshold to change based on the most recent ‘state’ of the market (defined in the algo as the most recent half a year? I think), and I see this as a form of malleability. Using your example of PE ratios, while selecting stocks based on a fixed PE would yield very different results pre and post covid, if we selected the top / bottom 25 stocks by PE, we would not expect this stock selection to differ very much pre and post covid. the idea of using the most extreme 25 stocks is the same as using the most extreme 1% of returns, I think.
Of course, this does not address the point you made about indicators remaining as leading indicators in the long run. I think this would remain a rather difficult problem to solve. However, considering that there is a good deal of economics theory backing the algorithm (e.g. the use of diverging SHY and TIP returns to forecast inflation expectations and thus the impact on whatever ETF we are trading), I do not see the loss of predictive power of the indicators any time soon, if they have any currently. I myself am also still in the midst of evaluating this.
Guy Fleury
The stocks are sorted by returns. So, if you take 100 stocks sorted in this manner, the 1% will take the one stock at the end of the sample, the highest or lowest, no matter what the distribution may be. This last one, or first one, could be 5, even 10+ standard deviations from the mean (an outlier).
Peter Guenther
Nice work there again, Han Ruobin, and valid point, Guy Fleury!
The missing puzzle piece might be the following: We have to be careful that we do not equate a ‘99% confidence interval’ with a ‘99% confidence interval for a normally distributed variable’. The -2.58 std that you are comparing against in your chart is the cut-off value (1% bottom) for normally distributed variables. It is important to note that this cut-off value does not give us the 99% confidence interval cut-off for just any variable, but only for normally distributed variables. Now, stock returns are often not normally distributed, especially in small samples. When an underlying variable does not follow a normal distribution, then resampling and using the 1% percentile will give you an appropriate estimate for the 99% confidence level cut-off value. Therefore, the distances displayed in your chart are a result of the ‘99% confidence interval for a normally distributed variable’ not applying in this context, since the sampled returns are not normally distributed, so that using -2.58 sd creates an error/deviation. That is, the distances are not created by the 1% percentile deviating, but by the -2.58 sd deviating since it assumes a normal distribution that does not show neatly in the data.
Not sure if you have a background or interest in statistics, but to find an equivalent for this point, one example that comes to mind is the test for mediating effects in statistics (variable A affects variable B which affects variable C). Call the effects a (A on B) and b (B on C). The total mediation effect (A on C) can be calculated as a * b. This total effect does not follow a normal distribution (due to the multiplication), irrespective of the fact that the effect components (i.e., a and b) may be normally distributed. To test the statistical significance of this effect (or to create a confidence interval), we cannot use the logics we would use for normally distributed variables (e.g., t = effect / sd) since the assumption of a normal distribution does not hold. Instead, we can use bootstrapping (random draws from the original sample) to create observations of the total mediation effect in many different scenarios and, thereby, generate a distribution. To determine the ‘99% confidence interval’, we would look at the cut-off level provided by the bottom 1% percentile (similar to what the algo does) of the bootstrapped distribution. Again, we cannot use -2.58 sd in this context because the assumption of a normal distribution does not hold (as can be the case for real-life stock returns, especially in small samples). For more details, e.g. see the description and online tool for total mediating effects here: http://quantpsy.org/medmc/medmc.htm
Good reply regarding Chak’s note, reminding us that a 1% percentile is not a hard cut-off, since it is responsive to the actual data that is being sampled. I agree.
Guy Fleury
Trivia: 1% of 10 stocks is zero. 1% of 100 stocks is one. 1% of 199 stocks is one. 1% of 299 stocks is 2. If you trade less than 100 stocks, use a top or bottom function instead of the percentile function to at least have one stock selected. It is nonsensical to use the 1-percentile function if the number of stocks traded is less than 100...
We do not have fractional stocks. The number of stocks traded is an integer. There is no such thing as 0.6 Apple or any other stock for that matter. We have started to have fractional shares, but that applies only to the number of shares bought or sold on this integer stock.
The greater the number of stocks traded, the more the >|2.58σ| will contain some stocks. The more the portfolio has a large number of stocks, the more the distribution will approach a normal distribution, even if the distribution is not normal.
We cannot predict with any accuracy what will happen to 100 stocks tomorrow, and yet, we want to cut hairs to the 100th. All that is very fuzzy, and any approximation will not make that much of a difference on a 10-year trading strategy.
Peter Guenther
Thanks for sharing, Guy Fleury.
Maybe a thought to add, only a nuance really: The 1% percentile is used to find extreme returns within a given instrument (e.g., say, for the ETF tracking metals). That is, it is a time-series view (e.g., 100 historic returns) instead of a cross-sectional view (e.g., 100 stocks). The 1% percentile tracks whether a given current return is ‘extreme’, i.e. sits within the 1% percentile of the sampled returns (including the current return).
Your point is generally correct that we need to ensure that we have at least 100 returns in the sample, which is constantly fulfilled in the algo considering the lookback of 252 trading days. Now, it seems that the algo would still work if we forget to pay attention to this technical requirement and reduce the lookback period below 100 observations: numpy.nanpercentile() even gives us the most extreme observation when we specify a percentile of 1% and the sample includes fewer than 100 observations. If you have python at hand (e.g. via Research) and want to give it a try, the following code can be used:
import numpy as np
test = np.array(range(1,51))
np.nanpercentile(test, 1)
The test sample includes 50 observations, from 1 to 50. We request the 1% percentile. It gives us 1.49 as the output, meaning that the most extreme (smallest) observation (i.e., 1) would be categorized as ‘extreme’ since it is smaller than the percentile cut-off value.
Therefore, we would still be within our conceptual logic of checking for extreme returns/observations, although of course a case can be made that the sample size should be made more consistent with the percentile that we are checking for.
Axist
Curious with all the work that has been done on these, how popular are these when it comes to live trading? Or combining this strategy with another.
Tentor Testivis
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!