As a fan of machine learning for trading I am always on the lookout for tricks and ways to improve the performance of my algorithms. In this regard I have been playing for a while with the concept of Meta-Labeling in the hope of squeezing additional returns out of my trading models.

uMpYeVKVoOrIrujEWNRNAppCuC15-YQ-KTWCSPO8SFVwl5JJju4pYvHJ-13uLOULkESHaLcbVOPri_tFylszH9SEZuL0msmrWpnp6CYcZ6CnjB2CY-jSIYr85gAm_25HDJBKq8tcqC6CEERtui7y-Wg-wx0Znyz4HUgak78zUIpjit-V4O3zcn6ccBAvTA

High level structure of a meta-labeling algorithm

By Meta-Labeling I refer to a technique introduced by Marcos Lopez de Prado in his book Advances in Financial Machine Learning to systematically address the issue of sizing the position of trades, or signals, generated by another model (the “main” one).  At a high level the technique works as follow:

  1. Have a main model generating the signal or direction of a trade. This model can be a ML model or a discretionary one;
  2. Train a secondary machine learning model, the meta model, that takes the main model features (if present), its predictions and is trained to predict whether the first main model was correct or not
  3. Feed the main model predictions in the meta model so that you can estimate the likelihood of the main model (the signal) to be correct and use this secondary prediction to decide whether to trade or not and size the position to be taken.

The idea is very interesting and I believe has some specific fields of application but, I argue, it is not a panacea that is able to magically improve the performance of a machine learning model that has access to the same data. For instance, in this article by Hudson & Thames (which I follow and generally like) they argue that Meta-Labeling is a general purpose technique that can improve the performance of Machine Learning models.

What I argue, according to my experience and to pure logic, is that this technique can only improve the performance of existing discretionary trading models but cannot improve the performance of a main Machine Learning model trained end-to-end on the same data. 

I outline here the reasons and will also attach a simple experiment proving my point:

  1. To improve the performance of the main ML model the meta model should be able, somehow, to extract more information from the existing features than the main model does. There is no logical reason for which the meta model should find more information in the data than the main one;
  2. If cascading a meta-model after a main one was actually able to improve the overall trading performance then there is no reason to stop there; we could add a meta-meta-model trained on the meta-model predictions and continue like this ad-infinitum;
  3. Having a meta model correctly sizing the trades, or signals, is as difficult as having a model generating the right signals. If you imagine having a naive buy-and-hold main model, i.e. signal always equal to 1, then the burden of the performance would be completely on the meta-model which would need to decide if and how much to buy. I wish I had such a meta-model to tell me how much exposure to take in SPY!

I attach here a simple algorithm using machine learning on simple technical features to either:

  • Train a single, main-model, end-to-end and use the predicted probability of a profitable trade as the position size (parameter “use_meta” equal to 0)
  • Train a main model on the trade direction and use a second, meta-model prediction to size the position to be taken (parameter “use_meta” equal to 1)
17196_1673452919.jpg

Comparison of algorithm with and without Meta-Labelling

I ran several simulations through a grid search varying only the parameter “use_meta” and you can see that, while the range of performance widens, the meta-model is not able to squeeze additional returns compared with a single end-to-end model (average sharpe ratio is lower).

17196_1673452584.jpg

Grid search with use_meta 0/1 over multiple seeds

To conclude I believe the merits of this technique are highlighting the importance of sizing properly a bet and providing a way to do that for existing discretionary models but I argue that it cannot increase the performance of a Machine Learning model trained end-to-end.

If you experimented with the same topic and you have similar or opposing views on the topic I’d be happy to know more!

Francescowww.beawai.com