Hi Everyone,

About two years ago I spent quite a bit of time learning sklearn and QC.  I invested quite a bit of time into this algorithm before finally giving up on ML for other methods.  That said, it was very enjoyable to create and I think there is some code in here that could be very useful to others.

Some thoughts
1. We use ExtraTreesClassifiers for the model.  In my experience they provided the best results.
2. The models are very sensitive to the random seed. So we create 10 models per symbols with different random seeds and score the models.
3. We use walk forward validation for the testing.
4. I struggled mightily with the scoring of the models.  I tried many methods (you'll see some big blocks of commented code)
5. It's also very sensitive to the features used.
6. There is a very useful features helper class.  It allows you to easily append features by submitting a list such as ['EMA_7','EMA_7_28','EMA_7_28_diff'].  These are the EMA_7, the ratio of EMA_7 to EMA_28, and the diff of the previous and current value of the ratio of EMA_7 and EMA_28.  You can use almost an indicator in TA Lib with it.
7. The code is somewhat messy and I see some places that need to be refactored.
8. The current trading framework is very basic.  We train the models every 3 months and we predict the direction of the equities for the coming week.

I do not think ML is the best way to build an algo, but I do think it's the most fun!  I would love to be proved wrong.  If anyone has ideas for improvements I'm glad to make those changes and see what we can do.

Author