Hi QC, I want to share some work I am doing using LSTM and RL back to the community, as I have seen quite a bit of great posts and comments that are helping me. Please let me know if you have any questions, suggested improvements or general feedback about the attached.

I'm using a LSTM to predict price and volume. Then using those predictions to train an RL agent.

The LSTM is used to predict a price and volume but uses the feature of dropout in Keras to provide a probability distribution of the prediction.
(Monte Carlo Dropout - great video on this -> https://youtu.be/eHT0raFtl1Q )

Price and Volume are multiplied for an interaction effect and to keep the network simpler. In the future I plan to add in additional interaction effects around options chains/greeks so that these can be predicted and simulated against actuals.

The probability distribution produced by the LSTM prediction is used as inputs to the RL environment with some additional abstraction done. This is appended after each prediction so the RL agent must trade profitably over a longer and longer time period.

The gym environment for the RL agent is still in beginning stages. I have used code from github for the gym and plan to customize this code to have the ability to trade different options styles. So instead of just 0 and 1 it would have a 2, a 3, a 4, etc, which may include different options trading logic like a covered call or put logic that it could test and learn on simulated data against.

Issues that I am still working through:
Simmed data has stochasticity so on additional backtests it won't reproduce results without setting a seed, but improved sample coverage would be a better option through scalability of the codebase instead of using seed.

RL trade logic should be improved in many ways. I would like to first test adding in options and greeks. It would be interesting to also tie in the rewards for the RL agent to work with existing QC data like portfolio. Also RL logic should be able to return continuous values if you want it to decide how much to buy.

Simmed data is still doing some simple things that could be improved, it just rolls die from the predicted distribution to get close and open, but certainly many important factors act upon open and close values, so would need to understand and include those values in the prediction to be able to better simulate open/close values. But right now price and volume are being simmed in a strong manner for the RL agent. Also blending of the data is just averaged between some random selections to protect real data from overfitting, but this could be improved and tested on.

Scalability needs to continue to improve. LSTM can go for about 6 months, before weights become larger then my backtest machine memory. With the addition of the RL agent I am also using additional memory. Scalability needs to improve to make it a more viable strategy. This could be solved by moving to custom hardware but for now I will look to cleanup and improve the codebase.

Buy and sell logic is still basic and just for debugging, but could go many routes either emit insights back to a portfolio manager. Or continue down custom trade logic but write it to better utilize capital on table and spread risk e.g. TSLA at 1% of available.

Paper trade I am still debugging this to get it to paper trade, so likely some bugs in this as well, but to start with I need to make sure it does warmup and can begin trading immediately so that start/stop of state can happen without issues.

Benchmarks haven't been set or compared against. And generalization of framework against other stocks sets and selection methods I'm just starting to test.

Other minor bugs like not trading last stock in list.

If you are testing it make sure to clear out the cached models and datasets before next test.

Author