Back

Continuous Deep Reinforcement Learning on QC

An attempted at implementing and training a TD3 DDPG on QuantConnect.  Currently, I am debating re-coding the whole project and implementing new methods of training.  Just wanted to share and see if anyone had success implementing DeepQ learning or similar RL codes on QC.  Its fun to try new features and see what works.

Update Backtest








0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Hi Joe, 

That is actually the next thing I wanted to implement here on QC. At the moment, I am going through a few books/references on DRL, but I should be coding up something in a few weeks. I will get back to this post when I have some results to discuss. In the meantime, thank you for sharing the code :)

Best wishes,

Lorenzo

1

Awesome! Thanks.

0

Hi Joe

Have you tried paper trading on QC hardware? The limitations I've found on this platform relate to the training time when deployed as it's limited. I've still not been able to get LEAN running well with python and DRL.

 

Interesting algo, well done on it's completion and for sharing so we can learn. 

0

Awesome share. Reinforcement learning seems interesting - any recommended books or resources on the theory?

On Ryan's point, I've hit some computational bottlenecks as well with deep learning nets on QC due to the time limit for training during backtests. Paper trading/live trading is fine since an hour/week is enough for training on a weekly or incremental basis, but during backtests an hour through 3-5 years of data just takes too long. Only way I could work around this was by splitting up backtests into small 6 month chunks, and doing some bookkeeping  via the ObjectStore by initializing states from the last chunk.

0

Ryan McMullan 

I am unsure how to save my Pytorch model to an ObjectStore which is needed if we want to save the trained model for live trading.  I did add the ability to save the replay buffer to an ObjectStore instance so in theory we can continue to train on new data.

 

Adam W 

Thanks, I agree there are a lot of computational bottlenecks.  We need a speedy GPU for training.  Have you figured out how to save a Pytorch model to the Objectstore? 

1

It's unlikely that any external model formats can be directly saved to the ObjectStore (maybe security reasons?), but perhaps you can save the relevant aspects of the model and serialize that into a compatible format. 

I'm not very familiar with reinforcement learning so can't comment much on the specifics here, but for instance a deep neural network can be characterized entirely by its architecture, layer weights, and internal states. To "save" my models to the ObjectStore, I extract the weights/states, serialize them into JSON, dump into ObjectStore, then "load" models by re-building the model with the pre-trained weights/states in the same architecture.  Perhaps a similar methodology could work here.

1

I'll see what we can do to make it possible.

1

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


All disregard if this is an unintelligent comment. I know a tiny bit on RL, first I've come into real contact with TD3 but very interesting after doing a bit of reading up. 

As more of a thought provoking suggestion and something to complicate it more, but possibly make it more advanced as far as your sediment tokenization. In the Tiingo News and Sediment.py section, what's to keep you from also incorporating a NLP Transformer like Bert or mid weight GPT-2? It could add more words  I've not seen it done with pytorch, but assume TF to pytorch would be possible. Would be able to word score with Tiingo sentiment engine? Just theories here, sadly not much help on the construction front. 

0

Brandon Schleter You can add a transformer and incorporate it into the network as an input. Or we can just wait for GPT-4 and just ask it what the best stocks to buy. lol

All kidding aside, I just took the sediment score and divided by a large number for an input. I wish I had more time to test things like you mentioned

0

 Joe Bastulli good one. I even feel like GPT-2, even mid tier level would still be computationally expensive layered into this, would likely run something seperate. GPT-4 will just read my Ikea manual, put together my furniture, and self complete algorithms. 

No worries, and while I'm a novice, I'll try a bit of things in relation to this, to try to use it for sentiment additions. 

1

Hello Joe Bastulli , i would like to know in which part of the project the permitted action space of the agent is declared and where the agent is doing the trade. Also if the action space is continous since TD3 policy gradient only works for continous action  space. Thanks.

0

Update Backtest





0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Loading...

This discussion is closed