Back

Using machine learning in backtesting

Hello,

I have trained a sklearn svm regressor in the Research Environment.

Now I want to use this trained regressor in backtesting. How can I do it?

Thanks

Daniele

Update Backtest







0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Did you ever figure this out?

0

Sorry at this time it's not possible to save objects and retrieve those objects. It's a feature we've designed out and are almost done implementing but it is not ready quite yet sorry!

0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Thanks for quick reply! I'm new to QuantConnect -- is it possible to use train and use an ML algorithm within the standard algorithm framework versus the research framework? 

0

Yes, many people do this; training in warm-up etc and then running with the model in memory. The missing piece we're working on is saving that model so you don't need to run the training each time.

0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


May be what you can do is to persist your model into a text using pickle. I tested on Research and it works.Let me give you an example.

Research training phase:

import pickle
import codecs
import pandas as pd
from sklearn.svm import SVR

# Build the data
df = pd.DataFrame({'x': list(range(100)), 'y': list(range(100))})
df = df.sample(frac=1)
df_train = df.iloc[:70]

# Build the SVM regressor
model = SVR(kernel='linear')
model.fit(
df_train['x'].values.reshape(-1, 1),
df_train['y'].values.reshape(-1, 1)
)

# Test the SVM
print(model.predict([[1], [10], [100]]))

# Persist to text
pickle_content = codecs.encode(pickle.dumps(model), "base64").decode()
pickle_content

Last line will print the serialization in UTF. Then on backtests or another Research you have to load the pickle content. For example

import pickle
import codecs

pickle_content = 'gANjc2tsZWFybi5zdm0uY2xhc3NlcwpTVlIKcQApgXEBfXECKFgFAAAAX2ltcGxxA1gLAAAAZXBz\naWxvbl9zdnJxBFgGAAAAa2VybmVscQVYBgAAAGxpbmVhcnEGWAYAAABkZWdyZWVxB0sDWAUAAABn\nYW1tYXEIWAQAAABhdXRvcQlYBQAAAGNvZWYwcQpHAAAAAAAAAABYAwAAAHRvbHELRz9QYk3S8an8\nWAEAAABDcQxHP/AAAAAAAABYAgAAAG51cQ1HAAAAAAAAAABYBwAAAGVwc2lsb25xDkc/uZmZmZmZ\nmlgJAAAAc2hyaW5raW5ncQ+IWAsAAABwcm9iYWJpbGl0eXEQiVgKAAAAY2FjaGVfc2l6ZXERS8hY\nDAAAAGNsYXNzX3dlaWdodHESTlgHAAAAdmVyYm9zZXETiVgIAAAAbWF4X2l0ZXJxFEr/////WAwA\nAAByYW5kb21fc3RhdGVxFU5YBwAAAF9zcGFyc2VxFolYDQAAAGNsYXNzX3dlaWdodF9xF2NudW1w\neS5jb3JlLm11bHRpYXJyYXkKX3JlY29uc3RydWN0CnEYY251bXB5Cm5kYXJyYXkKcRlLAIVxGkMB\nYnEbh3EcUnEdKEsBSwCFcR5jbnVtcHkKZHR5cGUKcR9YAgAAAGY4cSBLAEsBh3EhUnEiKEsDWAEA\nAAA8cSNOTk5K/////0r/////SwB0cSRiiUMAcSV0cSZiWAYAAABfZ2FtbWFxJ0c/8AAAAAAAAFgI\nAAAAc3VwcG9ydF9xKGgYaBlLAIVxKWgbh3EqUnErKEsBSwKFcSxoH1gCAAAAaTRxLUsASwGHcS5S\ncS8oSwNoI05OTkr/////Sv////9LAHRxMGKJQwghAAAAIwAAAHExdHEyYlgQAAAAc3VwcG9ydF92\nZWN0b3JzX3EzaBhoGUsAhXE0aBuHcTVScTYoSwFLAksBhnE3aCKJQxAAAAAAAMBYQAAAAAAAAAAA\ncTh0cTliWAoAAABuX3N1cHBvcnRfcTpoGGgZSwCFcTtoG4dxPFJxPShLAUsChXE+aC+JQwj4y0L3\nXn8AAHE/dHFAYlgKAAAAZHVhbF9jb2VmX3FBaBhoGUsAhXFCaBuHcUNScUQoSwFLAUsChnFFaCKJ\nQxAr8rfEI6WEPyvyt8QjpYS/cUZ0cUdiWAoAAABpbnRlcmNlcHRfcUhoGGgZSwCFcUloG4dxSlJx\nSyhLAUsBhXFMaCKJQwjNmJmZmZm5P3FNdHFOYlgGAAAAcHJvYkFfcU9oGGgZSwCFcVBoG4dxUVJx\nUihLAUsAhXFTaCKJaCV0cVRiWAYAAABwcm9iQl9xVWgYaBlLAIVxVmgbh3FXUnFYKEsBSwCFcVlo\nIoloJXRxWmJYCwAAAGZpdF9zdGF0dXNfcVtLAFgKAAAAc2hhcGVfZml0X3FcS0ZLAYZxXVgLAAAA\nX2ludGVyY2VwdF9xXmgYaBlLAIVxX2gbh3FgUnFhKEsBSwGFcWJoIolDCM2YmZmZmbk/cWN0cWRi\nWAsAAABfZHVhbF9jb2VmX3FlaERYEAAAAF9za2xlYXJuX3ZlcnNpb25xZlgGAAAAMC4xOS4xcWd1\nYi4=\n'

# Load the model
model = pickle.loads(codecs.decode(pickle_content.encode(), "base64"))

# Test it
print(model.predict([[1], [10], [100]]))

Of course you cannot persist 100 MB ensemble, but except you are working with K-NN, most models don't weight too much. Most of my GBM models, which are boosting models, weights near 1 MB.

Hope it helps.

1

Can we actually save a trained model on an external source, for example a dropbox http url ?

0

You're right Adrien. May be you can store the Pickle serialization in dropbox and load it on backtest with self.Download method.

https://www.quantconnect.com/docs/algorithm-reference/importing-custom-data#Importing-Custom-Data-Loading-Reference-Data
1

Hi Adrien, you cannot export the serialization to dropbox unless it is a tiny volume of data (less than 10kb). Exporting data isn't allowed sadly due to our data licensing agreements that make it possible to provide the data. 

We are working on a way to save and fetch objects; it'll be ready soon. 

0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Have there been any uppdates to this issue ? 

 

 

0

Hi Christian,

Sorry, for now it's not possible to save and retrieve trained ML models. However, QC supports a lot of Machine Learning libraries, including TensorFlow, Pytorch, Scikit-Learn, Keras and etc. All the supported libraries can be found in the whitelist.

Moreover, it is actually convenient and easy to train ML models and create ML algorithms in QC. Here is one example of how to build an algorithm using TensorFlow. Hope it is helpful for you!

0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


I'm trying to do this in the algorithm section and having issues with reading in a CSV?  It shoud just be (using pandas):

data = pd.read_csv('https://url/data.csv')

Right?  I'm just trying to make the data/target arrays really.  I thought this would be the easy part!  It looks like it's timing out though.  Is there any functionality that isn't allowed here that i'm trying to use?

0

Stephen unless i am missunderstanding this tread, that is the case that we are discussing, that you cant load a csv file into your program that contains the trained algorithem. 

And if i am understanding the example from above Qilong Chen, you make use of the Nettrain, and the training happends via a lookvack of 30 days ? 

 

def NetTrain(self):
history = self.History(self.symbols, self.lookback + 1, Resolution.Daily)

self.prices_x, self.prices_y = {}, {}
self.sell_prices, self.buy_prices = {}, {}

However in many ML examples, it is usfull to train your algo for several years, so the csv option that was eluded to earlyer in this thread would be usufull. 

parhaps a way around this is to shedual a function to only run once on start up, and usa a simmilar example as above with a longer lookback ? 

 

 

0

Christian,

You're completely correct - I overlooked a bit here... I missed the import and only saw the export portion!  Sorry abou that.

I do see the example and agree that I could create a training array on price.  But, with the warm up, won't I only be able to access the Symbol attribuites like they have here?  How can I build true data and target arrays?  I would be looking at price points at different times of the day, consolidated price points, volume, and may even want to put in the day of the week or month.  Do you know of a way to do this? 

0

Sorry - to respond to myself here.  I guess I'm not fully understanding the warmup and I can't seem to find it in the documentation tab anymore.  I thought it was in there.  If I call for self.SetWarmUp(time,resolution) in my initialization - does it just run that algo, basically, for the time I've set for the lookback time?  So, if I have scheduled events storing into a global data array, a volume grab, etc. and then maybe onData grabbing some things as events happen or however I need to get that array filled on a day to to basis.  As long as I set a SetWarmUp it should basically run that and build that array over that time?

0

Hello Stephen and Christian,

Thank you for your responses on the ML example I made.

Stephen, SetWarmUp() just runs the data from whatever period it is set to before the start date to warm up Indicators, but it doesn't perform any of the trading code and won't execute the scheduled events. Besides, you can find details in our documentation here.

Christian, if you want a longer training period, one way to do this is to increase the lookback days. Also, it's really exciting to see you consider option contracts to take the basic example to a new level. To import custom data, you can use download() method to import the data stored in dropbox. You might follow the pattern in this example.  

The example only serves as a start for you guys to explore more on ML algorithms. You are free to set different parameters and improve the algorithm in different ways. For example, you can change the resolution from daily to minute-level, weekly or monthly. Besides, you can put the trading logic in OnData() to get a flexible trading time instead of using Schedule.On().

Hope this help!

0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Thank you!  I'm not sure how I was missing the documentation there... 

0

Update Backtest





0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Loading...

This discussion is closed