QUANTCONNECT COMMUNITY

No Results

Join Our Discord Channel

Join QuantConnect's Discord server for real-time support, where a vibrant community of traders and developers awaits to help you with any of your QuantConnect needs.

pending review This research is under review. To publish this research attract three community upvotes.

Draft Discussions

Bookmarked Discussions

Share New Research

Start New Discussion Sign up

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

The Open-Quant League is a quarterly competition between universities and investment clubs for the best performing strategy. Previous quarter's code is open-sourced, and competitors must adapt to survive.

competition rules

See the competition code of conduct and rules for participation in prizes.

Read Rules

previous competitions

Browse strategies and organization entries from previous quarter's competitions.

STRATEGY

286,600 Quants.

Become a Quant

VOTE FOR UPCOMING FEATURES

Share your input and vote on our future direction.

LEAN Roadmap

Create an account on QuantConnect for the latest delivered to your inbox.

Machine Learning 1

I'm going to start series of research in Machine Learning. It is very challenging to implement successful ML trading stragegies and prove they are working in real market. However, since our world is changing so fast and getting more complex, it's worth doing research in ML and hopefully we can develop robust models. Let's start with a Linear Regression model. Any comments and suggestions are welcome.

Update Backtest

person upvoted this people upvoted this

HanByul P

| |

Accepted Answer

Update Backtest

Notebook

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

HanByul P

8.1k Pro ,

Hi all, I found a mistake in the charts. Our entire data history was from (2010, 1, 1) to (2017, 6, 30) and we were supposed to predict roughly next 60 days (3 % of entire data). In the process of ML prep, I dropped about 60 rows to clean up the dataset, which is correct. However, converting historical prices to charts, I should have showed entire history of prices. That's why the above Notebook's chart showed the ending date of historical prices as April instead of (2017, 6, 30). I fixed this and attached below. Thank you.

Notebook

Michael Manus

10.2k Pro ,

nice thanks for posting!

very interesting, straight to favorites

Michael Manus

10.2k Pro ,

did you every traded live with a machine learning algo?

hopefully there will be another machine learning algo post HanByl! thanks for posting again.

very good content

Jing Wu

242.2k Pro ,

Nice work HanByul! this is a good example for machine learning 101. I'm confused about this line

# Shift 'close' upper side as our target variable
dataset['label'] = dataset[predict_col].shift(-predict_out)

I'm confused why you shift the "y" backward instead of using the first "predict_out" days' price as the response variable value. If shift "y" backward and drop the NaNs, the left values will be the lately price instead of the old price.

I saw X is using the top half X = X[:-predict_out], it looks like y is using the later half of price. I'm not sure the date of these two datasets are matched with each other. Could you explain a little bit?

Thanks for your help!

Jing

HanByul P

8.1k Pro ,

Hi Jing and all, First of all, Thank you all. Second, I reviewed my post again and realized that the first post is correct (I mean the charting in the first post is correct.). The ML process is all correct in both posts anyway. I was confused by charting. Again, please disregard second post. Sorry about the confusion.

predict_col = 'close'
predict_out = int(math.ceil(0.03*len(dataset)))
dataset['label'] = dataset[predict_col].shift(-predict_out)

The above process is constructing 'label' with our 'future' prices. By shifting 'close' data up into the new column 'label' and matching with left-side data, we can fill 'label' with future prices. For example, by shifting price of 2017/6/30 (future price) up by 60 (days), the price of 2017/6/30 (future price) lines up with the row of 2017/4/30 (past data). Then, here we have NaNs in the bottom part of 'label' (because we shifted up.). I dropped these NaNs later.

All the left-side data of this new 'label' is our features ('P/E', 'B/V_Yield', 'EVToEBITDA', 'close') and all these are already known data.

# Define X  and   X_lately    ---------------------------------

# First drop 'label' column, so that we just play with only features data 
X = np.array(dataset.drop(['label'],1))
# Scale all the X data
X = preprocessing.scale(X)
# Define the bottom part of our features for our prediction
X_lately = X[-predict_out:]
# Define our X feature for training
X = X[:-predict_out]


dataset.dropna(inplace=True)

# Define y   ----------------------------------
y = np.array(dataset['label'])

Before doing ML, we need to redefine X and newly define 'X_lately'. What we are going to predict is the prediction of 'X_lately' feature data, which I sliced with the exactly same length as the length of shifted (predict_out). In other words, this ML model will predict future 60-day prices with these bottom 60 rows of our features (left-side of 'label'). That is 'X_lately = X[ -predict_out: ]'.The entire rows above the bottom 60 is our X (X = X[ : -preict_out]) and y is the 'label' that we already shifted up. So we have a clean dataset: X features (left-side of 'label') and y 'label'. I dropped ['label'] from the entire dataset before defining X and 'X_lately', so had no problem in there. And I dropped NaNs of 'label' finally.

Wish I could draw all of these so that you guys understand better. Please understand my poor ability of explanation. Again, sorry about the confusion in charting and thank you all.

Jing Wu

242.2k Pro ,

Thanks for the explanation. In your algorithm, you are using the current close price as one of the features and try to predict the price after 57 days. Probably that's why you shift the price into future to be the label variable. The X variable is today's 'P/E', 'B/V_Yield', 'EVToEBITDA', 'close' and y is the price in the 57 days. As feature "close" and response variable y might have high correlation and the coefficient of 'close' will be high. Looking forward to your trading algorithm with the application of this model!

HanByul P

8.1k Pro ,

Hi Jing, Yes, that's right. I'm glad that my poor explanation sounded ok to you I guess. I included AAPL's price data in the features (X) and it could be a key factor of impact on the prediction, which is something that I expected. And again, this is a just first step of ML as you know and we have long long ways to go. And I will try to implement my research to an algo sometime later. Thank you again and I'm looking forward to discussing some more topics later on the next post. ^ ^

Ethan Scott

59 Pro ,

Nice post though. Even I saw the mistakes in the beginning and then realised you know it.

HanByul P INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Organization

Organization Website

Update Competition

Team

Clone Strategy

Copy this strategy code to your QuantConnect account and deploy it live with your brokerage.

Clone

Previous Ranking

Browse strategies and organization entries from previous quarter's competitions.

Author:

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

286,600 Quants.

VOTE FOR UPCOMING FEATURES

Machine Learning 1

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

Platform

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

286,600 Quants.

VOTE FOR UPCOMING FEATURES

Machine Learning 1

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free