Back

Using Custom Reddit Data to Trade Stocks

Hi everyone, 

Most of us have heard of the subreddit "wallstreetbets." It is one of Reddit's most popular hubs for retail traders to share their ideas or positions. This algorithm scrapes the subreddit looking for posts related to the stocks: SPCE, LULU, CCL, and SDC (These stocks tend to be a popular conversation in the retail investor space). Then it conducts a sentiment analysis of the comments in the identified post and stores the values in a data frame. 

The algorithm may contradict Odd Lot Theory, which states that the small investor is usually wrong. But today, retail investor participation in the market is at an all-time high, and there are plenty of simple stock analysis tools to help small investors make informed decisions. If the general sentiment from retail investors is up, then they could be purchasing the underlying equity.

Here are the links to the csv file generated from the Reddit sentiment analysis program.

 

Cheers,

Jovad .

Update Backtest








0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Very interesting work Jovad. How did you decide which submissions statistics to use as inputs to sentiment score? Is this based on work done elsewhere? And why the choice of those ticker symbols? 

Oddlot theory is proving to be resoundly incorrect this year. Recent market moves in the NASDAQ have been attributed to big Softbank option bets, however, I have seen a few analyses from vol traders that suggest 1) dollar premium spent on small-lot option trades have swamped the recent trades allegedly from the 'NASDAQ Whale' by as much as 10x; 2) small lot trades have mostly been short dated (<30 days DTE) tenors vs 6-12months from Softbank 3) a dealer/MM 'gamma squeeze' situation more likely to be caused by these shorter dated flows since shorter (longer) dated options have more gamma (vega).

There is probably still money to be made following the 'dumb-money'...!
 

https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy8xMTkwOGIzYy9wb2RjYXN0L3Jzcw/episode/MDgzNDZlYzMtODAyOC00Yjg0LThiZDctMDRlYmEwMTNiMWNi?hl=en-AU&ved=2ahUKEwiJsNyO2unrAhWKH7cAHXI2BDMQjrkEegQIThAF&ep=6

 

0

Hi P Chen, 

Thanks for sharing the podcast! After realizing a few flaws in the algorithm, I made a few changes. A backtest of the "upgraded" algorithm is attached, and here is the link to the commented python code for analyzing Reddit. 

We use the SentimentIntensityAnalyzer from nltk.sentiment.vader to determine the sentiment of a body of text. Originally, we only analyzed sub comments but it now also analyzes submission title and submission body text. Once all scores are collected, they are averaged and returned as averageScore. 

This work is not based on work elsewhere. However, there are plenty of youtube tutorials on Reddit API and a few on analyzing sentiment.

I picked those stocks by briefly scrolling through wallstreetbets and picking four tickers. I have included a backtest with 11 more tickers that I saw mentioned in the subreddit. If this algorithm were to be run real-time, it would be intuitive to create a separate program to pull a list of tickers mentioned on the subreddit within a 30-day rolling window (ie. will not pull ticker 'F' if it was mentioned 120 days ago; it must be a recent discussion).

Yes, you are right. Odd lot theory has been incorrect this year, and I believe it will continue to be inaccurate. As the friction for small investors to participate in the market declines, a collection of small investors can have a greater market impact and company news reports may create increase short-term volatility for a stock price. Here is an interesting short article on a mutual fund's manager thoughts on Robinhood. 

Some pitfalls with this algorithm:

  1. There is no filter for predatory wallstreetbets submissions (ie. link)
  2. Only searches for mentions of ticker names but not company names (quick fix)
  3. No credibility model
 

An idea to prime this algorithm for live trading is to use a neural network. It will create a prediction for the next set of sentiment scores and use those predictions to manage risk. The CSV file would also need to be updated and fed into the algorithm daily.  If anything I said is confusing please let me know. 

(One change I made to the backtest is increased the trailing stop loss from 0.05 to 0.08 to allow more room for recovering losses)

Cheers,

Jovad Uribe

0


Update Backtest





0

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.


Loading...

This discussion is closed