By: Simon Burns
Quant Development Intern30.07.2013
Designing Sentiment Trading Strategies with Stefan Nann
Stefan Nann is the Co-Founder and CEO of StockPulse, a provider of social media sentiment data for individual and index securities. Stefan studied Business Administration and Information Systems before performing graduate studies with a focus on financial markets, analysis of unstructured text and online communities. While a visiting scholar at the Massachusetts Institute of Technology (MIT) Center for Collective Intelligence, Jonas Krauss (Co-Founder) and Stefan Nann build the semantic compilation algorithm first used for Oscar predictions that is the backbone to StockPulse’s sentiment scores today. Stefan is analytical, eloquent and truly visionary individual and it was a pleasure speaking with him about building his algorithm, valuing input data by source value and signal management for modern financial markets. Enjoy our interview with Stefan Nann:
Simon Burns: Stefan Nann, thank you for joining us. Could you start by taking us through the your story and how you got into compiling sentiment scores for stocks?
Stefan Nann: Basically, it started when I was with my colleague Jonas Krauss. We met in University and started working together for a study we were doing predicting Oscar winners with algorithms. We collected vast amounts of communication data on people’s opinion of movies and actors. This information was found online from sources like the the internet movie database (IMDB) forum where we collected a year’s worth of data.
Using our algorithm, we got 9 out of 10 of our predictions right in 2007. This got us a fair amount of press and validated our work. So we wrote an academic paper which was also well received. Then we thought, “Okay if this works for Oscar predictions, then it probably also work for stocks and other financial instruments”. This was around the time Twitter/StockTwits invented the dollar sign hashtag which organized financial market communications about stocks and made it readable by our algorithm.
So we started collecting all this data, building a hypothesis and predicting value based on public communications data. The result is StockPulse.
Simon Burns: What is the main purpose of StockPulse? What did you built it for?
Stefan Nann: We built it as a complimentary set information for existing systems, our users have other news screens and data sources either technical or fundamental. They should never rely only solely on our social media data, that would not make sense.
Simon Burns: There’s a fair amount of noise on Twitter some of it market chatter, some of it not. Clearly that’s not directly applicable to the markets and doesn’t impact security values. How does StockPulse filter out the noise and just get the influential tweets into your system?
Stefan Nann: There are several steps we are doing to ensure this flurry of tweets that don’t impact markets, don’t impact our algorithms. The first step is we do a lot of pre-processing. We are not collecting a few hundred million tweets or posts every day. Twitter receives 500 million tweets as of today, there is no way that we could or would want to decipher all of that data.
We pull certain calls for data. We know exactly the stock-related communication for stocks, indices or currencies that we want. We get very segmented data within these verticals. For example, pulling Twitter data on a stock is very easy. You can say, find the communication related to Apple stock which would be organized on Twitter under $AAPL. An issue that we had to face was with European markets. If you go into stock forums in Germany, to access the concrete information there is more work to be done than in the U.S. and this applies for the German markets and the UK markets. So let’s say we take the example of Deutsche Bank stock. Again, same as Apple, we want to find sentiment data based on all media communications about the stock. You can’t simply pull for $DB, that’s not the German Deutsche Bank because German markets have this ISIN. It’s a string numbers that identifies a stock.
However, no one is tweeting the ISIN number of Deutsche Bank in a tweet when they want to talk about Deutsche bank. No one knows the ISIN, so that’s really quite weird. The communication is poorly organized for our purposes. We have to manually do text analysis on the text which is coming from all aggregate communication. The text analysis is looking for keywords, like the words “long” or “short” which are very specific keywords for the financial markets and define the positive or negative value of a communication piece very quickly. This is how we are able to build up our algorithmic based scoring.
Simon Burns: In financial market commentary there is evidently a different value placed by participants on tweets versus blog posts, or between a press release and a magazine feature. How do you account for that?
Stefan Nann: That is a very interesting question because of the different sources we deal with, Twitter or blog posts or news stories. Each source has a different half-life, so a fundamental analysis of a Seeking Alpha post shows a much longer half-life than a tweet with 140 characters. I think the concept is very relevant especially to trading. A lot of traders trade on information, however most traders don’t think about the half-life of the data they are trading on. You really need to know the half-life of your data and separate your material information by the different sources it is coming from.
Simon Burns: Great insights, so based on this approach how do you value different market signals within posts? How does the specific language that’s used in a blog post or a tweet effect your sentiment score?
Stefan Nann: Right now we’re not weighing sources differently, they all get the same weight. It’s just the weighing happens on the authorship level rather than the content type level. We have tiered categories for authors and it helps determine the value. The hypothesis is that if you have a very reputable Twitter account, you can move markets. For example, 50 Cent was able to move a penny stock hundreds of percentage points with one tweet. We really have to weigh sources differently to incorporate this effect. That’s a task we need to do in the next couple of months.
Simon Burns: Well Stefan, thank you so much for your time. It is much appreciated and best of luck with StockPulse!