Alternative data is not a new tool for alpha generation, but the availability of it is. For over a decade, the sell-side and funds have used nontraditional data to augment their ability to capture alpha. In the last few years, the amount of alternative data collated in raw data sets has grown significantly, and the data sets themselves have become more accessible for use on the buy side. As a result, it can feel like investment strategies not taking advantage of alt data miss out on capturing potential predictive power in algorithms. 

QuantConnect provides a rich data library of alternative data you can import into your algorithm. This data covers corporate fundamentals, macroeconomics, news and events, price and sentiment data. A key takeaway is that alternative data covers ever-expanding data sources. 

Some examples of alt data include sentiment data from sources like Twitter and data from Edgar SEC filings, the U.S. Energy Information Administration, and the CBOE's VIX datasets. The term alt data began to trend after the 2007 financial crisis and is often used to describe big data specifically useful for economic analysis. 

Examples include web scraped data, credit card transactions, geolocation data, satellite images, sentiment data, and weather data. Use cases include tracking the number of cars in parking lots of major businesses as a proxy for quarterly earnings, better loan grading for the LendingClub platform in predicting loan default, and predicting Electronic Arts game sales before products hit the shelves via sentiment analysis of social media posts. Not only has the in-house development of algorithms using alt data been successful for big funds, with Greenwich Associates reporting 72% of investment firms citing alt data for enhancing their signals but now investors outside of major hedge funds can access the data. While the cost of alt data sets is still prohibitive for individual purchases, the cost barrier can be overcome by using a platform like QuantConnect that makes the data available to its users. 

With the advantages come challenges for alt data as an industry, projected to surpass $1 billion globally in 2021, according to Eagle Alpha. These challenges lie in the small amount of historical data available for a given source, the lack of accurate time-stamping of data, and the limited alt data available for assets outside of liquid stocks. Additionally, regulations on what data can be used and collected, in light of increasing concerns about privacy, are also placing restrictions on alt data sources. What has made the availability of alt data grow is the increase in sources of streaming data and the proliferation of storage and processing tools for data engineering that make mapping data to securities in structures recognized by a trading platform possible. 

The journey of raw data from its source to its integration with a trading platform can be complex. For example, QuantConnect’s data engineers look at the ingestion, loading, transformation, and storage considerations of data. This process completed by the QC team ensures you have accurately time-stamped data mapped to securities in a structure integrated with QC's API. As more investors use alt data sources, some question whether insights from such a source will become less useful in identifying a trend not already captured by the market. After all, alt data is not inherently better than traditional data; its benefit is often considered to be the untapped insights a data set contains. 

Ultimately, the value of alt data lies in how it is used to make trading decisions and in gaining insight and a sense of industries at large. The growth of data science teams in investment banks represents a change in the tools used in quantitative trading. Data scientists use machine learning models as a vehicle for predictive analysis. Artificial intelligence tools can be a competitive advantage when combined with alternative data. 

As individual investors, we can use statistics and scientific thinking to structure strategy hypotheses and use machine learning models to maximize the value of alt data. Pairing alternative data with your strategy, such as one from the QC Strategy Library, can enable you to move beyond passive investing. As the number of data providers grows, some considerations for strategy implementation with alternative data include: 

  • Is the data structured or unstructured? How will we query for what we need?
  • Is additional data cleaning needed? What libraries do I need for cleaning data?
  • Do I need to store my cleaned data or my model created from the data?
  • What hypothesis do I want to work on with this data?

As the availability of data grows, understanding which data set is most useful for your specific strategy will be the next important decision for capturing the advantages of alt data. We hope to see you formulate and test your data science-driven ideas in QuantConnect's research environment soon. 

Happy Coding!