Datasets

Overview

Introduction

It's common to use price and fundamental data points to inform trading decisions. Alternative datasets are datasets that aren't classified as market or fundamental data. Alternative data research began gaining popularity in 2007 after the financial crisis and its usage has continued to grow over the years. As usage has grown, so has the number of unique alternative datasets that are available for investors. The benefit of alternative datasets is that it’s easier to find alpha in these datasets because they receive less attention from investment researchers and these datasets contain information not found in market and fundamental datasets. Most traders research with market and fundamental datasets, so it's difficult to find alpha in them.

Methodology

The journey of raw data from its source to its integration with a trading platform can be complex. Alternative data is generally created through individuals, business processes, and sensors. Individuals create content on social media platforms, write reviews on e-commerce sites, and visit corporate Wikipedia pages. Businesses process transactions, file SEC reports, and issue company announcements. Sensors collect location data, gather satellite images, and monitor customer behavior.

Data providers can capture these forms of data for traders to inform trading decisions in ways that supplement traditional market and fundamental datasets. With over a million terabytes of data on the internet, there is virtually a limitless amount of information that can be informative for investors when analyzing a security’s performance in the marketplace.

Examples

A common example of alternative data in a trading system is using sentiment analysis on news sources to predict the future price movements of Equities. A second example is using announcements regarding corporate share buyback programs to trading Equities. A third example is using global supply and demand information for US Crude Products to inform Futures trades. To view all of the datasets we have available, see the left navigation menu.

Challenges

A challenge of working with datasets is that it can be difficult to find a reliable, high-quality, and inexpensive data source to use in live algorithms. If a data source has survivorship bias, insufficient history, or hasn’t been cleaned and processed properly, the data can cause poor trading performance. We thoroughly vet the datasets we add to the Dataset Market to ensure they are free of survivorship bias and have a reliable live feed. All of the available datasets have already been cleaned and processed, so you can just focus on the strategy research and development.

Future Outlook

The Dataset Market is still in the early phases of development. Over time, we will add more markets and alternative datasets. Alternative datasets are an ever-expanding data source, so it’s expected to continue to grow into the future. In today’s digital world, more data is created each year than in the previous year. To take advantage of the trend, we have implemented a new onboarding process to integrate many new alternative data sources into the QuantConnect platform. With the Dataset Market, you can import a new alternative dataset into your trading algorithm with just a single line of code. The Dataset Market gives everyone the ability to access high-quality alternative datasets that were previously only accessible to major hedge funds.

Live Trading Considerations

We receive most daily alternative data at 7 AM Eastern Time (ET) and start processing it. In live trading, your algorithm receives this data within 30 minutes of 7 AM ET. In backtesting, this data has a timestamp of midnight, so your algorithm receives it about seven hours earlier than it does in live mode.

In backtests, your algorithm receives data at perfect timing. If you request minute resolution data, your algorithm receives the bars at the top of each minute. In live trading, bars have a slight delay, so you may receive them milliseconds after the top of each minute.

Note that there will be delay for data available to be backtested. Live feed data is available for backtesting after 24-48 hours normally. Consider using Paper live trading if close monitoring is needed.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: