QuantConnect

US Equity

Introduction

Download the US Equities dataset to your local machine. You can download the full dataset in bulk to avoid selection bias or download individual tickers to keep the cost low. The dataset contains data for every ticker and trading day. If the resolution you download provides trade and quote data, the download contains both data types. To check which data types each resolution provides, see Resolutions.

To use the CLI, you must be a member in an organization on a paid tier.

Prerequisites

The US Equities dataset depends on the US Equity Security Master dataset because the US Equity Security Master contains information on splits, dividends, and symbol changes. Before you download US Equities data, open the Pricing page of your organization and subscribe to the US Equity Security Master by QuantConnect data package. You need billing permissions to change the organization's subscriptions.

Download in Bulk

After you subscribe to local access (see Prerequisites), open a terminal in your organization workspace and run the following commands to bulk download the data and its prerequisites.

To download the US Equity Security Master, run:

$ lean data download --dataset "US Equity Security Master"

To download the US Equities data for a resolution, run the following command, replacing <resolution> with daily, hour, minute, second, or tick and adjusting the date range:

$ lean data download --dataset "US Equities" --data-type "Bulk" --resolution "<resolution>" --start "20230101" --end "20230105"

You can also use the CLI Command Generator on the dataset listing to generate the command. For more information about the interactive wizard and the CLI Command Generator, see Using the CLI.

After you bulk download the US Equities dataset, new daily updates are available at 7 AM Eastern Time (ET) after each trading day. Subscribe to at least one of the US Equity ... Updates by AlgoSeek data packages to unlock the updates. Instead of directly calling the lean data download command, you can place a Python script in the data directory of your organization workspace and run it to update your data files. The following example script updates all data resolutions:

import os
import pandas as pd
from datetime import datetime, time, timedelta
from pytz import timezone
from os.path import abspath, dirname
os.chdir(dirname(abspath(__file__)))

OVERWRITE = False

# Define a method to download the data
def __download_data(resolution, start=None, end=None):
    print(f"Updating {resolution} data...")
    command = f'lean data download --dataset "US Equities" --data-type "Bulk"  --resolution "{resolution}"'
    if start:
        end = end if end else start
        command += f" --start {start} --end {end}"
    if OVERWRITE:
        command += " --overwrite"
    print(command)
    os.system(command)

def __get_end_date() -> str:
    now = datetime.now(timezone("US/Eastern"))
    if now.time() > time(7,30):
        return (now - timedelta(1)).strftime("%Y%m%d")
    print('New data is available at 07:30 AM EST')
    return (now - timedelta(2)).strftime("%Y%m%d")

def __download_high_frequency_data(latest_on_cloud):
    for resolution in ["minute", "second", "tick"]:
        dir_name = f"equity/usa/{resolution}/spy".lower()
        if not os.path.exists(dir_name):
            __download_data(resolution, '19980101')
            continue
        latest_on_disk = sorted(os.listdir(dir_name))[-1].split('_')[0]
        if latest_on_disk >= latest_on_cloud:
            print(f"{resolution} data is already up to date.")
            continue
        __download_data(resolution, latest_on_disk, latest_on_cloud)

def __download_low_frequency_data(latest_on_cloud):
    for resolution in ["daily", "hour"]:
        file_name = f"equity/usa/{resolution}/spy.zip".lower()
        if not os.path.exists(file_name):
            __download_data(resolution)
            continue
        latest_on_disk = str(pd.read_csv(file_name, header=None)[0].iloc[-1])[:8]
        if latest_on_disk >= latest_on_cloud:
            print(f"{resolution} data is already up to date.")
            continue
        __download_data(resolution)

if __name__ == "__main__":
    latest_on_cloud = __get_end_date()
    __download_low_frequency_data(latest_on_cloud)
    __download_high_frequency_data(latest_on_cloud)

The preceding script checks the date of the most recent SPY data you have for all resolutions. If there is new data available for any of these resolutions, it downloads the new data files and overwrites your hourly and daily files. If you don't intend to download all resolutions, adjust this script to your needs.

To update your local copy of the US Equity Security Master, run:

$ lean data download --dataset "US Equity Security Master"

Download by Ticker

To download data for selected tickers instead of the full dataset, run a non-interactive lean data download command. First, download the prerequisite US Equity Security Master:

$ lean data download --dataset "US Equity Security Master"

Then download the US Equities data for the tickers, resolution, and date range you need. Tick, second, and minute resolutions provide separate trade and quote data, so run a command for each data type. For example, to download minute-resolution data for SPY:

$ lean data download --dataset "US Equities" --data-type "Trade" --ticker "SPY" --resolution "Minute" --start "20230101" --end "20230105"
$ lean data download --dataset "US Equities" --data-type "Quote" --ticker "SPY" --resolution "Minute" --start "20230101" --end "20230105"

Hour and daily resolutions provide only trade data, so download only the trade data type for those resolutions.

To download data interactively or to use the CLI Command Generator instead, see Using the CLI.

Size and Format

The following table shows the size and format of the US Equities dataset for each resolution:

ResolutionSizeFormat
Daily2 GB1 file per ticker
Hour4 GB1 file per ticker
Minute500 GB1 file per ticker per day
Second1.5 TB1 file per ticker per day
Tick1.5 TB1 file per ticker per day

For more information about the file format, see the Data / equity directory in the LEAN repository.

Price

Both the bulk and by-ticker downloads require the US Equity Security Master. The following table shows the cost of an annual subscription to the US Equity Security Master for each organization tier:

TierPrice ($/Year)
Quant Researcher
600
Team
900
Trading Firm
1,200
Institution
1,800

Download in Bulk

To download the US Equities dataset in bulk, subscribe to it on the Pricing page of your organization. The price depends on your organization tier and the resolution you need. The following table shows the price ($/year) to download the historical data of each resolution for each organization tier:

ResolutionQuant ResearcherTeamTrading FirmInstitution
Tick16,80028,80033,60052,800
Second15,36028,80033,60048,000
Minute11,76016,80031,20043,200
Hour2,1363,4803,4803,480
Daily2,1363,4803,4803,480

After the first bulk subscription ends, subscribe to the updates to keep your local data current. The updates cost the same for all resolutions. The following table shows the price ($/year) of the updates for each organization tier:

TierPrice ($/Year)
Quant Researcher600
Team840
Trading Firm1,440
Institution2,640

The following table shows the total cost of downloading the required datasets in bulk at minute resolution on the Quant Researcher tier. Other organization tiers apply their own rates, shown in the preceding tables.

DatasetPackageHistoricalUpdates
US Equity Security MasterSubscription$600$600/year
US EquitiesMinute$11,760$600/year
Total$12,360$1,200/year

Download by Ticker

The US Equities dataset is available is several resolutions. The resolution you need depends on the US Equity subscriptions you create in your algorithm and the resolution of data you get in history requests. The following table describes the file format and costs of each resolution:

ResolutionFile FormatCost per file
TickOne file per security per trading day per data format. Quote and trade data are separate files.6 QCC = $0.06 USD
SecondOne file per security per trading day per data format. Quote and trade data are separate files.5 QCC = $0.05 USD
MinuteOne file per security per trading day per data format. Quote and trade data are separate files.5 QCC = $0.05 USD
HourOne file per security.300 QCC = $3 USD
DailyOne file per security.100 QCC = $1 USD

For example, the following algorithm subscribes to minute resolution data for a US Equity:

public class USEquityDataAlgorithm : QCAlgorithm
{
    public override void Initialize()
    {
        SetStartDate(2020, 1, 1);
        SetEndDate(2021, 1, 1);
        AddEquity("SPY", Resolution.Minute);
    }
}
class USEquityDataAlgorithm(QCAlgorithm):
    def initialize(self) -> None:
        self.set_start_date(2020, 1, 1)
        self.set_end_date(2021, 1, 1)
        self.add_equity("SPY", Resolution.MINUTE)

The following table shows the data cost of the preceding algorithm on the Quant Researcher tier:

DatasetPackageInitial CostOngoing Cost
US Equity Security MasterDownload On Premise$600 USD$600 USD/year
US EquityMinute Download1 security over 252 trading days with 2 data formats
=> 1 * 252 * 2 files
= 504 files

504 files @ 5 QCC/file
=> 504 * 5 QCC
= 2,520 QCC
= $25.20 USD
1 security with 2 data formats
=> 1 * 2 files/day
= 2 files/day

2 files/day @ 5 QCC/file
=> 2 * 5 QCC/day
= 10 QCC/day
= $0.10 USD/day

The preceding table assumes you download trade and quote data, but you can run backtests with only trade data.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: