QuantConnect
CFD
Introduction
Download the CFD dataset to your local machine. You can download the full dataset in bulk to avoid selection bias or download individual tickers to keep the cost low. The dataset contains data for every ticker and trading day.
To use the CLI, you must be a member in an organization on a paid tier.
Download in Bulk
After you subscribe to the CFD dataset on the Pricing page of your organization, open a terminal in your organization workspace and run the following command to bulk download the data, replacing <resolution> with daily, hour, minute, or second and adjusting the date range:
$ lean data download --dataset "CFD Data" --data-type "Bulk" --resolution "<resolution>" --start "20230101" --end "20230105"
You can also use the CLI Command Generator. For more information, see Using the CLI.
After you bulk download the CFD dataset, new daily updates are available at 3 PM Coordinated Universal Time (UTC) after each trading day. Instead of directly calling the lean data download command, you can place a Python script in the data directory of your organization workspace and run it to update your data files. The following example script updates all data resolutions:
import os
import pandas as pd
from datetime import datetime, time, timedelta
from pytz import timezone
from os.path import abspath, dirname
os.chdir(dirname(abspath(__file__)))
OVERWRITE = False
# Define a method to download the data
def __download_data(resolution, start=None, end=None):
print(f"Updating {resolution} data...")
command = f'lean data download --dataset "CFD Data" --data-type "Bulk" --resolution "{resolution}"'
if start:
end = end if end else start
command += f" --start {start} --end {end}"
if OVERWRITE:
command += " --overwrite"
print(command)
os.system(command)
def __get_end_date() -> str:
now = datetime.now(timezone("US/Eastern"))
if now.time() > time(7,30):
return (now - timedelta(1)).strftime("%Y%m%d")
print('New data is available at 07:30 AM EST')
return (now - timedelta(2)).strftime("%Y%m%d")
def __download_high_frequency_data(latest_on_cloud):
for resolution in ["minute", "second"]:
dir_name = f"cfd/oanda/{resolution}/xauusd".lower()
if not os.path.exists(dir_name):
__download_data(resolution, '19980101')
continue
latest_on_disk = sorted(os.listdir(dir_name))[-1].split('_')[0]
if latest_on_disk >= latest_on_cloud:
print(f"{resolution} data is already up to date.")
continue
__download_data(resolution, latest_on_disk, latest_on_cloud)
def __download_low_frequency_data(latest_on_cloud):
for resolution in ["daily", "hour"]:
file_name = f"cfd/oanda/{resolution}/xauusd.zip".lower()
if not os.path.exists(file_name):
__download_data(resolution)
continue
latest_on_disk = str(pd.read_csv(file_name, header=None)[0].iloc[-1])[:8]
if latest_on_disk >= latest_on_cloud:
print(f"{resolution} data is already up to date.")
continue
__download_data(resolution)
if __name__ == "__main__":
latest_on_cloud = __get_end_date()
__download_low_frequency_data(latest_on_cloud)
__download_high_frequency_data(latest_on_cloud)
The preceding script checks the date of the most recent XAUUSD data you have for second and minute resolutions. If there is new data available for either of these resolutions, it downloads the new data files and overwrites your hourly and daily files. If you don't intend to download all resolutions, adjust this script to your needs.
Download by Ticker
To download data for selected tickers instead of the full dataset, run a non-interactive lean data download command. For example, to download minute-resolution quote data for XAUUSD:
$ lean data download --dataset "CFD Data" --data-type "Quote" --ticker "XAUUSD" --resolution "Minute" --start "20230101" --end "20230105"
To download data interactively or to use the CLI Command Generator, see Using the CLI.
Size and Format
The following table shows the size and format of the CFD dataset for each resolution:
| Resolution | Size | Format |
|---|---|---|
| Daily | 500 MB | 1 file per ticker |
| Hour | 1 GB | 1 file per ticker |
| Minute | 50 GB | 1 file per ticker per day |
| Second | 200 TB | 1 file per ticker per day |
For more information about the file format, see the Data / cfd directory in the LEAN repository.
Price
Download in Bulk
To download the CFD dataset in bulk, subscribe to it on the Pricing page of your organization. The price is the same for all resolutions and organization tiers: $800/year to download the historical data and $200/year for the daily updates.
The following table shows the total cost of downloading the CFD dataset in bulk. The price is the same for all organization tiers.
| Dataset | Package | Historical | Updates |
|---|---|---|---|
| CFD Data | Subscription | $800 | $200/year |
Download by Ticker
The CFD dataset is available is several resolutions. The resolution you need depends on the CFD subscriptions you create in your algorithm and the resolution of data you get in history requests. The following table describes the file format and costs of each resolution:
| Resolution | File Format | Cost per file |
|---|---|---|
| Second | One file per contract per trading day. | 3 QCC = $0.03 USD |
| Minute | One file per contract per trading day. | 3 QCC = $0.03 USD |
| Hour | One file per contract. | 3 QCC = $0.03 USD |
| Daily | One file per contract. | 3 QCC = $0.03 USD |
For example, the following algorithm subscribes to minute resolution data for one CFD contract:
public class CFDDataAlgorithm : QCAlgorithm
{
public override void Initialize()
{
SetStartDate(2020, 1, 1);
SetEndDate(2021, 1, 1);
AddCfd("XAUUSD");
}
} class CFDDataAlgorithm(QCAlgorithm):
def initialize(self) -> None:
self.set_start_date(2020, 1, 1)
self.set_end_date(2021, 1, 1)
self.add_cfd("XAUUSD")
The following table shows the data cost of the preceding algorithm:
| Dataset | Package | Initial Cost | Ongoing Cost |
|---|---|---|---|
| CFD Data | Minute Download | 1 contract over 314 trading days
=> 314 files 314 files @ 3 QCC/file => 314 * 3 QCC = 942 QCC = $9.42 USD | 1 contract/day => 1 file/day 1 file/day @ 3 QCC/file => 3 QCC/day = $0.03 USD/day |