Datasets
Local Data
Introduction
When you run a command like lean backtest
or lean optimize
, the CLI runs the LEAN engine in a Docker container locally.
If the LEAN engine runs locally, your local market data is used, which is the data stored in the data directory inside your workspace.
This page explains the structure of your local data directory and ways to source LEAN-compatible data.
Data Storage
When you run create a workspace in an empty directory, the CLI downloads the latest data directory from the LEAN repository. This directory contains a standard directory structure from which the LEAN engine reads. Because LEAN strives to use an open, human-readable format all data is stored in flat files (formatted as CSV or JSON), and compression is done using zip. Once downloaded, the data directory tree looks like this:
data ├── alternative/ ├── cfd/ ├── crypto/ ├── equity/ ├── forex/ ├── future/ ├── futureoption/ ├── index/ ├── indexoption/ ├── market-hours/ ├── option/ ├── symbol-properties/ └── readme.md
By default, the data directory contains a small amount of sample data for all asset types to demonstrate how data files must be formatted. Additionally, the data directory itself and most of its subdirectories contain readme.md files containing more documentation on the format of the data files of each asset type.
Custom Data Directory
You can configure the data directory to use in the data-folder
property in your Lean configuration file.
The path this property is set to is used as the data directory by all commands that run the LEAN engine locally.
By default, this property points to the data directory inside your workspace.
If this property is set to a relative path, it is resolved relative to the Lean configuration file's parent directory.
The data directory is the only local directory that is mounted into all Docker containers ran by the CLI, so it must contain all the local files you want to read from your algorithms.
You can get the path to this directory in your algorithm using the Globals.DataFolder
variable.
QuantConnect Datasets
The recommended way to source LEAN-compatible data for local usage is by purchasing and downloading it from QuantConnect Datasets.
This data is curated and maintained by QuantConnect and is easy to download with the CLI or by using the ApiDataProvider in LEAN. Before the CLI and ApiDataProvider download new files, they check if your local machine already stores the files.
Data charges are deducted from your organization's QCC balance. You can purchase QCC in the Algorithm Lab.
Other Data Sources
If you already have data of your own you can convert it to a LEAN-compatible format yourself.
In that case, we recommend that you read the readme.md files generated by the lean init
command in the data directory, as these files contain up-to-date documentation on the expected format of the data files.
For development purposes, it is also possible to generate data using the CLI. This generator uses a Brownian motion model to generate realistic market data, which might be helpful when you're testing strategies locally but don't have access to real market data.