Defining Data Sources

Key Concepts

Introduction

You must set up your environment before you define the DataSource class for your dataset.

You can set up a data sources to provide trading data, universe selection data, or both. Trading data is passed to the OnData method in algorithms and is meant to inform trading decisions on an existing universe of securities. Universe selection data used to select a universe of securities on a daily basis. If your dataset doesn't provide trading data, delete the Lean.DataSource.<vendorNameDatasetName>/<vendorNameDatasetName>.cs file. If your dataset doesn't provide universe selection data, delete the Lean.DataSource.<vendorNameDatasetName>/<vendorNameDatasetName>Universe.cs file.

Data Formats

Your data source class should read CSV files. Bundle dates together where possible to speed up execution. Just ensure the data in the file is in chronological order. You should package your data to avoid reading data outside the algorithm date range. For example, if your dataset provides minute-resolution data for a security, it should bundle the data by date (for example, /alternative/<vendorName>/<datasetName>/minute/aapl/20200202.csv).

Set Data Sources

The GetSource method in your dataset class instructs LEAN where to find the data. This method must return a SubscriptionDataSource object, which contains the data location and format.

The following table describes the arguments the SubscriptionDataSource accepts:

ArgumentData TypeDescriptionDefault Value
sourcestringData source location. The path should be completely lowercase unless absolutely required. Don't use special characters in your output path, except - in directories names and _ in file names. Your output file(s) must be in CSV format.
transportMediumSubscriptionTransportMediumThe transport medium to be used to retrieve data from the source.
formatFileFormatThe format of the data within the source.FileFormat.Csv
headersIEnumerable<KeyValuePair<string, string>>The headers to be used for this source. In cloud algorithms, each of the key-value pairs can consist of up to 1,000 characters.null

QuantConnect hosts your data, so the transportMedium must be SubscriptionTransportMedium.LocalFile and the format must be FileFormat.Csv.

The following table describes the arguments the GetSource method accepts:

ArgumentData TypeDescription
configSubscriptionDataConfigThe subscription configuration
dateDateTimeDate of this source file
isLiveModebooltrue if algorithm is running in live mode

You can use these arguments to create SubscriptionDataSource objects representing different locations and formats.

public class VendorNameDatasetName : BaseData
{
    public override SubscriptionDataSource GetSource(
        SubscriptionDataConfig config,
        DateTime date,
        bool isLive)
    {
        // File location example:
        // data/alternative/<vendorName>/<datasetName>/aapl.csv
        return new SubscriptionDataSource(
            Path.Combine(
                Globals.DataFolder,
                "alternative",
                "<vendorName>",
                "<datasetName>",
                $"{config.Symbol.Value.ToLowerInvariant()}.csv"),
            SubscriptionTransportMedium.LocalFile,
            FileFormat.Csv);
    }
}

Parse Custom Data

The Reader method of your dataset class takes one line of data from the source location and parses it into one of your dataset objects. You can add as many properties to your dataset objects as you need, but the following table describes the properties you must set. When there is no useable data in a line, the method should return null. LEAN repeatedly calls the Reader method until the date/time advances or it reaches the end of the file.

PropertyDescription
SymbolYou can set this property to config.Symbol.
TimeThe time when the data sample starts.
EndTimeThe time when the data sample ends and when LEAN should add the sample to a Slice.
ValueThe default data point value.

The following table describes the arguments the Reader method accepts:

ArgumentData TypeDescription
configSubscriptionDataConfigThe subscription configuration
linestringstrContent from the requested data source
dateDateTimedatetimeDate of this source file
isLiveModebooltrueTrue if algorithm is running in live mode

You can use these arguments to create BaseData objects from different sources.

public class VendorNameDatasetName : BaseData
{
    public override BaseData Reader(
        SubscriptionDataConfig config,
        string line,
        DateTime date,
        bool isLive)
    {
        // File content example:
        // 20200202,123.45
        var data = line.Split(',');
        return new VendorNameDatasetName()
        {
            Time = DateTime.ParseExact(data[0], "yyyyMMdd", CultureInfo.InvariantCulture),
            EndTime = Time.AddDays(1),
            Symbol = config.Symbol,
            Value = data[1].IfNotNullOrEmpty(
                s => decimal.Parse(s, NumberStyles.Any, CultureInfo.InvariantCulture)),
        };
    }
}

Resolutions

The following tables describes the methods that declare your dataset frequency:

MethodReturn TypeDescription
SupportedResolutionsList<Resolution>Gets the supported resolutions for this dataset. Universe data must have hour or daily resolution.
DefaultResolutionResolutionGets the default resolution for this data and security type. If a member doesn't specify a resolution when they subscribe to your dataset, Lean uses the DefaultResolution.
IsSparseboolIndicates whether the data is sparse. If your dataset is not tick resolution and your dataset is missing data for at least one sample, it's sparse. If your dataset is sparse, we disable logging for missing files.

The following table describes the Resolution enumeration members:

MemberValueDescription
Tick0Tick Resolution (0)
Second1Second Resolution (1)
Minute2Minute Resolution (2)
Hour3Hour Resolution (3)
Daily4Daily Resolution (4)

The following snippet provides an example implementation of the preceding methods:

public class VendorNameDatasetName : BaseData
{
    public override List<Resolution> SupportedResolutions()
    {
        return DailyResolution;
    }

    public override Resolution DefaultResolution()
    {
        return Resolution.Daily;
    }

    public override bool IsSparseData()
    {
        return true;
    }
}

Time Zones

The DataTimeZone method of your data source class declares the time zone of your dataset. This method returns a NodaTime.DateTimeZone object. The TimeZones class provides the following helper attributes to create NodaTime.DateTimeZone objects:

If your dataset provides trading data and universe data, the DataTimeZone methods in your Lean.DataSource.<vendorNameDatasetName>/<vendorNameDatasetName>.cs and Lean.DataSource.<vendorNameDatasetName>/<vendorNameDatasetName>Universe.cs files must be the same.

The following snippet provides an example implementation of the DataTimeZone method:

public class VendorNameDatasetName : BaseData
{
    public override DateTimeZone DataTimeZone()
    {
        return DateTimeZone.Utc;
    }
}

Linked Datasets

The RequiresMapping method of your data source class declares if your dataset is linked to individual securities. Your dataset is linked if any of the following statements are true:

  • Your dataset describes market price properties of specific securities (for example, the closing price of AAPL).
  • Your alternative dataset is linked to individual securities (for example, the Wikipedia page view count of AAPL).

If your dataset is not linked to a specific securities (for example, if your dataset contains the weather of New York City), then the RequiresMapping method should return false.

The following snippet provides an example implementation of the RequiresMapping method:

public class VendorNameDatasetName : BaseData
{
    public override bool RequiresMapping()
    {
        return true;
    }
}

String Representations

The ToString method of your data source class provides a string representation of a single data point in your dataset. The string representation should contain the values of your dataset properties in an easy-to-read format.

The following snippet provides an example implementation of the ToString method:

public class VendorNameDatasetName : BaseData
{
    public override string ToString()
    {
        return $"{Symbol} - {SomeCustomProperty}";
    }
}

Clone Objects

The Clone method of your data source class create a copy of a single data point in your dataset. When LEAN applies fill-forward to your dataset, it calls the Clone method and then modifies the data timestamp.

The following snippet provides an example implementation of the Clone method:

public class VendorNameDatasetName : BaseData
{
    public override BaseData Clone()
    {
        return new VendorNameDatasetName
        {
            Symbol = Symbol,
            Time = Time,
            EndTime = EndTime,
            SomeCustomProperty = SomeCustomProperty,
        };
    }
}

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: