QuantConnect Lean Algorithmic Trading Engine
Tutorials > How Do I Load Custom Data?
How Do I Load Custom Data?
Summary Guide to preparing the data and creating a new type for creating your custom data objects.
Difficulty Easy
Time Required 20 minutes

Overview

QuantConnect has two key ways to load custom data into your algorithm. These are useful for different specific purposes depending on the requirements but to use them properly you must understand how the data system works.

Data is brought into QuantConnect in two key ways:

  1. Statically through DataFeed classes which pull data from a compile-time static source and,
  2. Dynamically through custom types inside algorithms which define how to read and produce the custom data.

Static Sources

Typically the static sources are a database, file-system or live socket stream which you connect with. They are generated by a DataFeed class.

In Lean static sources are used to generate Tick and TradeBar streams which pipe through the Lean Engine and fire in event handlers. The TradeBar and Tick are special data types in Lean because before they are piped into the algorithm, they are joined together to form TradeBars and Ticks. These array versions of the data allow the code to access the TradeBar data with a symbol-string dictionary. The Algorithm Manager joins the individual TradeBar and Tick objects together.

Dynamic Sources

Dynamic sources are often generated for specific algorithms and are often only needed once. There is no restriction on the data type but they typically would not be market data but external technical indicators or fundamentals. It is often from a remote source or API's and may even involve sourcing at runtime. Dynamic data can be defined with a few lines of code in the algorithm and is a custom, user defined type which inherits from BaseData. The key difference with dynamic data is that it is singular and piped into the algorithms as individual objects, not arrays like a TradeBars object. Dynamic data has a dedicated event handler for your data type.

Static Sourcing

Easy Method

By far the simplest way to get started is to simply use the existing Lean FileSystemDataFeed class, and store your data in the same folder-hierarchy as Lean uses eliminating the need to write your own DataFeed class. The data is located in /Lean/Data and is organized by asset class, resolution and symbol.

When a new asset stream is requested in the algorithm Lean pulls the data from the location on the disk. If the data is not there it is skipped.

Figure 1: Example files included in Lean Repository

The data is named according to its date in the format YYYYMMDD_trade.zip and is a zip of a CSV file which contains the raw tick or trade bar data. 

Inside the equity files is data in the following format:

Time Open High Low Close Volume
14400000   1689500   1689800   1689500   1689800   800

Time is measured in the number of milliseconds since midnight on that day, and Open, High, Low and Close are scaled by 10,000. The FOREX data has a slightly different internal format but uses the same directory structure. If you data is in a similar format it will simply directly plug into QuantConnect and work instantly.

Hard Method

If you have data stored in a completely different location you can write a custom DataFeed class to extract the data from your database or live data source. The DataFeed is a standalone thread which is started with the Run() method and requested by your Tasks configuration. It must loop inside the Run() method until it has finished loading the requested data, or a cancel has been requested with the Exit() method. For more information on creating a DataFeed plug-in class please see this tutorial.

Dynamic Sourcing - Creating Your Custom Type

Individual algorithms can also request data dynamically at runtime. This requires very little coding and no changes to QuantConnect's DataFeed infrastructure. Creating a dynamic runtime type requires two steps:

  1. Create a new type inheriting from BaseData which implements the GetSource, Reader and Clone methods.
  2. Add a line in your algorithm to request the data

Creating a New Data Class

Creating a new dynamic data class to load into your algorithm takes a few lines of code and has 3 primary methods:

  • GetSource - Where should the data be sourced from? Loading the data from a file or REST API.
  • Reader - How should we interpret each line of the data? Creating a new object from a string source.
  • Clone - How can I create a new object from myself? This method is optional and only required for fillforward data streams.

The code below shows a simple implementation of pulling a custom data source from a local file:

Custom Dynamic Data Implementation
Copy Code
public class Bitcoin : BaseData
{
    //Set the defaults and define our data shape:
    public decimal Open = 0;
    public decimal High = 0;
    public decimal Low = 0;
    public decimal Close = 0;
    public decimal VolumeBTC = 0;
    public decimal VolumeUSD = 0;
    public decimal WeightedPrice = 0;
    /// <summary>
    /// 1. Create a default constructor: Custom data types need a default constructor.
    /// We search for a default constructor so please provide one here. 
    /// It won't be used for data, just to generate the "Factory".
    /// </summary>
    public Bitcoin()
    {
        this.Symbol = "BTC";
    }
    /// <summary>
    /// 2. RETURN THE STRING URL SOURCE LOCATION FOR YOUR DATA:
    /// This is a powerful and dynamic select source file method. If you have a large dataset, 10+mb we recommend you break it into smaller files. E.g. One zip per year.
    /// We can accept raw text or ZIP files. We read the file extension to determine if it is a zip file.
    /// </summary>
    /// <param name="config">Subscription data, symbol name, data type</param>
    /// <param name="date">Current date we're requesting. This allows you to break up the data source into daily files.</param>
    /// <param name="datafeed">Datafeed type: Backtesting or the Live data broker who will provide live data. You can specify a different source for live trading! </param>
    /// <returns>string URL end point.</returns>
    public override string GetSource(SubscriptionDataConfig config, DateTime date, DataFeedEndpoint datafeed)
    {
        switch (datafeed)
        {
            //Selecting different source location depending on computer location.
            default:
            case DataFeedEndpoint.FileSystem:
                return "C:\DropBox\bitcoin.csv";
                break;
            case DataFeedEndpoint.Backtesting:
                //Return the location of your dataset. This can be local or remote sources.
                //e.g. date split: "http://my-ftp-server.com/futures-data-" + date.ToString("Ymd") + ".zip";
                // OR simply return a fixed small data file. If its remote Lean will download the file.
                return "http://www.quandl.com/api/v1/datasets/BITCOIN/BITSTAMPUSD.csv?sort_order=asc";
                break;
        }
    }
    /// <summary>
    /// 3. READER METHOD: Read 1 line from data source and convert it into Object.
    /// Each line of the CSV File is presented in here. The backend downloads your file, loads it into memory and then line by line feeds it into your algorithm
    /// </summary>
    /// <param name="line">string line from the data source file submitted above</param>
    /// <param name="config">Subscription data, symbol name, data type</param>
    /// <param name="date">Current date we're requesting. This allows you to break up the data source into daily files.</param>
    /// <param name="datafeed">Datafeed type - Backtesting or LiveTrading</param>
    /// <returns>New Bitcoin Object which extends BaseData.</returns>
    public override BaseData Reader(SubscriptionDataConfig config, string line, DateTime date, DataFeedEndpoint datafeed)
    {
        //Create a new Bitcoin object that we'll return to Lean.
        Bitcoin coin = new Bitcoin();
        try
        {
            //Example File Format:
            //Date,      Open   High    Low     Close   Volume (BTC)    Volume (Currency)   Weighted Price
            //2011-09-13 5.8    6.0     5.65    5.97    58.37138238,    346.0973893944      5.929230648356
            string[] data = line.Split(',');
            coin.Time = DateTime.Parse(data[0]);
            coin.Open = Convert.ToDecimal(data[1]);
            coin.High = Convert.ToDecimal(data[2]);
            coin.Low = Convert.ToDecimal(data[3]);
            coin.Close = Convert.ToDecimal(data[4]);
            coin.VolumeBTC = Convert.ToDecimal(data[5]);
            coin.VolumeUSD = Convert.ToDecimal(data[6]);
            coin.WeightedPrice = Convert.ToDecimal(data[7]);
            coin.Symbol = "BTC";
            coin.Value = coin.Close;
        }
        catch { /* Do nothing, skip first title row */ }
        return coin;
    }
}

 

Adding Dynamic Data to Your Algorithm

Finally add the requested data source to your algorithm in the initialize method and create an event handler to handle the new data which takes the same data type. For example, if your custom type was called "Bitcoin" - your event handler would be called "public void OnData(Bitcoin data)".

Initialize Bitcoin Data and add Event Handler
Copy Code
namespace QuantConnect
{
    public class CustomDataSourceAlgorithm : QCAlgorithm
    {
        public override void Initialize()
        {
            SetStartDate(2011, 9, 13);
            SetEndDate(DateTime.Now.Date.AddDays(-1));
            SetCash(100000);
            //Define the symbol and "type" of our generic data:
            AddData<Bitcoin>("BTC");
        }
        /// <summary>
        /// Event Handler for Bitcoin Data Events: These Bitcoin objects are created from our "Bitcoin" type and fired into this event handler.
        /// </summary>
        /// <param name="data">A Bitcoin object, streamed into our algorithm synchronized in time with our other data streams</param>
        public void OnData(Bitcoin data)
        {
            //If we don't have any Bitcoin "SHARES" -- invest"
            if (!Portfolio.Invested)
            {
                //Bitcoin used as a tradable asset, like stocks, futures etc.
                if (data.Close != 0)
                {
                    Order("BTC", (Portfolio.Cash / Math.Abs(data.Close + 1)));
                }
                QuantConnect.Algorithm.Console.WriteLine("Buying BTC 'Shares': BTC: " + data.Close);
            }
            Console.WriteLine("Time: " + Time.ToLongDateString() + " " + Time.ToLongTimeString() + data.Close.ToString());
        }
    }
}

 

 

See Also