Rendering Data

Rendering Data with Notebooks

Introduction

This page explains how to create a script to download and process your dataset with Jupyter Notebooks for QuantConnect distribution.

Using Processing Framework

During this part of the contribution process, you need to edit the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / process.sample.ipynb file so it transforms and moves your raw data into the format and location the GetSource methods expect. The notebook should save all the data history to the output directory in your machine's root directory (for example, C: / output) and it should save a sample of the data history to the Lean.DataSource.<vendorNameDatasetName> / output directory.

Follow these steps to set up the downloading and processing script for your dataset:

  1. Change the structure of the Lean.DataSource.<vendorNameDatasetName> / output directory to match the path structure you defined in the GetSource methods (for example, output / alternative / xyzairline / ticketsales).
  2. In the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / process.sample.ipynb file, add some code to time how long it takes to process the entire dataset and how long it takes to update the dataset with one day's worth of data.
  3. You need this information for when you provide the dataset documentation. We need to know how long it takes to process your dataset so we can schedule its processing job.
  4. In the processing file, load the raw data from your source.
  5. You can fetch data from any of the following sources:

    SourceConsiderations
    Local FilesIt can help to first copy the data into location.
    Remote APIStay within the rate limits. You can use the rate gate class.

    You should load and process the data period by period. Use the date range provided to the script to process the specific dates provided.

  6. If your dataset is for universe selection data and it's at a higher frequency than hour resolution, resample your data to hourly or daily resolution.
  7. If any of the following statements are true, skip the rest of the steps in this tutorial:
    • Your dataset is not related to Equities.
    • Your dataset is related to Equities and already includes the point-in-time tickers.

    If your dataset is related to Equities and your dataset doesn't account for ticker changes, the rest of the steps help you to adjust the tickers over the historical data so they are point-in-time.

  8. If you don't have the US Equity Security Master dataset, contact us.
  9. In the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / Program.cs file, remove the statements of the Main method
  10. In a terminal, compile the data processing project.
  11. $ dotnet build .\DataProcessing\DataProcessing.csproj

    This step generates a file that the CLRImports library uses.

  12. In the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / process.sample.ipynb file, import the CLRImports library.
  13. from CLRImports import *
  14. Create and initialize a map file provider.
  15. map_file_provider = LocalZipMapFileProvider()
    map_file_provider.Initialize(DefaultDataProvider())
  16. Create a security identifier.
  17. sid = SecurityIdentifier.generate_equity(point_in_time_ticker,
        Market.USA, True, map_file_provider, csv_date)

After you finish editing the process.sample.ipynb script, run its cells to populate the Lean.DataSource.<vendorNameDatasetName> / output directory and the output directory in your machine's root directory.

Note: The pull request you make at the end must contain sample data so we can review it and run the demonstration algorithms.

Notebook Processor Examples

The following examples are rendering datasets with Jupyter Notebook processing:

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: