Rendering Data
Rendering Data with Python
Using Processing Framework
During this part of the contribution process, you need to edit the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / process.sample.py file so it transforms and moves your raw data into the format and location the GetSource methods expect. Before you start editing the script, change the structure of the Lean.DataSource.<vendorNameDatasetName> / output directory to match the path structure you defined in the GetSource
methods (for example, output / alternative / xyzairline / ticketsales).
Follow these steps to set up the downloading and processing script for your dataset:
- In the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / process.sample.py file, add some code to time how long it takes to process the entire dataset and how long it takes to update the dataset with one day's worth of data.
- In the processing file, load the raw data from your source.
- If your dataset is for universe selection data and it's at a higher frequency than hour resolution, resample your data to hourly or daily resolution.
- If any of the following statements are true, skip the rest of the steps in this tutorial:
- Your dataset is not related to Equities.
- Your dataset is related to Equities and already includes the point-in-time tickers.
- If you don't have the US Equity Security Master dataset, contact us.
- In the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / Program.cs file, remove the statements of the
Main
method - In a terminal, compile the data processing project.
- In the Lean.DataSource.<vendorNameDatasetName> / DataProcessing / process.sample.py file, import the
CLRImports
library. - Create and initialize a map file provider.
- Create a security identifier.
You can fetch data from any of the following sources:
Source | Considerations |
---|---|
Local Files | It can help to first copy the data into location. |
Remote API | Stay within the rate limits. You can use the rate gate class. |
You should load and process the data period by period. Use the date range provided to the script to process the specific dates provided.
If your dataset is related to Equities and your dataset doesn't account for ticker changes, the rest of the steps help you to adjust the tickers over the historical data so they are point-in-time.
$ dotnet build .\DataProcessing\DataProcessing.csproj
This step generates a file that the CLRImports
library uses.
from CLRImports import *
map_file_provider = LocalZipMapFileProvider() map_file_provider.Initialize(DefaultDataProvider())
sid = SecurityIdentifier.GenerateEquity(point_in_time_ticker, Market.USA, True, map_file_provider, csv_date)
After you finish editing the process.sample.py script, run it to populate the Lean.DataSource.<vendorNameDatasetName> / output directory.
$ python process.sample.py