# Datasets

## Defining Data Models

### Introduction

This page explains how to set up the data source SDK and use it to create data models.

### Part 1/ Set up SDK

1. Open the Lean.DataSource.SDK repository and click .
2. Start with the SDK repository instead of existing data source implementations because we periodically update the SDK repository.

3. On the Create a new repository from Lean.DataSource.SDK page, set the repository name to Lean.DataSource.<vendorNameDatasetName> (for example, Lean.DataSource.XYZAirlineTicketSales).
4. If your dataset contains multiple series, use <vendorName> instead of <vendorNameDatasetName>. For instance, the Federal Reserve Economic Data (FRED) dataset repository has the name Lean.DataSource.FRED because it has many different series.

5. Click .
6. Clone the Lean.DataSource.<vendorNameDatasetName> repository.
7. $git clone https://github.com/username/Lean.DataSource.<vendorNameDatasetName>.git 8. If you're on a Linux terminal, in your Lean.DataSource.<vendorNameDatasetName> directory, change the access permissions of the bash script. 9. $ chmod +x ./renameDataset
10. In your Lean.DataSource.<vendorNameDatasetName> directory, run the renameDataset.sh bash script.
}
}

### Part 3/ Create Universe Models

1. Delete the Lean.DataSource.<vendorNameDatasetName> / <vendorNameDatasetName>Universe.cs.
2. Delete the Lean.DataSource.<vendorNameDatasetName> / <vendorNameDatasetName>UniverseSelectionAlgorithm.* files.
3. In the Lean.DataSource.<vendorNameDatasetName> / tests / Tests.csproj file, delete the code on line 8 that compiles the universe selection algorithms.

The input to your model should be a CSV file where the first column is the security identifier and the second column is the point-in-time ticker.

A R735QTJ8XC9X,A,17.19,109700,1885743,False,0.9904858,1
AA R735QTJ8XC9X,AA,71.25,513400,36579750,False,0.3992678,0.750075
AAB R735QTJ8XC9X,AAB,16.38,5000,81900,False,0.9902758,1
...
ZSEV R735QTJ8XC9X,ZSEV,10.5,800,8400,False,0.8981684,1
ZTR R735QTJ8XC9X,ZTR,9.56,102300,977988,False,0.0803037,3.97015016
ZVX R735QTJ8XC9X,ZVX,10,15600,156000,False,1,0.666667


Follow these steps to define the data source class:

1. Open the Lean.DataSource.<vendorNameDatasetName> / <vendorNameDatasetName>Universe.cs file.
1. Duplicate lines 33-36 or 38-41 (depending on the data type) for as many properties as there are in your dataset.
2. Rename the SomeCustomProperty/SomeNumericProperty properties to the names of your dataset properties (for example, Destination/FlightPassengerCount).
3. Replace the “Some custom data property” comments with a description of each property in your dataset.
3. Define the GetSource method to point to the path of your dataset file(s).
4. Use the date parameter as the file name to get the date of data being requested. An example output file path is / output / alternative / xyzairline / ticketsales / universe / 20200320.csv.

5. Define the Reader method to return instances of your universe class.
6. The first column in your data file must be the security identifier and the second column must be the point-in-time ticker. With this configuration, use new Symbol(SecurityIdentifier.Parse(csv[0]), csv[1]) to create the security Symbol.

The date in your data file must be the date that the data point is available for consumption. With this configuration, set the Time to date - Period.

7. Define the DataTimeZone method.
8. public class VendorNameDatasetNameUniverse : BaseData
{
public override DateTimeZone DataTimeZone()
{
return DateTimeZone.Utc;
}
}

If you import using QuantConnect, the TimeZones class provides helper attributes to create DateTimeZone objects. For example, you can use TimeZones.Utc or TimeZones.NewYork. For more information about time zones, see Time Zones.

9. Define the SupportedResolutions method.
10. public class VendorNameDatasetNameUniverse : BaseData
{
public override List<Resolution> SupportedResolutions()
{
return DailyResolution;
}
}

Universe data must have hour or daily resolution.

The Resolution enumeration has the following members:

11. Define the DefaultResolution method.
12. If a member doesn't specify a resolution when they subscribe to your dataset, Lean uses the DefaultResolution.

public class VendorNameDatasetNameUniverse : BaseData
{
public override Resolution DefaultResolution()
{
return Resolution.Daily;
}
}
13. Define the IsSparseData method.
14. If your dataset is not tick resolution and your dataset is missing data for at least one sample, it's sparse. If your dataset is sparse, we disable logging for missing files.

public class VendorNameDatasetNameUniverse : BaseData
{
public override bool IsSparseData()
{
return true;
}
}
15. Define the RequiresMapping method.
16. public class VendorNameDatasetNameUniverse : BaseData
{
public override bool RequiresMapping()
{
return true;
}
}
17. Define the Clone method.
18. public class VendorNameDatasetNameUniverse : BaseData
{
public override BaseData Clone()
{
return new VendorNameDatasetName
{
Symbol = Symbol,
Time = Time,
EndTime = EndTime,
SomeCustomProperty = SomeCustomProperty,
};
}
}
19. Define the ToString method.
20. public class VendorNameDatasetNameUniverse : BaseData
{
public override string ToString()
{
return \$"{Symbol} - {SomeCustomProperty}";
}
}

