Hello team!
I am using LEAN-CLI to run locally with data downloaded from Yahoo. Everything works as expected when my custom data class is using SubscriptionTransportMedium.LocalFile, reading the files from the /data directory.
I am trying to use a SQL database instead of reading from local files for each symbol. I have my own API endpoint at localhost:5000, which is working fine running in the CLI for host.docker.internal:5000. (I can see requests coming through). However, I am not seeing any data being loaded to the security. I am not sure what am I missing to make it work. Please see the algorithm code at the end below.
I've tested the endpoint using Postman, curl, or just a browser and I am seeing the correct result.
For example:
http://localhost:5000/api/ticker/SPY?date=2014-12-22
# will return:
{"date":"2014-12-22","open":"206.75","high":"207.47000122070312","low":"206.4600067138672","close":"207.47000122070312","adjclose":"181.56944274902344","volume":"148318900","ticker":"SPY"}
In the log of the backtest, I can see the self.Debug from OnData is able to find SPY from a local file, but from the API endpoint there is no data (the close 205.539993286133 debug line), and in my LoadHistory function, I can see that the history call is empty:
2022-05-26T05:18:07.5664364Z ERROR:: Runtime Error: 'SPY_DB.DBData' wasn't found in the Slice object, likely because there was no-data at this moment in time and it wasn't possible to fillforward historical data. Please check the data exists before accessing it with data.ContainsKey("SPY_DB.DBData") in Slice.cs:line 315
'SPY_DB.DBData' wasn't found in the Slice object, likely because there was no-data at this moment in time and it wasn't possible to fillforward historical data. Please check the data exists before accessing it with data.ContainsKey("SPY_DB.DBData") in Slice.cs:line 315
2022-05-26T05:18:07.5667108Z TRACE:: Debug: Algorithm Id:(1838390587) completed in 1.32 seconds at 0k data points per second. Processing total of 13 data points.
2022-05-26T05:18:07.5671696Z TRACE:: Debug: SPY
SPY_DB
History is empty True 0
[OnSecChanged] Add: [SPY]
[OnSecChanged] Remove: []
close 205.539993286133
Algorithm code:
(the two custom data classes BData and YahooData)
# QUANTCONNECT.COM - Democratizing Finance, Empowering Individuals.
# Lean Algorithmic Trading Engine v2.0. Copyright 2014 QuantConnect Corporation.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from AlgorithmImports import *
import os
import json
### <summary>
### Basic template algorithm simply initializes the date range and cash. This is a skeleton
### framework you can use for designing an algorithm.
### </summary>
### <meta name="tag" content="using data" />
### <meta name="tag" content="using quantconnect" />
### <meta name="tag" content="trading and orders" />
class DatabaseAlgorithm(QCAlgorithm):
'''Basic template algorithm simply initializes the date range and cash'''
def Initialize(self):
'''Initialise the data and resolution required, as well as the cash and start-end dates for your algorithm. All algorithms must initialized.'''
self.SetStartDate(2015,1, 1) #Set Start Date
self.SetEndDate(2015,1,30) #Set End Date
self.SetCash(100000) #Set Strategy Cash
self.activeStocks = {}
self.maxPeriod = 5 # max history lookback
self.days = 5
self.atrPeriod = 3
security_db = self.AddData(DBData, "SPY_DB", Resolution.Daily)
security = self.AddData(YahooData, "SPY", Resolution.Daily)
self.LoadHistory(security)
self.LoadHistory(security_db)
self.AddUniverse("DB-Universe", self.selector)
def selector(self, date):
return ["SPY"]
def OnData(self, data):
'''OnData event is the primary entry point for your algorithm. Each new data point will be pumped in here.
Arguments:
data: Slice object keyed by symbol containing the stock data
'''
self.Debug("close " + str(data["SPY"].Close))
self.Debug("close DB" + str(data["SPY_DB"].Close))
# for key, sd in self.activeStocks.items():
# self.Debug(f"{key} - ATR:{sd.atr.Current.Value} avgVOL:{sd.avgVol.Current.Value} daysHigh:{sd.daysHigh.Current.Value}")
def OnSecuritiesChanged(self, changes):
self.changes = changes
msg = '[OnSecChanged] Add: ' + str([(security.Symbol.Value) for security in self.changes.AddedSecurities]).replace("\'", "") + '\n'
msg += '[OnSecChanged] Remove: ' + str([security.Symbol.Value for security in self.changes.RemovedSecurities]).replace("\'", "")
self.Debug(msg)
def LoadHistory(self, security):
sd = SymbolData(security)
self.Debug(security.Symbol)
sd.daysHigh = self.MAX(security.Symbol, self.days, Resolution.Daily, Field.Close)
sd.atr = self.ATR(security.Symbol, self.atrPeriod)
sd.avgVol = self.SMA(security.Symbol, 30, Resolution.Daily, Field.Volume)
# warmup our indicators by pushing history through the indicators
# rolling window must use warmup !IMPORTANT
history = self.History(security.Symbol, self.maxPeriod, Resolution.Daily)
if (history.empty):
self.Debug(f"History is empty {history.empty} {history.size}")
return
for index, row in history.loc[sd.Symbol].iterrows():
if index == self.Time:
continue
sd.daysHigh.Update(index, row['close'])
sd.avgVol.Update(index, row['volume'])
bar = TradeBar(index, sd.Symbol, row['open'], row['high'], row['low'], row['close'], row['volume'])
sd.atr.Update(bar)
# add sd to active dict with all history
self.activeStocks[sd.Symbol.Value] = sd
class SymbolData:
def __init__(self, security):
self.Security = security
self.Symbol = security.Symbol
self.daysHigh = None
self.atr = None
self.avgVol = None
from datetime import datetime, timedelta
class DBData(PythonData):
def GetSource(self, config, date, isLiveMode):
ticker = config.Symbol.Value
if ("_" in ticker):
ticker = ticker.split("_")[0]
# The source folder depends on the directory initialized in lean-cli
# https://www.quantconnect.com/docs/v2/lean-cli/tutorials/local-data/importing-custom-data
source = f"http://host.docker.internal:5000/api/ticker/{ticker}?date={date.strftime('%Y-%m-%d')}"
return SubscriptionDataSource(source, SubscriptionTransportMedium.Rest)
"""
Example line from DB:
[
{
"date": "1999-12-29",
"open": 45.06437683105469,
"high": 56.55400466918945,
"low": 45.01967239379883,
"close": 51.50214767456055,
"adjclose": 44.22134017944336,
"volume": 10518552,
"ticker": "A"
}
]
"""
def Reader(self, config, line, date, isLiveMode):
equity = DBData()
equity.Symbol = config.Symbol
data = json.loads(line)
# If value is zero, return None
if "error" in data or not data:
return None
equity.Time = datetime.strptime(data["date"], "%Y-%m-%d")
equity.EndTime = equity.Time + timedelta(days=1)
equity.Value = data['close']
equity["Open"] = data['open']
equity["High"] = data['high']
equity["Low"] = data['low']
equity["Close"] = data['close']
equity["AdjClose"] = data['adjclose']
equity["Volume"] = data['volume']
return equity
from pathlib import Path
class YahooData(PythonData):
def GetSource(self, config, date, isLiveMode):
# The name of the asset is the symbol in lowercase .csv (ex. spy.csv)
fname = config.Symbol.Value.lower() + '.csv'
# The source folder depends on the directory initialized in lean-cli
# https://www.quantconnect.com/docs/v2/lean-cli/tutorials/local-data/importing-custom-data
source = Path(Globals.DataFolder) / 'yahoo' / fname
# The subscription method is LocalFile in this case
return SubscriptionDataSource(source.as_posix(), SubscriptionTransportMedium.LocalFile)
def Reader(self, config, line, date, isLiveMode):
equity = YahooData()
equity.Symbol = config.Symbol
# Parse the Line from the Yahoo CSV
try:
data = line.split(',')
# If value is zero, return None
value = data[4]
if value == 0: return None
equity.Time = datetime.strptime(data[0], "%Y-%m-%d")
equity.EndTime = equity.Time + timedelta(days=1)
equity.Value = value
equity["Open"] = float(data[1])
equity["High"] = float(data[2])
equity["Low"] = float(data[3])
equity["Close"] = float(data[4])
equity["AdjClose"] = float(data[5])
equity["Volume"] = float(data[6])
return equity
except ValueError:
# Do nothing, possible error in csv decoding
return None
Every help is very much appreciated!
Ido
Fred Painchaud
Hi Ido,
Can you confirm that line 152 (in DBData) is not triggered in your code?
Fred
Ido Elmaliah
Hey Fred,
Thanks for looking at this :)
Following your suggestion, I can confirm that line 152 is triggered, but not due to "error" in data, rather due to not data. It seems that Lean is requesting a date from my endpoint, which is querying my SQL database, that does not exist. My endpoint is designed to return an empty json in case the date does not exist in the SQL table.
Looking at the file reader class, it seems to read the CSV line by line and not by dates, so it will not actually request an empty date.
Here is my endpoint:
And the error in Lean:(I've tried to return {"test":"test"} instead and then the error is for {"test":"test"})
Should I not return “None” from the reader class, or an empty json from the endpoint, when the date requested cannot be found?
I have been following mostly this example - https://github.com/QuantConnect/Lean/blob/master/Algorithm.Python/CustomDataBitcoinAlgorithm.py#L89
Thanks!
Ido
Fred Painchaud
Hi Ido,
Returning None from Reader signals to LEAN that there was no data point for the asked date and time. That's why I asked the question since your issue was that you were not receiving data.
So when you execute the run which gets you no data, is None always returned from Reader?
LEAN simply asks your data source for each data point according to the registered resolution (hour, minute…). It first asks GetSource to return the source for the date/time and then reads a line from that source and gives it to Reader for parsing into an object.
If the dates and times LEAN asks for are not found by your Reader method, and it returns None, then no data at all will be really gathered by LEAN into your algo.
So your also is set to execute between
Does your DB have data for Jan 2015?
You are also using Daily resolution so you'll need daily resolution in your DB too.
Fred
Ido Elmaliah
Hey Fred,
Thanks for the clarification.
Actually yes, I do have data for Jan 2015, in fact I have data from 2000 to 2022. Furthermore, the SQL DB is composed of the same CSV files I use when running with local files.
In the Lean log file, I can see errors only for dates that are not found in the DB table, however, they cannot be found in the local CSV file as well (again they are the same).
Please see below Jan 2015 from the CSV file, which is also the data in the DB:
And here are some errors from Lean from using rest api, the endpoint will return {"test": requested_date } when the requested date is not found in the SQL Select query:
Please let me know if you have any insights.
Thanks!
Ido
Fred Painchaud
Hi Ido,
I'm not sure about your exact format since I do not have your DB in json format. But from what you wrote, I understand that when a date is NOT in the DB, your DB returns the json line {"test": <the date string>}.
If exact, then your code does not properly parse your DB output:
It looks like LEAN silently catches those exceptions (well - it logs them but catches the exceptions) while reading data and moves to the next line - BUT you still don't have a data point for the asked date/time.
You would need:
Then, the logs you see "ERROR:: Error invoking SPY.DBData data reader. Line: {"test":"2015-01-01"} Error: ('date',)" would most probably go away.
But you would still NOT have your data points.
It is difficult to help without having the DB. But you can debug on your side also. The first thing I would do is debug the line I get in Reader. You want to know if when the date IS in the DB, if the DB returns it. So when LEAN asks for Jan 10 2015, say, what is the DB answering. You want to first understand if the problem is in the DB not returning the right answers or if your code in Reader does not parse the answer properly. But from what you said about the format of your data in the JSON DB, your code in Reader seems ok.
I hope I am not too lost but it is relatively simple overall:
Since that does not seem to work, something is broken in the above.
It can either be:
Fred
Ido Elmaliah
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!